Multithreading problem with print_png and font object?

I originally posted this to the user’s list but got no response there. As I think there’s a bug in matplotlib here, I’m re-trying on the development list. Here’s what I sent to -users back on Mar 13:

I am using matplotlib’s object-oriented API to dynamically generate some graphs served by a web site. The web site is built with Django and I have generally followed the cookbook example I found here: http://www.scipy.org/Cookbook/Matplotlib/Django
for serving matplotlib figures under Django. Specifically my code looks like this:

from matplotlib.backends.backend_agg import FigureCanvasAgg as FigureCanvas
from matplotlib.figure import Figure

def generate_png(request, f, year, cid, pid, ic):

# ...snipped code that generates the data to graph...

fig = Figure()
ax = fig.add_subplot(111)
ax.set_title(fig_title)
ax.set_xlabel("Score")
ax.set_ylabel("Frequency")
n, bins2, patches = ax.hist(vals, bins, facecolor='blue', edgecolor='blue')
if x is not None:
    patches[x].set_facecolor('red')
    patches[x].set_edgecolor('red')
    fig.legend((patches[x],), ('%s (%d)' % (cname, cval),), 'lower left')
canvas = FigureCanvas(fig)
canvas.print_png(f)

# ... snip remainder ...

This works fine, except when I run it under a multi-threaded web server (Apache with mod_wsgi in daemon mode with multi-threaded processes) it sometimes (not always) fails with this traceback:

File “/home/kmt/django/Django-1.1-alpha-1/django/core/handlers/base.py”, line 86, in get_response
response = callback(request, *callback_args, **callback_kwargs)
File “/home/kmt/software/web/xword/acpt/views.py”, line 321, in get_png
response = generate_png(request, f, year, cid, pid, ic)
File “/home/kmt/software/web/xword/acpt/views.py”, line 308, in generate_png
canvas.print_png(f)
File “/usr/lib/python2.5/site-packages/matplotlib/backends/backend_agg.py”, line 305, in print_png
FigureCanvasAgg.draw(self)
File “/usr/lib/python2.5/site-packages/matplotlib/backends/backend_agg.py”, line 261, in draw
self.figure.draw(self.renderer)
File “/usr/lib/python2.5/site-packages/matplotlib/figure.py”, line 765, in draw
legend.draw(renderer)
File “/usr/lib/python2.5/site-packages/matplotlib/legend.py”, line 215, in draw
t.draw(renderer)
File “/usr/lib/python2.5/site-packages/matplotlib/text.py”, line 329, in draw
ismath=self.is_math_text(line))
File “/usr/lib/python2.5/site-packages/matplotlib/backends/backend_agg.py”, line 113, in draw_text
self._renderer.draw_text_image(font.get_image(), int(x), int(y) + 1, angle, gc)
RuntimeError: You must call .set_text() before .get_image()

I’m not at all familiar with the internals (truly I’m barely familiar with the public APIs) of matplotlib but it appears from this exception that internally there’s a ‘font’ object being shared between threads here, such that one thread can come in and change the font state resulting in a subsequent error in a different thread that was also in the middle of using that font object? If I protect that block of code above with a thread lock so that only one thread is allowed in at a time, the problem goes away.

For reference I’m using the latest matplotlib available in the Ubuntu Intrepid (8.10) repositories, which looks to be 0.98.3. In a brief scan I didn’t see anything relevant listed in the “what’s new” page for 0.98.4 (and can’t find a “what’s new in 0.98.5” on the matplotlib web site though that is what is listed as most recent?). Nor can I find anything that looks similar logged as a bug in the tracker.

Is there something (besides bracketing all access to the matplotlib code with a thread mutex) that I should be doing to make my use of matplotlib thread safe or does it seem like there’s a multi-threading bug in matplotlib here?

Apologies if this is the wrong list and there is in fact something I ought to be doing in my code (other than using a mutex) to prevent this – I haven’t been able to find anything. My impression from various doc I’ve read is the object-oriented API is supposed to be thread-safe. Is that true? If so should I file a ticket for this?

Thanks for any feedback,
Karen

Karen Tracey wrote:

I originally posted this to the user's list but got no response there. As I think there's a bug in matplotlib here, I'm re-trying on the development list. Here's what I sent to -users back on Mar 13:

Karen,

(I saw your question to the users group, and was hoping someone more knowledgeable than myself about threading problems would reply to you.)

http://www.mail-archive.com/matplotlib-users@lists.sourceforge.net/msg06152.html

I don't think there has been any progress on this question since it came up a year ago in the above thread. Evidently it is something that needs more attention. I would say that you have found a bug, and there are probably many more. There are global objects outside the pyplot interface; maybe they provide opportunities for threads to trip over each other. A reasonable goal would be to have matplotlib be thread-safe at least when the pure OO interface is used, and possibly with some additional restrictions such as "don't mess with rcParams when more than one thread might be running". I suspect this will take some time and effort to achieve--and I don't know who might be able to put in that time and effort. What might help would be a simple but brutal testing framework, independent of web servers etc., that would be likely to make mpl fail quickly unless it really is thread-safe. Maybe you can provide such a test program? And if you are willing to dive into mpl internals and come up with solutions for thread problems, that's even better.

For your immediate needs, I suggest that you use only the OO interface (which means that you won't import pylab or pyplot, and so won't use the figure() function), and/or use a mutex as you describe below. (I don't think that dumping "figure" in itself will help with the problem you have run into so far; you will still need the mutex. And if the mutex locks all your plotting, then you probably won't hit any other mpl threading problems.)

If you do keep using the figure() function (or any related aspect of the pyplot interface) then be sure to explicitly close each figure. Otherwise your process will chew up all available memory.

Eric

···

I am using matplotlib's object-oriented API to dynamically generate some graphs served by a web site. The web site is built with Django and I have generally followed the cookbook example I found here: http://www.scipy.org/Cookbook/Matplotlib/Django for serving matplotlib figures under Django. Specifically my code looks like this:

from matplotlib.backends.backend_agg import FigureCanvasAgg as FigureCanvas
from matplotlib.figure import Figure

def generate_png(request, f, year, cid, pid, ic):

    # ...snipped code that generates the data to graph...

    fig = Figure()
    ax = fig.add_subplot(111)
    ax.set_title(fig_title)
    ax.set_xlabel("Score")
    ax.set_ylabel("Frequency")
    n, bins2, patches = ax.hist(vals, bins, facecolor='blue', edgecolor='blue')
    if x is not None:
        patches[x].set_facecolor('red')
        patches[x].set_edgecolor('red')
        fig.legend((patches[x],), ('%s (%d)' % (cname, cval),), 'lower left')
    canvas = FigureCanvas(fig)
    canvas.print_png(f)

    # ... snip remainder ...

This works fine, except when I run it under a multi-threaded web server (Apache with mod_wsgi in daemon mode with multi-threaded processes) it sometimes (not always) fails with this traceback:

File "/home/kmt/django/Django-1.1-alpha-1/django/core/handlers/base.py", line 86, in get_response
   response = callback(request, *callback_args, **callback_kwargs)
File "/home/kmt/software/web/xword/acpt/views.py", line 321, in get_png
   response = generate_png(request, f, year, cid, pid, ic)
File "/home/kmt/software/web/xword/acpt/views.py", line 308, in generate_png
   canvas.print_png(f)
File "/usr/lib/python2.5/site-packages/matplotlib/backends/backend_agg.py", line 305, in print_png
   FigureCanvasAgg.draw(self)
File "/usr/lib/python2.5/site-packages/matplotlib/backends/backend_agg.py", line 261, in draw
   self.figure.draw(self.renderer)
File "/usr/lib/python2.5/site-packages/matplotlib/figure.py", line 765, in draw
   legend.draw(renderer)
File "/usr/lib/python2.5/site-packages/matplotlib/legend.py", line 215, in draw
   t.draw(renderer)
File "/usr/lib/python2.5/site-packages/matplotlib/text.py", line 329, in draw
   ismath=self.is_math_text(line))
File "/usr/lib/python2.5/site-packages/matplotlib/backends/backend_agg.py", line 113, in draw_text
   self._renderer.draw_text_image(font.get_image(), int(x), int(y) + 1, angle, gc)
RuntimeError: You must call .set_text() before .get_image()

I'm not at all familiar with the internals (truly I'm barely familiar with the public APIs) of matplotlib but it appears from this exception that internally there's a 'font' object being shared between threads here, such that one thread can come in and change the font state resulting in a subsequent error in a different thread that was also in the middle of using that font object? If I protect that block of code above with a thread lock so that only one thread is allowed in at a time, the problem goes away.

For reference I'm using the latest matplotlib available in the Ubuntu Intrepid (8.10) repositories, which looks to be 0.98.3. In a brief scan I didn't see anything relevant listed in the "what's new" page for 0.98.4 (and can't find a "what's new in 0.98.5" on the matplotlib web site though that is what is listed as most recent?). Nor can I find anything that looks similar logged as a bug in the tracker.

Is there something (besides bracketing all access to the matplotlib code with a thread mutex) that I should be doing to make my use of matplotlib thread safe or does it seem like there's a multi-threading bug in matplotlib here?

Apologies if this is the wrong list and there is in fact something I ought to be doing in my code (other than using a mutex) to prevent this -- I haven't been able to find anything. My impression from various doc I've read is the object-oriented API is supposed to be thread-safe. Is that true? If so should I file a ticket for this?

Thanks for any feedback,
Karen

------------------------------------------------------------------------

------------------------------------------------------------------------------

------------------------------------------------------------------------

_______________________________________________
Matplotlib-devel mailing list
Matplotlib-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/matplotlib-devel

It might not be too much trouble to protect RcParams and its data, although I dont know how disruptive it would be to the mpl codebase and to users for rcParams to begin returning copies of things like font lists.

Darren

···

On Thu, Mar 26, 2009 at 8:10 PM, Eric Firing <efiring@…552…229…> wrote:

Karen Tracey wrote:

I originally posted this to the user’s list but got no response there.

As I think there’s a bug in matplotlib here, I’m re-trying on the

development list. Here’s what I sent to -users back on Mar 13:

Karen,

(I saw your question to the users group, and was hoping someone more

knowledgeable than myself about threading problems would reply to you.)

http://www.mail-archive.com/matplotlib-users@lists.sourceforge.net/msg06152.html

I don’t think there has been any progress on this question since it came

up a year ago in the above thread. Evidently it is something that needs

more attention. I would say that you have found a bug, and there are

probably many more. There are global objects outside the pyplot

interface; maybe they provide opportunities for threads to trip over

each other. A reasonable goal would be to have matplotlib be thread-safe

at least when the pure OO interface is used, and possibly with some

additional restrictions such as "don’t mess with rcParams when more than

one thread might be running".

In addition to the rc params, I suspect some of the caching we do at the module level, eg in agg

class RendererAgg(RendererBase):

    texd = maxdict(50)  # a cache of tex image rasters

    _fontd = maxdict(50)

might be what is hurting here. Karen, could you try editing site-packages/matplotlib/backends/backend_agg.py and moving these two lines into the __init__method so they are at the instance level rather than the class level. Eg,

def __init__(self, width, height, dpi):
    if __debug__: verbose.report('RendererAgg.__init__', 'debug-annoying')
    RendererBase.__init__(self)
    self.texd = maxdict(50)  # a cache of tex image rasters

    self._fontd = maxdict(50)

and see if that helps. Also, make sure you have disabled usetex in matplotlibrc, since the use of the filesystem for caching the tex datafiles is probably not thread safe. My guess is that the font cache on the file system is not thread safe either, but this may only affect the first run of mpl after a clean install.

Also, as Eric suggests, a freestanding script which is thread enabled (preferably just using mpl and the standard threading library rather than django et al) which uses the agg backend that we could use for debugging would be very helpful.

JDH

···

On Fri, Mar 27, 2009 at 5:54 AM, Darren Dale <dsdale24@…55…149…> wrote:

It might not be too much trouble to protect RcParams and its data, although I dont know how disruptive it would be to the mpl codebase and to users for rcParams to begin returning copies of things like font lists.

Thanks for the responses everyone. More below inline…

It might not be too much trouble to protect RcParams and its data, although I dont know how disruptive it would be to the mpl codebase and to users for rcParams to begin returning copies of things like font lists.

In addition to the rc params, I suspect some of the caching we do at the module level, eg in agg

class RendererAgg(RendererBase):

    texd = maxdict(50)  # a cache of tex image rasters



    _fontd = maxdict(50)

might be what is hurting here. Karen, could you try editing site-packages/matplotlib/backends/backend_agg.py and moving these two lines into the __init__method so they are at the instance level rather than the class level. Eg,

def __init__(self, width, height, dpi):
    if __debug__: verbose.report('RendererAgg.__init__', 'debug-annoying')
    RendererBase.__init__(self)
    self.texd = maxdict(50)  # a cache of tex image rasters



    self._fontd = maxdict(50)

and see if that helps.

Yes, this change appears to fix the problem. I created a standalone script to recreate the problem. The script starts several threads. Each thread enters a loop generating a (different) png file. With the original matplotlib code, the threads seem unable to complete 50 iterations without the exception being raised. (I tried the script at least a dozen times, it never finished successfully with 50 iterations. Usually failed before 5, but once made it as high as 20, so I bumped it up to 50 and that seemed to ensure it would fail before finishing). With the above change to make the texd and _fontd caches per-instance, the script has finished successfully 4 times. Seems pretty convincing evidence that it fixes the problem.

Also, make sure you have disabled usetex in matplotlibrc, since the use of the filesystem for caching the tex datafiles is probably not thread safe. My guess is that the font cache on the file system is not thread safe either, but this may only affect the first run of mpl after a clean install.

Not sure what matplotlibrc is? I found a file /usr/share/matplotlib/matplotlib.conf that has usetex set to False. Is that it? If so I guess it’s been disabled all along, this is not something I have fiddled with.

Also, as Eric suggests, a freestanding script which is thread enabled (preferably just using mpl and the standard threading library rather than django et al) which uses the agg backend that we could use for debugging would be very helpful.

Understood, and I did make a standalone script (plus a set of pickle files for the data to graph) to recreate it. I can zip them up and send them somewhere, or open a ticket and attach it there, or whatever. Let me know what would be most convenient/helpful.

Thanks!

Karen

···

On Fri, Mar 27, 2009 at 9:12 AM, John Hunter <jdh2358@…149…> wrote:

On Fri, Mar 27, 2009 at 5:54 AM, Darren Dale <dsdale24@…55…149…> wrote:

Yes, this change appears to fix the problem. I created a standalone script to recreate the problem. The script starts several threads. Each thread enters a loop generating a (different) png file. With the original matplotlib code, the threads seem unable to complete 50 iterations without the exception being raised. (I tried the script at least a dozen times, it never finished successfully with 50 iterations. Usually failed before 5, but once made it as high as 20, so I bumped it up to 50 and that seemed to ensure it would fail before finishing). With the above change to make the texd and _fontd caches per-instance, the script has finished successfully 4 times. Seems pretty convincing evidence that it fixes the problem.

OK, I made this change to svn r7008. We probably do not gain that much by caching across renderers anyhow.

Also, make sure you have disabled usetex in matplotlibrc, since the use of the filesystem for caching the tex datafiles is probably not thread safe. My guess is that the font cache on the file system is not thread safe either, but this may only affect the first run of mpl after a clean install.

Not sure what matplotlibrc is? I found a file /usr/share/matplotlib/matplotlib.conf that has usetex set to False. Is that it? If so I guess it’s been disabled all along, this is not something I have fiddled with.

matplotlib.conf is an experimental config that is no longer in use (but it won’t hurt you to have it lying around). See http://matplotlib.sourceforge.net/users/customizing.html for details on the matplotlibrc file

Understood, and I did make a standalone script (plus a set of pickle files for the data to graph) to recreate it. I can zip them up and send them somewhere, or open a ticket and attach it there, or whatever. Let me know what would be most convenient/helpful.

No need for data files – just use np.random.rand to make up some random data for plotting. That keeps things simple and small, and you can post the script to the list, and hopefully we can find a home for it in our unit tests.

Thanks,
JDH

···

On Fri, Mar 27, 2009 at 1:23 PM, Karen Tracey <kmtracey@…149…> wrote:

With the above change to make the texd and _fontd caches per-instance, the script has finished successfully 4 times. Seems pretty convincing evidence that it fixes the problem.

OK, I made this change to svn r7008. We probably do not gain that much by caching across renderers anyhow.

Cool, thanks!

Also, make sure you have disabled usetex in matplotlibrc, since the use of the filesystem for caching the tex datafiles is probably not thread safe. My guess is that the font cache on the file system is not thread safe either, but this may only affect the first run of mpl after a clean install.

Not sure what matplotlibrc is? I found a file /usr/share/matplotlib/matplotlib.conf that has usetex set to False. Is that it? If so I guess it’s been disabled all along, this is not something I have fiddled with.

matplotlib.conf is an experimental config that is no longer in use (but it won’t hurt you to have it lying around). See http://matplotlib.sourceforge.net/users/customizing.html for details on the matplotlibrc file

Hmm, the Ubuntu packaging for matplotlib seems to put a copy of matplotlibrc only in /etc, where, I gather, it will not ever be used by matplotlib? I guess one is supposed to copy it to one’s home directory and do per-user customization there. For my case, where the code is running under Apache, I’d guess no matplotlibrc is being found so all defaults are being used.

No need for data files – just use np.random.rand to make up some random data for plotting. That keeps things simple and small, and you can post the script to the list, and hopefully we can find a home for it in our unit tests.

OK, I reduced it to a bare minimum, using random data. I removed stuff that was in my code like customizing colors, highlighting a particular bar, and even setting of the title and axis labels since it turns out none of that is necessary to trip the exception. I’ll append below a script that reliably reproduces the error on my machine. 8 threads at 50 iterations seems sure to hit it, fewer threads or fewer iterations may not. Hope this is helpful,

Karen

#! /usr/bin/python
import os
import threading
import traceback

import numpy as np
from matplotlib.backends.backend_agg import FigureCanvasAgg as FigureCanvas
from matplotlib.figure import Figure

thread_count = 8
max_iterations = 50
exception_raised = False

def png_thread(tn):
png_fname = ‘out%d.png’ % tn
vals = 100 + 15 * np.random.randn(10000)

i = 0
excp = None

global exception_raised
while not exception_raised and i < max_iterations:
    i += 1
    png_f = open(png_fname, 'wb')

    try:
        fig = Figure()
        ax = fig.add_subplot(111)

        ax.hist(vals, 50)
        FigureCanvas(fig).print_png(png_f)
    except Exception, excp:
        pass

    png_f.close()
    if excp:
        print 'png_thread %d failed on iteration %d:' % (tn, i)

        print traceback.format_exc(excp)
        exception_raised = True
    else: 
        print 'png_thread %d completed iteration %d.' % (tn, i)

os.unlink(png_fname)

def main(tc):
threads = []
for i in range(tc):
threads.append(threading.Thread(target=png_thread, args=(i+1,)))

for t in threads:
    t.start()

for t in threads:

    t.join()

if not exception_raised:
    msg = 'Success! %d threads completed %d iterations with no exceptions raised.'
else:
    msg = 'Failed! Exception raised before %d threads completed %d iterations.'

print msg % (tc, max_iterations)

if name== “main”:
main(thread_count)

···

On Fri, Mar 27, 2009 at 5:11 PM, John Hunter <jdh2358@…149…> wrote:

On Fri, Mar 27, 2009 at 1:23 PM, Karen Tracey <kmtracey@…149…> wrote:

Karen Tracey wrote:

Hmm, the Ubuntu packaging for matplotlib seems to put a copy of matplotlibrc only in /etc, where, I gather, it will not ever be used by matplotlib? I guess one is supposed to copy it to one's home directory and do per-user customization there. For my case, where the code is running under Apache, I'd guess no matplotlibrc is being found so all defaults are being used.

Karen,

That seems a little odd; matplotlib doesn't look in /etc by default. Although I run ubuntu, I have never used the ubuntu package, so I have not run into this.

The default matplotlibrc has been stripped down to a bare minimum: everything but the default backend selection is commented out.

You may have already discovered this, but in case you haven't, you can find out where the active matplotlibrc is being found by using matplotlib_fname():

In [1]:import matplotlib

In [2]:matplotlib.matplotlib_fname()
Out[2]:'/usr/local/lib/python2.5/site-packages/matplotlib/mpl-data/matplotlibrc'

The docstring explains the search order:

     Return the path to the rc file

     Search order:

      * current working dir
      * environ var MATPLOTLIBRC
      * HOME/.matplotlib/matplotlibrc
      * MATPLOTLIBDATA/matplotlibrc

Thank you for the test script. I have added it to the "unit" subdirectory of matplotlib, after adding a short docstring. JDH may want to modify or move it.

Eric