savefig Memory Leak

Hello,

I have written a small script that, I think, demonstrates a memory leak in savefig. A search of the mailing list shows a thread started by Ralf Gommers <ralf.gommers@...982...> about 2009-07-01 that seems to cover a very similar issue. I have appended the demonstration script at the end of this e-mail text.

The demonstration script script sits in a relatively tight loop creating figures then saving them while monitoring memory usage. A plot of VmRSS vs. number of loop iterations as generated on my system is attached as "data.png" (you can create your own plots with the sample script). Although I have only tested this on Fedora 12, I expect that most Linux users should be able to run the script for themselves. Users should be able to comment out the "savefig" line and watch memory usage go from unbounded to (relatively) bounded.

Can anybody see a cause for this leak hidden in my code? Has anybody seen this issue and solved it? I would also appreciate it if other people would run this script and report their findings so that there will be some indication of the problem's manifestation frequency.

Sincerely,
Keegan Callin

data.png

···

************************************************************************

'''Script to demonstrate memory leakage in savefig call.

Requirements:
Tested in Fedora 12. It should work on other systems where
/proc/{PID}/status files exist and those files contain a 'VmRSS' entry
(this is how the script monitors its memory usage).

System Details on Original Test System:

[keegan@...3070... test]$ uname -a
Linux grizzly 2.6.32.9-70.fc12.x86_64 #1 SMP Wed Mar 3 04:40:41 UTC 2010 x86_64 x86_64 x86_64 GNU/Linux

[keegan@...3070... ~]$ gcc --version
gcc (GCC) 4.4.3 20100127 (Red Hat 4.4.3-4)
Copyright (C) 2010 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

[keegan@...3070... ~]$ cd ~/src/matplotlib-0.99.1.1
[keegan@...3070... matplotlib-0.99.1.1]$ rm -rf build
[keegan@...3070... matplotlib-0.99.1.1]$ python setup.py build &> out.log
[keegan@...3070... matplotlib-0.99.1.1]$ head -38 out.log

BUILDING MATPLOTLIB
             matplotlib: 0.99.1.1
                 python: 2.6.4 (r264:75706, Jan 20 2010, 12:34:05) [GCC
                         4.4.2 20091222 (Red Hat 4.4.2-20)]
               platform: linux2

REQUIRED DEPENDENCIES
                  numpy: 1.4.0
              freetype2: 9.22.3

OPTIONAL BACKEND DEPENDENCIES
                 libpng: 1.2.43
                Tkinter: no
                         * TKAgg requires Tkinter
               wxPython: no
                         * wxPython not found
                   Gtk+: no
                         * Building for Gtk+ requires pygtk; you must be able
                         * to "import gtk" in your build/install environment
        Mac OS X native: no
                     Qt: no
                    Qt4: no
                  Cairo: no

OPTIONAL DATE/TIMEZONE DEPENDENCIES
               datetime: present, version unknown
               dateutil: matplotlib will provide
                   pytz: 2010b

OPTIONAL USETEX DEPENDENCIES
                 dvipng: no
            ghostscript: 8.71
                  latex: no
                pdftops: 0.12.4

[Edit setup.cfg to suppress the above messages]

[keegan@...3070... matplotlib-0.99.1.1]$ bzip2 out.log
# out.log.bz2 is attached to the message containing this program.

[keegan@...3070... ~]$ python2.6
Python 2.6.4 (r264:75706, Jan 20 2010, 12:34:05)
[GCC 4.4.2 20091222 (Red Hat 4.4.2-20)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import matplotlib
>>> matplotlib.__version__
'0.99.1.1'
'''
# Import standard python modules
import sys
import os
from ConfigParser import SafeConfigParser as ConfigParser
from cStringIO import StringIO

# import numpy
import numpy
from numpy import zeros

# Import matplotlib
from matplotlib.figure import Figure
from matplotlib.backends.backend_agg import FigureCanvasAgg as FigureCanvas

def build_figure(a):
     '''Returns a new figure containing array a.'''

     # Create figure and setup graph
     fig = Figure()
     FigureCanvas(fig)
     ax = fig.add_subplot(1, 1, 1)
     ax.plot(a)

     return fig

_proc_status = '/proc/%d/status' % os.getpid()
def load_status():
     '''Returns a dict of process statistics from from /proc/{PID}/status.'''
     status = {}

     with open(_proc_status) as f:
         for line in f:
             key, value = line.split(':', 1)
             key = key.strip()
             value = value.strip()
             status[key] = value

     return status

def main():
     data_file = 'data.txt'
     image_file = 'data.png'
     num_iterations = 1000

     with open(data_file, 'w') as f:
         # Tried running without matplotlib or numpy such that the
         # only thing happening in the process is the dumping of process
         # status information to `data_file` from the loop. Memory
         # usage reaches a bound _very_ quickly.
         status = load_status()
         rss, unit = status['VmRSS'].split()
         print >>f, rss

         print 'Executing', num_iterations, 'iterations.'
         a = zeros(10000)
         for i in xrange(0, num_iterations):
             # Shift random data is being shifted into a numpy array.
             # With numpy and the process status dump enabled, memory
             # usage reaches a bound very quickly.
             a[0:-1] = a[1:]
             a[-1] = numpy.random.rand(1)[0]

             # When figures of the array are generated in each loop,
             # memory reaches a bound more slowly(~50 iterations) than
             # without matplotlib; nevertheless, memory usage still
             # appears to be bounded.
             fig = build_figure(a)

             # Savefig alone causes memory usage to become unbounded.
             # Memory usage increase seems to be linear with the number
             # of iterations.
             sink = StringIO()
             fig.savefig(sink, format='png', dpi=80, transparent=False, bbox_inches="tight", pad_inches=0.15)
             # This line below can be used to demonstrate that StringIO
             # does not leak without the savefig call.
             #sink.write(1000*'hello')
             sink.close()

             status = load_status()
             rss, unit = status['VmRSS'].split()
             print >>f, rss
             sys.stdout.write('#')
             sys.stdout.flush()

     # Load process statistics and save them to a file.
     print
     print 'Graphing memory usage data from', data_file, 'to', image_file
     with open(data_file) as f:
         rss = [int(r) for r in f]
     fig = build_figure(rss)

     with open(image_file, 'wb') as f:
         fig = build_figure(rss)
         fig.savefig(f, format='png', dpi=80, transparent=False, bbox_inches="tight", pad_inches=0.15)

     return 0

if __name__ == '__main__':
     sys.exit(main())