Possible memory-leak while plotting in a loop

Hi all,

I’m using MPL to plot a 6 panel figure of 2d data using pcolormesh. I started this script last night, and found it consuming over 3GB of memory when I got in this morning. After reading through old posts to this list, I came across this suggestion:

http://sourceforge.net/mailarchive/forum.php?thread_name=47558A63.8050307%40cornell.edu&forum_name=matplotlib-users

…to use gc.collect(), unfortunately this does not solve my problem…after implementing it, my script is now at nearly 800MB within 15 minutes of running. I am looping this over several thousand data files.

I am using MPL - 0.92.2, numpy - 1.0.1, and the Agg backend, and running in non-interactive mode on a Fedora Linux box with Python 2.4. MPL and numpy were installed from source rather than the distribution.

Following the suggestion to track down memory-leaks, the cbook.print_cycles(gc.garbage) call returns None at every iteration of the loop. However, using the comparison of original objects ( gc.get_objects() ) to new objects created after the first loop iteration, there are a lot of new objects present.

I’ve included a copy of the script in an attachment (slicer.py).

Any help would be great, thanks!

– Aaron Botnick

slicer.py (3.38 KB)

Aaron Botnick wrote:

I'm using MPL to plot a 6 panel figure of 2d data using pcolormesh. I started this script last night, and found it consuming over 3GB of memory when I got in this morning.

Another one... :wink:

After reading through old posts to this list, I came across this suggestion:

Thread: [Matplotlib-users] savefig memory useage | matplotlib

...to use gc.collect(), unfortunately this does not solve my problem...after implementing it, my script is now at nearly 800MB within 15 minutes of running. I am looping this over several thousand data files.

I am using MPL - 0.92.2,

I assume you mean 0.91.2?

numpy - 1.0.1, and the Agg backend, and running in non-interactive mode on a Fedora Linux box with Python 2.4. MPL and numpy were installed from source rather than the distribution.

Following the suggestion to track down memory-leaks, the cbook.print_cycles(gc.garbage) call returns None at every iteration of the loop.

That function doesn't return anything -- it prints to the console. Is it producing any console output?

However, using the comparison of original objects ( gc.get_objects() ) to new objects created after the first loop iteration, there are a lot of new objects present.

Do you know what type those objects are?

I've included a copy of the script in an attachment (slicer.py).

Would it be possible to get a revised version of the script that doesn't require the data files, i.e. just random data that closely resembles what's in the files, and reduced to the bare minimum that reproduces the leak? I could hack at it myself for a while, but if I can't reproduce the leak, I wouldn't know if it's because I removed something critical or the leak is somehow platform-dependent.

Cheers,
Mike

···

Any help would be great, thanks!

-- Aaron Botnick

------------------------------------------------------------------------

-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/

------------------------------------------------------------------------

_______________________________________________
Matplotlib-users mailing list
Matplotlib-users@lists.sourceforge.net
matplotlib-users List Signup and Options

--
Michael Droettboom
Science Software Branch
Operations and Engineering Division
Space Telescope Science Institute
Operated by AURA for NASA

Also, if you're adventurous, try installing valgrind and running:

valgrind --tool=massif python slicer.py

valgrind --tool=memcheck --leak-check=yes --log-file=slicer_leak python slicer.py

and send me the output -- (probably off list because they will be large files).

Cheers,
Mike

Aaron Botnick wrote:

···

Hi all,

I'm using MPL to plot a 6 panel figure of 2d data using pcolormesh. I started this script last night, and found it consuming over 3GB of memory when I got in this morning. After reading through old posts to this list, I came across this suggestion:

Thread: [Matplotlib-users] savefig memory useage | matplotlib

...to use gc.collect(), unfortunately this does not solve my problem...after implementing it, my script is now at nearly 800MB within 15 minutes of running. I am looping this over several thousand data files.

I am using MPL - 0.92.2, numpy - 1.0.1, and the Agg backend, and running in non-interactive mode on a Fedora Linux box with Python 2.4. MPL and numpy were installed from source rather than the distribution.

Following the suggestion to track down memory-leaks, the cbook.print_cycles(gc.garbage) call returns None at every iteration of the loop. However, using the comparison of original objects ( gc.get_objects() ) to new objects created after the first loop iteration, there are a lot of new objects present.

I've included a copy of the script in an attachment (slicer.py).

Any help would be great, thanks!

-- Aaron Botnick

------------------------------------------------------------------------

-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/

------------------------------------------------------------------------

_______________________________________________
Matplotlib-users mailing list
Matplotlib-users@lists.sourceforge.net
matplotlib-users List Signup and Options

--
Michael Droettboom
Science Software Branch
Operations and Engineering Division
Space Telescope Science Institute
Operated by AURA for NASA