Images and memory management

I have a friend who’s having strange memory issues when opening and displaying images (using Matplotlib).

Here’s what he says:

···

#######################################

pylab seems really inefficient: Opening a few images and displaying them eats up tons of memory, and the memory doesn’t get freed.

Starting python, and run

In [5]: from glob import *;

In [6]: from pylab import *

python has 33MB of memory.

Run

In [7]: i = 1

In [8]: for imname in glob("*.JPG"):

…: im = imread(imname)

…: figure(i); i = i+1

…: imshow(im)

…:

This opens 10 figures and displays them. Python takes 480MB of memory. This is crazy, for 10 images – 40+MB of memory for each!

In [14]: close(“all”)

In [15]: i = 1

In [16]: for imname in glob("*.JPG"):

im = imread(imname)

figure(i); i = i+1

imshow(im)

…:

…:

This closes all figures and opens them again. Python takes up 837MB of memory.

and so on… Something is really wrong with memory management.

System info:

(using macosx backend)

2.4GHz MacBook Pro Intel Core 2 Duo

4GB 667MHz DDR2 SDRAM

In [5]: sys.version
Out[5]: ‘2.6.2 (r262:71600, Oct 1 2009, 16:44:23) \n[GCC 4.2.1 (Apple Inc. build 5646)]’

In [6]: numpy.version
Out[6]: ‘1.3.0’

In [7]: matplotlib.version
Out[7]: ‘0.99.1.1’

In [8]: scipy.version
Out[8]: ‘0.7.1’

In [9]:

If you assign each figure to a new number, it will keep all of those
figures around in memory (because pyplot thinks you may want to use it
again.) The best route is to call close(‘all’) or fig.close() with
each loop iteration.

40MB per image doesn’t sound way out of reason to me. How big are your
images?

Mike

···

http://p.sf.net/sfu/devconfMatplotlib-users@lists.sourceforge.nethttps://lists.sourceforge.net/lists/listinfo/matplotlib-users

Hi,

I think I’ve figured out what’s going on. It’s a combination of things:

  1. iPython is ignorant of the problems associated with caching massive data output

  2. iPython doesn’t seem to have a good way to clear data from memory reliably (https://bugs.launchpad.net/ipython/+bug/412350)

  3. matplotlib/Python seems to be insufficiently aggressive in its garbage collection (??)

  4. For obvious reasons, JPGs are much bigger when stored as arrays
    (though they still seem to take up more memory than they should)

Problems 1-3 seem problematic enough that they will get fixed eventually.

… but (4) is a design issue. Assuming it’s possible, it looks like there could be benefits to making an array-like wrapper around PIL image objects (perhaps similar in principle to a sparse matrix). Given PIL.ImageMath, ImagePath, etc., it seems actually fairly doable. Wouldn’t something like this be of major benefit to people using SciPy for anything image-related?

Leo

···

On Fri, Oct 2, 2009 at 7:45 AM, Michael Droettboom <mdroe@…86…> wrote:

If you assign each figure to a new number, it will keep all of those
figures around in memory (because pyplot thinks you may want to use it
again.) The best route is to call close(‘all’) or fig.close() with
each loop iteration.

40MB per image doesn’t sound way out of reason to me. How big are your
images?

Mike

On 10/01/2009 10:25 PM, Leo Trottier wrote:

I have a friend who’s having strange memory
issues when opening and displaying images (using Matplotlib).

Here’s what he says:

#######################################

pylab seems really inefficient: Opening a few images and displaying
them eats up tons of memory, and the memory doesn’t get freed.

Starting python, and run

In [5]: from glob import *;

In [6]: from pylab import *

python has 33MB of memory.

Run

In [7]: i = 1

In [8]: for imname in glob(“*.JPG”):

…: im = imread(imname)

…: figure(i); i = i+1

…: imshow(im)

…:

This opens 10 figures and displays them. Python takes 480MB of memory.
This is crazy, for 10 images – 40+MB of memory for each!

In [14]: close(“all”)

In [15]: i = 1

In [16]: for imname in glob(“*.JPG”):

im = imread(imname)

figure(i); i = i+1

imshow(im)

…:

…:

This closes all figures and opens them again. Python takes up 837MB of
memory.

and so on… Something is really wrong with memory management.

System info:

(using macosx backend)

2.4GHz
MacBook Pro Intel Core 2 Duo

4GB 667MHz DDR2 SDRAM

In
[5]: sys.version

Out[5]: ‘2.6.2 (r262:71600, Oct 1 2009, 16:44:23) \n[GCC 4.2.1 (Apple
Inc. build 5646)]’

In [6]: numpy.version

Out[6]: ‘1.3.0’

In [7]: matplotlib.version

Out[7]: ‘0.99.1.1’

In [8]: scipy.version

Out[8]: ‘0.7.1’

In [9]:



---

------------------------------------------------------------------------------
Come build with us! The BlackBerry&reg; Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay ahead of the curve. Join us from November 9&#45;12, 2009. Register now&#33;
[http://p.sf.net/sfu/devconf](http://p.sf.net/sfu/devconf)


Matplotlib-users mailing list
Matplotlib-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/matplotlib-users

For some reason, my earlier reply didn’t seem to make it to the mailing
list. Here it is in its entirety:

“”"

If you assign each figure to a new number, it will keep all of those
figures around in memory (because pyplot thinks you may want to use it
again.) The best route is to call close(‘all’) or fig.close() with
each loop iteration.

40MB per image doesn’t sound way out of reason to me. How big are your
images?

“”"

Hi,

I think I’ve figured out what’s going on. It’s a combination of things:

  1. iPython is ignorant of the problems associated with caching massive
    data output

  2. iPython doesn’t seem to have a good way to clear data from memory
    reliably (https://bugs.launchpad.net/ipython/+bug/412350)
    iPython is designed for interactive use, and stores a lot of values so
    they can be conveniently reused later. For long running “batch”
    scripts, you can use “regular” Python, or run the code in iPython such
    that it isn’t displayed at the console (by using “import” or “%run”).
    Bug 2) may help looks like it would still require some manual
    intervention to be usefull. You’re still using a tool designed for
    fine-grained interactive use (eg. a pen) where one designed for
    automation may be more appropriate (eg. a laser printer) :slight_smile:

  3. matplotlib/Python seems to be insufficiently aggressive
    in its garbage collection (??)
    Is that still true after forcibly closing the figures on each loop
    iteration as I suggested? Many hours have been spent squashing memory
    leaks in matplotlib, and I am not aware of any in at least 0.98 and
    later (other than some unavoidable small leaks in certain GUI
    backends). Do you have a standalone example that illustrates this on a
    recent version of matplotlib?

  4. For obvious reasons, JPGs are much bigger when stored
    as arrays
    (though they still seem to take up more memory than they should)
    It’s pretty easy to estimate the memory requirements for an image. If
    the image is true-color (by this, I mean not color-mapped), you’ll need
    4-bytes-per-pixel for the original image, plus a cached scaled copy
    (the size of which depends on the output dpi), again with 4 bytes per
    pixel. For color-mapped images, you’ll have 4-byte floats for each
    pixel, 4-byte rgba for the color-mapped image, and again a cached
    scaled copy of that. Not knowing the size of your input images, it’s
    impossible to say if 40MB per image is way too big or not, but it’s not
    unheard of by any means.

Problems 1-3 seem problematic enough that they will get fixed
eventually.

… but (4) is a design issue. Assuming it’s possible, it looks like
there could be benefits to making an array-like wrapper around PIL
image objects (perhaps similar in principle to a sparse matrix). Given
PIL.ImageMath, ImagePath, etc., it seems actually fairly doable.
Wouldn’t something like this be of major benefit to people using SciPy
for anything image-related?
Are you suggesting decompressing the JPEG on-the-fly with each redraw?
I’m not certain that would be fast enough for interactive use. It may
be worth experimenting with, but it would require a lot of changes to
how matplotlib works. It’s also very tricky to get right – I’m not
aware of any image processing applications that don’t ultimately store
a dense matrix of uncompressed image data in memory, except for
something like compressed OpenGL textures on a graphics card. PIL
certainly doesn’t retain the compressed JPEG in memory. So, I’m not
sure the cost/benefit tradeoff is right here – the problems it solves
can be solved much more easily without sacrificing speed in other
ways. That is, if the image data is simply too large, it can be scaled
before feeding it to imshow(). And generating multiple figures in
batch is not a problem if the figure is explicitly closed.

Hope this helps. I would like to get to the bottom of any memory
leaks, so if you can provide a standalone script that leaks, despite
calling figure.close() in each iteration, please let me know.

Cheers,

Mike

···

On 10/05/2009 03:46 AM, Leo Trottier wrote:

Leo

On Fri, Oct 2, 2009 at 7:45 AM, Michael > Droettboom <mdroe@…86…> > wrote:

If you assign each figure to a new number, it will keep all of those
figures around in memory (because pyplot thinks you may want to use it
again.) The best route is to call close(‘all’) or fig.close() with
each loop iteration.

40MB per image doesn’t sound way out of reason to me. How big are your
images?

Mike

On 10/01/2009 10:25 PM, Leo Trottier wrote:

I have a friend who’s having strange
memory
issues when opening and displaying images (using Matplotlib).

Here’s what he says:

#######################################

pylab seems really inefficient: Opening a few images and displaying
them eats up tons of memory, and the memory doesn’t get freed.

Starting python, and run

In [5]: from glob import *;

In [6]: from pylab import *

python has 33MB of memory.

Run

In [7]: i = 1

In [8]: for imname in glob(“*.JPG”):

…: im = imread(imname)

…: figure(i); i = i+1

…: imshow(im)

…:

This opens 10 figures and displays them. Python takes 480MB of memory.
This is crazy, for 10 images – 40+MB of memory for each!

In [14]: close(“all”)

In [15]: i = 1

In [16]: for imname in glob(“*.JPG”):

im = imread(imname)

figure(i); i = i+1

imshow(im)

…:

…:

This closes all figures and opens them again. Python takes up 837MB of
memory.

and so on… Something is really wrong with memory management.

System info:

(using macosx backend)

2.4GHz
MacBook
Pro Intel Core 2 Duo

4GB 667MHz DDR2 SDRAM

In
[5]:
sys.version

Out[5]: ‘2.6.2 (r262:71600, Oct 1 2009, 16:44:23) \n[GCC 4.2.1 (Apple
Inc. build 5646)]’

In [6]: numpy.version

Out[6]: ‘1.3.0’

In [7]: matplotlib.version

Out[7]: ‘0.99.1.1’

In [8]: scipy.version

Out[8]: ‘0.7.1’

In [9]:



---

------------------------------------------------------------------------------
Come build with us! The BlackBerry&reg; Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay ahead of the curve. Join us from November 9&#45;12, 2009. Register now&#33;
[http://p.sf.net/sfu/devconf](http://p.sf.net/sfu/devconf)


Matplotlib-users mailing list
Matplotlib-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/matplotlib-users

Hi Michael,

I suppose I'm a bit confused -- I thought that jpeglib, part of which
is implemented by PIL (??) could process compressed images without
representing decompressing them to a dense raster-image matrix
(libjpeg - Wikipedia).

That said, I tried to do some PIL things, and as soon as I converted
an image (or something similar) the memory taken up suggested that the
image was represented completely and uncompressed (memory was more or
less evenly split between virtual and real memory).

So, I guess what remains are the problems with iPython. My
MATLAB-loving friend has stuck his nose up because of the memory-leaky
interactive prompt, claiming that MATLAB has no such problems ...

Thanks for your help, in any case.

Leo

···

On Mon, Oct 5, 2009 at 5:47 AM, Michael Droettboom <mdroe@...86...> wrote:

For some reason, my earlier reply didn't seem to make it to the mailing
list. Here it is in its entirety:

"""
If you assign each figure to a new number, it will keep all of those figures
around in memory (because pyplot thinks you may want to use it again.) The
best route is to call close('all') or fig.close() with each loop iteration.

40MB per image doesn't sound way out of reason to me. How big are your
images?
"""

On 10/05/2009 03:46 AM, Leo Trottier wrote:

Hi,

I think I've figured out what's going on. It's a combination of things:

1) iPython is ignorant of the problems associated with caching massive data
output
2) iPython doesn't seem to have a good way to clear data from memory
reliably (Bug #412350 “%clear should also delete _NN references and Out[NN...” : Bugs : IPython)

iPython is designed for interactive use, and stores a lot of values so they
can be conveniently reused later. For long running "batch" scripts, you can
use "regular" Python, or run the code in iPython such that it isn't
displayed at the console (by using "import" or "%run"). Bug 2) may help
looks like it would still require some manual intervention to be usefull.
You're still using a tool designed for fine-grained interactive use (eg. a
pen) where one designed for automation may be more appropriate (eg. a laser
printer) :slight_smile:

3) matplotlib/Python seems to be insufficiently aggressive in its garbage
collection (??)

Is that still true after forcibly closing the figures on each loop iteration
as I suggested? Many hours have been spent squashing memory leaks in
matplotlib, and I am not aware of any in at least 0.98 and later (other than
some unavoidable small leaks in certain GUI backends). Do you have a
standalone example that illustrates this on a recent version of matplotlib?

4) For obvious reasons, JPGs are much bigger when stored as arrays (though
they still seem to take up more memory than they should)

It's pretty easy to estimate the memory requirements for an image. If the
image is true-color (by this, I mean not color-mapped), you'll need
4-bytes-per-pixel for the original image, plus a cached scaled copy (the
size of which depends on the output dpi), again with 4 bytes per pixel. For
color-mapped images, you'll have 4-byte floats for each pixel, 4-byte rgba
for the color-mapped image, and again a cached scaled copy of that. Not
knowing the size of your input images, it's impossible to say if 40MB per
image is way too big or not, but it's not unheard of by any means.

Problems 1-3 seem problematic enough that they will get fixed eventually.

... but (4) is a design issue. Assuming it's possible, it looks like there
could be benefits to making an array-like wrapper around PIL image objects
(perhaps similar in principle to a sparse matrix). Given PIL.ImageMath,
ImagePath, etc., it seems actually fairly doable. Wouldn't something like
this be of major benefit to people using SciPy for anything image-related?

Are you suggesting decompressing the JPEG on-the-fly with each redraw? I'm
not certain that would be fast enough for interactive use. It may be worth
experimenting with, but it would require a lot of changes to how matplotlib
works. It's also very tricky to get right -- I'm not aware of any image
processing applications that don't ultimately store a dense matrix of
uncompressed image data in memory, except for something like compressed
OpenGL textures on a graphics card. PIL certainly doesn't retain the
compressed JPEG in memory. So, I'm not sure the cost/benefit tradeoff is
right here -- the problems it solves can be solved much more easily without
sacrificing speed in other ways. That is, if the image data is simply too
large, it can be scaled before feeding it to imshow(). And generating
multiple figures in batch is not a problem if the figure is explicitly
closed.

Hope this helps. I would like to get to the bottom of any memory leaks, so
if you can provide a standalone script that leaks, despite calling
figure.close() in each iteration, please let me know.

Cheers,
Mike

Leo

On Fri, Oct 2, 2009 at 7:45 AM, Michael Droettboom <mdroe@...86...> wrote:

If you assign each figure to a new number, it will keep all of those
figures around in memory (because pyplot thinks you may want to use it
again.) The best route is to call close('all') or fig.close() with each
loop iteration.

40MB per image doesn't sound way out of reason to me. How big are your
images?

Mike

On 10/01/2009 10:25 PM, Leo Trottier wrote:

I have a friend who's having strange memory issues when opening and
displaying images (using Matplotlib).
Here's what he says:
#######################################

pylab seems really inefficient: Opening a few images and displaying them
eats up tons of memory, and the memory doesn't get freed.

Starting python, and run

In [5]: from glob import *;

In [6]: from pylab import *

python has 33MB of memory.

Run

In [7]: i = 1

In [8]: for imname in glob("*.JPG"):
...: im = imread(imname)
...: figure(i); i = i+1
...: imshow(im)
...:

This opens 10 figures and displays them. Python takes 480MB of memory.
This is crazy, for 10 images -- 40+MB of memory for each!

In [14]: close("all")

In [15]: i = 1

In [16]: for imname in glob("*.JPG"):
im = imread(imname)
figure(i); i = i+1
imshow(im)
....:
....:

This closes all figures and opens them again. Python takes up 837MB of
memory.

and so on... Something is really wrong with memory management.

##### System info: ##############
(using macosx backend)
2.4GHz MacBook Pro Intel Core 2 Duo

4GB 667MHz DDR2 SDRAM
In [5]: sys.version
Out[5]: '2.6.2 (r262:71600, Oct 1 2009, 16:44:23) \n[GCC 4.2.1 (Apple
Inc. build 5646)]'

In [6]: numpy.__version__
Out[6]: '1.3.0'

In [7]: matplotlib.__version__
Out[7]: '0.99.1.1'

In [8]: scipy.__version__
Out[8]: '0.7.1'

In [9]:

________________________________

------------------------------------------------------------------------------
Come build with us! The BlackBerry&reg; Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay
ahead of the curve. Join us from November 9&#45;12, 2009. Register
now&#33;
http://p.sf.net/sfu/devconf

________________________________
_______________________________________________
Matplotlib-users mailing list
Matplotlib-users@lists.sourceforge.net
matplotlib-users List Signup and Options

Leo Trottier wrote:

Hi Michael,

I suppose I'm a bit confused -- I thought that jpeglib, part of which
is implemented by PIL (??)

Other way around. PIL uses jpeglib to read JPEG files.

could process compressed images without
representing decompressing them to a dense raster-image matrix
(libjpeg - Wikipedia).

However, PIL does not use make use of such capabilities. It just reads in the data into uncompressed memory just like it does with any other image format.

···

--
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
  that is made terrible by our own mad attempt to interpret it as though it had
  an underlying truth."
   -- Umberto Eco