Matplotlib eating memory

Hi,
  rendering some of my charts takes almost 50GB of RAM. I believe below is a stracktrace
of one such situation when it already took 15GB. Would somebody comments on what is
matplotlib doing at the very moment? Why the recursion?

  The charts had to have 262422 data points in a 2D scatter plot, each point has assigned
its own color. They are in batches so that there are 153 distinct colors but nevertheless,
I assigned to each data point a color value. There are 153 legend items also (one color
won't be used).

^CTraceback (most recent call last):
...
    _figure.savefig(filename, dpi=100)
  File "/usr/lib64/python2.7/site-packages/matplotlib/figure.py", line 1421, in savefig
    self.canvas.print_figure(*args, **kwargs)
  File "/usr/lib64/python2.7/site-packages/matplotlib/backend_bases.py", line 2220, in print_figure
    **kwargs)
  File "/usr/lib64/python2.7/site-packages/matplotlib/backends/backend_agg.py", line 505, in print_png
    FigureCanvasAgg.draw(self)
  File "/usr/lib64/python2.7/site-packages/matplotlib/backends/backend_agg.py", line 451, in draw
    self.figure.draw(self.renderer)
  File "/usr/lib64/python2.7/site-packages/matplotlib/artist.py", line 54, in draw_wrapper
    draw(artist, renderer, *args, **kwargs)
  File "/usr/lib64/python2.7/site-packages/matplotlib/figure.py", line 1034, in draw
    func(*args)
  File "/usr/lib64/python2.7/site-packages/matplotlib/artist.py", line 54, in draw_wrapper
    draw(artist, renderer, *args, **kwargs)
  File "/usr/lib64/python2.7/site-packages/matplotlib/axes.py", line 2086, in draw
    a.draw(renderer)
  File "/usr/lib64/python2.7/site-packages/matplotlib/artist.py", line 54, in draw_wrapper
    draw(artist, renderer, *args, **kwargs)
  File "/usr/lib64/python2.7/site-packages/matplotlib/collections.py", line 718, in draw
    return Collection.draw(self, renderer)
  File "/usr/lib64/python2.7/site-packages/matplotlib/artist.py", line 54, in draw_wrapper
    draw(artist, renderer, *args, **kwargs)
  File "/usr/lib64/python2.7/site-packages/matplotlib/collections.py", line 276, in draw
    offsets, transOffset, self.get_facecolor(), self.get_edgecolor(),
  File "/usr/lib64/python2.7/site-packages/matplotlib/collections.py", line 551, in get_edgecolor
    return self._edgecolors
KeyboardInterrupt
^CError in atexit._run_exitfuncs:
Traceback (most recent call last):
  File "/usr/lib64/python2.7/atexit.py", line 24, in _run_exitfuncs
    func(*targs, **kargs)
  File "/usr/lib64/python2.7/site-packages/matplotlib/_pylab_helpers.py", line 90, in destroy_all
    gc.collect()
KeyboardInterrupt
Error in sys.exitfunc:
Traceback (most recent call last):
  File "/usr/lib64/python2.7/atexit.py", line 24, in _run_exitfuncs
    func(*targs, **kargs)
  File "/usr/lib64/python2.7/site-packages/matplotlib/_pylab_helpers.py", line 90, in destroy_all
    gc.collect()
KeyboardInterrupt

^C

Clues what is the code doing? I use mpl-1.3.0.
Thank you,
Martin

Can you provide a complete, standalone example that reproduces the problem. Otherwise all I can do is guess.

The usual culprit is forgetting to close figures after you're done with them.

Mike

···

On 10/10/2013 09:05 AM, Martin MOKREJ� wrote:

Hi,
   rendering some of my charts takes almost 50GB of RAM. I believe below is a stracktrace
of one such situation when it already took 15GB. Would somebody comments on what is
matplotlib doing at the very moment? Why the recursion?

   The charts had to have 262422 data points in a 2D scatter plot, each point has assigned
its own color. They are in batches so that there are 153 distinct colors but nevertheless,
I assigned to each data point a color value. There are 153 legend items also (one color
won't be used).

^CTraceback (most recent call last):
...
     _figure.savefig(filename, dpi=100)
   File "/usr/lib64/python2.7/site-packages/matplotlib/figure.py", line 1421, in savefig
     self.canvas.print_figure(*args, **kwargs)
   File "/usr/lib64/python2.7/site-packages/matplotlib/backend_bases.py", line 2220, in print_figure
     **kwargs)
   File "/usr/lib64/python2.7/site-packages/matplotlib/backends/backend_agg.py", line 505, in print_png
     FigureCanvasAgg.draw(self)
   File "/usr/lib64/python2.7/site-packages/matplotlib/backends/backend_agg.py", line 451, in draw
     self.figure.draw(self.renderer)
   File "/usr/lib64/python2.7/site-packages/matplotlib/artist.py", line 54, in draw_wrapper
     draw(artist, renderer, *args, **kwargs)
   File "/usr/lib64/python2.7/site-packages/matplotlib/figure.py", line 1034, in draw
     func(*args)
   File "/usr/lib64/python2.7/site-packages/matplotlib/artist.py", line 54, in draw_wrapper
     draw(artist, renderer, *args, **kwargs)
   File "/usr/lib64/python2.7/site-packages/matplotlib/axes.py", line 2086, in draw
     a.draw(renderer)
   File "/usr/lib64/python2.7/site-packages/matplotlib/artist.py", line 54, in draw_wrapper
     draw(artist, renderer, *args, **kwargs)
   File "/usr/lib64/python2.7/site-packages/matplotlib/collections.py", line 718, in draw
     return Collection.draw(self, renderer)
   File "/usr/lib64/python2.7/site-packages/matplotlib/artist.py", line 54, in draw_wrapper
     draw(artist, renderer, *args, **kwargs)
   File "/usr/lib64/python2.7/site-packages/matplotlib/collections.py", line 276, in draw
     offsets, transOffset, self.get_facecolor(), self.get_edgecolor(),
   File "/usr/lib64/python2.7/site-packages/matplotlib/collections.py", line 551, in get_edgecolor
     return self._edgecolors
KeyboardInterrupt
^CError in atexit._run_exitfuncs:
Traceback (most recent call last):
   File "/usr/lib64/python2.7/atexit.py", line 24, in _run_exitfuncs
     func(*targs, **kargs)
   File "/usr/lib64/python2.7/site-packages/matplotlib/_pylab_helpers.py", line 90, in destroy_all
     gc.collect()
KeyboardInterrupt
Error in sys.exitfunc:
Traceback (most recent call last):
   File "/usr/lib64/python2.7/atexit.py", line 24, in _run_exitfuncs
     func(*targs, **kargs)
   File "/usr/lib64/python2.7/site-packages/matplotlib/_pylab_helpers.py", line 90, in destroy_all
     gc.collect()
KeyboardInterrupt

^C

Clues what is the code doing? I use mpl-1.3.0.
Thank you,
Martin

------------------------------------------------------------------------------
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from
the latest Intel processors and coprocessors. See abstracts and register >
http://pubads.g.doubleclick.net/gampad/clk?id=60134071&iu=/4140/ostg.clktrk
_______________________________________________
Matplotlib-users mailing list
Matplotlib-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/matplotlib-users

--
                    _

\/|o _|_ _. _ | | \.__ __|__|_|_ _ _ ._ _
>>(_| |(_|(/_| |_/|(_)(/_|_ |_|_)(_)(_)| | |

http://www.droettboom.com

Unfortunately, that stacktrace isn't very useful. There is no recursion
there, but rather the perfectly normal drawing of the figure object that
has a child axes, which has child collections which have child artist
objects.

Without the accompanying code, it would be difficult to determine where the
memory hog is.

Ben Root

···

On Thu, Oct 10, 2013 at 9:05 AM, Martin MOKREJŠ <mmokrejs@...287...> wrote:

Hi,
  rendering some of my charts takes almost 50GB of RAM. I believe below is
a stracktrace
of one such situation when it already took 15GB. Would somebody comments
on what is
matplotlib doing at the very moment? Why the recursion?

  The charts had to have 262422 data points in a 2D scatter plot, each
point has assigned
its own color. They are in batches so that there are 153 distinct colors
but nevertheless,
I assigned to each data point a color value. There are 153 legend items
also (one color
won't be used).

^CTraceback (most recent call last):
...
    _figure.savefig(filename, dpi=100)
  File "/usr/lib64/python2.7/site-packages/matplotlib/figure.py", line
1421, in savefig
    self.canvas.print_figure(*args, **kwargs)
  File "/usr/lib64/python2.7/site-packages/matplotlib/backend_bases.py",
line 2220, in print_figure
    **kwargs)
  File
"/usr/lib64/python2.7/site-packages/matplotlib/backends/backend_agg.py",
line 505, in print_png
    FigureCanvasAgg.draw(self)
  File
"/usr/lib64/python2.7/site-packages/matplotlib/backends/backend_agg.py",
line 451, in draw
    self.figure.draw(self.renderer)
  File "/usr/lib64/python2.7/site-packages/matplotlib/artist.py", line 54,
in draw_wrapper
    draw(artist, renderer, *args, **kwargs)
  File "/usr/lib64/python2.7/site-packages/matplotlib/figure.py", line
1034, in draw
    func(*args)
  File "/usr/lib64/python2.7/site-packages/matplotlib/artist.py", line 54,
in draw_wrapper
    draw(artist, renderer, *args, **kwargs)
  File "/usr/lib64/python2.7/site-packages/matplotlib/axes.py", line 2086,
in draw
    a.draw(renderer)
  File "/usr/lib64/python2.7/site-packages/matplotlib/artist.py", line 54,
in draw_wrapper
    draw(artist, renderer, *args, **kwargs)
  File "/usr/lib64/python2.7/site-packages/matplotlib/collections.py",
line 718, in draw
    return Collection.draw(self, renderer)
  File "/usr/lib64/python2.7/site-packages/matplotlib/artist.py", line 54,
in draw_wrapper
    draw(artist, renderer, *args, **kwargs)
  File "/usr/lib64/python2.7/site-packages/matplotlib/collections.py",
line 276, in draw
    offsets, transOffset, self.get_facecolor(), self.get_edgecolor(),
  File "/usr/lib64/python2.7/site-packages/matplotlib/collections.py",
line 551, in get_edgecolor
    return self._edgecolors
KeyboardInterrupt
^CError in atexit._run_exitfuncs:
Traceback (most recent call last):
  File "/usr/lib64/python2.7/atexit.py", line 24, in _run_exitfuncs
    func(*targs, **kargs)
  File "/usr/lib64/python2.7/site-packages/matplotlib/_pylab_helpers.py",
line 90, in destroy_all
    gc.collect()
KeyboardInterrupt
Error in sys.exitfunc:
Traceback (most recent call last):
  File "/usr/lib64/python2.7/atexit.py", line 24, in _run_exitfuncs
    func(*targs, **kargs)
  File "/usr/lib64/python2.7/site-packages/matplotlib/_pylab_helpers.py",
line 90, in destroy_all
    gc.collect()
KeyboardInterrupt

^C

Clues what is the code doing? I use mpl-1.3.0.
Thank you,
Martin

Benjamin Root wrote:

    Hi,
      rendering some of my charts takes almost 50GB of RAM. I believe below is a stracktrace
    of one such situation when it already took 15GB. Would somebody comments on what is
    matplotlib doing at the very moment? Why the recursion?

      The charts had to have 262422 data points in a 2D scatter plot, each point has assigned
    its own color. They are in batches so that there are 153 distinct colors but nevertheless,
    I assigned to each data point a color value. There are 153 legend items also (one color
    won't be used).

    ^CTraceback (most recent call last):
    ...
        _figure.savefig(filename, dpi=100)
      File "/usr/lib64/python2.7/site-packages/matplotlib/figure.py", line 1421, in savefig
        self.canvas.print_figure(*args, **kwargs)
      File "/usr/lib64/python2.7/site-packages/matplotlib/backend_bases.py", line 2220, in print_figure
        **kwargs)
      File "/usr/lib64/python2.7/site-packages/matplotlib/backends/backend_agg.py", line 505, in print_png
        FigureCanvasAgg.draw(self)
      File "/usr/lib64/python2.7/site-packages/matplotlib/backends/backend_agg.py", line 451, in draw
        self.figure.draw(self.renderer)
      File "/usr/lib64/python2.7/site-packages/matplotlib/artist.py", line 54, in draw_wrapper
        draw(artist, renderer, *args, **kwargs)
      File "/usr/lib64/python2.7/site-packages/matplotlib/figure.py", line 1034, in draw
        func(*args)
      File "/usr/lib64/python2.7/site-packages/matplotlib/artist.py", line 54, in draw_wrapper
        draw(artist, renderer, *args, **kwargs)
      File "/usr/lib64/python2.7/site-packages/matplotlib/axes.py", line 2086, in draw
        a.draw(renderer)
      File "/usr/lib64/python2.7/site-packages/matplotlib/artist.py", line 54, in draw_wrapper
        draw(artist, renderer, *args, **kwargs)
      File "/usr/lib64/python2.7/site-packages/matplotlib/collections.py", line 718, in draw
        return Collection.draw(self, renderer)
      File "/usr/lib64/python2.7/site-packages/matplotlib/artist.py", line 54, in draw_wrapper
        draw(artist, renderer, *args, **kwargs)
      File "/usr/lib64/python2.7/site-packages/matplotlib/collections.py", line 276, in draw
        offsets, transOffset, self.get_facecolor(), self.get_edgecolor(),
      File "/usr/lib64/python2.7/site-packages/matplotlib/collections.py", line 551, in get_edgecolor
        return self._edgecolors
    KeyboardInterrupt
    ^CError in atexit._run_exitfuncs:
    Traceback (most recent call last):
      File "/usr/lib64/python2.7/atexit.py", line 24, in _run_exitfuncs
        func(*targs, **kargs)
      File "/usr/lib64/python2.7/site-packages/matplotlib/_pylab_helpers.py", line 90, in destroy_all
        gc.collect()
    KeyboardInterrupt
    Error in sys.exitfunc:
    Traceback (most recent call last):
      File "/usr/lib64/python2.7/atexit.py", line 24, in _run_exitfuncs
        func(*targs, **kargs)
      File "/usr/lib64/python2.7/site-packages/matplotlib/_pylab_helpers.py", line 90, in destroy_all
        gc.collect()
    KeyboardInterrupt

    ^C

    Clues what is the code doing? I use mpl-1.3.0.
    Thank you,
    Martin

Unfortunately, that stacktrace isn't very useful. There is no recursion there, but rather the perfectly normal drawing of the figure object that has a child axes, which has child collections which have child artist objects.

Without the accompanying code, it would be difficult to determine where the memory hog is.

Could there be places where gc.collect() could be introduced? Are there places where matplotlib
could del() unnecessary objects right away? I think the problem is with huge lists or pythonic
dicts. I could save 10GB of RAM when I converted one python dict to a bsddb3 file having just
10MB on disk. I speculate matplotlib in that code keeps the data in some huge list or more likely
a dict and that is the same issue.

Are you sure you cannot see where a problem is? It happens (is visible) only with huge number of
dots, of course.

Thanks,
Martin

···

On Thu, Oct 10, 2013 at 9:05 AM, Martin MOKREJŠ <mmokrejs@…287… <mailto:mmokrejs@…287…>> wrote:

Benjamin Root wrote:
>
>
>
>
> Hi,
> rendering some of my charts takes almost 50GB of RAM. I believe
below is a stracktrace
> of one such situation when it already took 15GB. Would somebody
comments on what is
> matplotlib doing at the very moment? Why the recursion?
>
> The charts had to have 262422 data points in a 2D scatter plot,
each point has assigned
> its own color. They are in batches so that there are 153 distinct
colors but nevertheless,
> I assigned to each data point a color value. There are 153 legend
items also (one color
> won't be used).
>
> ^CTraceback (most recent call last):
> ...
> _figure.savefig(filename, dpi=100)
> File "/usr/lib64/python2.7/site-packages/matplotlib/figure.py",
line 1421, in savefig
> self.canvas.print_figure(*args, **kwargs)
> File
"/usr/lib64/python2.7/site-packages/matplotlib/backend_bases.py", line
2220, in print_figure
> **kwargs)
> File
"/usr/lib64/python2.7/site-packages/matplotlib/backends/backend_agg.py",
line 505, in print_png
> FigureCanvasAgg.draw(self)
> File
"/usr/lib64/python2.7/site-packages/matplotlib/backends/backend_agg.py",
line 451, in draw
> self.figure.draw(self.renderer)
> File "/usr/lib64/python2.7/site-packages/matplotlib/artist.py",
line 54, in draw_wrapper
> draw(artist, renderer, *args, **kwargs)
> File "/usr/lib64/python2.7/site-packages/matplotlib/figure.py",
line 1034, in draw
> func(*args)
> File "/usr/lib64/python2.7/site-packages/matplotlib/artist.py",
line 54, in draw_wrapper
> draw(artist, renderer, *args, **kwargs)
> File "/usr/lib64/python2.7/site-packages/matplotlib/axes.py", line
2086, in draw
> a.draw(renderer)
> File "/usr/lib64/python2.7/site-packages/matplotlib/artist.py",
line 54, in draw_wrapper
> draw(artist, renderer, *args, **kwargs)
> File
"/usr/lib64/python2.7/site-packages/matplotlib/collections.py", line 718,
in draw
> return Collection.draw(self, renderer)
> File "/usr/lib64/python2.7/site-packages/matplotlib/artist.py",
line 54, in draw_wrapper
> draw(artist, renderer, *args, **kwargs)
> File
"/usr/lib64/python2.7/site-packages/matplotlib/collections.py", line 276,
in draw
> offsets, transOffset, self.get_facecolor(), self.get_edgecolor(),
> File
"/usr/lib64/python2.7/site-packages/matplotlib/collections.py", line 551,
in get_edgecolor
> return self._edgecolors
> KeyboardInterrupt
> ^CError in atexit._run_exitfuncs:
> Traceback (most recent call last):
> File "/usr/lib64/python2.7/atexit.py", line 24, in _run_exitfuncs
> func(*targs, **kargs)
> File
"/usr/lib64/python2.7/site-packages/matplotlib/_pylab_helpers.py", line 90,
in destroy_all
> gc.collect()
> KeyboardInterrupt
> Error in sys.exitfunc:
> Traceback (most recent call last):
> File "/usr/lib64/python2.7/atexit.py", line 24, in _run_exitfuncs
> func(*targs, **kargs)
> File
"/usr/lib64/python2.7/site-packages/matplotlib/_pylab_helpers.py", line 90,
in destroy_all
> gc.collect()
> KeyboardInterrupt
>
> ^C
>
>
> Clues what is the code doing? I use mpl-1.3.0.
> Thank you,
> Martin
>
>
> Unfortunately, that stacktrace isn't very useful. There is no recursion
there, but rather the perfectly normal drawing of the figure object that
has a child axes, which has child collections which have child artist
objects.
>
> Without the accompanying code, it would be difficult to determine where
the memory hog is.

Could there be places where gc.collect() could be introduced? Are there
places where matplotlib
could del() unnecessary objects right away? I think the problem is with
huge lists or pythonic
dicts. I could save 10GB of RAM when I converted one python dict to a
bsddb3 file having just
10MB on disk. I speculate matplotlib in that code keeps the data in some
huge list or more likely
a dict and that is the same issue.

Are you sure you cannot see where a problem is? It happens (is visible)
only with huge number of
dots, of course.

I am not going to claim that matplotlib is the most lean graphing library
out there, and we already do know where we can make continued improvements,
but the symptom you are describing (50 GB for a couple hundred thousand
scatter points) is just unheard of for matplotlib. Without a simple,
concise, complete code example to demonstrate your problem, we can only
hazard guesses. For all I know, you might be "appending" to numpy arrays in
a loop prior to plotting, which would eat up significant amount of memory
without it being the fault of matplotlib.

As far as I am aware, we don't do very large dictionaries, so I am doubtful
that is the issue either.

As a side note, I have typically found that situations where del()
significantly improved memory usage were typically situations where I was
"doing it wrong" in the first place and a simple refactor of the code
improved memory and (sometimes) speed, with an added benefit of improved
readability. I have even seen situations where calling del() in the wrong
places (say, for a list created at the beginning of the loop) actually hurt
performance because python couldn't recycle that chunk of memory.

Give us a code example that reproduces your problem, and then we can start
doing some more serious debugging.

Ben Root

···

On Thu, Oct 10, 2013 at 9:47 AM, Martin MOKREJŠ <mmokrejs@...287...> wrote:

> On Thu, Oct 10, 2013 at 9:05 AM, Martin MOKREJŠ <mmokrejs@...1896....<mailto: > mmokrejs@...287...>> wrote:

Thanks,
Martin

Benjamin Root wrote:

     Hi,
       rendering some of my charts takes almost 50GB of RAM. I believe below is a stracktrace
     of one such situation when it already took 15GB. Would somebody comments on what is
     matplotlib doing at the very moment? Why the recursion?

       The charts had to have 262422 data points in a 2D scatter plot, each point has assigned
     its own color. They are in batches so that there are 153 distinct colors but nevertheless,
     I assigned to each data point a color value. There are 153 legend items also (one color
     won't be used).

     ^CTraceback (most recent call last):
     ...
         _figure.savefig(filename, dpi=100)
       File "/usr/lib64/python2.7/site-packages/matplotlib/figure.py", line 1421, in savefig
         self.canvas.print_figure(*args, **kwargs)
       File "/usr/lib64/python2.7/site-packages/matplotlib/backend_bases.py", line 2220, in print_figure
         **kwargs)
       File "/usr/lib64/python2.7/site-packages/matplotlib/backends/backend_agg.py", line 505, in print_png
         FigureCanvasAgg.draw(self)
       File "/usr/lib64/python2.7/site-packages/matplotlib/backends/backend_agg.py", line 451, in draw
         self.figure.draw(self.renderer)
       File "/usr/lib64/python2.7/site-packages/matplotlib/artist.py", line 54, in draw_wrapper
         draw(artist, renderer, *args, **kwargs)
       File "/usr/lib64/python2.7/site-packages/matplotlib/figure.py", line 1034, in draw
         func(*args)
       File "/usr/lib64/python2.7/site-packages/matplotlib/artist.py", line 54, in draw_wrapper
         draw(artist, renderer, *args, **kwargs)
       File "/usr/lib64/python2.7/site-packages/matplotlib/axes.py", line 2086, in draw
         a.draw(renderer)
       File "/usr/lib64/python2.7/site-packages/matplotlib/artist.py", line 54, in draw_wrapper
         draw(artist, renderer, *args, **kwargs)
       File "/usr/lib64/python2.7/site-packages/matplotlib/collections.py", line 718, in draw
         return Collection.draw(self, renderer)
       File "/usr/lib64/python2.7/site-packages/matplotlib/artist.py", line 54, in draw_wrapper
         draw(artist, renderer, *args, **kwargs)
       File "/usr/lib64/python2.7/site-packages/matplotlib/collections.py", line 276, in draw
         offsets, transOffset, self.get_facecolor(), self.get_edgecolor(),
       File "/usr/lib64/python2.7/site-packages/matplotlib/collections.py", line 551, in get_edgecolor
         return self._edgecolors
     KeyboardInterrupt
     ^CError in atexit._run_exitfuncs:
     Traceback (most recent call last):
       File "/usr/lib64/python2.7/atexit.py", line 24, in _run_exitfuncs
         func(*targs, **kargs)
       File "/usr/lib64/python2.7/site-packages/matplotlib/_pylab_helpers.py", line 90, in destroy_all
         gc.collect()
     KeyboardInterrupt
     Error in sys.exitfunc:
     Traceback (most recent call last):
       File "/usr/lib64/python2.7/atexit.py", line 24, in _run_exitfuncs
         func(*targs, **kargs)
       File "/usr/lib64/python2.7/site-packages/matplotlib/_pylab_helpers.py", line 90, in destroy_all
         gc.collect()
     KeyboardInterrupt

     ^C

     Clues what is the code doing? I use mpl-1.3.0.
     Thank you,
     Martin

Unfortunately, that stacktrace isn't very useful. There is no recursion there, but rather the perfectly normal drawing of the figure object that has a child axes, which has child collections which have child artist objects.

Without the accompanying code, it would be difficult to determine where the memory hog is.

Could there be places where gc.collect() could be introduced? Are there places where matplotlib
could del() unnecessary objects right away? I think the problem is with huge lists or pythonic
dicts. I could save 10GB of RAM when I converted one python dict to a bsddb3 file having just
10MB on disk. I speculate matplotlib in that code keeps the data in some huge list or more likely
a dict and that is the same issue.

Are you sure you cannot see where a problem is? It happens (is visible) only with huge number of
dots, of course.

Matplotlib generally keeps data in Numpy arrays, not lists or dictionaries (though given that matplotlib predates Numpy, there are some corner cases we've found recently where arrays are converted to lists and back unintentionally).

As Ben said, the traceback looks quite normal -- and it doesn't show what any of the values are. If you can provide us with a script that reproduces this, that's the only way we can really plug in and see what might be going wrong. It doesn't have to have anything proprietary, such as your data. You can even start with one of the existing examples, if that helps.

Mike

···

On 10/10/2013 09:47 AM, Martin MOKREJŠ wrote:

On Thu, Oct 10, 2013 at 9:05 AM, Martin MOKREJŠ <mmokrejs@…287… <mailto:mmokrejs@…287…>> wrote:

                    _
>\/|o _|_ _. _ | | \.__ __|__|_|_ _ _ ._ _
> >>(_| |(_|(/_| |_/|(_)(/_|_ |_|_)(_)(_)| | |

http://www.droettboom.com

Michael Droettboom wrote:

Can you provide a complete, standalone example that reproduces the
problem. Otherwise all I can do is guess.

The usual culprit is forgetting to close figures after you're done with
them.

Thanks, I learned that through matplotlib-1.3.0 give spit over me a warning message some weeks
ago. Yes, i do call _figure.clear() and pylab.clf() but only after the savefig() returns, which
is not the case here. Also use gc.collect() a lot through the code, especially before and after
I draw every figure. That is not enough here.

from itertools import izip, imap, ifilter
import pylab
import matplotlib
# Force matplotlib not to use any X-windows backend.
matplotlib.use('Agg')
import pylab

F = pylab.gcf()

# convert the view of numpy array to tuple
# http://matplotlib.1069221.n5.nabble.com/RendererAgg-int-width-int-height-dpi-debug-False-ValueError-width-and-height-must-each-be-below-32768-td27756.html
DefaultSize = tuple(F.get_size_inches())

def draw_hist2d_plot(filename, mydata_x, mydata_y, colors, title_data, xlabel_data, ylabel_data, legends, legend_loc='upper right', legend_bbox_to_anchor=(1.0, 1.0), legend_ncol=None, xmin=None, xmax=None, ymin=None, ymax=None, fontsize=10, legend_fontsize=8, dpi=100, tight_layout=False, legend_inside=False, objsize=0.1):
    # hist2d(x, y, bins = None, range=None, weights=None, cmin=None, cmax=None **kwargs)

    if len(mydata_x) != len(mydata_y):
        raise ValueError, "%s: len(mydata_x) != len(mydata_y): %s != %s" % (filename, len(mydata_x), len(mydata_y))

    if colors and len(mydata_x) != len(colors):
        sys.stderr.write("Warning: draw_hist2d_plot(): %s: len(mydata_x) != len(colors): %s != %s.\n" % (filename, len(mydata_x), len(colors)))

    if colors and legends and len(colors) != len(legends):
        sys.stderr.write("Warning: draw_hist2d_plot(): %s, len(colors) != len(legends): %s != %s.\n" % (filename, len(colors), len(legends)))

    if mydata_x and mydata_y and filename:
        if legends:
            if not legend_ncol:
                _subfigs, _ax1_num, _ax2_num, _legend_ncol = get_ncol(legends, fontsize=legend_fontsize)
            else:
                _subfigs, _ax1_num, _ax2_num, _legend_ncol = 3, 213, 313, legend_ncol
        else:
            _subfigs, _ax1_num, _legend_ncol = 3, 313, 0

        set_my_pylab_defaults()
        pylab.clf()
        _figure = pylab.figure()
        _figure.clear()
        _figure.set_tight_layout(True)
        gc.collect()

        if legends:
            # do not crash on too tall figures
            if 8.4 * _subfigs < 200:
                _figure.set_size_inches(11.2, 8.4 * (_subfigs + 1))
            else:
                # _figure.set_size_inches() silently accepts a large value but later on _figure.savefig() crashes with:
                # ValueError: width and height must each be below 32768
                _figure.set_size_inches(11.2, 200)
                sys.stderr.write("Warning: draw_hist2d_plot(): Wanted to set %s figure height to %s but is too high, forcing %s instead. You will likely get an incomplete image.\n" % (filename, 8.4 * _subfigs, 200))
            if myoptions.debug > 5: print "Debug: draw_hist2d_plot(): Changed %s figure size to: %s" % (filename, str(_figure.get_size_inches()))
            _ax1 = _figure.add_subplot(_ax1_num)
            _ax2 = _figure.add_subplot(_ax2_num)
        else:
            _figure.set_size_inches(11.2, 8.4 * 2)
            _ax1 = _figure.gca()
        if myoptions.debug > 5: print "Debug: draw_hist2d_plot(): Changed %s figure size to: %s" % (filename, str(_figure.get_size_inches()))

        _series = []
        #for _x, _y, _c, _l in izip(mydata_x, mydata_y, colors, legends):
        for _x, _y, _c in izip(mydata_x, mydata_y, colors):
            # _Line2D = _ax1.plot(_x, _y) # returns Line2D object
            _my_PathCollection = _ax1.scatter(_x, _y, color=_c, s=objsize) # , label=_l) # returns PathCollection object
            _series.append(_my_PathCollection)

        if legends:
            #for _x, _y, _c, _l in izip(mydata_x, mydata_y, colors, legends):
            for _x, _y, _c in izip(mydata_x, mydata_y, colors):
                _my_PathCollection = _ax1.scatter(_x, _y, color=_c, s=objsize) # , label=_l)
                _series.append(_my_PathCollection)

            _ax2.legend(_series, legends, loc='upper left', bbox_to_anchor=(0,0,1,1), borderaxespad=0., ncol=_legend_ncol, mode='expand', fontsize=legend_fontsize)
            _ax2.set_frame_on(False)
            _ax2.tick_params(bottom='off', left='off', right='off', top='off')
            pylab.setp(_ax2.get_yticklabels(), visible=False)
            pylab.setp(_ax2.get_xticklabels(), visible=False)
        else:
            for _x, _y, _c in izip(mydata_x, mydata_y, colors):
                _ax1.scatter(_x, _y, color=_c, s=objsize) #, marker='^') # keeps eating memory in:

···

#
                # draw_hist2d_plot(filename, _data_xrow, _data_yrow, _my_colors, _title, _xlabel, _ylabel, [], xmin=None, xmax=None, ymin=None, ymax=None, fontsize=10, dpi=100)
                # File "/blah.py", line 14080, in draw_hist2d_plot
                # _ax1.scatter(_x, _y, color=_c, s=objsize) #, marker='^')
                # File "/usr/lib64/python2.7/site-packages/matplotlib/axes.py", line 6247, in scatter
                # self._process_unit_info(xdata=x, ydata=y, kwargs=kwargs)
                # File "/usr/lib64/python2.7/site-packages/matplotlib/axes.py", line 1685, in _process_unit_info
                # self.xaxis.update_units(xdata)
                # File "/usr/lib64/python2.7/site-packages/matplotlib/axis.py", line 1332, in update_units
                # converter = munits.registry.get_converter(data)

            # pylab.subplots_adjust(left = (5/25.4)/_figure.xsize, bottom = (4/25.4)/_figure.ysize, right = 1 - (1/25.4)/_figure.xsize, top = 1 - (3/25.4)/_figure.ysize)

        _ax1.set_xlabel(xlabel_data, fontsize=fontsize)
        _ax1.set_ylabel(ylabel_data, fontsize=fontsize)
        _ax1.set_xmargin(0.05)
        _ax1.set_ymargin(0.05)
        _ax1.set_autoscale_on(False)

        set_limits(_ax1, xmin, xmax, ymin, ymax)

        if fontsize == 10:
            _ax1.set_title('\n'.join(wrap(title_data, 100)), fontsize=fontsize+2)
        elif fontsize == 12:
            _ax1.set_title('\n'.join(wrap(title_data, 90)), fontsize=fontsize+2)
        else:
            _ax1.set_title('\n'.join(wrap(title_data, 100)), fontsize=fontsize+2)

        if legends:
            _figure.savefig(filename, dpi=100) #, bbox_inches='tight')
            del(_my_PathCollection)
            del(_ax2)
        else:
            _figure.savefig(filename, dpi=100)

        del(_series)
        del(_ax1)
        _figure.clear()
        del(_figure)
        pylab.clf()
        pylab.close()
        # pylab.rcdefaults()

        gc.collect()

That's the whole function. I used to suspect _ax1.scatter() in the past but probably
only because I hit the memory problems earlier. That is worked around now by using
on disk bsddb3 file or gdbm somewhere upstream. This particular function is nevertheless
fed with just a huge list numbers, and that is not the issue in itself.

I would be glad if I could tell matplotlib: Here you have 100 colors, use them for all data
as you wish, just spread them evenly over the whole dataset so that first 1/100th of the data
gets the first color, second 1/100th of the data gets the second color, and so on. Optionally,
if you would like to say: use the 100 colors in cycles for all data points, just loop through
the colors as long as you need some. In both scenarios, I could have avoided the two for loops
in the above code and necessity to generate those objects. Same for legend stuff.

Martin

Mike

On 10/10/2013 09:05 AM, Martin MOKREJŠ wrote:

Hi,
   rendering some of my charts takes almost 50GB of RAM. I believe below is a stracktrace
of one such situation when it already took 15GB. Would somebody comments on what is
matplotlib doing at the very moment? Why the recursion?

   The charts had to have 262422 data points in a 2D scatter plot, each point has assigned
its own color. They are in batches so that there are 153 distinct colors but nevertheless,
I assigned to each data point a color value. There are 153 legend items also (one color
won't be used).

^CTraceback (most recent call last):
...
     _figure.savefig(filename, dpi=100)
   File "/usr/lib64/python2.7/site-packages/matplotlib/figure.py", line 1421, in savefig
     self.canvas.print_figure(*args, **kwargs)
   File "/usr/lib64/python2.7/site-packages/matplotlib/backend_bases.py", line 2220, in print_figure
     **kwargs)
   File "/usr/lib64/python2.7/site-packages/matplotlib/backends/backend_agg.py", line 505, in print_png
     FigureCanvasAgg.draw(self)
   File "/usr/lib64/python2.7/site-packages/matplotlib/backends/backend_agg.py", line 451, in draw
     self.figure.draw(self.renderer)
   File "/usr/lib64/python2.7/site-packages/matplotlib/artist.py", line 54, in draw_wrapper
     draw(artist, renderer, *args, **kwargs)
   File "/usr/lib64/python2.7/site-packages/matplotlib/figure.py", line 1034, in draw
     func(*args)
   File "/usr/lib64/python2.7/site-packages/matplotlib/artist.py", line 54, in draw_wrapper
     draw(artist, renderer, *args, **kwargs)
   File "/usr/lib64/python2.7/site-packages/matplotlib/axes.py", line 2086, in draw
     a.draw(renderer)
   File "/usr/lib64/python2.7/site-packages/matplotlib/artist.py", line 54, in draw_wrapper
     draw(artist, renderer, *args, **kwargs)
   File "/usr/lib64/python2.7/site-packages/matplotlib/collections.py", line 718, in draw
     return Collection.draw(self, renderer)
   File "/usr/lib64/python2.7/site-packages/matplotlib/artist.py", line 54, in draw_wrapper
     draw(artist, renderer, *args, **kwargs)
   File "/usr/lib64/python2.7/site-packages/matplotlib/collections.py", line 276, in draw
     offsets, transOffset, self.get_facecolor(), self.get_edgecolor(),
   File "/usr/lib64/python2.7/site-packages/matplotlib/collections.py", line 551, in get_edgecolor
     return self._edgecolors
KeyboardInterrupt
^CError in atexit._run_exitfuncs:
Traceback (most recent call last):
   File "/usr/lib64/python2.7/atexit.py", line 24, in _run_exitfuncs
     func(*targs, **kargs)
   File "/usr/lib64/python2.7/site-packages/matplotlib/_pylab_helpers.py", line 90, in destroy_all
     gc.collect()
KeyboardInterrupt
Error in sys.exitfunc:
Traceback (most recent call last):
   File "/usr/lib64/python2.7/atexit.py", line 24, in _run_exitfuncs
     func(*targs, **kargs)
   File "/usr/lib64/python2.7/site-packages/matplotlib/_pylab_helpers.py", line 90, in destroy_all
     gc.collect()
KeyboardInterrupt

^C

Clues what is the code doing? I use mpl-1.3.0.
Thank you,
Martin

------------------------------------------------------------------------------
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from
the latest Intel processors and coprocessors. See abstracts and register >
http://pubads.g.doubleclick.net/gampad/clk?id=60134071&iu=/4140/ostg.clktrk
_______________________________________________
Matplotlib-users mailing list
Matplotlib-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/matplotlib-users

--
Martin Mokrejs, Ph.D.
Bioinformatics
Donovalska 1658
149 00 Prague
Czech Republic
http://www.iresite.org
http://www.iresite.org/~mmokrejs

Benjamin Root wrote:

    Benjamin Root wrote:
    >
    >
    >
    >
    > Hi,
    > rendering some of my charts takes almost 50GB of RAM. I believe below is a stracktrace
    > of one such situation when it already took 15GB. Would somebody comments on what is
    > matplotlib doing at the very moment? Why the recursion?
    >
    > The charts had to have 262422 data points in a 2D scatter plot, each point has assigned
    > its own color. They are in batches so that there are 153 distinct colors but nevertheless,
    > I assigned to each data point a color value. There are 153 legend items also (one color
    > won't be used).
    >
    > ^CTraceback (most recent call last):
    > ...
    > _figure.savefig(filename, dpi=100)
    > File "/usr/lib64/python2.7/site-packages/matplotlib/figure.py", line 1421, in savefig
    > self.canvas.print_figure(*args, **kwargs)
    > File "/usr/lib64/python2.7/site-packages/matplotlib/backend_bases.py", line 2220, in print_figure
    > **kwargs)
    > File "/usr/lib64/python2.7/site-packages/matplotlib/backends/backend_agg.py", line 505, in print_png
    > FigureCanvasAgg.draw(self)
    > File "/usr/lib64/python2.7/site-packages/matplotlib/backends/backend_agg.py", line 451, in draw
    > self.figure.draw(self.renderer)
    > File "/usr/lib64/python2.7/site-packages/matplotlib/artist.py", line 54, in draw_wrapper
    > draw(artist, renderer, *args, **kwargs)
    > File "/usr/lib64/python2.7/site-packages/matplotlib/figure.py", line 1034, in draw
    > func(*args)
    > File "/usr/lib64/python2.7/site-packages/matplotlib/artist.py", line 54, in draw_wrapper
    > draw(artist, renderer, *args, **kwargs)
    > File "/usr/lib64/python2.7/site-packages/matplotlib/axes.py", line 2086, in draw
    > a.draw(renderer)
    > File "/usr/lib64/python2.7/site-packages/matplotlib/artist.py", line 54, in draw_wrapper
    > draw(artist, renderer, *args, **kwargs)
    > File "/usr/lib64/python2.7/site-packages/matplotlib/collections.py", line 718, in draw
    > return Collection.draw(self, renderer)
    > File "/usr/lib64/python2.7/site-packages/matplotlib/artist.py", line 54, in draw_wrapper
    > draw(artist, renderer, *args, **kwargs)
    > File "/usr/lib64/python2.7/site-packages/matplotlib/collections.py", line 276, in draw
    > offsets, transOffset, self.get_facecolor(), self.get_edgecolor(),
    > File "/usr/lib64/python2.7/site-packages/matplotlib/collections.py", line 551, in get_edgecolor
    > return self._edgecolors
    > KeyboardInterrupt
    > ^CError in atexit._run_exitfuncs:
    > Traceback (most recent call last):
    > File "/usr/lib64/python2.7/atexit.py", line 24, in _run_exitfuncs
    > func(*targs, **kargs)
    > File "/usr/lib64/python2.7/site-packages/matplotlib/_pylab_helpers.py", line 90, in destroy_all
    > gc.collect()
    > KeyboardInterrupt
    > Error in sys.exitfunc:
    > Traceback (most recent call last):
    > File "/usr/lib64/python2.7/atexit.py", line 24, in _run_exitfuncs
    > func(*targs, **kargs)
    > File "/usr/lib64/python2.7/site-packages/matplotlib/_pylab_helpers.py", line 90, in destroy_all
    > gc.collect()
    > KeyboardInterrupt
    >
    > ^C
    >
    >
    > Clues what is the code doing? I use mpl-1.3.0.
    > Thank you,
    > Martin
    >
    >
    > Unfortunately, that stacktrace isn't very useful. There is no recursion there, but rather the perfectly normal drawing of the figure object that has a child axes, which has child collections which have child artist objects.
    >
    > Without the accompanying code, it would be difficult to determine where the memory hog is.

    Could there be places where gc.collect() could be introduced? Are there places where matplotlib
    could del() unnecessary objects right away? I think the problem is with huge lists or pythonic
    dicts. I could save 10GB of RAM when I converted one python dict to a bsddb3 file having just
    10MB on disk. I speculate matplotlib in that code keeps the data in some huge list or more likely
    a dict and that is the same issue.

    Are you sure you cannot see where a problem is? It happens (is visible) only with huge number of
    dots, of course.

I am not going to claim that matplotlib is the most lean graphing library out there, and we already do know where we can make continued improvements, but the symptom you are describing (50 GB for a couple hundred thousand scatter points) is just unheard of for matplotlib. Without a simple, concise, complete code example to demonstrate your problem, we can only hazard guesses. For all I know, you might be "appending" to numpy arrays in a loop prior to plotting, which would eat up significant amount of memory without it being the fault of matplotlib.

As far as I am aware, we don't do very large dictionaries, so I am doubtful that is the issue either.

As a side note, I have typically found that situations where del() significantly improved memory usage were typically situations where I was "doing it wrong" in the first place and a simple refactor of the code improved memory and (sometimes) speed, with an added benefit of improved readability. I have even seen situations where calling del() in the wrong places (say, for a list created at the beginning of the loop) actually hurt performance because python couldn't recycle that chunk of memory.

Give us a code example that reproduces your problem, and then we can start doing some more serious debugging.

Should be in your Inboxes now. I have to rush for a meeting now, so there was no example call
to that function with sample data, but hope I wrote already enough as I knew number of dots and legends
to be drawn. Yeah, the number of columns is determined elsewhere, put 2 as a value into that variable.

Surely one can rewrite the code, but ideally I would also propose that matplotlib is improved so that
others with similarly bad coding style do not hit the issue. :wink:

Thank you for your time,
Martin

···

On Thu, Oct 10, 2013 at 9:47 AM, Martin MOKREJŠ <mmokrejs@…287… <mailto:mmokrejs@…287…>> wrote:
    > On Thu, Oct 10, 2013 at 9:05 AM, Martin MOKREJŠ <mmokrejs@…2015…87… <mailto:mmokrejs@…287…> <mailto:mmokrejs@…287…>> wrote:

Michael Droettboom wrote:

···

On 10/10/2013 09:47 AM, Martin MOKREJŠ wrote:

Benjamin Root wrote:

On Thu, Oct 10, 2013 at 9:05 AM, Martin MOKREJŠ <mmokrejs@…1896… <mailto:mmokrejs@…287…>> wrote:

     Hi,
       rendering some of my charts takes almost 50GB of RAM. I believe below is a stracktrace
     of one such situation when it already took 15GB. Would somebody comments on what is
     matplotlib doing at the very moment? Why the recursion?

       The charts had to have 262422 data points in a 2D scatter plot, each point has assigned
     its own color. They are in batches so that there are 153 distinct colors but nevertheless,
     I assigned to each data point a color value. There are 153 legend items also (one color
     won't be used).

     ^CTraceback (most recent call last):
     ...
         _figure.savefig(filename, dpi=100)
       File "/usr/lib64/python2.7/site-packages/matplotlib/figure.py", line 1421, in savefig
         self.canvas.print_figure(*args, **kwargs)
       File "/usr/lib64/python2.7/site-packages/matplotlib/backend_bases.py", line 2220, in print_figure
         **kwargs)
       File "/usr/lib64/python2.7/site-packages/matplotlib/backends/backend_agg.py", line 505, in print_png
         FigureCanvasAgg.draw(self)
       File "/usr/lib64/python2.7/site-packages/matplotlib/backends/backend_agg.py", line 451, in draw
         self.figure.draw(self.renderer)
       File "/usr/lib64/python2.7/site-packages/matplotlib/artist.py", line 54, in draw_wrapper
         draw(artist, renderer, *args, **kwargs)
       File "/usr/lib64/python2.7/site-packages/matplotlib/figure.py", line 1034, in draw
         func(*args)
       File "/usr/lib64/python2.7/site-packages/matplotlib/artist.py", line 54, in draw_wrapper
         draw(artist, renderer, *args, **kwargs)
       File "/usr/lib64/python2.7/site-packages/matplotlib/axes.py", line 2086, in draw
         a.draw(renderer)
       File "/usr/lib64/python2.7/site-packages/matplotlib/artist.py", line 54, in draw_wrapper
         draw(artist, renderer, *args, **kwargs)
       File "/usr/lib64/python2.7/site-packages/matplotlib/collections.py", line 718, in draw
         return Collection.draw(self, renderer)
       File "/usr/lib64/python2.7/site-packages/matplotlib/artist.py", line 54, in draw_wrapper
         draw(artist, renderer, *args, **kwargs)
       File "/usr/lib64/python2.7/site-packages/matplotlib/collections.py", line 276, in draw
         offsets, transOffset, self.get_facecolor(), self.get_edgecolor(),
       File "/usr/lib64/python2.7/site-packages/matplotlib/collections.py", line 551, in get_edgecolor
         return self._edgecolors
     KeyboardInterrupt
     ^CError in atexit._run_exitfuncs:
     Traceback (most recent call last):
       File "/usr/lib64/python2.7/atexit.py", line 24, in _run_exitfuncs
         func(*targs, **kargs)
       File "/usr/lib64/python2.7/site-packages/matplotlib/_pylab_helpers.py", line 90, in destroy_all
         gc.collect()
     KeyboardInterrupt
     Error in sys.exitfunc:
     Traceback (most recent call last):
       File "/usr/lib64/python2.7/atexit.py", line 24, in _run_exitfuncs
         func(*targs, **kargs)
       File "/usr/lib64/python2.7/site-packages/matplotlib/_pylab_helpers.py", line 90, in destroy_all
         gc.collect()
     KeyboardInterrupt

     ^C

     Clues what is the code doing? I use mpl-1.3.0.
     Thank you,
     Martin

Unfortunately, that stacktrace isn't very useful. There is no recursion there, but rather the perfectly normal drawing of the figure object that has a child axes, which has child collections which have child artist objects.

Without the accompanying code, it would be difficult to determine where the memory hog is.

Could there be places where gc.collect() could be introduced? Are there places where matplotlib
could del() unnecessary objects right away? I think the problem is with huge lists or pythonic
dicts. I could save 10GB of RAM when I converted one python dict to a bsddb3 file having just
10MB on disk. I speculate matplotlib in that code keeps the data in some huge list or more likely
a dict and that is the same issue.

Are you sure you cannot see where a problem is? It happens (is visible) only with huge number of
dots, of course.

Matplotlib generally keeps data in Numpy arrays, not lists or
dictionaries (though given that matplotlib predates Numpy, there are
some corner cases we've found recently where arrays are converted to
lists and back unintentionally).

Just a brief note. I don't use Numpy myself in my code, so consider that
while replicating my use case. :wink: The code is merely what I think Tony Yu
of Chao Yue proposed or somebody, sorry, don't remember now, proposed to
me on this list in the past. I am writing it now really from top of my head,
maybe I remember rubbish. :wink:

Martin

Thanks. This is much more helpful.

What we need, however, is a "self contained, standalone example". The code below calls functions that are not present. See http://sscce.org/ for why this is so important. Again, I would have to guess what those functions do -- it may be relevant, it may not. If I have something that I can *just run* then I can use various introspection tools to see what is going wrong.

Mike

···

On 10/10/2013 10:12 AM, Martin MOKREJŠ wrote:

Michael Droettboom wrote:

Can you provide a complete, standalone example that reproduces the
problem. Otherwise all I can do is guess.

The usual culprit is forgetting to close figures after you're done with
them.

Thanks, I learned that through matplotlib-1.3.0 give spit over me a warning message some weeks
ago. Yes, i do call _figure.clear() and pylab.clf() but only after the savefig() returns, which
is not the case here. Also use gc.collect() a lot through the code, especially before and after
I draw every figure. That is not enough here.

from itertools import izip, imap, ifilter
import pylab
import matplotlib
# Force matplotlib not to use any X-windows backend.
matplotlib.use('Agg')
import pylab

F = pylab.gcf()

# convert the view of numpy array to tuple
# http://matplotlib.1069221.n5.nabble.com/RendererAgg-int-width-int-height-dpi-debug-False-ValueError-width-and-height-must-each-be-below-32768-td27756.html
DefaultSize = tuple(F.get_size_inches())

def draw_hist2d_plot(filename, mydata_x, mydata_y, colors, title_data, xlabel_data, ylabel_data, legends, legend_loc='upper right', legend_bbox_to_anchor=(1.0, 1.0), legend_ncol=None, xmin=None, xmax=None, ymin=None, ymax=None, fontsize=10, legend_fontsize=8, dpi=100, tight_layout=False, legend_inside=False, objsize=0.1):
     # hist2d(x, y, bins = None, range=None, weights=None, cmin=None, cmax=None **kwargs)

     if len(mydata_x) != len(mydata_y):
         raise ValueError, "%s: len(mydata_x) != len(mydata_y): %s != %s" % (filename, len(mydata_x), len(mydata_y))

     if colors and len(mydata_x) != len(colors):
         sys.stderr.write("Warning: draw_hist2d_plot(): %s: len(mydata_x) != len(colors): %s != %s.\n" % (filename, len(mydata_x), len(colors)))

     if colors and legends and len(colors) != len(legends):
         sys.stderr.write("Warning: draw_hist2d_plot(): %s, len(colors) != len(legends): %s != %s.\n" % (filename, len(colors), len(legends)))

     if mydata_x and mydata_y and filename:
         if legends:
             if not legend_ncol:
                 _subfigs, _ax1_num, _ax2_num, _legend_ncol = get_ncol(legends, fontsize=legend_fontsize)
             else:
                 _subfigs, _ax1_num, _ax2_num, _legend_ncol = 3, 213, 313, legend_ncol
         else:
             _subfigs, _ax1_num, _legend_ncol = 3, 313, 0

         set_my_pylab_defaults()
         pylab.clf()
         _figure = pylab.figure()
         _figure.clear()
         _figure.set_tight_layout(True)
         gc.collect()

         if legends:
             # do not crash on too tall figures
             if 8.4 * _subfigs < 200:
                 _figure.set_size_inches(11.2, 8.4 * (_subfigs + 1))
             else:
                 # _figure.set_size_inches() silently accepts a large value but later on _figure.savefig() crashes with:
                 # ValueError: width and height must each be below 32768
                 _figure.set_size_inches(11.2, 200)
                 sys.stderr.write("Warning: draw_hist2d_plot(): Wanted to set %s figure height to %s but is too high, forcing %s instead. You will likely get an incomplete image.\n" % (filename, 8.4 * _subfigs, 200))
             if myoptions.debug > 5: print "Debug: draw_hist2d_plot(): Changed %s figure size to: %s" % (filename, str(_figure.get_size_inches()))
             _ax1 = _figure.add_subplot(_ax1_num)
             _ax2 = _figure.add_subplot(_ax2_num)
         else:
             _figure.set_size_inches(11.2, 8.4 * 2)
             _ax1 = _figure.gca()
         if myoptions.debug > 5: print "Debug: draw_hist2d_plot(): Changed %s figure size to: %s" % (filename, str(_figure.get_size_inches()))

         _series = []
         #for _x, _y, _c, _l in izip(mydata_x, mydata_y, colors, legends):
         for _x, _y, _c in izip(mydata_x, mydata_y, colors):
             # _Line2D = _ax1.plot(_x, _y) # returns Line2D object
             _my_PathCollection = _ax1.scatter(_x, _y, color=_c, s=objsize) # , label=_l) # returns PathCollection object
             _series.append(_my_PathCollection)

         if legends:
             #for _x, _y, _c, _l in izip(mydata_x, mydata_y, colors, legends):
             for _x, _y, _c in izip(mydata_x, mydata_y, colors):
                 _my_PathCollection = _ax1.scatter(_x, _y, color=_c, s=objsize) # , label=_l)
                 _series.append(_my_PathCollection)

             _ax2.legend(_series, legends, loc='upper left', bbox_to_anchor=(0,0,1,1), borderaxespad=0., ncol=_legend_ncol, mode='expand', fontsize=legend_fontsize)
             _ax2.set_frame_on(False)
             _ax2.tick_params(bottom='off', left='off', right='off', top='off')
             pylab.setp(_ax2.get_yticklabels(), visible=False)
             pylab.setp(_ax2.get_xticklabels(), visible=False)
         else:
             for _x, _y, _c in izip(mydata_x, mydata_y, colors):
                 _ax1.scatter(_x, _y, color=_c, s=objsize) #, marker='^') # keeps eating memory in:
                 #
                 # draw_hist2d_plot(filename, _data_xrow, _data_yrow, _my_colors, _title, _xlabel, _ylabel, [], xmin=None, xmax=None, ymin=None, ymax=None, fontsize=10, dpi=100)
                 # File "/blah.py", line 14080, in draw_hist2d_plot
                 # _ax1.scatter(_x, _y, color=_c, s=objsize) #, marker='^')
                 # File "/usr/lib64/python2.7/site-packages/matplotlib/axes.py", line 6247, in scatter
                 # self._process_unit_info(xdata=x, ydata=y, kwargs=kwargs)
                 # File "/usr/lib64/python2.7/site-packages/matplotlib/axes.py", line 1685, in _process_unit_info
                 # self.xaxis.update_units(xdata)
                 # File "/usr/lib64/python2.7/site-packages/matplotlib/axis.py", line 1332, in update_units
                 # converter = munits.registry.get_converter(data)

             # pylab.subplots_adjust(left = (5/25.4)/_figure.xsize, bottom = (4/25.4)/_figure.ysize, right = 1 - (1/25.4)/_figure.xsize, top = 1 - (3/25.4)/_figure.ysize)

         _ax1.set_xlabel(xlabel_data, fontsize=fontsize)
         _ax1.set_ylabel(ylabel_data, fontsize=fontsize)
         _ax1.set_xmargin(0.05)
         _ax1.set_ymargin(0.05)
         _ax1.set_autoscale_on(False)

         set_limits(_ax1, xmin, xmax, ymin, ymax)

         if fontsize == 10:
             _ax1.set_title('\n'.join(wrap(title_data, 100)), fontsize=fontsize+2)
         elif fontsize == 12:
             _ax1.set_title('\n'.join(wrap(title_data, 90)), fontsize=fontsize+2)
         else:
             _ax1.set_title('\n'.join(wrap(title_data, 100)), fontsize=fontsize+2)

         if legends:
             _figure.savefig(filename, dpi=100) #, bbox_inches='tight')
             del(_my_PathCollection)
             del(_ax2)
         else:
             _figure.savefig(filename, dpi=100)

         del(_series)
         del(_ax1)
         _figure.clear()
         del(_figure)
         pylab.clf()
         pylab.close()
         # pylab.rcdefaults()

         gc.collect()

That's the whole function. I used to suspect _ax1.scatter() in the past but probably
only because I hit the memory problems earlier. That is worked around now by using
on disk bsddb3 file or gdbm somewhere upstream. This particular function is nevertheless
fed with just a huge list numbers, and that is not the issue in itself.

I would be glad if I could tell matplotlib: Here you have 100 colors, use them for all data
as you wish, just spread them evenly over the whole dataset so that first 1/100th of the data
gets the first color, second 1/100th of the data gets the second color, and so on. Optionally,
if you would like to say: use the 100 colors in cycles for all data points, just loop through
the colors as long as you need some. In both scenarios, I could have avoided the two for loops
in the above code and necessity to generate those objects. Same for legend stuff.

Martin

Mike

On 10/10/2013 09:05 AM, Martin MOKREJŠ wrote:

Hi,
    rendering some of my charts takes almost 50GB of RAM. I believe below is a stracktrace
of one such situation when it already took 15GB. Would somebody comments on what is
matplotlib doing at the very moment? Why the recursion?

    The charts had to have 262422 data points in a 2D scatter plot, each point has assigned
its own color. They are in batches so that there are 153 distinct colors but nevertheless,
I assigned to each data point a color value. There are 153 legend items also (one color
won't be used).

^CTraceback (most recent call last):
...
      _figure.savefig(filename, dpi=100)
    File "/usr/lib64/python2.7/site-packages/matplotlib/figure.py", line 1421, in savefig
      self.canvas.print_figure(*args, **kwargs)
    File "/usr/lib64/python2.7/site-packages/matplotlib/backend_bases.py", line 2220, in print_figure
      **kwargs)
    File "/usr/lib64/python2.7/site-packages/matplotlib/backends/backend_agg.py", line 505, in print_png
      FigureCanvasAgg.draw(self)
    File "/usr/lib64/python2.7/site-packages/matplotlib/backends/backend_agg.py", line 451, in draw
      self.figure.draw(self.renderer)
    File "/usr/lib64/python2.7/site-packages/matplotlib/artist.py", line 54, in draw_wrapper
      draw(artist, renderer, *args, **kwargs)
    File "/usr/lib64/python2.7/site-packages/matplotlib/figure.py", line 1034, in draw
      func(*args)
    File "/usr/lib64/python2.7/site-packages/matplotlib/artist.py", line 54, in draw_wrapper
      draw(artist, renderer, *args, **kwargs)
    File "/usr/lib64/python2.7/site-packages/matplotlib/axes.py", line 2086, in draw
      a.draw(renderer)
    File "/usr/lib64/python2.7/site-packages/matplotlib/artist.py", line 54, in draw_wrapper
      draw(artist, renderer, *args, **kwargs)
    File "/usr/lib64/python2.7/site-packages/matplotlib/collections.py", line 718, in draw
      return Collection.draw(self, renderer)
    File "/usr/lib64/python2.7/site-packages/matplotlib/artist.py", line 54, in draw_wrapper
      draw(artist, renderer, *args, **kwargs)
    File "/usr/lib64/python2.7/site-packages/matplotlib/collections.py", line 276, in draw
      offsets, transOffset, self.get_facecolor(), self.get_edgecolor(),
    File "/usr/lib64/python2.7/site-packages/matplotlib/collections.py", line 551, in get_edgecolor
      return self._edgecolors
KeyboardInterrupt
^CError in atexit._run_exitfuncs:
Traceback (most recent call last):
    File "/usr/lib64/python2.7/atexit.py", line 24, in _run_exitfuncs
      func(*targs, **kargs)
    File "/usr/lib64/python2.7/site-packages/matplotlib/_pylab_helpers.py", line 90, in destroy_all
      gc.collect()
KeyboardInterrupt
Error in sys.exitfunc:
Traceback (most recent call last):
    File "/usr/lib64/python2.7/atexit.py", line 24, in _run_exitfuncs
      func(*targs, **kargs)
    File "/usr/lib64/python2.7/site-packages/matplotlib/_pylab_helpers.py", line 90, in destroy_all
      gc.collect()
KeyboardInterrupt

^C

Clues what is the code doing? I use mpl-1.3.0.
Thank you,
Martin

------------------------------------------------------------------------------
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from
the latest Intel processors and coprocessors. See abstracts and register >
http://pubads.g.doubleclick.net/gampad/clk?id=60134071&iu=/4140/ostg.clktrk
_______________________________________________
Matplotlib-users mailing list
Matplotlib-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/matplotlib-users

--
                    _

\/|o _|_ _. _ | | \.__ __|__|_|_ _ _ ._ _
>>(_| |(_|(/_| |_/|(_)(/_|_ |_|_)(_)(_)| | |

http://www.droettboom.com

That being said, I do see a number of anti-patterns here that could be
significant. For example:

        for _x, _y, _c in izip(mydata_x, mydata_y, colors):
            # _Line2D = _ax1.plot(_x, _y) # returns Line2D object
            _my_PathCollection = _ax1.scatter(_x, _y, color=_c, s=objsize)
# , label=_l) # returns PathCollection object
            _series.append(_my_PathCollection)

Could be more concisely written as:

        _series = [_ax1.scatter(_x, _y, color=_c, s=objsize) for _x, _y, _c
in izip(mydata_x, mydata_y, colors)]

Python can then more intelligently handle memory management by
intelligently allocating the memory for _series. You can then use
_series.extend() for when you are doing the scatter plots for _ax2 with a
similar list comprehension (or even a generator statement).

I would also question the need to store _series in the first place. You use
it for the call to legend, but you could have simply passed a label to each
call of scatter as well.

Some other things of note:

1) The clear() call here is completely useless as the figure is already
clear.
        _figure = pylab.figure()
        _figure.clear()

2) When limits are set on an axis, autoscaling for that axis is
automatically turned off anyway, so no need to turn if off yourself (also
not sure why you are calling out to an external function here):
        _ax1.set_autoscale_on(False)
        set_limits(_ax1, xmin, xmax, ymin, ymax)

3) Finally, some discussion on the end of your function here:
        if legends:
            _figure.savefig(filename, dpi=100) #, bbox_inches='tight')
            del(_my_PathCollection)
            del(_ax2)
        else:
            _figure.savefig(filename, dpi=100)

        del(_series)
        del(_ax1)
        _figure.clear()
        del(_figure)
        pylab.clf()
        pylab.close()
first, as discussed, you can easily eliminate the need for
_my_PathCollection and possibly even _series. Second, when calling
_figure.clear(), all of its axes objects are deleted for you, so you don't
need to delete them yourself. Third, you delete the _figure object, but
then call "pylab.clf()". I haven't double-checked exactly what would
happen, but I think you might run the risk of accidentially clearing some
other existing figure by doing that. Lastly, you then call pylab.close(),
which I point out the same caveat as before. Really, all you needed was
pylab.close() and you can eliminate the 5 preceding lines and the other two
del()'s. All del() really does is remove the variable out of scope. Once
that object is out of everybody's scope, then the gc can clean it up. Since
the function was ending anyway, there is no point in deleting the variable.

I don't know if this would fix your problem, and there are a bunch of other
style issues here (particularly, pylab really shouldn't be used this way),
but hopefully this gives some food for thought.

Cheers!
Ben Root

···

On Thu, Oct 10, 2013 at 10:21 AM, Michael Droettboom <mdroe@...86...>wrote:

Thanks. This is much more helpful.

What we need, however, is a "self contained, standalone example". The
code below calls functions that are not present. See http://sscce.org/for why this is so important. Again, I would have to guess what those
functions do -- it may be relevant, it may not. If I have something that I
can *just run* then I can use various introspection tools to see what is
going wrong.

Mike

Hi Ben,
  thank you for your comments. Looks I will have a bad sleep tonight. :frowning: Some quick
answers below.

Benjamin Root wrote:

    Thanks. This is much more helpful.

    What we need, however, is a "self contained, standalone example". The code below calls functions that are not present. See http://sscce.org/ for why this is so important. Again, I would have to guess what those functions do -- it may be relevant, it may not. If I have something that I can *just run* then I can use various introspection tools to see what is going wrong.

    Mike

That being said, I do see a number of anti-patterns here that could be significant. For example:

        for _x, _y, _c in izip(mydata_x, mydata_y, colors):
            # _Line2D = _ax1.plot(_x, _y) # returns Line2D object
            _my_PathCollection = _ax1.scatter(_x, _y, color=_c, s=objsize) # , label=_l) # returns PathCollection object
            _series.append(_my_PathCollection)

Could be more concisely written as:

        _series = [_ax1.scatter(_x, _y, color=_c, s=objsize) for _x, _y, _c in izip(mydata_x, mydata_y, colors)]

Python can then more intelligently handle memory management by intelligently allocating the memory for _series. You can then use _series.extend() for when you are doing the scatter plots for _ax2 with a similar list comprehension (or even a generator statement).

You are right the .append() is ugly, maybe is a the real source of troubles. I somehow
do not understand myself right now why under the "if legends:" use ax1 instead of ax2.
Weird. I actually stopped using legends with this function because that was my first guess
that they cause the memory issues. Seems the culprit is elsewhere so I should add them
back and likely fix the ax2 vs. ax1 copy/paste (most likely) error.

As you could have seen, I used in the past label=_l but for some reason I switched away
to the current ugly code. Will try to find out why I did that.

Hmm, I don't know what you mean with _series.extend() at the moment, will read some
python Intro on using lists. :frowning:

I would also question the need to store _series in the first place. You use it for the call to legend, but you could have simply passed a label to each call of scatter as well.

As I said, I used that in the past but somehow that did not work. Maybe time to re-try that.

Some other things of note:

1) The clear() call here is completely useless as the figure is already clear.
        _figure = pylab.figure()
        _figure.clear()

Right, I was just trying to ensure everything is cleared. I somewhat suspect python
garbage collector does not recycle too often, and therefore added more and more del()
and gc.collect() calls.

2) When limits are set on an axis, autoscaling for that axis is automatically turned off anyway, so no need to turn if off yourself (also not sure why you are calling out to an external function here):
        _ax1.set_autoscale_on(False)
        set_limits(_ax1, xmin, xmax, ymin, ymax)

The set_limits() is called because I got unstable coordinates in every figure.
Sometimes, matplotlib used wider offset from the axes line while sometimes not.
So, I basically force same layout for expected layouts.

3) Finally, some discussion on the end of your function here:
        if legends:
            _figure.savefig(filename, dpi=100) #, bbox_inches='tight')
            del(_my_PathCollection)
            del(_ax2)
        else:
            _figure.savefig(filename, dpi=100)

        del(_series)
        del(_ax1)
        _figure.clear()
        del(_figure)
        pylab.clf()
        pylab.close()
first, as discussed, you can easily eliminate the need for _my_PathCollection and possibly even _series. Second, when calling _figure.clear(), all of its axes objects are deleted for you, so you don't need to delete them yourself. Third, you delete the _figure object, but then call "pylab.clf()". I haven't double-checked exactly what would happen, but I think you might run the risk of accidentially clearing some other existing figure by doing that. Lastly, you then call pylab.close(), which I point out the same caveat as before. Really, all you needed was pylab.close() and you can eliminate the 5 preceding lines and the other two del()'s. All del() really does is remove the variable out of scope. Once that object is out of everybody's scope, then the gc can clean it up. Since the function was ending anyway, there is no point in deleting the variable.

Right, but I suspect that garbage collector does not recycle quickly enough unused objects
after the function is left. If I generate many figure sin a loop, one after another, it
appeared to me helpful to interleave the function calls with the gc.collect() calls.

I don't know if this would fix your problem, and there are a bunch of other style issues here (particularly, pylab really shouldn't be used this way), but hopefully this gives some food for thought.

I think I will start tomorrow finishing up the broken testcase so that we can be sure
where was the culprit. Then should improve the function as you proposed. I am not sure
some places what you really mean but will resolve it hopefully.

I was thinking about submitting several other functions like this one for discussion and
improvement, so that so that such wrapper functions could be included in matplotlib. I am
sure you would not like the many function argument and would prefer kwargs instead, but
something have same API would be helpful if I want to switch easily between scatter,
histplot, piechart. Actually, the hist2d substring in this function name is a remnant of
my attempts to do 2d charts but I did not take that route in the end. Just in case you were
puzzled by the function name. :wink:

Thank you,
Martin

···

On Thu, Oct 10, 2013 at 10:21 AM, Michael Droettboom <mdroe@…86… <mailto:mdroe@…86…>> wrote:

Cheers!
Ben Root

Hi,
  so here is some quick but working example. I added there are 2-3 functions (unused)
as a bonus, you can easily call them from the main function using same API
(except the piechart). I hope this shows what I lack in matplotlib - a general API
so that I could easily switch form scatter plot to piechart or barchart without altering
much the function arguments. Messing with return objects line2D, PathCollection, Rectangle
is awkward and I would like to stay away from matplotlib's internals. :wink: Some can be sliced,
so not, you will see in the code.

  This eatmem.py will take easily all your memory. Drawing 300000 dots is not feasible
with 16GB of RAM. While the example is for sure inefficient in many places generating the data
in python does not eat RAM. That happens afterwards.

I would really like to hear whether matplotlib could be adjusted instead. :wink: I already mentioned
in this thread that it is awkward to pre-create colors before passing all data to a drawing
function. I think we could all save a lot if matplotlib could dynamically fetch colors
on the fly from user-created generator, same for legends descriptions. I think my example
code shows the inefficient approach here. Would I have more time I would randomize a bit
more the sublist of each series so that the numbers in legends would be more variable
but that is a cosmetic issue.
  Probably due to my ignorance you will see that figures with legends have different font
sizes, axes are rescaled and the figure. Of course I wanted to have the drawing same via both
approaches but failed badly. The files/figures with legends should be just accompanied by the
legend "table" underneath but the drawing itself should be same. Maybe an issue with DPI settings
but not only.

  I placed some comments in the code, please don't take them in person. :wink: Of course
I am glad for the existing work and am happy to contribute my crap. I am fine if you rewamp
this ugly code into matplotlib testsuite, provide similar function (the API mentioned above)
so that I could use your code directly. That would be great. I just tried to show multiple
issues at once, notably that is why I included those unused functions. You will for sure find
a way to use them.

Regarding the "unnecessary" del() calls etc., I think I have to use keep some, Ben, because
the function is not always left soon enough. I could drop some, you are right, but for some
I don't think so. Matplotlib cannot recycle the memory until me (upstream) deletes the reference
so ... go and test this lousy code. Now you have a testcase. :wink: Same with the gc.collect() calls.
Actually, the main loop with 10 iteration is there just to show why I always want to clear
a figure when entering a function and while leaving it as well. It happened too many times that
I drawed over an old figure, and this was posted also few times on this list by others. That is
a weird behavior in my opinion. We, users, are just forced to use too low-level functions.

So, have fun eating your memory! :))
Martin

eatmem.py (26.9 KB)

Sorry to repeat myself, but please reduce this to a short, self contained example, that is absolutely minimal to demonstrate the problem. http://sscce.org/ should help better explain what I'm after. I don't want to find the needle in the haystack here -- there is code in your example that doesn't even run, for example.

That said, are you really after creating a legend entry for each of the dots? (See below). That just isn't going to work, and I'm not surprised it eats up excessive amounts of memory. I think you want (and can) reduce this to a single scatter call.

_series = [_ax1.scatter(_x, _y, color=_c, s=objsize, label=_l, hatch='.') for _x, _y, _c, _l in izip(mydata_x, mydata_y, colors, legends)] # returns PathCollection object

Mike

···

On 10/12/2013 12:57 PM, Martin MOKREJŠ wrote:

Hi,
   so here is some quick but working example. I added there are 2-3 functions (unused)
as a bonus, you can easily call them from the main function using same API
(except the piechart). I hope this shows what I lack in matplotlib - a general API
so that I could easily switch form scatter plot to piechart or barchart without altering
much the function arguments. Messing with return objects line2D, PathCollection, Rectangle
is awkward and I would like to stay away from matplotlib's internals. :wink: Some can be sliced,
so not, you will see in the code.

   This eatmem.py will take easily all your memory. Drawing 300000 dots is not feasible
with 16GB of RAM. While the example is for sure inefficient in many places generating the data
in python does not eat RAM. That happens afterwards.

I would really like to hear whether matplotlib could be adjusted instead. :wink: I already mentioned
in this thread that it is awkward to pre-create colors before passing all data to a drawing
function. I think we could all save a lot if matplotlib could dynamically fetch colors
on the fly from user-created generator, same for legends descriptions. I think my example
code shows the inefficient approach here. Would I have more time I would randomize a bit
more the sublist of each series so that the numbers in legends would be more variable
but that is a cosmetic issue.
   Probably due to my ignorance you will see that figures with legends have different font
sizes, axes are rescaled and the figure. Of course I wanted to have the drawing same via both
approaches but failed badly. The files/figures with legends should be just accompanied by the
legend "table" underneath but the drawing itself should be same. Maybe an issue with DPI settings
but not only.

   I placed some comments in the code, please don't take them in person. :wink: Of course
I am glad for the existing work and am happy to contribute my crap. I am fine if you rewamp
this ugly code into matplotlib testsuite, provide similar function (the API mentioned above)
so that I could use your code directly. That would be great. I just tried to show multiple
issues at once, notably that is why I included those unused functions. You will for sure find
a way to use them.

  Regarding the "unnecessary" del() calls etc., I think I have to use keep some, Ben, because
the function is not always left soon enough. I could drop some, you are right, but for some
I don't think so. Matplotlib cannot recycle the memory until me (upstream) deletes the reference
so ... go and test this lousy code. Now you have a testcase. :wink: Same with the gc.collect() calls.
Actually, the main loop with 10 iteration is there just to show why I always want to clear
a figure when entering a function and while leaving it as well. It happened too many times that
I drawed over an old figure, and this was posted also few times on this list by others. That is
a weird behavior in my opinion. We, users, are just forced to use too low-level functions.

So, have fun eating your memory! :))
Martin

--
                    _

\/|o _|_ _. _ | | \.__ __|__|_|_ _ _ ._ _
>>(_| |(_|(/_| |_/|(_)(/_|_ |_|_)(_)(_)| | |

http://www.droettboom.com

Michael Droettboom wrote:

Sorry to repeat myself, but please reduce this to a short, self contained example, that is absolutely minimal to demonstrate the problem. http://sscce.org/ should help better explain what I'm after. I don't want to find the needle in the haystack here -- there is code in your example that doesn't even run, for example.

That said, are you really after creating a legend entry for each of the dots? (See below). That just isn't going to work, and I'm not surprised it eats up excessive amounts of memory. I think you want (and can) reduce this to a single scatter call.

_series = [_ax1.scatter(_x, _y, color=_c, s=objsize, label=_l, hatch='.') for _x, _y, _c, _l in izip(mydata_x, mydata_y, colors, legends)] # returns PathCollection object

Are you sure? I think it was concluded on this list that scatter cannot (or does not) take
nested lists of lists with series like histogram and piechart do. I cannot find the thread
but maybe you are more lucky. I even think that I already opened a bugreport/feature requested
in the past for this. But maybe not.
Martin

···

Mike

On 10/12/2013 12:57 PM, Martin MOKREJŠ wrote:

Hi,
  so here is some quick but working example. I added there are 2-3 functions (unused)
as a bonus, you can easily call them from the main function using same API
(except the piechart). I hope this shows what I lack in matplotlib - a general API
so that I could easily switch form scatter plot to piechart or barchart without altering
much the function arguments. Messing with return objects line2D, PathCollection, Rectangle
is awkward and I would like to stay away from matplotlib's internals. :wink: Some can be sliced,
so not, you will see in the code.

  This eatmem.py will take easily all your memory. Drawing 300000 dots is not feasible
with 16GB of RAM. While the example is for sure inefficient in many places generating the data
in python does not eat RAM. That happens afterwards.

I would really like to hear whether matplotlib could be adjusted instead. :wink: I already mentioned
in this thread that it is awkward to pre-create colors before passing all data to a drawing
function. I think we could all save a lot if matplotlib could dynamically fetch colors
on the fly from user-created generator, same for legends descriptions. I think my example
code shows the inefficient approach here. Would I have more time I would randomize a bit
more the sublist of each series so that the numbers in legends would be more variable
but that is a cosmetic issue.
  Probably due to my ignorance you will see that figures with legends have different font
sizes, axes are rescaled and the figure. Of course I wanted to have the drawing same via both
approaches but failed badly. The files/figures with legends should be just accompanied by the
legend "table" underneath but the drawing itself should be same. Maybe an issue with DPI settings
but not only.

  I placed some comments in the code, please don't take them in person. :wink: Of course
I am glad for the existing work and am happy to contribute my crap. I am fine if you rewamp
this ugly code into matplotlib testsuite, provide similar function (the API mentioned above)
so that I could use your code directly. That would be great. I just tried to show multiple
issues at once, notably that is why I included those unused functions. You will for sure find
a way to use them.

Regarding the "unnecessary" del() calls etc., I think I have to use keep some, Ben, because
the function is not always left soon enough. I could drop some, you are right, but for some
I don't think so. Matplotlib cannot recycle the memory until me (upstream) deletes the reference
so ... go and test this lousy code. Now you have a testcase. :wink: Same with the gc.collect() calls.
Actually, the main loop with 10 iteration is there just to show why I always want to clear
a figure when entering a function and while leaving it as well. It happened too many times that
I drawed over an old figure, and this was posted also few times on this list by others. That is
a weird behavior in my opinion. We, users, are just forced to use too low-level functions.

So, have fun eating your memory! :))
Martin

--
                   _
>\/|o _|_ _. _ | | \.__ __|__|_|_ _ _ ._ _
> >>(_| |(_|(/_| |_/|(_)(/_|_ |_|_)(_)(_)| | |

http://www.droettboom.com

--
Martin Mokrejs, Ph.D.
Bioinformatics
Donovalska 1658
149 00 Prague
Czech Republic
http://www.iresite.org
http://www.iresite.org/~mmokrejs

Hello Martin,

can I ask what is the meaning of plotting a scatter plot with 200
thousands points in it? Either you visualize it on a screen much larger
than mine, or you are not going to be able to distinguish the single
data points. Maybe you should rethink the visualization tool you are using.

Nevertheless, I'm perfectly able to plot a scatter plot with 262422 data
points each with its own color just fine, and the python process
consumes a few hundred Mb of ram (having quite a few other datasets
loaded in memory)::

    import numpy as np
    import matplotlib.pyplot as plt
    n = 262422
    x = np.random.rand(n)
    y = np.random.rand(n)
    c = np.random.rand(n)
    f = plt.figure()
    a = f.add_subplot(111)
    a.scatter(x, y, c=c, s=50)
    plt.show()

and a possible solution using exactly 153 different colors, but again, I
don't see how you can distinguish between hundreds different shades of
colors::

n = 262422 #22
    ncolors = 153
    x = np.random.rand(n)
    y = np.random.rand(n)
    c = np.random.rand(ncolors)
    f = plt.figure()
    a = f.add_subplot(111)
    for i in xrange(n // ncolors):
        a.scatter(x[i*ncolors:(i+1)*ncolors],
                  y[i*ncolors:(i+1)*ncolors], c=c, s=50)
    plt.show()

Unfortunately the code you provide is too contrived to be useful to
understand the root cause of your problem.

Cheers,
Daniele

···

On 10/10/2013 15:05, Martin MOKREJŠ wrote:

Hi,
  rendering some of my charts takes almost 50GB of RAM. I believe below is a stracktrace
of one such situation when it already took 15GB. Would somebody comments on what is
matplotlib doing at the very moment? Why the recursion?

  The charts had to have 262422 data points in a 2D scatter plot, each point has assigned
its own color. They are in batches so that there are 153 distinct colors but nevertheless,
I assigned to each data point a color value. There are 153 legend items also (one color
won't be used).