Removing lines leaks memory

_Pearu_Peterson · May 28, 2010, 8:18am

Hi,

In an application that updates a plot with
new experimental data, say, every second and the experiment
can last hours, I have tried two approaches:
1) clear axes and plot new experimental data - this is
slow and takes too much cpu resources.
2) remove lines and plot new experimental data - this is
fast enough but unfortunately there seems to be a memory
leakage, the application runs out of memory.

Here follows a simple script that demonstrates the
leakage problem:

···

#
import numpy
from numpy.testing.utils import memusage
import matplotlib.pyplot as plt
x = range (1000)
axes1 = plt.figure().add_subplot( 111 )
y = numpy.random.rand (len (x))
while 1:
    if 1:
        # leakage
        for line in axes1.lines:
            if line.get_label ()=='data':
                line.remove()
    else:
        # no leak, but slow
        axes1.clear()
    axes1.plot(x, y, 'b', label='data')
    print memusage (), len (axes1.lines)
#eof

When running the script, the memory usage
is increasing by 132 kbytes per iteration, that is,
with an hour this example application will consume
464MB RAM while no new data has been generated. In real
application this effect will be even worse.

So, I am looking for an advice how to avoid
this memory leakage without clearing axes.

I am using matplotlib from SVN.

Thanks,
Pearu

Joao_Luis_Silva · May 28, 2010, 2:10pm

Why don't you just update the exiting line with the new data, as shown in the animation examples in http://matplotlib.sourceforge.net/examples/animation/index.html ?

For example:

···

On 05/28/2010 09:18 AM, Pearu Peterson wrote:

, say, every second and the experiment
can last hours, I have tried two approaches:
1) clear axes and plot new experimental data - this is
slow and takes too much cpu resources.
2) remove lines and plot new experimental data - this is
fast enough but unfortunately there seems to be a memory
leakage, the application runs out of memory.

#
import numpy
from numpy.testing.utils import memusage
import matplotlib.pyplot as plt
x = range (1000)
fig = plt.figure()
axes1 = fig.add_subplot( 111 )
y = numpy.random.rand (len (x))
line = None
while 1:
     if not line:
         line, = axes1.plot(x, y, 'b', label='data')
     else:
         line.set_data(x,y)
  fig.canvas.draw()
     print memusage ()/(1024.0*1024.0),"MB", len (axes1.lines)
#eof

Regards,
Jo�o Silva

_John_Hunter · May 28, 2010, 2:12pm

Hey Pearu -- thanks for the report. We'll try and track down and fix
this leak. In the interim, would an acceptable work around for you be
to *reuse* an existing line by calling set_data on it. That way you
wouldn't have to do the add/remove that is causing your leak. Have
you confirmed this leak on various backends (eg Agg, PDF, PS)?

···

On Fri, May 28, 2010 at 3:18 AM, Pearu Peterson <pearu@...20...> wrote:

Hi,

In an application that updates a plot with
new experimental data, say, every second and the experiment
can last hours, I have tried two approaches:
1) clear axes and plot new experimental data - this is
slow and takes too much cpu resources.
2) remove lines and plot new experimental data - this is
fast enough but unfortunately there seems to be a memory
leakage, the application runs out of memory.

Here follows a simple script that demonstrates the
leakage problem:

#
import numpy
from numpy.testing.utils import memusage
import matplotlib.pyplot as plt
x = range (1000)
axes1 = plt.figure().add_subplot( 111 )
y = numpy.random.rand (len (x))
while 1:
if 1:
# leakage
for line in axes1.lines:
if line.get_label ()=='data':
line.remove()
else:
# no leak, but slow
axes1.clear()
axes1.plot(x, y, 'b', label='data')
print memusage (), len (axes1.lines)
#eof

When running the script, the memory usage
is increasing by 132 kbytes per iteration, that is,
with an hour this example application will consume
464MB RAM while no new data has been generated. In real
application this effect will be even worse.

So, I am looking for an advice how to avoid
this memory leakage without clearing axes.

Michael_Droettboom · May 28, 2010, 2:47pm

I'm on to something -- some callbacks are being created that are never disconnected.

In Line2D.set_axes:

self._xcid = ax.xaxis.callbacks.connect('units', self.recache_always)

gets called twice. This is problematic because the id of the first connection is simply lost. Also, there doesn't seem to be any code to attempt to remove either of them.

I'm looking into it further -- forcibly deleting these callbacks reduces the reference count on the line object, but doesn't seem to completely eliminate the leak.

Mike

···

On 05/28/2010 10:12 AM, John Hunter wrote:

On Fri, May 28, 2010 at 3:18 AM, Pearu Peterson<pearu@...20...> wrote:


Hi,

In an application that updates a plot with
new experimental data, say, every second and the experiment
can last hours, I have tried two approaches:
1) clear axes and plot new experimental data - this is
slow and takes too much cpu resources.
2) remove lines and plot new experimental data - this is
fast enough but unfortunately there seems to be a memory
leakage, the application runs out of memory.

Here follows a simple script that demonstrates the
leakage problem:

#
import numpy
from numpy.testing.utils import memusage
import matplotlib.pyplot as plt
x = range (1000)
axes1 = plt.figure().add_subplot( 111 )
y = numpy.random.rand (len (x))
while 1:
    if 1:
        # leakage
        for line in axes1.lines:
            if line.get_label ()=='data':
                line.remove()
    else:
        # no leak, but slow
        axes1.clear()
    axes1.plot(x, y, 'b', label='data')
    print memusage (), len (axes1.lines)
#eof

When running the script, the memory usage
is increasing by 132 kbytes per iteration, that is,
with an hour this example application will consume
464MB RAM while no new data has been generated. In real
application this effect will be even worse.

So, I am looking for an advice how to avoid
this memory leakage without clearing axes.


Hey Pearu -- thanks for the report. We'll try and track down and fix
this leak. In the interim, would an acceptable work around for you be
to *reuse* an existing line by calling set_data on it. That way you
wouldn't have to do the add/remove that is causing your leak. Have
you confirmed this leak on various backends (eg Agg, PDF, PS)?

------------------------------------------------------------------------------

_______________________________________________
Matplotlib-devel mailing list
Matplotlib-devel@lists.sourceforge.net
matplotlib-devel List Signup and Options

--
Michael Droettboom
Science Software Branch
Space Telescope Science Institute
Baltimore, Maryland, USA

Michael_Droettboom · May 28, 2010, 5:48pm

There is a fix in r8341. It passes the regression tests, and all of the event handling examples I tried seem to still work.

It seems that many places in matplotlib were never disconnecting callbacks, and these callbacks keep references to the destination objects alive.

Unfortunately, it's not quite obvious where the "disconnect" calls should be added -- the lifetime of objects isn't very symmetrical. For example, the "units" callback is set up by Lines2D inside of its "set_axes" method, but there is no "remove_axes" method in which to put the disconnect. Tracking down all of the ways in which a line could be removed from an axes seems daunting.

Instead, my solution is to store weak references to the methods stored in the CallbackRegistry -- that way the CallbackRegistry won't leak references like it does now. Since the Python stdlib weakref module doesn't directly support weak references to bound methods, the whole thing is a bit hairy -- but I think it's a more permanent solution than trying to ensure that all callbacks get explicitly disconnected.

As this change is rather fundamental and may have unintended consequences, please play with it in your contexts and let me know if you see anything strange.

Mike

···

On 05/28/2010 10:47 AM, Michael Droettboom wrote:

I'm on to something -- some callbacks are being created that are never
disconnected.

In Line2D.set_axes:

    self._xcid = ax.xaxis.callbacks.connect('units', self.recache_always)

gets called twice. This is problematic because the id of the first
connection is simply lost. Also, there doesn't seem to be any code to
attempt to remove either of them.

I'm looking into it further -- forcibly deleting these callbacks reduces
the reference count on the line object, but doesn't seem to completely
eliminate the leak.

Mike

On 05/28/2010 10:12 AM, John Hunter wrote:


On Fri, May 28, 2010 at 3:18 AM, Pearu Peterson<pearu@...20...> wrote:

Hi,

In an application that updates a plot with
new experimental data, say, every second and the experiment
can last hours, I have tried two approaches:
1) clear axes and plot new experimental data - this is
slow and takes too much cpu resources.
2) remove lines and plot new experimental data - this is
fast enough but unfortunately there seems to be a memory
leakage, the application runs out of memory.

Here follows a simple script that demonstrates the
leakage problem:

#
import numpy
from numpy.testing.utils import memusage
import matplotlib.pyplot as plt
x = range (1000)
axes1 = plt.figure().add_subplot( 111 )
y = numpy.random.rand (len (x))
while 1:
     if 1:
         # leakage
         for line in axes1.lines:
             if line.get_label ()=='data':
                 line.remove()
     else:
         # no leak, but slow
         axes1.clear()
     axes1.plot(x, y, 'b', label='data')
     print memusage (), len (axes1.lines)
#eof

When running the script, the memory usage
is increasing by 132 kbytes per iteration, that is,
with an hour this example application will consume
464MB RAM while no new data has been generated. In real
application this effect will be even worse.

So, I am looking for an advice how to avoid
this memory leakage without clearing axes.

Hey Pearu -- thanks for the report. We'll try and track down and fix
this leak. In the interim, would an acceptable work around for you be
to *reuse* an existing line by calling set_data on it. That way you
wouldn't have to do the add/remove that is causing your leak. Have
you confirmed this leak on various backends (eg Agg, PDF, PS)?

------------------------------------------------------------------------------

_______________________________________________
Matplotlib-devel mailing list
Matplotlib-devel@lists.sourceforge.net
matplotlib-devel List Signup and Options

--
Michael Droettboom
Science Software Branch
Space Telescope Science Institute
Baltimore, Maryland, USA

_Pearu_Peterson · May 28, 2010, 6:04pm

No, I haven't but I can try it.

Regarding reusing existing line --- I have understood that this
will work only if the length of the line data does not change.
In my case the data grows as more data points are acquired and I have
not figured out how to make axes to set new limits after changing
the line data.

Currently I am using a work around where the axes are cleared
after every 60 seconds - this seems to keep memory usage under control.

It seems that Mike has resolved the problem. I'll try the latest
SVN..

Thanks!
Pearu

···

On Fri, May 28, 2010 5:12 pm, John Hunter wrote:

Hey Pearu -- thanks for the report. We'll try and track down and fix
this leak. In the interim, would an acceptable work around for you be
to *reuse* an existing line by calling set_data on it. That way you
wouldn't have to do the add/remove that is causing your leak. Have
you confirmed this leak on various backends (eg Agg, PDF, PS)?

_John_Hunter · May 28, 2010, 6:15pm

Regarding reusing existing line --- I have understood that this
will work only if the length of the line data does not change.

This is not correct -- you can change the line length with calls to set_data

In my case the data grows as more data points are acquired and I have
not figured out how to make axes to set new limits after changing
the line data.

ax.relim()

Cheers,
JDH

···

On Fri, May 28, 2010 at 1:04 PM, Pearu Peterson <pearu@...20...> wrote:

Eric_Firing2 · May 28, 2010, 6:29pm

Hey Pearu -- thanks for the report. We'll try and track down and fix
this leak. In the interim, would an acceptable work around for you be
to *reuse* an existing line by calling set_data on it. That way you
wouldn't have to do the add/remove that is causing your leak. Have
you confirmed this leak on various backends (eg Agg, PDF, PS)?

No, I haven't but I can try it.

Regarding reusing existing line --- I have understood that this
will work only if the length of the line data does not change.

Not so.

In my case the data grows as more data points are acquired and I have
not figured out how to make axes to set new limits after changing
the line data.

lineobj = plot([0], [0])[0]
ax = gca()
x = np.arange(10)
y = 20 * np.sin(x)
lineobj.set_data(x, y)
xy = np.concatenate((x[:,np.newaxis], y[:,np.newaxis]), axis=1)
ax.update_datalim(xy)
ax.autoscale_view()
draw()

Eric

···

On 05/28/2010 08:04 AM, Pearu Peterson wrote:

On Fri, May 28, 2010 5:12 pm, John Hunter wrote:

Currently I am using a work around where the axes are cleared
after every 60 seconds - this seems to keep memory usage under control.

It seems that Mike has resolved the problem. I'll try the latest
SVN..

Thanks!
Pearu

------------------------------------------------------------------------------

_______________________________________________
Matplotlib-devel mailing list
Matplotlib-devel@lists.sourceforge.net
matplotlib-devel List Signup and Options

_Pearu_Peterson · May 28, 2010, 6:37pm

Ok, very good. However, it does not seem to have effect. Consider
the following example:

···

On Fri, May 28, 2010 9:15 pm, John Hunter wrote:

On Fri, May 28, 2010 at 1:04 PM, Pearu Peterson <pearu@...20...> wrote:

Regarding reusing existing line --- I have understood that this
will work only if the length of the line data does not change.

This is not correct -- you can change the line length with calls to
set_data

In my case the data grows as more data points are acquired and I have
not figured out how to make axes to set new limits after changing
the line data.

ax.relim()

#
import numpy
from numpy.testing.utils import memusage
import matplotlib
matplotlib.use('GTKAgg')

import matplotlib.pyplot as plt

fig = plt.figure()
axes1 = fig.add_subplot( 111 )

def animate():
    x = [0]
    while 1:
        y = numpy.random.rand (len (x))
        if 1:
            # updating line in place
            if not axes1.lines:
                line, = axes1.plot(x, y, 'b')
            else:
                line.set_data(x, y)
                # relim does not have effect in updating axes
                axes1.relim()
        else:
            # demonstrates expected behaviour, has leakage w/o Mike patch
            for line in axes1.lines:
                line.remove()
            line, = axes1.plot(x, y, 'b')
        fig.canvas.draw()
        print memusage ()/(1024.0*1024.0),"MB", len (axes1.lines), len(x)
        x.append(x[-1]+1)

import gobject
print 'adding idle'
gobject.idle_add(animate)
print 'showing'
plt.show()
#eof

While the new data is plotted correctly, the plot shows fixed axes
from the first plot call. What I am doing wrong?

Pearu

_John_Hunter · May 28, 2010, 6:41pm

ax.relim() causes the data limits to be updated based on the current
objects it contains, ax.autoscale_view() causes the view limits to be
updated based on the data limits, and fig.canvas.draw() forces a
redraw.

JDH

···

On Fri, May 28, 2010 at 1:37 PM, Pearu Peterson <pearu@...20...> wrote:

While the new data is plotted correctly, the plot shows fixed axes
from the first plot call. What I am doing wrong?

_Pearu_Peterson · May 28, 2010, 7:31pm

Thanks, John!

Adding ax.autoscale_view after ax.relim makes the script work correctly.

Best regards,
Pearu