Alpha compositing of ~60000 line plots takes forever

I want to overlay many line plots using alpha transparency. However, plotting
them in Matplotlib takes about O(n**2) time, and I think I may be running into
memory limitations as well.

As a simple benchmark, I used IPython to run alco.ipy (below), which runs
alco.py for an increasing number of data series. Extrapolating from this,
plotting 60000 series would take something like 200 minutes. This is similar to
my actual use case, which takes about 3 hours to finish a plot. Zooming in and
saving again is much faster, taking only about 30 seconds.

I would appreciate suggestions on how to speed this up. For instance:

Is there a memoryless "canvas" object that I could draw on, just accumulating
the alpha in each pixel: new_alpha = old_alpha + (1 - old_alpha) * this_alpha.

Failing that, I could do it manually by keeping a Numpy array of the pixels in
the image. For each series, find the x values corresponding to each column
index, then interpolate to find the row index corresponding to each y value.
Finally, use imshow() or something to add axes and annotation.

That you in advance for any help.

Best regards,
Jon Olav

== Output of alco.ipy ==

The columns are "number of series" and "seconds".

In [8]: run alco.ipy
1000 9.07
2000 24.8
3000 44.73
4000 67.85
5000 95.67
6000 135.1
7000 177.82
8000 226.03
9000 278.32
10000 340.81

== alco.ipy ==

n, t = [], []
for i in range(1000, 10001, 1000):
    n.append(i)
    ti = !python alco.py $i
    t.append(float(ti.s))
    print n[-1], t[-1]

plot(n, t, '.-')

== alco.py ==

"""Alpha compositing of line plots. Usage: python alco.py NSERIES ALPHA"""
from sys import argv
import numpy as np
import matplotlib as mpl
mpl.use("agg") # noninteractive plotting
from pylab import *

n = int(argv[1])
try:
    alpha = float(argv[2])
except IndexError:
    alpha = 0.02

# generate some data
x = np.arange(200)
for i in range(n):
    y = np.sin(x / (2 * np.pi * x[-1] * i))
    plot(x, y, 'k-', alpha=alpha)

savefig("test.png")

If you're plotting lots of lines, do not use plot but use
LineCollection instead.

http://matplotlib.sourceforge.net/examples/api/collections_demo.html

http://matplotlib.sourceforge.net/api/collections_api.html#matplotlib.collections.LineCollection

Here is slightly modified version of your code that uses
LineCollection (but I haven't check if the code is correct).
With my not so good macbook, it took me 3 sec for 6000 lines and it
seems like O(n) to me.

Regards,

-JJ

    ax = subplot(111)
    x = np.arange(200)
    yy = [np.array((x, np.sin(x / (2 * np.pi * x[-1] * i))))) for i in range(n)]
    yyt = [np.transpose(y1) for y1 in yy]
    from matplotlib.collections import LineCollection

    lc = LineCollection(yyt, colors=[(0, 0, 0, alpha)])
    ax.add_collection(lc)
    ax.autoscale_view()

···

On Tue, Mar 16, 2010 at 7:26 AM, Jon Olav Vik <jonovik@...287...> wrote:

I want to overlay many line plots using alpha transparency. However, plotting
them in Matplotlib takes about O(n**2) time, and I think I may be running into
memory limitations as well.

As a simple benchmark, I used IPython to run alco.ipy (below), which runs
alco.py for an increasing number of data series. Extrapolating from this,
plotting 60000 series would take something like 200 minutes. This is similar to
my actual use case, which takes about 3 hours to finish a plot. Zooming in and
saving again is much faster, taking only about 30 seconds.

I would appreciate suggestions on how to speed this up. For instance:

Is there a memoryless "canvas" object that I could draw on, just accumulating
the alpha in each pixel: new_alpha = old_alpha + (1 - old_alpha) * this_alpha.

Failing that, I could do it manually by keeping a Numpy array of the pixels in
the image. For each series, find the x values corresponding to each column
index, then interpolate to find the row index corresponding to each y value.
Finally, use imshow() or something to add axes and annotation.

That you in advance for any help.

Best regards,
Jon Olav

== Output of alco.ipy ==

The columns are "number of series" and "seconds".

In [8]: run alco.ipy
1000 9.07
2000 24.8
3000 44.73
4000 67.85
5000 95.67
6000 135.1
7000 177.82
8000 226.03
9000 278.32
10000 340.81

== alco.ipy ==

n, t = ,
for i in range(1000, 10001, 1000):
n.append(i)
ti = !python alco.py $i
t.append(float(ti.s))
print n[-1], t[-1]

plot(n, t, '.-')

== alco.py ==

"""Alpha compositing of line plots. Usage: python alco.py NSERIES ALPHA"""
from sys import argv
import numpy as np
import matplotlib as mpl
mpl.use("agg") # noninteractive plotting
from pylab import *

n = int(argv[1])
try:
alpha = float(argv[2])
except IndexError:
alpha = 0.02

# generate some data
x = np.arange(200)
for i in range(n):
y = np.sin(x / (2 * np.pi * x[-1] * i))
plot(x, y, 'k-', alpha=alpha)

savefig("test.png")

------------------------------------------------------------------------------
Download Intel&#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
_______________________________________________
Matplotlib-users mailing list
Matplotlib-users@lists.sourceforge.net
matplotlib-users List Signup and Options