large data sets and performance

Although the data I'm playing with right now is monotonic

    > (in x), I cannot assume that this will always be the case,
    > and need an efficient solutions for all situations.

Agreed.

    > the 'lod' option in: l = plot(arange(10000),
    > arange(20000,30000)) #dummy data.. 10,000 pairs set(l,
    > 'lod', True) option does not work for me. It's still
    > roughly 1000 points/second

I left out a *critical* detail. The new gd backend code implements
antialiased drawing by default. Very slow. Check out the numbers
below based on the demo script you supplied

    backend = 'GD'
    import matplotlib
    matplotlib.use(backend)
    from matplotlib.matlab import *
    l = plot(arange(10000), arange(20000,30000)) #dummy data.. 10,000 pairs
    lod, aa = False, False
    print 'Backend: %s, LOD %d, AA %d' % (backend, lod, aa)
    set(l, 'lod', lod, 'antialiased', aa)
    savefig('test')

  Backend: GD, LOD 1, AA 1
  23.770u 0.030s 0:23.77 100.1% 0+0k 0+0io 793pf+0w

  Backend: GD, LOD 0, AA 1
  23.500u 0.020s 0:23.52 100.0% 0+0k 0+0io 793pf+0w

  Backend: GD, LOD 1, AA 0
  0.270u 0.000s 0:00.28 96.4% 0+0k 0+0io 794pf+0w

  Backend: GD, LOD 0, AA 0
  0.240u 0.030s 0:00.27 100.0% 0+0k 0+0io 794pf+0w

In other words, if you are using the new GD in it's default
configuration, you are paying a *100 fold performance hit* for
antialiased line drawing. Without it, I can draw and save your figure
(including python startup time, etc, etc) in 0.25s on a 2GHz Pentium
4. Is this in the ballpark for you, performance wise?

While we're on the subject of performance, I took the opportunity to
test the other backends. Note the numbers are not strictly comparable
(discussed below) but are informative.

  Backend: Paint, LOD 0, AA 0
  0.520u 0.000s 0:00.52 100.0% 0+0k 0+0io 726pf+0w

  Backend: PS, LOD 0, AA 0
  1.030u 0.040s 0:01.08 99.0% 0+0k 0+0io 582pf+0w

  Backend: Agg, LOD 0, AA 0
  0.320u 0.010s 0:00.28 117.8% 0+0k 0+0io 681pf+0w

  Backend: GTK, LOD 0, AA 0
  0.650u 0.020s 0:00.66 101.5% 0+0k 0+0io 3031pf+0w

The GTK results are in xvfb so it appears to be a no-go for you even
if we could figure out how to print to stdout. These numbers are
repeatable and consistent.

Worthy of comment:

  * GD with antialiased off wins

  * paint is not as fast as I hoped

  * GTK is not as fast as I thought

  * Agg is an interesting case. It is doing antialiased drawing
    despite the AA 0 flag because I haven't made this conditional in
    the backend. It draws antialised unconditionally currently. But
    it hasn't implemented text yet. So it's not strictly comparable,
    but it is noteworthy that it is 100 times faster than GD at AA
    lines. It remains to be seen what speed we can get with plain
    vanilla aliased rendering.

My guess is: when you turn off antialiasing you'll be a whole lot
happier. Let me know.

The last thing I looked at was how the GD numbers scale with line
size. Below, N is the number of data points (with LOD false the
numbers are very close to these results where LOD is true)

  Backend: GD, LOD 1, AA 0, N 10000
  0.230u 0.040s 0:00.24 112.5% 0+0k 0+0io 794pf+0w

  Backend: GD, LOD 1, AA 0, N 20000
  0.260u 0.060s 0:00.31 103.2% 0+0k 0+0io 794pf+0w

  Backend: GD, LOD 1, AA 0, N 40000
  0.390u 0.030s 0:00.41 102.4% 0+0k 0+0io 794pf+0w

  Backend: GD, LOD 1, AA 0, N 80000
  0.590u 0.060s 0:00.60 108.3% 0+0k 0+0io 815pf+0w

  Backend: GD, LOD 1, AA 0, N 160000
  1.070u 0.090s 0:01.13 102.6% 0+0k 0+0io 818pf+0w

JDH

John:
Thanks very much for your investigative work.

antialiased line drawing. Without it, I can draw and save your figure
(including python startup time, etc, etc) in 0.25s on a 2GHz Pentium
4. Is this in the ballpark for you, performance wise?

yes.. yes..yes..

My guess is: when you turn off antialiasing you'll be a whole lot
happier. Let me know.

With antialiasing off, the performance is superb!.. I plot 500,000 points in ~4-5 seconds.. The visual quality of the graphs is (naturally) inferior to the antialiased counterparts, but the software is now feasible for my purposes.

Just couple more questions:

1) Seems like setting 'lod' to true does not improve performance? I would imagine it should, because it limits the amount of points used. What am I missing?

2) Is there any way to make the graphs look "prettier"? They really look quite OK but in some cases having a little more detail would be nice. Is it possible specify just how much antialiasing is needed? Are there any other "visual enchantment options" that can be set, and will not impact performace too much?

3) When I do:

plot1 = plot(arange(10000), arange(20000,30000)) #dummy data.. 10,000 pairs
lod, aa = False, False
set(l, 'lod', lod, 'antialiased', aa)

Do these options only apply to the current plot (ie. plot1)?
Is it possible to have a plot inside a plot with one being antialiased, and the other one not?
Do I have to re-set them after I call savefig() (Will test this.. )

I have been playing around with the dpi setting a little. Is it supposed to change the size of the image and/or the resolution??

Thanks again.

···

--
Peter Groszkowski Gemini Observatory
Tel: +1 808 974-2509 670 N. A'ohoku Place
Fax: +1 808 935-9235 Hilo, Hawai'i 96720, USA