Speed improvements on the branch

Thought some of you may be interested to know that the speed on the branch is getting much better. Whereas earlier the branch was about 2x slower than the trunk, now most things are close to equal with the trunk speed-wise (with a few outliers for some things such as auto legends, quivers and the pcolor stuff that Eric and I have been working on).

Here are the results for the "simple_plot_fps.py" benchmark, which is meant to measure the interactive performance of panning and zooming:

   trunk: 21.63 fps
   branch: 23.25 fps

Attached are the time differences for everything in backend_driver.py. (Sorted by the percentage difference in speed.) Note that, unlike the above, this measures only one drawing of the plot. It would be interesting to measure the difference in interactive performance for some of these -- I suspect the branch may do better.

Cheers,
Mike

results (5.86 KB)

···

--
Michael Droettboom
Science Software Branch
Operations and Engineering Division
Space Telescope Science Institute
Operated by AURA for NASA

Thought some of you may be interested to know that the speed on the
branch is getting much better. Whereas earlier the branch was about 2x
slower than the trunk, now most things are close to equal with the trunk
speed-wise (with a few outliers for some things such as auto legends,
quivers and the pcolor stuff that Eric and I have been working on).

Hey Michael, this is very encouraging I just wanted to let know about
another important use case which I think you are aware of because
you've referred to optimized marker drawing in the past, but this is
something I put a lot of effort into (the agg cached marker rasters in
extension code) because it is an important use case. The script below
is a useful test, with performance numbers below

import time
import numpy as n
import matplotlib
matplotlib.use('Agg')
from pylab import figure

fig = figure()
ax = fig.add_subplot(111)
for i in range(1,7):
    N = 10**i
    x, y = n.random.rand(2,N)
    ax.cla()
    tstart = time.time()
    ax.plot(x, y, 'o')
    fig.canvas.draw()
    print 'N=%d; elapsed=%1.3f'%(N, time.time()-tstart)

Trunk:
N=10; elapsed=0.139
N=100; elapsed=0.092
N=1000; elapsed=0.082
N=10000; elapsed=0.133
N=100000; elapsed=0.594
N=1000000; elapsed=5.193

Branch:
N=10; elapsed=0.207
N=100; elapsed=0.118
N=1000; elapsed=0.138
N=10000; elapsed=0.280
N=100000; elapsed=1.671
N=1000000; elapsed=15.877

    log_demo.py 1.769 2.011 0.242 113%

Here is another area where there is an important difference. Panning
and zooming interactively with log scaling is much slower on the
branch, presumably because you have to redo the non-affine part every
time. Also, the old grid line bug on log plots seems to be back, as
evinced when you zoom from the "home" view.

Anyway, with a few exceptional cases, your new timing results are
starting to look very promising.

Thanks,
JDH

···

On Nov 15, 2007 12:53 PM, Michael Droettboom <mdroe@...31...> wrote:

John Hunter wrote:

Thought some of you may be interested to know that the speed on the
branch is getting much better. Whereas earlier the branch was about 2x
slower than the trunk, now most things are close to equal with the trunk
speed-wise (with a few outliers for some things such as auto legends,
quivers and the pcolor stuff that Eric and I have been working on).

Hey Michael, this is very encouraging I just wanted to let know about
another important use case which I think you are aware of because
you've referred to optimized marker drawing in the past, but this is
something I put a lot of effort into (the agg cached marker rasters in
extension code) because it is an important use case. The script below
is a useful test, with performance numbers below

import time
import numpy as n
import matplotlib
matplotlib.use('Agg')
from pylab import figure

fig = figure()
ax = fig.add_subplot(111)
for i in range(1,7):
    N = 10**i
    x, y = n.random.rand(2,N)
    ax.cla()
    tstart = time.time()
    ax.plot(x, y, 'o')
    fig.canvas.draw()
    print 'N=%d; elapsed=%1.3f'%(N, time.time()-tstart)

Trunk:
N=10; elapsed=0.139
N=100; elapsed=0.092
N=1000; elapsed=0.082
N=10000; elapsed=0.133
N=100000; elapsed=0.594
N=1000000; elapsed=5.193

Branch:
N=10; elapsed=0.207
N=100; elapsed=0.118
N=1000; elapsed=0.138
N=10000; elapsed=0.280
N=100000; elapsed=1.671
N=1000000; elapsed=15.877

Very odd. I've been running my own very similar benchmark as I've been going, and the two code bases perform quite similarly. The branch continues to cache the markers in more or less the same way as on the trunk. Here are my results with your benchmark:

Trunk:
N=10; elapsed=0.056
N=100; elapsed=0.039
N=1000; elapsed=0.042
N=10000; elapsed=0.067
N=100000; elapsed=0.326
N=1000000; elapsed=2.913

Branch:
N=10; elapsed=0.033
N=100; elapsed=0.028
N=1000; elapsed=0.030
N=10000; elapsed=0.055
N=100000; elapsed=0.310
N=1000000; elapsed=2.858

I wonder what environmental and/or hardware difference could cause this? Perhaps a fresh rebuild would make a difference? (Due to distutils' lack of dependency tracking...?)

    log_demo.py 1.769 2.011 0.242 113%

Here is another area where there is an important difference. Panning
and zooming interactively with log scaling is much slower on the
branch, presumably because you have to redo the non-affine part every
time.

The non-affine part is not computed on every pan and zoom -- that was one of the main design goals of the branch. (You can put a print statement in Log10Transform.transform to see when it gets called.) I can't feel a speed difference between the two, but...

Also, the old grid line bug on log plots seems to be back, as
evinced when you zoom from the "home" view.

...I should fix this bug first to have a fair comparison.

Cheers,
Mike

···

On Nov 15, 2007 12:53 PM, Michael Droettboom <mdroe@...31...> wrote:

--
Michael Droettboom
Science Software Branch
Operations and Engineering Division
Space Telescope Science Institute
Operated by AURA for NASA

Hmm, good guess. I did a clean reinstall of both and the timing
numbers are very close. Thanks for catching this.

JDH

···

On Nov 15, 2007 1:51 PM, Michael Droettboom <mdroe@...31...> wrote:

Very odd. I've been running my own very similar benchmark as I've been
going, and the two code bases perform quite similarly. The branch
continues to cache the markers in more or less the same way as on the
trunk. Here are my results with your benchmark:

John Hunter wrote:

···

On Nov 15, 2007 1:51 PM, Michael Droettboom <mdroe@...31...> wrote:

Very odd. I've been running my own very similar benchmark as I've been
going, and the two code bases perform quite similarly. The branch
continues to cache the markers in more or less the same way as on the
trunk. Here are my results with your benchmark:

Hmm, good guess. I did a clean reinstall of both and the timing
numbers are very close. Thanks for catching this.

Phew! I was worried we'd have a real mystery on our hands... :wink:

--
Michael Droettboom
Science Software Branch
Operations and Engineering Division
Space Telescope Science Institute
Operated by AURA for NASA

Michael Droettboom wrote:

John Hunter wrote:

    log_demo.py 1.769 2.011 0.242 113%

Here is another area where there is an important difference. Panning
and zooming interactively with log scaling is much slower on the
branch, presumably because you have to redo the non-affine part every
time.

The non-affine part is not computed on every pan and zoom -- that was one of the main design goals of the branch. (You can put a print statement in Log10Transform.transform to see when it gets called.) I can't feel a speed difference between the two, but...

Also, the old grid line bug on log plots seems to be back, as
evinced when you zoom from the "home" view.

...I should fix this bug first to have a fair comparison.

Fixing that bug actually had a net positive effect on the benchmarks overall... (More correct *and* faster? Never happens.)

I created a benchmark using the middle plot of log_demo.py and moving the bounds (just like simple_plot_fps.py) and I get the following:

Trunk: 23.68 fps
Branch: 16.83 fps

So there's definitely a slow down there. The profiler shows that a huge chunk of the time is spent in numpy/core/ma.py, suggesting that masked arrays are the culprit. I think further quarantining of masked arrays will help -- for instance, a masked array is created whether or not there are any values <= 0.0.

Any thoughts on this are welcome.

Cheers,
Mike

log_fps.py (652 Bytes)

···

--
Michael Droettboom
Science Software Branch
Operations and Engineering Division
Space Telescope Science Institute
Operated by AURA for NASA