I am getting very inconsistent timings when looking into plotting a line with a very large number of points. Axes.add_line() is very slow, and the time is taken by Axes._update_line_limits(). But when I simply run the latter, on a Line2D of the same dimensions, it can be fast.

import matplotlib
matplotlib.use('template')
import numpy as np
import matplotlib.lines as mlines
import matplotlib.pyplot as plt
ax = plt.gca()
LL = mlines.Line2D(np.arange(1.5e6), np.sin(np.arange(1.5e6)))
from time import time
###16.621543884277344
LL = mlines.Line2D(np.arange(1.5e6), np.sin(np.arange(1.5e6)))
###16.579419136047363
## We added two identical lines, each took 16 seconds.

LL = mlines.Line2D(np.arange(1.5e6), np.sin(np.arange(1.5e6)))
t = time(); ax._update_line_limits(LL); time()-t
###0.1733548641204834
## But when we made another identical line, updating the limits was
## fast.

# Below are similar experiments:
LL = mlines.Line2D(np.arange(1.5e6), 2*np.sin(np.arange(1.5e6)))
t = time(); ax._update_line_limits(LL); time()-t
###0.18362092971801758

## with a fresh axes:
plt.clf()
ax = plt.gca()
LL = mlines.Line2D(np.arange(1.5e6), 2*np.sin(np.arange(1.5e6)))
t = time(); ax._update_line_limits(LL); time()-t
###0.22244811058044434

###16.724560976028442

What is going on? I used print statements inside add_line() to verify that all the time is in _update_line_limits(), which runs one or two orders of magnitude slower when run inside of add_line than when run outside--even if I run the preceding parts of add_line first.

Eric

According to lsprofcalltree, the slowness appears to be entirely in the units code by a wide margin -- which is unfortunately code I understand very little about. The difference in timing before and after adding the line to the axes appears to be because the unit conversion is not invalidated until the line has been added to an axes.

In units.get_converter(), it iterates through every *value* in the data to see if any of them require unit conversion, and returns the first one it finds. It seems like if we're passing in a numpy array of numbers (i.e. not array of objects), then we're pretty much guaranteed from the get-go not to find a single value that requires unit conversion so we might as well not look. Am I making the wrong assumption?

However, for lists, it also seems that, since the code returns the first converter it finds, maybe it could just look at the first element of the sequence, rather than the entire sequence. It the first is not in the same unit as everything else, then the result will be broken anyway. For example, if I hack evans_test.py to contain a single int amongst the list of "Foo" objects in the data, I get an exception anyway, even as the code stands now.

I have attached a patch against unit.py to speed up the first case (passing Numpy arrays). I think I need more feedback from the units experts whether my suggestion for lists (to only look at the first element) is reasonable.

Feel free to commit the patch if it seems reasonable to those who know more about units than I do.

Mike

Eric Firing wrote:

units.py.patch (709 Bytes)

···

I am getting very inconsistent timings when looking into plotting a line with a very large number of points. Axes.add_line() is very slow, and the time is taken by Axes._update_line_limits(). But when I simply run the latter, on a Line2D of the same dimensions, it can be fast.

import matplotlib
matplotlib.use('template')
import numpy as np
import matplotlib.lines as mlines
import matplotlib.pyplot as plt
ax = plt.gca()
LL = mlines.Line2D(np.arange(1.5e6), np.sin(np.arange(1.5e6)))
from time import time
###16.621543884277344
LL = mlines.Line2D(np.arange(1.5e6), np.sin(np.arange(1.5e6)))
###16.579419136047363
## We added two identical lines, each took 16 seconds.

LL = mlines.Line2D(np.arange(1.5e6), np.sin(np.arange(1.5e6)))
t = time(); ax._update_line_limits(LL); time()-t
###0.1733548641204834
## But when we made another identical line, updating the limits was
## fast.

# Below are similar experiments:
LL = mlines.Line2D(np.arange(1.5e6), 2*np.sin(np.arange(1.5e6)))
t = time(); ax._update_line_limits(LL); time()-t
###0.18362092971801758

## with a fresh axes:
plt.clf()
ax = plt.gca()
LL = mlines.Line2D(np.arange(1.5e6), 2*np.sin(np.arange(1.5e6)))
t = time(); ax._update_line_limits(LL); time()-t
###0.22244811058044434

###16.724560976028442

What is going on? I used print statements inside add_line() to verify that all the time is in _update_line_limits(), which runs one or two orders of magnitude slower when run inside of add_line than when run outside--even if I run the preceding parts of add_line first.

Eric

-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Matplotlib-devel mailing list
Matplotlib-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/matplotlib-devel

--
Michael Droettboom
Science Software Branch
Operations and Engineering Division
Space Telescope Science Institute
Operated by AURA for NASA

I made this change -- return the converter from the first element --
and added Michael's non-object numpy arrat optimization too. The
units code needs some attention, I just haven't been able to get to
it...

This helps performance considerably -- on backend driver:

Before:
Backend agg took 1.32 minutes to complete
Backend ps took 1.37 minutes to complete
Backend pdf took 1.78 minutes to complete
Backend template took 0.83 minutes to complete
Backend svg took 1.53 minutes to complete

After:
Backend agg took 1.08 minutes to complete
Backend ps took 1.15 minutes to complete
Backend pdf took 1.57 minutes to complete
Backend template took 0.61 minutes to complete
Backend svg took 1.31 minutes to complete

Obviously, the results for tests focused on lines with lots of data
would be more dramatic.

Thanks for these suggestions.
JDH

···

On Tue, Oct 7, 2008 at 9:18 AM, Michael Droettboom <mdroe@...31...> wrote:

According to lsprofcalltree, the slowness appears to be entirely in the
units code by a wide margin -- which is unfortunately code I understand very
little about. The difference in timing before and after adding the line to
the axes appears to be because the unit conversion is not invalidated until
the line has been added to an axes.

In units.get_converter(), it iterates through every *value* in the data to
see if any of them require unit conversion, and returns the first one it
finds. It seems like if we're passing in a numpy array of numbers (i.e. not
array of objects), then we're pretty much guaranteed from the get-go not to
find a single value that requires unit conversion so we might as well not
look. Am I making the wrong assumption?

However, for lists, it also seems that, since the code returns the first
converter it finds, maybe it could just look at the first element of the
sequence, rather than the entire sequence. It the first is not in the same
unit as everything else, then the result will be broken anyway.