I am getting very inconsistent timings when looking into plotting a line with a very large number of points. Axes.add_line() is very slow, and the time is taken by Axes._update_line_limits(). But when I simply run the latter, on a Line2D of the same dimensions, it can be fast.

import matplotlib

matplotlib.use('template')

import numpy as np

import matplotlib.lines as mlines

import matplotlib.pyplot as plt

ax = plt.gca()

LL = mlines.Line2D(np.arange(1.5e6), np.sin(np.arange(1.5e6)))

from time import time

t = time(); ax.add_line(LL); time()-t

###16.621543884277344

LL = mlines.Line2D(np.arange(1.5e6), np.sin(np.arange(1.5e6)))

t = time(); ax.add_line(LL); time()-t

###16.579419136047363

## We added two identical lines, each took 16 seconds.

LL = mlines.Line2D(np.arange(1.5e6), np.sin(np.arange(1.5e6)))

t = time(); ax._update_line_limits(LL); time()-t

###0.1733548641204834

## But when we made another identical line, updating the limits was

## fast.

# Below are similar experiments:

LL = mlines.Line2D(np.arange(1.5e6), 2*np.sin(np.arange(1.5e6)))

t = time(); ax._update_line_limits(LL); time()-t

###0.18362092971801758

## with a fresh axes:

plt.clf()

ax = plt.gca()

LL = mlines.Line2D(np.arange(1.5e6), 2*np.sin(np.arange(1.5e6)))

t = time(); ax._update_line_limits(LL); time()-t

###0.22244811058044434

t = time(); ax.add_line(LL); time()-t

###16.724560976028442

What is going on? I used print statements inside add_line() to verify that all the time is in _update_line_limits(), which runs one or two orders of magnitude slower when run inside of add_line than when run outside--even if I run the preceding parts of add_line first.

Eric

According to lsprofcalltree, the slowness appears to be entirely in the units code by a wide margin -- which is unfortunately code I understand very little about. The difference in timing before and after adding the line to the axes appears to be because the unit conversion is not invalidated until the line has been added to an axes.

In units.get_converter(), it iterates through every *value* in the data to see if any of them require unit conversion, and returns the first one it finds. It seems like if we're passing in a numpy array of numbers (i.e. not array of objects), then we're pretty much guaranteed from the get-go not to find a single value that requires unit conversion so we might as well not look. Am I making the wrong assumption?

However, for lists, it also seems that, since the code returns the first converter it finds, maybe it could just look at the first element of the sequence, rather than the entire sequence. It the first is not in the same unit as everything else, then the result will be broken anyway. For example, if I hack evans_test.py to contain a single int amongst the list of "Foo" objects in the data, I get an exception anyway, even as the code stands now.

I have attached a patch against unit.py to speed up the first case (passing Numpy arrays). I think I need more feedback from the units experts whether my suggestion for lists (to only look at the first element) is reasonable.

Feel free to commit the patch if it seems reasonable to those who know more about units than I do.

Mike

Eric Firing wrote:

units.py.patch (709 Bytes)

## ···

I am getting very inconsistent timings when looking into plotting a line with a very large number of points. Axes.add_line() is very slow, and the time is taken by Axes._update_line_limits(). But when I simply run the latter, on a Line2D of the same dimensions, it can be fast.

import matplotlib

matplotlib.use('template')

import numpy as np

import matplotlib.lines as mlines

import matplotlib.pyplot as plt

ax = plt.gca()

LL = mlines.Line2D(np.arange(1.5e6), np.sin(np.arange(1.5e6)))

from time import time

t = time(); ax.add_line(LL); time()-t

###16.621543884277344

LL = mlines.Line2D(np.arange(1.5e6), np.sin(np.arange(1.5e6)))

t = time(); ax.add_line(LL); time()-t

###16.579419136047363

## We added two identical lines, each took 16 seconds.

LL = mlines.Line2D(np.arange(1.5e6), np.sin(np.arange(1.5e6)))

t = time(); ax._update_line_limits(LL); time()-t

###0.1733548641204834

## But when we made another identical line, updating the limits was

## fast.

# Below are similar experiments:

LL = mlines.Line2D(np.arange(1.5e6), 2*np.sin(np.arange(1.5e6)))

t = time(); ax._update_line_limits(LL); time()-t

###0.18362092971801758

## with a fresh axes:

plt.clf()

ax = plt.gca()

LL = mlines.Line2D(np.arange(1.5e6), 2*np.sin(np.arange(1.5e6)))

t = time(); ax._update_line_limits(LL); time()-t

###0.22244811058044434

t = time(); ax.add_line(LL); time()-t

###16.724560976028442

What is going on? I used print statements inside add_line() to verify that all the time is in _update_line_limits(), which runs one or two orders of magnitude slower when run inside of add_line than when run outside--even if I run the preceding parts of add_line first.

Eric

-------------------------------------------------------------------------

This SF.Net email is sponsored by the Moblin Your Move Developer's challenge

Build the coolest Linux based applications with Moblin SDK & win great prizes

Grand prize is a trip for two to an Open Source event anywhere in the world

http://moblin-contest.org/redirect.php?banner_id=100&url=/

_______________________________________________

Matplotlib-devel mailing list

Matplotlib-devel@lists.sourceforge.net

https://lists.sourceforge.net/lists/listinfo/matplotlib-devel

--

Michael Droettboom

Science Software Branch

Operations and Engineering Division

Space Telescope Science Institute

Operated by AURA for NASA

I made this change -- return the converter from the first element --

and added Michael's non-object numpy arrat optimization too. The

units code needs some attention, I just haven't been able to get to

it...

This helps performance considerably -- on backend driver:

Before:

Backend agg took 1.32 minutes to complete

Backend ps took 1.37 minutes to complete

Backend pdf took 1.78 minutes to complete

Backend template took 0.83 minutes to complete

Backend svg took 1.53 minutes to complete

After:

Backend agg took 1.08 minutes to complete

Backend ps took 1.15 minutes to complete

Backend pdf took 1.57 minutes to complete

Backend template took 0.61 minutes to complete

Backend svg took 1.31 minutes to complete

Obviously, the results for tests focused on lines with lots of data

would be more dramatic.

Thanks for these suggestions.

JDH

## ···

On Tue, Oct 7, 2008 at 9:18 AM, Michael Droettboom <mdroe@...31...> wrote:

According to lsprofcalltree, the slowness appears to be entirely in the

units code by a wide margin -- which is unfortunately code I understand very

little about. The difference in timing before and after adding the line to

the axes appears to be because the unit conversion is not invalidated until

the line has been added to an axes.

In units.get_converter(), it iterates through every *value* in the data to

see if any of them require unit conversion, and returns the first one it

finds. It seems like if we're passing in a numpy array of numbers (i.e. not

array of objects), then we're pretty much guaranteed from the get-go not to

find a single value that requires unit conversion so we might as well not

look. Am I making the wrong assumption?

However, for lists, it also seems that, since the code returns the first

converter it finds, maybe it could just look at the first element of the

sequence, rather than the entire sequence. It the first is not in the same

unit as everything else, then the result will be broken anyway.