performance (speed) of logarithmic plots

_Andrew_Hawryluk · March 18, 2010, 11:21pm

I've observed a significant difference in the time required by different
plotting functions. With a plot of 5000 random data points (all
positive, non-zero), plt.semilogx takes 3.5 times as long as plt.plot.
(Data for the case of saving to PDF, ratio changes to about 3.1 for PNG
on my machine.)

I used cProfile (script attached) and found several significant
differences between the profiles of each plotting command. On my first
analysis, it appears that most of the difference is due to increased use
of mathtext in semilogx:

semilogPerformance.py (285 Bytes)

···

==================================
Plotting command

cumtime (s) plot semilogx semilogy loglog

total running time 0.618 2.192 0.953 1.362
axis.py:181(draw) 0.118 1.500 0.412 0.569
text.py:504(draw) 0.056 1.353 0.290 0.287
mathtext.py:2765(init) 0.000 1.018 0.104 0.103
mathtext.py:2772(parse) --- 1.294 0.143 0.254
pyparsing.py:1018(parseString) --- 0.215 0.216 0.221
pyparsing.py:3129(oneOf) --- 0.991 --- ---
pyparsing.py:3147(<lambda>) --- 0.358 --- ---
lines.py:918(_draw_solid) 0.243 0.358 0.234 0.352

It seems that semilogx could be made as fast as semilogy since they have
to do the same amount of work, but I'm not sure where the differences
lie. Can anyone suggest where I should look first?

Much thanks,

Andrew Hawryluk

matplotlib.__version__ = '0.99.1'
Windows XP Professional
Version 2002, Service Pack 3
Intel Pentium 4 CPU 3.00 GHz, 2.99 GHz, 0.99 GB of RAM

Gokhan_SEVER · March 19, 2010, 3:39pm

Hello,

How did you get the cumtime listing? The output of the run doesn’t produce a cumulative sum table as you showed here.

···

On Thu, Mar 18, 2010 at 6:21 PM, Andrew Hawryluk <HAWRYLA@…619…> wrote:

I’ve observed a significant difference in the time required by different

plotting functions. With a plot of 5000 random data points (all

positive, non-zero), plt.semilogx takes 3.5 times as long as plt.plot.

(Data for the case of saving to PDF, ratio changes to about 3.1 for PNG

on my machine.)

I used cProfile (script attached) and found several significant

differences between the profiles of each plotting command. On my first

analysis, it appears that most of the difference is due to increased use

of mathtext in semilogx:
                            ==================================

                            Plotting command
==================================================================

cumtime (s) plot semilogx semilogy loglog

==================================================================

total running time 0.618 2.192 0.953 1.362

axis.py:181(draw) 0.118 1.500 0.412 0.569

text.py:504(draw) 0.056 1.353 0.290 0.287

mathtext.py:2765(init) 0.000 1.018 0.104 0.103

mathtext.py:2772(parse) — 1.294 0.143 0.254

pyparsing.py:1018(parseString) — 0.215 0.216 0.221

pyparsing.py:3129(oneOf) — 0.991 — —

pyparsing.py:3147() — 0.358 — —

lines.py:918(_draw_solid) 0.243 0.358 0.234 0.352

=================================================================

It seems that semilogx could be made as fast as semilogy since they have

to do the same amount of work, but I’m not sure where the differences

lie. Can anyone suggest where I should look first?

Much thanks,

Andrew Hawryluk

matplotlib.version = ‘0.99.1’

Windows XP Professional

Version 2002, Service Pack 3

Intel Pentium 4 CPU 3.00 GHz, 2.99 GHz, 0.99 GB of RAM

================================================================================

Platform : Linux-2.6.31.9-174.fc12.i686.PAE-i686-with-fedora-12-Constantine
Python : (‘CPython’, ‘tags/r262’, ‘71600’)
NumPy : 1.5.0.dev8038
Matplotlib : 1.0.svn

–
Gökhan

_Andrew_Hawryluk · March 19, 2010, 4:25pm

Hello,

How did you get the cumtime listing? The output of the run doesn't produce a
cumulative sum table as you showed here.

Gökhan

No, it doesn't. The output of the run is four huge cProfile listings,
one for each plotting command tested. I manually searched the data for
long cumtime's that differed between the plots and typed the table myself.

I have also confirmed the speed differences on matplotlib 0.99.0 under
Ubuntu 9.10:

plot 0.629 CPU seconds
semilogx 3.430 CPU seconds
semilogy 1.044 CPU seconds
loglog 1.479 CPU seconds

I'll try to figure out why semilogx uses so much more mathtext than
semilogy, but if anyone familiar with the code is curious enough to
look into it they will probably beat me to the answer.

Andrew

Michael_Droettboom · March 19, 2010, 4:39pm

This is indeed a very interesting result and I am able to reproduce similar ratios for total running time.

However, I think the semilogx result is somewhat of a red herring. If you change the order of the tests in your script, you'll notice that the first "*log*" plot always takes the longest run time. If you run each test in a separate process, all of the "*log*" run times are approximately equal (with loglog being slightly slower). The reason for this is the caching of mathtext expressions. I agree that mathtext is the bottleneck -- but mathtext expressions are only parsed and rendered the first time they are encountered, and simply pulled from a cache after that.

It's sort of a "known issue" that mathtext is slow-ish. It's a very function-call heavy and object-oriented bit of code and most attempts at optimization seem to lead to too much uglification. The algorithms themselves are from TeX, so I don't know if there's much room for improvement, but there is something about the translation from Pascal/C to Python that creates a very different performance profile.

An interesting result may be to disable the mathtext rendering for log plots (by setting the axis formatters to something static) and comparing those numbers. That would give a better sense of the overhead of merely log-transforming the points and the transformation system itself. I don't think a factor of 2 is too problematic, given all of the extra work that has to be done to maintain two copies of the data, extra care to calculate xlim and ylim etc.

Mike

Andrew Hawryluk wrote:

···

I've observed a significant difference in the time required by different
plotting functions. With a plot of 5000 random data points (all
positive, non-zero), plt.semilogx takes 3.5 times as long as plt.plot.
(Data for the case of saving to PDF, ratio changes to about 3.1 for PNG
on my machine.)

I used cProfile (script attached) and found several significant
differences between the profiles of each plotting command. On my first
analysis, it appears that most of the difference is due to increased use
of mathtext in semilogx:

                                ==================================
                                Plotting command

cumtime (s) plot semilogx semilogy loglog

total running time 0.618 2.192 0.953 1.362
axis.py:181(draw) 0.118 1.500 0.412 0.569
text.py:504(draw) 0.056 1.353 0.290 0.287
mathtext.py:2765(__init__) 0.000 1.018 0.104 0.103
mathtext.py:2772(parse) --- 1.294 0.143 0.254
pyparsing.py:1018(parseString) --- 0.215 0.216 0.221
pyparsing.py:3129(oneOf) --- 0.991 --- ---
pyparsing.py:3147(<lambda>) --- 0.358 --- ---
lines.py:918(_draw_solid) 0.243 0.358 0.234 0.352

It seems that semilogx could be made as fast as semilogy since they have
to do the same amount of work, but I'm not sure where the differences
lie. Can anyone suggest where I should look first?

Much thanks,

Andrew Hawryluk

matplotlib.__version__ = '0.99.1'
Windows XP Professional
Version 2002, Service Pack 3
Intel Pentium 4 CPU 3.00 GHz, 2.99 GHz, 0.99 GB of RAM
  ------------------------------------------------------------------------

------------------------------------------------------------------------------
Download Intel® Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
------------------------------------------------------------------------

_______________________________________________
Matplotlib-devel mailing list
Matplotlib-devel@lists.sourceforge.net
matplotlib-devel List Signup and Options

--
Michael Droettboom
Science Software Branch
Operations and Engineering Division
Space Telescope Science Institute
Operated by AURA for NASA

performance (speed) of logarithmic plots

================================== Plotting command

cumtime (s) plot semilogx semilogy loglog

Platform : Linux-2.6.31.9-174.fc12.i686.PAE-i686-with-fedora-12-Constantine Python : (‘CPython’, ‘tags/r262’, ‘71600’) NumPy : 1.5.0.dev8038 Matplotlib : 1.0.svn

================================== Plotting command

cumtime (s) plot semilogx semilogy loglog

==================================
Plotting command

Platform : Linux-2.6.31.9-174.fc12.i686.PAE-i686-with-fedora-12-Constantine
Python : (‘CPython’, ‘tags/r262’, ‘71600’)
NumPy : 1.5.0.dev8038
Matplotlib : 1.0.svn

==================================
Plotting command