Options for speeding up matplotlib, spectrogram with log scale axis, etc.

Hi all,

A little background: I am from the space physics field where a lot of people watch/analyze satellite data for a living. This is a field currently dominated by IDL in terms of visualization/analysis software. I was a happy IDL user until I saw those very, very, I mean, seriously, very, very pretty matplotlib plots a couple of weeks ago. Although I was happy with IDL most of the time, I always hated the feel of IDL plots on screen.

So, I decided to make my move from IDL to python + numpy + scipy + matplotlib. However, this is not a trivial move. One major thing that makes me stick to IDL in the first place is the Tplot package (bundled into THEMIS Data Analysis Software, a.k.a., TDAS) developed at my own lab, the Space Sciences Lab at UC Berkeley. I must have something equivalent to Tplot to work efficiently on the python platform. In order to do that, there are two problems to solve. First, a utility module is required to load data that are in NASA CDF format. Second, a 2D plotting application is required with the following features: 1) Able to handle large amount vector data, 2) able to display spectrogram with log scale axis quickly, and 3) convenient toolbar to navigate the data.

I have written a module that can quickly load data in CDF files in cython, with help from the cython and the numpy communities. I have also gotten the third plotting feature working with a customized navigation toolbar, thanks to the help I received in this mailing list. However, I haven’t figured out how to get the first two plotting features. Matplotlib is known for its slow speed when it comes to large data sets. However, it seems some other packages can plot large data sets very fast, although not as pretty as matplotlib. So, I am wondering what makes matplotlib so slow. Is it because the anti-aliasing engine? If so, is it possible to turn it on or off flexibly to compromise between performance and quality? Also, is it possible to convert the bottle-neck bit of the code into cython to speed up matplotlib? As for spectrograms with log scale axis, I found a working solution from Stack Overflow, but it is simply too slow. So, again, why is it so slow?

So, for my purposes, my real problem now is the slow speed of matplotlib. I tried other packages, such as pyqtgraph, pyqwt, and Chaco/Traits. They seem to be faster, but they have serious problems too. Pyqtgraph seems very promising, but it seems to be in an infant stage for now with serious bugs. For example, I can’t get it working together with matplotlib. PyQwt/guiqwt is reasonably robust, but it has too many dependencies in my opinion, and doesn’t seem to have a wide user base. Chaco/Traits seems another viable possibility, especially considering the fact that it is actually supported by a company, but I didn’t get a chance to see their performance and quality because I can’t install Enable, a necessary bit for Chaco, on my mac. (But the fact that Chaco/Traits is supported by a real company is a real plus to me. If I can’t eventually speed up matplotlib, I will probably give it another shot.)

I have one idea to speed up line plots in matplotlib on screen, which is basically down-sampling the data before plotting. Basically, my idea is to down-sample the data into a level that one pixel only corresponds to one data point. Apparently, one must have enough information to determine the mapping between the data and the pixels on screen. However, such an overhead is just to maintain some house-keeping information, which I suppose is minimal.

I have no idea how to speed up the log-scale spectrogram plot at the moment. :frowning:

So, the bottom line: What are the options to speed up matplotlib? Your comments and insights are very much appreciated. :slight_smile:

Thank you for reading.

Cheers,

Jianbao

Hi Jianbao,

One option for getting Chaco is to install the Enthought python disctribution

http://www.enthought.com/

you can see from their package index, they install Chaco (and all needed libraries to make it work)

http://www.enthought.com/products/epdlibraries.php

If you have an email ending in ".edu" you can automatically get their academic version (fully functioning version - you just have to verify you are doing academic research). Since you mentioned you were at UC Berkeley, I assume you have .edu.
Their python installation works nicely, and installs itself in /Library/Frameworks/Python.framework/ so it plays nicely with the Mac GUI environment. Also, it will not overwrite any other installation you have - it makes its own install dir.

UNFORTUNATELY - at the moment, it appears they are writing their new academic software licenses, so you can not download it right now. But there message promises it will soon be available again.

I have found the Enthought installation to be MUCH more reliable than FINK or MacPorts (Enthought is also a private company - hence the quality installers etc, and they like to support academic work).

Cheers,

Andre

···

On Oct 8, 2012, at 10:55 AM, Jianbao Tao wrote:

Hi all,

A little background: I am from the space physics field where a lot of people watch/analyze satellite data for a living. This is a field currently dominated by IDL in terms of visualization/analysis software. I was a happy IDL user until I saw those very, very, I mean, seriously, very, very pretty matplotlib plots a couple of weeks ago. Although I was happy with IDL most of the time, I always hated the feel of IDL plots on screen.

So, I decided to make my move from IDL to python + numpy + scipy + matplotlib. However, this is not a trivial move. One major thing that makes me stick to IDL in the first place is the Tplot package (bundled into THEMIS Data Analysis Software, a.k.a., TDAS) developed at my own lab, the Space Sciences Lab at UC Berkeley. I must have something equivalent to Tplot to work efficiently on the python platform. In order to do that, there are two problems to solve. First, a utility module is required to load data that are in NASA CDF format. Second, a 2D plotting application is required with the following features: 1) Able to handle large amount vector data, 2) able to display spectrogram with log scale axis quickly, and 3) convenient toolbar to navigate the data.

I have written a module that can quickly load data in CDF files in cython, with help from the cython and the numpy communities. I have also gotten the third plotting feature working with a customized navigation toolbar, thanks to the help I received in this mailing list. However, I haven't figured out how to get the first two plotting features. Matplotlib is known for its slow speed when it comes to large data sets. However, it seems some other packages can plot large data sets very fast, although not as pretty as matplotlib. So, I am wondering what makes matplotlib so slow. Is it because the anti-aliasing engine? If so, is it possible to turn it on or off flexibly to compromise between performance and quality? Also, is it possible to convert the bottle-neck bit of the code into cython to speed up matplotlib? As for spectrograms with log scale axis, I found a working solution from Stack Overflow, but it is simply too slow. So, again, why is it so slow?

So, for my purposes, my real problem now is the slow speed of matplotlib. I tried other packages, such as pyqtgraph, pyqwt, and Chaco/Traits. They seem to be faster, but they have serious problems too. Pyqtgraph seems very promising, but it seems to be in an infant stage for now with serious bugs. For example, I can't get it working together with matplotlib. PyQwt/guiqwt is reasonably robust, but it has too many dependencies in my opinion, and doesn't seem to have a wide user base. Chaco/Traits seems another viable possibility, especially considering the fact that it is actually supported by a company, but I didn't get a chance to see their performance and quality because I can't install Enable, a necessary bit for Chaco, on my mac. (But the fact that Chaco/Traits is supported by a real company is a real plus to me. If I can't eventually speed up matplotlib, I will probably give it another shot.)

I have one idea to speed up line plots in matplotlib on screen, which is basically down-sampling the data before plotting. Basically, my idea is to down-sample the data into a level that one pixel only corresponds to one data point. Apparently, one must have enough information to determine the mapping between the data and the pixels on screen. However, such an overhead is just to maintain some house-keeping information, which I suppose is minimal.

I have no idea how to speed up the log-scale spectrogram plot at the moment. :frowning:

So, the bottom line: What are the options to speed up matplotlib? Your comments and insights are very much appreciated. :slight_smile:

Thank you for reading.

Cheers,
Jianbao
------------------------------------------------------------------------------
Don't let slow site performance ruin your business. Deploy New Relic APM
Deploy New Relic app performance management and know exactly
what is happening inside your Ruby, Python, PHP, Java, and .NET app
Try New Relic at no cost today and get our sweet Data Nerd shirt too!
http://p.sf.net/sfu/newrelic-dev2dev_______________________________________________
Matplotlib-users mailing list
Matplotlib-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/matplotlib-users

Hi all,

A little background: I am from the space physics field where a lot of
people watch/analyze satellite data for a living. This is a field
currently dominated by IDL in terms of visualization/analysis software.
I was a happy IDL user until I saw those very, very, I mean, seriously,
very, very pretty matplotlib plots a couple of weeks ago. Although I was
happy with IDL most of the time, I always hated the feel of IDL plots on
screen.

So, I decided to make my move from IDL to python + numpy + scipy +
matplotlib. However, this is not a trivial move. One major thing that
makes me stick to IDL in the first place is the Tplot package (bundled
into THEMIS Data Analysis Software, a.k.a.,TDAS
<http://themis.ssl.berkeley.edu/software.shtml>) developed at my own
lab, the Space Sciences Lab at UC Berkeley. I must have something
equivalent to Tplot to work efficiently on the python platform. In order
to do that, there are two problems to solve. First, a utility module is
required to load data that are in NASA CDF format. Second, a 2D plotting
application is required with the following features: 1) Able to handle
large amount vector data, 2) able to display spectrogram with log scale
axis quickly, and 3) convenient toolbar to navigate the data.

I have written a module that can quickly load data in CDF files in
cython, with help from the cython and the numpy communities. I have also
gotten the third plotting feature working with a customized navigation
toolbar, thanks to the help I received in this mailing list. However, I
haven't figured out how to get the first two plotting features.
Matplotlib is known for its slow speed when it comes to large data sets.
However, it seems some other packages can plot large data sets very
fast, although not as pretty as matplotlib. So, I am wondering what
makes matplotlib so slow. Is it because the anti-aliasing engine? If so,
is it possible to turn it on or off flexibly to compromise between
performance and quality? Also, is it possible to convert the bottle-neck
bit of the code into cython to speed up matplotlib? As for spectrograms
with log scale axis, I found a working solution fromStack Overflow
<http://stackoverflow.com/questions/10812189/creating-a-log-frequency-axis-spectrogram-using-specgram-in-matplotlib>,
but it is simply too slow. So, again, why is it so slow?

So, for my purposes, my real problem now is the slow speed of
matplotlib. I tried other packages, such as pyqtgraph, pyqwt, and
Chaco/Traits. They seem to be faster, but they have serious problems
too. Pyqtgraph seems very promising, but it seems to be in an infant
stage for now with serious bugs. For example, I can't get it working
together with matplotlib. PyQwt/guiqwt is reasonably robust, but it has
too many dependencies in my opinion, and doesn't seem to have a wide
user base. Chaco/Traits seems another viable possibility, especially
considering the fact that it is actually supported by a company, but I
didn't get a chance to see their performance and quality because I can't
install Enable, a necessary bit for Chaco, on my mac. (But the fact that
Chaco/Traits is supported by a real company is a real plus to me. If I
can't eventually speed up matplotlib, I will probably give it another shot.)

I have one idea to speed up line plots in matplotlib on screen, which is
basically down-sampling the data before plotting. Basically, my idea is
to down-sample the data into a level that one pixel only corresponds to
one data point. Apparently, one must have enough information to
determine the mapping between the data and the pixels on screen.
However, such an overhead is just to maintain some house-keeping
information, which I suppose is minimal.

I have no idea how to speed up the log-scale spectrogram plot at the
moment. :frowning:

For each type of plot, I suggest you provide a very minimal script, generating its own fake data, that illustrates the problem and that can serve as a benchmark and test jig for speed-ups. Without these examples, it is somewhere between difficult and impossible for anyone to make useful suggestions.

Eric

···

On 2012/10/08 7:55 AM, Jianbao Tao wrote:

So, the bottom line: What are the options to speed up matplotlib? Your
comments and insights are very much appreciated. :slight_smile:

Thank you for reading.

Cheers,
Jianbao

------------------------------------------------------------------------------
Don't let slow site performance ruin your business. Deploy New Relic APM
Deploy New Relic app performance management and know exactly
what is happening inside your Ruby, Python, PHP, Java, and .NET app
Try New Relic at no cost today and get our sweet Data Nerd shirt too!
http://p.sf.net/sfu/newrelic-dev2dev

_______________________________________________
Matplotlib-users mailing list
Matplotlib-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/matplotlib-users

Hi Andre,

Thanks for your message. I like it. :slight_smile:

I do have a .edu email. I didn’t try to install Chaco with EPD because I tend to be skeptical when it comes to a bundled package with a lot of stuff. I like it to be as simple as possible. But it seems that I am probably better off to install EPD as a whole.

Cheers,

Jianbao

···

On Mon, Oct 8, 2012 at 11:17 AM, Andre’ Walker-Loud <walksloud@…287…> wrote:

Hi Jianbao,

One option for getting Chaco is to install the Enthought python disctribution

http://www.enthought.com/

you can see from their package index, they install Chaco (and all needed libraries to make it work)

http://www.enthought.com/products/epdlibraries.php

If you have an email ending in “.edu” you can automatically get their academic version (fully functioning version - you just have to verify you are doing academic research). Since you mentioned you were at UC Berkeley, I assume you have .edu.

Their python installation works nicely, and installs itself in /Library/Frameworks/Python.framework/ so it plays nicely with the Mac GUI environment. Also, it will not overwrite any other installation you have - it makes its own install dir.

UNFORTUNATELY - at the moment, it appears they are writing their new academic software licenses, so you can not download it right now. But there message promises it will soon be available again.

I have found the Enthought installation to be MUCH more reliable than FINK or MacPorts (Enthought is also a private company - hence the quality installers etc, and they like to support academic work).

Cheers,

Andre

On Oct 8, 2012, at 10:55 AM, Jianbao Tao wrote:

Hi all,

A little background: I am from the space physics field where a lot of people watch/analyze satellite data for a living. This is a field currently dominated by IDL in terms of visualization/analysis software. I was a happy IDL user until I saw those very, very, I mean, seriously, very, very pretty matplotlib plots a couple of weeks ago. Although I was happy with IDL most of the time, I always hated the feel of IDL plots on screen.

So, I decided to make my move from IDL to python + numpy + scipy + matplotlib. However, this is not a trivial move. One major thing that makes me stick to IDL in the first place is the Tplot package (bundled into THEMIS Data Analysis Software, a.k.a., TDAS) developed at my own lab, the Space Sciences Lab at UC Berkeley. I must have something equivalent to Tplot to work efficiently on the python platform. In order to do that, there are two problems to solve. First, a utility module is required to load data that are in NASA CDF format. Second, a 2D plotting application is required with the following features: 1) Able to handle large amount vector data, 2) able to display spectrogram with log scale axis quickly, and 3) convenient toolbar to navigate the data.

I have written a module that can quickly load data in CDF files in cython, with help from the cython and the numpy communities. I have also gotten the third plotting feature working with a customized navigation toolbar, thanks to the help I received in this mailing list. However, I haven’t figured out how to get the first two plotting features. Matplotlib is known for its slow speed when it comes to large data sets. However, it seems some other packages can plot large data sets very fast, although not as pretty as matplotlib. So, I am wondering what makes matplotlib so slow. Is it because the anti-aliasing engine? If so, is it possible to turn it on or off flexibly to compromise between performance and quality? Also, is it possible to convert the bottle-neck bit of the code into cython to speed up matplotlib? As for spectrograms with log scale axis, I found a working solution from Stack Overflow, but it is simply too slow. So, again, why is it so slow?

So, for my purposes, my real problem now is the slow speed of matplotlib. I tried other packages, such as pyqtgraph, pyqwt, and Chaco/Traits. They seem to be faster, but they have serious problems too. Pyqtgraph seems very promising, but it seems to be in an infant stage for now with serious bugs. For example, I can’t get it working together with matplotlib. PyQwt/guiqwt is reasonably robust, but it has too many dependencies in my opinion, and doesn’t seem to have a wide user base. Chaco/Traits seems another viable possibility, especially considering the fact that it is actually supported by a company, but I didn’t get a chance to see their performance and quality because I can’t install Enable, a necessary bit for Chaco, on my mac. (But the fact that Chaco/Traits is supported by a real company is a real plus to me. If I can’t eventually speed up matplotlib, I will probably give it another shot.)

I have one idea to speed up line plots in matplotlib on screen, which is basically down-sampling the data before plotting. Basically, my idea is to down-sample the data into a level that one pixel only corresponds to one data point. Apparently, one must have enough information to determine the mapping between the data and the pixels on screen. However, such an overhead is just to maintain some house-keeping information, which I suppose is minimal.

I have no idea how to speed up the log-scale spectrogram plot at the moment. :frowning:

So, the bottom line: What are the options to speed up matplotlib? Your comments and insights are very much appreciated. :slight_smile:

Thank you for reading.

Cheers,

Jianbao


Don’t let slow site performance ruin your business. Deploy New Relic APM

Deploy New Relic app performance management and know exactly

what is happening inside your Ruby, Python, PHP, Java, and .NET app

Try New Relic at no cost today and get our sweet Data Nerd shirt too!

http://p.sf.net/sfu/newrelic-dev2dev_______________________________________________

Matplotlib-users mailing list

Matplotlib-users@lists.sourceforge.net

https://lists.sourceforge.net/lists/listinfo/matplotlib-users

Hi Jianbao,

I used to try and install my python suite from src code on my own.
Somewhere between the Mac OS 10.5, 10.6, migrating accounts, my python installation broke, and I never could get it all working again. Something related to 10.6 didn't have full backwards compatibility because of the switch to 64 bit architecture, so my binaries stopped working... many long frustrating days trying to figure it out. I eventually went to a friend of mine who does computing support for an astrophysics group, to get help solving my installation problems. He said, "Do you know about the Enthought python distribution?"

So that changed my philosophy. If my computer-wiz-friend uses Enthought, I have no excuse not to.
I have been happier ever since :slight_smile:

Also - I have recently come to love HDF5 (Hierarchical Data Format (Version) 5), which is a smart binary database with smart metadata mapping (maybe good for your research). Eg. on the big machines at NERSC, Livermore, Argonne, etc (meaning next generation super computers) HDF5 is one of the pieces of software they use to benchmark the performance of their file systems, and make sure this code scales to work with these new architectures. HDF5 is also professionally maintained. And the Enthought distribution comes with HDF5 and two python interfaces to it. From your description, I thought maybe you guys already use this. And if not, maybe it is worth looking into.

Cheers,

Andre

···

On Oct 8, 2012, at 11:32 AM, Jianbao Tao wrote:

Hi Andre,

Thanks for your message. I like it. :slight_smile:

I do have a .edu email. I didn't try to install Chaco with EPD because I tend to be skeptical when it comes to a bundled package with a lot of stuff. I like it to be as simple as possible. But it seems that I am probably better off to install EPD as a whole.

Cheers,
Jianbao

On Mon, Oct 8, 2012 at 11:17 AM, Andre' Walker-Loud <walksloud@...287...> wrote:
Hi Jianbao,

One option for getting Chaco is to install the Enthought python disctribution

http://www.enthought.com/

you can see from their package index, they install Chaco (and all needed libraries to make it work)

http://www.enthought.com/products/epdlibraries.php

If you have an email ending in ".edu" you can automatically get their academic version (fully functioning version - you just have to verify you are doing academic research). Since you mentioned you were at UC Berkeley, I assume you have .edu.
Their python installation works nicely, and installs itself in /Library/Frameworks/Python.framework/ so it plays nicely with the Mac GUI environment. Also, it will not overwrite any other installation you have - it makes its own install dir.

UNFORTUNATELY - at the moment, it appears they are writing their new academic software licenses, so you can not download it right now. But there message promises it will soon be available again.

I have found the Enthought installation to be MUCH more reliable than FINK or MacPorts (Enthought is also a private company - hence the quality installers etc, and they like to support academic work).

Cheers,

Andre

On Oct 8, 2012, at 10:55 AM, Jianbao Tao wrote:

> Hi all,
>
> A little background: I am from the space physics field where a lot of people watch/analyze satellite data for a living. This is a field currently dominated by IDL in terms of visualization/analysis software. I was a happy IDL user until I saw those very, very, I mean, seriously, very, very pretty matplotlib plots a couple of weeks ago. Although I was happy with IDL most of the time, I always hated the feel of IDL plots on screen.
>
> So, I decided to make my move from IDL to python + numpy + scipy + matplotlib. However, this is not a trivial move. One major thing that makes me stick to IDL in the first place is the Tplot package (bundled into THEMIS Data Analysis Software, a.k.a., TDAS) developed at my own lab, the Space Sciences Lab at UC Berkeley. I must have something equivalent to Tplot to work efficiently on the python platform. In order to do that, there are two problems to solve. First, a utility module is required to load data that are in NASA CDF format. Second, a 2D plotting application is required with the following features: 1) Able to handle large amount vector data, 2) able to display spectrogram with log scale axis quickly, and 3) convenient toolbar to navigate the data.
>
> I have written a module that can quickly load data in CDF files in cython, with help from the cython and the numpy communities. I have also gotten the third plotting feature working with a customized navigation toolbar, thanks to the help I received in this mailing list. However, I haven't figured out how to get the first two plotting features. Matplotlib is known for its slow speed when it comes to large data sets. However, it seems some other packages can plot large data sets very fast, although not as pretty as matplotlib. So, I am wondering what makes matplotlib so slow. Is it because the anti-aliasing engine? If so, is it possible to turn it on or off flexibly to compromise between performance and quality? Also, is it possible to convert the bottle-neck bit of the code into cython to speed up matplotlib? As for spectrograms with log scale axis, I found a working solution from Stack Overflow, but it is simply too slow. So, again, why is it so slow?
>
> So, for my purposes, my real problem now is the slow speed of matplotlib. I tried other packages, such as pyqtgraph, pyqwt, and Chaco/Traits. They seem to be faster, but they have serious problems too. Pyqtgraph seems very promising, but it seems to be in an infant stage for now with serious bugs. For example, I can't get it working together with matplotlib. PyQwt/guiqwt is reasonably robust, but it has too many dependencies in my opinion, and doesn't seem to have a wide user base. Chaco/Traits seems another viable possibility, especially considering the fact that it is actually supported by a company, but I didn't get a chance to see their performance and quality because I can't install Enable, a necessary bit for Chaco, on my mac. (But the fact that Chaco/Traits is supported by a real company is a real plus to me. If I can't eventually speed up matplotlib, I will probably give it another shot.)
>
> I have one idea to speed up line plots in matplotlib on screen, which is basically down-sampling the data before plotting. Basically, my idea is to down-sample the data into a level that one pixel only corresponds to one data point. Apparently, one must have enough information to determine the mapping between the data and the pixels on screen. However, such an overhead is just to maintain some house-keeping information, which I suppose is minimal.
>
> I have no idea how to speed up the log-scale spectrogram plot at the moment. :frowning:
>
> So, the bottom line: What are the options to speed up matplotlib? Your comments and insights are very much appreciated. :slight_smile:
>
> Thank you for reading.
>
> Cheers,
> Jianbao
> ------------------------------------------------------------------------------
> Don't let slow site performance ruin your business. Deploy New Relic APM
> Deploy New Relic app performance management and know exactly
> what is happening inside your Ruby, Python, PHP, Java, and .NET app
> Try New Relic at no cost today and get our sweet Data Nerd shirt too!
> http://p.sf.net/sfu/newrelic-dev2dev_______________________________________________
> Matplotlib-users mailing list
> Matplotlib-users@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/matplotlib-users

Hi Eric,

Not sure if this is exactly what Jinbao is referring to, but:

ax=gca()
X = randn(300,300)
pcolor(X)
ax.set_yscale('log')
ylim(1,100)

Brings my computer to a standstill for over a minute. Removing the "set_yscale" command speeds things up a lot.

Same thing takes about 2 s in Matlab.

Cheers, Jody

···

On Oct 8, 2012, at 11:29 AM, Eric Firing <efiring@...202...> wrote:

On 2012/10/08 7:55 AM, Jianbao Tao wrote:

Hi all,

A little background: I am from the space physics field where a lot of
people watch/analyze satellite data for a living. This is a field
currently dominated by IDL in terms of visualization/analysis software.
I was a happy IDL user until I saw those very, very, I mean, seriously,
very, very pretty matplotlib plots a couple of weeks ago. Although I was
happy with IDL most of the time, I always hated the feel of IDL plots on
screen.

So, I decided to make my move from IDL to python + numpy + scipy +
matplotlib. However, this is not a trivial move. One major thing that
makes me stick to IDL in the first place is the Tplot package (bundled
into THEMIS Data Analysis Software, a.k.a.,TDAS
<http://themis.ssl.berkeley.edu/software.shtml>) developed at my own
lab, the Space Sciences Lab at UC Berkeley. I must have something
equivalent to Tplot to work efficiently on the python platform. In order
to do that, there are two problems to solve. First, a utility module is
required to load data that are in NASA CDF format. Second, a 2D plotting
application is required with the following features: 1) Able to handle
large amount vector data, 2) able to display spectrogram with log scale
axis quickly, and 3) convenient toolbar to navigate the data.

I have written a module that can quickly load data in CDF files in
cython, with help from the cython and the numpy communities. I have also
gotten the third plotting feature working with a customized navigation
toolbar, thanks to the help I received in this mailing list. However, I
haven't figured out how to get the first two plotting features.
Matplotlib is known for its slow speed when it comes to large data sets.
However, it seems some other packages can plot large data sets very
fast, although not as pretty as matplotlib. So, I am wondering what
makes matplotlib so slow. Is it because the anti-aliasing engine? If so,
is it possible to turn it on or off flexibly to compromise between
performance and quality? Also, is it possible to convert the bottle-neck
bit of the code into cython to speed up matplotlib? As for spectrograms
with log scale axis, I found a working solution fromStack Overflow
<http://stackoverflow.com/questions/10812189/creating-a-log-frequency-axis-spectrogram-using-specgram-in-matplotlib>,
but it is simply too slow. So, again, why is it so slow?

So, for my purposes, my real problem now is the slow speed of
matplotlib. I tried other packages, such as pyqtgraph, pyqwt, and
Chaco/Traits. They seem to be faster, but they have serious problems
too. Pyqtgraph seems very promising, but it seems to be in an infant
stage for now with serious bugs. For example, I can't get it working
together with matplotlib. PyQwt/guiqwt is reasonably robust, but it has
too many dependencies in my opinion, and doesn't seem to have a wide
user base. Chaco/Traits seems another viable possibility, especially
considering the fact that it is actually supported by a company, but I
didn't get a chance to see their performance and quality because I can't
install Enable, a necessary bit for Chaco, on my mac. (But the fact that
Chaco/Traits is supported by a real company is a real plus to me. If I
can't eventually speed up matplotlib, I will probably give it another shot.)

I have one idea to speed up line plots in matplotlib on screen, which is
basically down-sampling the data before plotting. Basically, my idea is
to down-sample the data into a level that one pixel only corresponds to
one data point. Apparently, one must have enough information to
determine the mapping between the data and the pixels on screen.
However, such an overhead is just to maintain some house-keeping
information, which I suppose is minimal.

I have no idea how to speed up the log-scale spectrogram plot at the
moment. :frowning:

For each type of plot, I suggest you provide a very minimal script,
generating its own fake data, that illustrates the problem and that can
serve as a benchmark and test jig for speed-ups. Without these
examples, it is somewhere between difficult and impossible for anyone to
make useful suggestions.

Eric

So, the bottom line: What are the options to speed up matplotlib? Your
comments and insights are very much appreciated. :slight_smile:

Thank you for reading.

Cheers,
Jianbao

------------------------------------------------------------------------------
Don't let slow site performance ruin your business. Deploy New Relic APM
Deploy New Relic app performance management and know exactly
what is happening inside your Ruby, Python, PHP, Java, and .NET app
Try New Relic at no cost today and get our sweet Data Nerd shirt too!
http://p.sf.net/sfu/newrelic-dev2dev

_______________________________________________
Matplotlib-users mailing list
Matplotlib-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/matplotlib-users

------------------------------------------------------------------------------
Don't let slow site performance ruin your business. Deploy New Relic APM
Deploy New Relic app performance management and know exactly
what is happening inside your Ruby, Python, PHP, Java, and .NET app
Try New Relic at no cost today and get our sweet Data Nerd shirt too!
http://p.sf.net/sfu/newrelic-dev2dev
_______________________________________________
Matplotlib-users mailing list
Matplotlib-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/matplotlib-users

--
Jody Klymak
http://web.uvic.ca/~jklymak/

Hi Eric,

Not sure if this is exactly what Jinbao is referring to, but:

ax=gca()
X = randn(300,300)
pcolor(X)
ax.set_yscale('log')
ylim(1,100)

Brings my computer to a standstill for over a minute. Removing the "set_yscale" command speeds things up a lot.

Same thing takes about 2 s in Matlab.

That's why I wanted to see an example of the problem. If you change "pcolor" above to "pcolormesh" you will see a big speedup. (0.008 s for the three plotting commands on my machine.)

pcolor is fundamentally slow, as you discovered, and should be used only for small arrays and/or when, for some reason, one really needs the Collection that it returns. The pcolormesh docstring notes that it is much faster than pcolor; the pcolor docstring probably should refer people to pcolormesh, since matlab users are likely to go straight to pcolor without realizing that they should be using pcolormesh. (For linear axes, the pcolorfast axes method can be even faster, but it does not support log axes at present.)

Eric

···

On 2012/10/08 6:21 PM, Jody Klymak wrote:

Cheers, Jody

On Oct 8, 2012, at 11:29 AM, Eric Firing <efiring@...202...> wrote:

On 2012/10/08 7:55 AM, Jianbao Tao wrote:

Hi all,

A little background: I am from the space physics field where a lot of
people watch/analyze satellite data for a living. This is a field
currently dominated by IDL in terms of visualization/analysis software.
I was a happy IDL user until I saw those very, very, I mean, seriously,
very, very pretty matplotlib plots a couple of weeks ago. Although I was
happy with IDL most of the time, I always hated the feel of IDL plots on
screen.

So, I decided to make my move from IDL to python + numpy + scipy +
matplotlib. However, this is not a trivial move. One major thing that
makes me stick to IDL in the first place is the Tplot package (bundled
into THEMIS Data Analysis Software, a.k.a.,TDAS
<http://themis.ssl.berkeley.edu/software.shtml>) developed at my own
lab, the Space Sciences Lab at UC Berkeley. I must have something
equivalent to Tplot to work efficiently on the python platform. In order
to do that, there are two problems to solve. First, a utility module is
required to load data that are in NASA CDF format. Second, a 2D plotting
application is required with the following features: 1) Able to handle
large amount vector data, 2) able to display spectrogram with log scale
axis quickly, and 3) convenient toolbar to navigate the data.

I have written a module that can quickly load data in CDF files in
cython, with help from the cython and the numpy communities. I have also
gotten the third plotting feature working with a customized navigation
toolbar, thanks to the help I received in this mailing list. However, I
haven't figured out how to get the first two plotting features.
Matplotlib is known for its slow speed when it comes to large data sets.
However, it seems some other packages can plot large data sets very
fast, although not as pretty as matplotlib. So, I am wondering what
makes matplotlib so slow. Is it because the anti-aliasing engine? If so,
is it possible to turn it on or off flexibly to compromise between
performance and quality? Also, is it possible to convert the bottle-neck
bit of the code into cython to speed up matplotlib? As for spectrograms
with log scale axis, I found a working solution fromStack Overflow
<http://stackoverflow.com/questions/10812189/creating-a-log-frequency-axis-spectrogram-using-specgram-in-matplotlib>,
but it is simply too slow. So, again, why is it so slow?

So, for my purposes, my real problem now is the slow speed of
matplotlib. I tried other packages, such as pyqtgraph, pyqwt, and
Chaco/Traits. They seem to be faster, but they have serious problems
too. Pyqtgraph seems very promising, but it seems to be in an infant
stage for now with serious bugs. For example, I can't get it working
together with matplotlib. PyQwt/guiqwt is reasonably robust, but it has
too many dependencies in my opinion, and doesn't seem to have a wide
user base. Chaco/Traits seems another viable possibility, especially
considering the fact that it is actually supported by a company, but I
didn't get a chance to see their performance and quality because I can't
install Enable, a necessary bit for Chaco, on my mac. (But the fact that
Chaco/Traits is supported by a real company is a real plus to me. If I
can't eventually speed up matplotlib, I will probably give it another shot.)

I have one idea to speed up line plots in matplotlib on screen, which is
basically down-sampling the data before plotting. Basically, my idea is
to down-sample the data into a level that one pixel only corresponds to
one data point. Apparently, one must have enough information to
determine the mapping between the data and the pixels on screen.
However, such an overhead is just to maintain some house-keeping
information, which I suppose is minimal.

I have no idea how to speed up the log-scale spectrogram plot at the
moment. :frowning:

For each type of plot, I suggest you provide a very minimal script,
generating its own fake data, that illustrates the problem and that can
serve as a benchmark and test jig for speed-ups. Without these
examples, it is somewhere between difficult and impossible for anyone to
make useful suggestions.

Eric

So, the bottom line: What are the options to speed up matplotlib? Your
comments and insights are very much appreciated. :slight_smile:

Thank you for reading.

Cheers,
Jianbao

------------------------------------------------------------------------------
Don't let slow site performance ruin your business. Deploy New Relic APM
Deploy New Relic app performance management and know exactly
what is happening inside your Ruby, Python, PHP, Java, and .NET app
Try New Relic at no cost today and get our sweet Data Nerd shirt too!
http://p.sf.net/sfu/newrelic-dev2dev

_______________________________________________
Matplotlib-users mailing list
Matplotlib-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/matplotlib-users

------------------------------------------------------------------------------
Don't let slow site performance ruin your business. Deploy New Relic APM
Deploy New Relic app performance management and know exactly
what is happening inside your Ruby, Python, PHP, Java, and .NET app
Try New Relic at no cost today and get our sweet Data Nerd shirt too!
http://p.sf.net/sfu/newrelic-dev2dev
_______________________________________________
Matplotlib-users mailing list
Matplotlib-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/matplotlib-users

--
Jody Klymak
http://web.uvic.ca/~jklymak/

------------------------------------------------------------------------------
Don't let slow site performance ruin your business. Deploy New Relic APM
Deploy New Relic app performance management and know exactly
what is happening inside your Ruby, Python, PHP, Java, and .NET app
Try New Relic at no cost today and get our sweet Data Nerd shirt too!
http://p.sf.net/sfu/newrelic-dev2dev
_______________________________________________
Matplotlib-users mailing list
Matplotlib-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/matplotlib-users

Hi Eric,

The pcolormesh docstring notes that it is
much faster than pcolor; the pcolor docstring probably should refer
people to pcolormesh, since matlab users are likely to go straight to
pcolor without realizing that they should be using pcolormesh.

I'd agree with this. pcolormesh is not even in the "See Also", and there is no warning about the effciency of pcolor.

I'd even go so far as to suggest that pcolor be deprecated so new users are more likely to find pcolormesh.

Anyway, thanks for the pointer!

Cheers, Jody

···

--
Jody Klymak
http://web.uvic.ca/~jklymak/

While we're on this subject, pcolorfast doesn't have a pyplot function and its documentation refers to pcolor rather than itself. Also pcolor and pcolormesh ought to See Also this (or is it still experimental?)

M

···

On 10/9/12 10:03 PM, Jody Klymak wrote:

Hi Eric,

  The pcolormesh docstring notes that it is
much faster than pcolor; the pcolor docstring probably should refer
people to pcolormesh, since matlab users are likely to go straight to
pcolor without realizing that they should be using pcolormesh.

I'd agree with this. pcolormesh is not even in the "See Also", and there is no warning about the effciency of pcolor.

I'd even go so far as to suggest that pcolor be deprecated so new users are more likely to find pcolormesh.

Hi Eric,

   The pcolormesh docstring notes that it is
much faster than pcolor; the pcolor docstring probably should refer
people to pcolormesh, since matlab users are likely to go straight to
pcolor without realizing that they should be using pcolormesh.

I'd agree with this. pcolormesh is not even in the "See Also", and there is no warning about the effciency of pcolor.

I'd even go so far as to suggest that pcolor be deprecated so new users are more likely to find pcolormesh.

While we're on this subject, pcolorfast doesn't have a pyplot function
and its documentation refers to pcolor rather than itself. Also pcolor
and pcolormesh ought to See Also this (or is it still experimental?)

pcolorfast is a bit odd in that it uses one of three different mechanisms depending on its inputs, and it is not quite as flexible about input dimensions as pcolormesh, so the lack of a pyplot function for it is deliberate.

Thanks for pointing out the references to pcolor in the docstring--I don't know how that has passed apparently unnoticed so long!

Eric

···

On 2012/10/09 4:46 PM, Mike Kaufman wrote:

On 10/9/12 10:03 PM, Jody Klymak wrote:

M

------------------------------------------------------------------------------
Don't let slow site performance ruin your business. Deploy New Relic APM
Deploy New Relic app performance management and know exactly
what is happening inside your Ruby, Python, PHP, Java, and .NET app
Try New Relic at no cost today and get our sweet Data Nerd shirt too!
http://p.sf.net/sfu/newrelic-dev2dev
_______________________________________________
Matplotlib-users mailing list
Matplotlib-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/matplotlib-users