Hi all,
A little background: I am from the space physics field where a lot of people watch/analyze satellite data for a living. This is a field currently dominated by IDL in terms of visualization/analysis software. I was a happy IDL user until I saw those very, very, I mean, seriously, very, very pretty matplotlib plots a couple of weeks ago. Although I was happy with IDL most of the time, I always hated the feel of IDL plots on screen.
So, I decided to make my move from IDL to python + numpy + scipy + matplotlib. However, this is not a trivial move. One major thing that makes me stick to IDL in the first place is the Tplot package (bundled into THEMIS Data Analysis Software, a.k.a., TDAS) developed at my own lab, the Space Sciences Lab at UC Berkeley. I must have something equivalent to Tplot to work efficiently on the python platform. In order to do that, there are two problems to solve. First, a utility module is required to load data that are in NASA CDF format. Second, a 2D plotting application is required with the following features: 1) Able to handle large amount vector data, 2) able to display spectrogram with log scale axis quickly, and 3) convenient toolbar to navigate the data.
I have written a module that can quickly load data in CDF files in cython, with help from the cython and the numpy communities. I have also gotten the third plotting feature working with a customized navigation toolbar, thanks to the help I received in this mailing list. However, I haven’t figured out how to get the first two plotting features. Matplotlib is known for its slow speed when it comes to large data sets. However, it seems some other packages can plot large data sets very fast, although not as pretty as matplotlib. So, I am wondering what makes matplotlib so slow. Is it because the anti-aliasing engine? If so, is it possible to turn it on or off flexibly to compromise between performance and quality? Also, is it possible to convert the bottle-neck bit of the code into cython to speed up matplotlib? As for spectrograms with log scale axis, I found a working solution from Stack Overflow, but it is simply too slow. So, again, why is it so slow?
So, for my purposes, my real problem now is the slow speed of matplotlib. I tried other packages, such as pyqtgraph, pyqwt, and Chaco/Traits. They seem to be faster, but they have serious problems too. Pyqtgraph seems very promising, but it seems to be in an infant stage for now with serious bugs. For example, I can’t get it working together with matplotlib. PyQwt/guiqwt is reasonably robust, but it has too many dependencies in my opinion, and doesn’t seem to have a wide user base. Chaco/Traits seems another viable possibility, especially considering the fact that it is actually supported by a company, but I didn’t get a chance to see their performance and quality because I can’t install Enable, a necessary bit for Chaco, on my mac. (But the fact that Chaco/Traits is supported by a real company is a real plus to me. If I can’t eventually speed up matplotlib, I will probably give it another shot.)
I have one idea to speed up line plots in matplotlib on screen, which is basically down-sampling the data before plotting. Basically, my idea is to down-sample the data into a level that one pixel only corresponds to one data point. Apparently, one must have enough information to determine the mapping between the data and the pixels on screen. However, such an overhead is just to maintain some house-keeping information, which I suppose is minimal.
I have no idea how to speed up the log-scale spectrogram plot at the moment.
So, the bottom line: What are the options to speed up matplotlib? Your comments and insights are very much appreciated.
Thank you for reading.
Cheers,
Jianbao