Eric Firing wrote:
Mike,
I made a quick test and took a quick look, and I certainly see a ripe mango within reach. I don't know what your constraints and strategy are, but I thought I would give you the off-the-cuff idea before I forget what I did.
The test was pcolortest.py, and the kcachegrind input is the .log file.
The problem is the path initializer: it is converting everything to a masked array, which in the vast majority of cases is not needed, and is very costly.
Thanks for finding this. I agree completely. I think that was basically a typo that ended up "working", just suboptimally. The input to the path constructor may be either a numpy array, an ma array or a regular Python sequence. If it's the first two, it should be left alone (if there is an array mask, it is dealt with later on in the constructor), but if the latter, it should be converted to a numpy array.
What I meant to type was:
if not ma.isMaskedArray(vertices):
vertices = npy.asarray(vertices, npy.float_)
The argument against just "npy.asarray(vertices, npy.float_)" is that the mask needs to be preserved.
If I understand correctly, that will be essentially a no-op when the input is a numpy array, albeit with the overhead of some checks.
We need to think carefully about the levels of API, and what should be done at which levels. One possibility is that at the level of the path initializer, only ordinary ndarrays should be acceptable--any mask manipulations and compressions should already have been done. This would require a helper function to generate the codes for that case. Another is that the path initializer could get a flag telling it whether to check for masked arrays. And another is that a check for existance of a mask should be done at the start, and the mask processing done only if there is a mask.
This option was the intent.
Yet another is that if a mask is needed, it be passed in as an optional 1-D array kwarg. An advantage of this is that the code that calls the path initializer may be in a better position to know what is needed to generate the 1-D mask (that is, a mask for each (x,y) point rather than for x and y separately)--that mask may already be sitting around.
Many of these options I fear would significantly complicate the code. One of the driving motivations for the refactoring is to allow transformations to be combined more generally. Think of the case where you have a polar plot with a logarithmic scale on the r-axis (this wasn't ever possible in the trunk). The log scale means that there is potential for negative masked values, but the polar part of the transformation shouldn't have to know or care whether masked values are being passed through. Requiring it to do so would need the same checks currently performed in the Path constructor, but they would be copied all over the code in every kind of new transformation.
FWIW, there already is a deliberate "quarantining" of masked arrays -- it happens where the logical elements of the plot hit the drawing commands of the plot (the Path object). It could have been implemented such that the backends must understand masked arrays and draw accordingly, but it proved to be faster (based on the simple_plot_fps.py benchmark) to convert to a non-masked array with MOVETO codes upfront and reuse that. (Not surprising, given the overhead of masked arrays). This means that masked arrays are not used at all during panning and zooming operations where speed is perhaps the most crucial.
Masked arrays are pretty clunky and slow. The maskedarray implementation by Pierre GM is nicer, more complete, and faster for many operations than numpy.ma, but it still adds a lot of overhead, especially for small arrays. (It needs to have its core in C; so far I have failed dismally in trying to understand how to do that without repeating the bulk of the ndarray code.)
A related point: can you (or is it OK if I do it) change all the "import numpy.ma as ma" or whatever to "from matplotlib.numerix import npyma as ma"? The advantage is that it makes it easy to test the new version with either maskedarray or ma. This should be temporary; I am still hoping and expecting that maskedarray will replace ma in the core numpy distribution.
That sounds like a very good idea. I'll go ahead and do this (on the branch only).
Cheers,
Mike
···
--
Michael Droettboom
Science Software Branch
Operations and Engineering Division
Space Telescope Science Institute
Operated by AURA for NASA