Perry,
I would like to phase in matplotlib to replace Matlab ASAP for plotting physical oceanographic observations, primarily current profile measurements. I (and many other physical oceanographers) primarily use contourf to plot filled contours; I only rarely use line contours. It looks to me like gcntr.c has the necessary functionality--the ability to output polygons enclosing regions between a pair of specified levels. Is someone already working on exposing that functionality in matplotlib, or is it planned?
No one (as far as I know is working on it right now. It is in our plans to add this capability. As you correctly note, the underlying C code can handle this capability. I'm not sure how long it will be; right now the priority is to finish contour labeling capability, and the person working on that also has other work that competes with her time to do this. I'm guessing that she could start looking at it in a couple weeks. Of course, if someone wants to help now, that would be great.
I have started working on it. I don't know how far I will get; the necessary change to the c extension code was easy, but my first attempt to make a PolyCollection work in place of a Line Collection is failing. I will do a bit more research before asking for help, if necessary. (No promises--I don't have much time to work on this, and it is my first plunge into the innards of matplotlib.)
It appears that gcntr.c also has the ability to handle missing data via setting elements of the reg array to zero, and that this could be exposed fairly easily in the contour method in axes.py by adding "reg" to the set of kwargs. Correct? If so, is this also planned?
Correct. Yes (it is planned).
The question of missing data handling in contour plotting brings up the more general issue of how to handle data gaps in plots. For example, the ocean current profiles that I measure using a Doppler profiler extend to varying depths, and sometimes have holes in the middle where there are not enough acoustic scatterers to give a signal. This sort of thing--data gaps--is universal in physical oceanography. One of Matlab's major strengths is the way it handles them, using nan as a bad value flag. Plotting a line with the plot command, the line is broken at each nan; so if there is a hole in the data, the plot shows exactly that. The same for contouring: nans are automatically used as a mask.
Obviously, not everyone needs this kind of automatic handling of data gaps, but I think it would be very useful for many applications, so I hope it can be considered as a possible goal. At the plotting level, collections may make it easier to implement than would have been the case in the early days of matplotlib. At the array manipulation level, the implementation could involve either masked arrays or nans. I would greatly prefer the Matlab-style nan approach, but I don't know whether this would work with Numeric. Maybe in Numeric3? Numarray appears better equipped, with its ieeespecial.py module.
I think you touch on the key issue. I think we'd have to figure out how to handle this between Numeric and numarray (and Numeric3 potentially). Would a mask array be a suitable substitute as an interim solution?
Are you suggesting something like this? Let each plotting function have a new kwarg, perhaps called "validmask", with the same dimensions as the dependent variable to be plotted, and with nonzero where the variable is valid and 0 where it is missing. The mask would then be used (1) to limit the autoranging tests to the valid data, (2) in the case of line plotting, to break the line up into segments so that a LineCollection would be plotted, (3) in the case of contouring, to set the reg array, (4) for images or pcolors to similarly mask out the invalid regions with white, or transparent, or perhaps some settable color.
This could be implemented in matplotlib in a way that would not depend on any special features, or likely changes, in the Numeric/Numeric3/numarray set.
A numarray user could then use
def notnan(y):
return numarray.ieeespecial.mask(y, numarray.ieeespecial.NAN)
and say
plot(x, y, validmask=notnan(y))
In any case, this "validmask kwarg" solution seems to me like a perfectly good one from a user's standpoint, and a good bridge to the happy day when Numeric/Numeric3/numarray converge or evolve to a single, dominant numerical module with good nan handling built in. (I very much hope such convergence will occur, and the sooner the better.)
Eric