David,
I have made some changes in svn that address all but one of the points you made:
[....]
if self.clip:
mask = ma.getmaskorNone(val)
if mask == None:
val = ma.array(clip(val.filled(vmax), vmin, vmax))
else:
val = ma.array(clip(val.filled(vmax), vmin, vmax),
mask=mask)
The real problem here is that I should not have been using getmaskorNone(). In numpy.ma, we need nomask, not None, so we want an ordinary getmask() call. ma.array(...., mask=ma.nomask) is very fast, so the problem goes away.
Actually, the problem is in ma.array: with a value of mask to None, it should not make a difference between mask = None or no mask arg, right ?
But it does, because for numpy it needs to be nomask; it does something with None, but whatever it is, it is very slow.
I didn't change ma.array to keep my change as local as possible. To change only this operation as above gives a speed up from 1.8 s to ~ 1.0 s for to_rgba, which means calling show goes from ~ 2.2 s to ~1.4 s. I also changed result = (val-vmin)/float(vmax-vmin)
to
invcache = 1.0 / (vmax - vmin)
result = (val-vmin) * invcache
This is the one I did not address. I don't understand how this could be making much difference, and some testing using ipython and %prun with 1-line operations showed little difference with variations on this theme. The fastest would appear to be (and logically should be, I think) result = (val-vmin)*(1.0/(vmax-vmin)), but I don't think it makes much difference--it looks to me like maybe 10-20 msec, not 100, on my Pentium M 1.6 Ghz. Maybe still worthwhile, so I may yet make the change after more careful testing.
which gives a moderate speed up (around 100 ms for a 8000x256 points array). Once you make both those changes, the clip call is by far the most expensive operation in normalize functor, but the functor is not really expensive anymore compared to the rest, so this is not where I looked at.
For the where calls in Colormap functor, I was wondering if they are necessary in all cases: some of those calls seem redundant, and it may be possible to detect that before calling them. This should be both easier and faster, at least in this case, than having a fast where ?
You hit the nail squarely: where() is the wrong function to use, and I have eliminated it from colors.py. The much faster replacement is putmask, which does as well as direct indexing with a Boolean but works with all three numerical packages. I think that using the fast putmask is better than trying to figure out special cases in which there would be nothing to put, although I could be convinced otherwise.
I understand that support of multiple array backend, support of mask arrays have cost consequences. But it looks like it may be possible to speed things up for cases where an array has only meaningful values/no mask.
The big gains here were essentially bug fixes--picking the appropriate function (getmask versus getmaskorNone and putmask versus where).
Here is the colors.py diff:
--- trunk/matplotlib/lib/matplotlib/colors.py 2006/12/03 21:54:38 2906
+++ trunk/matplotlib/lib/matplotlib/colors.py 2006/12/14 08:27:04 2923
@@ -30,9 +30,9 @@
"""
import re
-from numerix import array, arange, take, put, Float, Int, where, \
+from numerix import array, arange, take, put, Float, Int, putmask, \
zeros, asarray, sort, searchsorted, sometrue, ravel, divide,\
- ones, typecode, typecodes, alltrue
+ ones, typecode, typecodes, alltrue, clip
from numerix.mlab import amin, amax
import numerix.ma as ma
import numerix as nx
@@ -536,8 +536,9 @@
lut[0] = y1[0]
lut[-1] = y0[-1]
# ensure that the lut is confined to values between 0 and 1 by clipping it
- lut = where(lut > 1., 1., lut)
- lut = where(lut < 0., 0., lut)
+ clip(lut, 0.0, 1.0)
+ #lut = where(lut > 1., 1., lut)
+ #lut = where(lut < 0., 0., lut)
return lut
@@ -588,16 +589,16 @@
vtype = 'array'
xma = ma.asarray(X)
xa = xma.filled(0)
- mask_bad = ma.getmaskorNone(xma)
+ mask_bad = ma.getmask(xma)
if typecode(xa) in typecodes['Float']:
- xa = where(xa == 1.0, 0.9999999, xa) # Tweak so 1.0 is in range.
+ putmask(xa, xa==1.0, 0.9999999) #Treat 1.0 as slightly less than 1.
xa = (xa * self.N).astype(Int)
- mask_under = xa < 0
- mask_over = xa > self.N-1
- xa = where(mask_under, self._i_under, xa)
- xa = where(mask_over, self._i_over, xa)
- if mask_bad is not None: # and sometrue(mask_bad):
- xa = where(mask_bad, self._i_bad, xa)
+ # Set the over-range indices before the under-range;
+ # otherwise the under-range values get converted to over-range.
+ putmask(xa, xa>self.N-1, self._i_over)
+ putmask(xa, xa<0, self._i_under)
+ if mask_bad is not None and mask_bad.shape == xa.shape:
+ putmask(xa, mask_bad, self._i_bad)
rgba = take(self._lut, xa)
if vtype == 'scalar':
rgba = tuple(rgba[0,:])
@@ -752,7 +753,7 @@
return 0.*value
else:
if clip:
- mask = ma.getmaskorNone(val)
+ mask = ma.getmask(val)
val = ma.array(nx.clip(val.filled(vmax), vmin, vmax),
mask=mask)
result = (val-vmin)/float(vmax-vmin)
@@ -804,7 +805,7 @@
return 0.*value
else:
if clip:
- mask = ma.getmaskorNone(val)
+ mask = ma.getmask(val)
val = ma.array(nx.clip(val.filled(vmax), vmin, vmax),
mask=mask)
result = (ma.log(val)-nx.log(vmin))/(nx.log(vmax)-nx.log(vmin))
Eric