I have noticed 2 bugs having to do with NaN handling in the scatter()
function. And one other bug that seems to be in numpy.
- The min and max for the axes are not computed properly when there are
NaNs in the data. Example:
import pylab as pl
import numpy as np
x = np.asarray([0, 1, 2, 3, None, 5, 6, 7, 8, 9], float)
y = np.asarray([0, None, 2, 3, 4, 5, 6, 7, 8, 9], float)
ax = pl.subplot(111)
ax.scatter(x, y)
pl.show()
The points with NaN values are left out of the plot as expected, but you
will see that everything before the NaN is ignored when computing the axis
ranges. (The X axis goes from 4 to 10, cutting off some data, when it
should be from -1 to 10. The Y axis goes from 1 to 10 when it should be also
be from -1 to 10.) This is rather annoying since these simple calls fix
the issue:
ax.set_xlim(min(x), max(y))
ax.set_ylim(min(y), max(y))
- We see the same behavior for the ‘c’ axis. Example:
import pylab as pl
import numpy as np
x = np.asarray([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], float)
y = np.asarray([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], float)
z = np.asarray([0, 1, 2, 3, 4, 5, None, 7, 8, 9], float)
ax = pl.subplot(111)
ax.scatter(x, y, c=z)
pl.show()
We see that everything before point 7 has zero color. And we can bandaid
fix it by adding:
ax.scatter(x, y, c=z,
vmin=min(z),
vmax=max(z))
Then only the one NaN point has zero color.
- Both of the above mentioned bandaid fixes suffer from some bug
(I think in numpy). Where the min() and max() of a numpy array
where the first value is NaN, bugs out:
x = np.asarray([None, 1, 2, 3, 4, 5, 6, 7, 8, 9], float)
y = np.asarray([0, 1, 2, 3, 4, 5, 6, 7, 8, None], float)
z = np.asarray([0, 1, 2, 3, 4, 5, None, 7, 8, 9], float)
print min(x), max(x) #prints 1.#QNAN 1.#QNAN
print min(y), max(y) #prints 0.0 8.0
print min(z), max(z) #pritns 0.0 9.0
FYI, I am using MatPlotLib version 0.91.4 and NumPy 1.1.0 on windows
and Debian Linux.
Thanks,
-Ben