I have noticed 2 bugs having to do with NaN handling in the scatter()

function. And one other bug that seems to be in numpy.

- The min and max for the axes are not computed properly when there are

NaNs in the data. Example:

import pylab as pl

import numpy as np

x = np.asarray([0, 1, 2, 3, None, 5, 6, 7, 8, 9], float)

y = np.asarray([0, None, 2, 3, 4, 5, 6, 7, 8, 9], float)

ax = pl.subplot(111)

ax.scatter(x, y)

pl.show()

The points with NaN values are left out of the plot as expected, but you

will see that everything before the NaN is ignored when computing the axis

ranges. (The X axis goes from 4 to 10, cutting off some data, when it

should be from -1 to 10. The Y axis goes from 1 to 10 when it should be also

be from -1 to 10.) This is rather annoying since these simple calls fix

the issue:

ax.set_xlim(min(x), max(y))

ax.set_ylim(min(y), max(y))

- We see the same behavior for the ‘c’ axis. Example:

import pylab as pl

import numpy as np

x = np.asarray([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], float)

y = np.asarray([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], float)

z = np.asarray([0, 1, 2, 3, 4, 5, None, 7, 8, 9], float)

ax = pl.subplot(111)

ax.scatter(x, y, c=z)

pl.show()

We see that everything before point 7 has zero color. And we can bandaid

fix it by adding:

ax.scatter(x, y, c=z,

vmin=min(z),

vmax=max(z))

Then only the one NaN point has zero color.

- Both of the above mentioned bandaid fixes suffer from some bug

(I think in numpy). Where the min() and max() of a numpy array

where the first value is NaN, bugs out:

x = np.asarray([None, 1, 2, 3, 4, 5, 6, 7, 8, 9], float)

y = np.asarray([0, 1, 2, 3, 4, 5, 6, 7, 8, None], float)

z = np.asarray([0, 1, 2, 3, 4, 5, None, 7, 8, 9], float)

print min(x), max(x) #prints 1.#QNAN 1.#QNAN

print min(y), max(y) #prints 0.0 8.0

print min(z), max(z) #pritns 0.0 9.0

FYI, I am using MatPlotLib version 0.91.4 and NumPy 1.1.0 on windows

and Debian Linux.

Thanks,

-Ben