date x y z
2008-01-01 10
2008-01-02 21 11
2008-01-02 32 15 5

How can I plot it such that all three lines are plotted by that it's apparent two of them are missing some data?
(I know I could just sub in zeros for the missing values, but I'd like the point not to be there, not just down the bottom of the graph...)

Use masked arrays. See masked_demo.py in the mpl examples subdirectory.

Eric

Chris Withers wrote:

···

Hi All,

Say I have data that looks like:

date x y z
2008-01-01 10
2008-01-02 21 11
2008-01-02 32 15 5

How can I plot it such that all three lines are plotted by that it's apparent two of them are missing some data?
(I know I could just sub in zeros for the missing values, but I'd like the point not to be there, not just down the bottom of the graph...)

Use masked arrays. See masked_demo.py in the mpl examples subdirectory.

Hi Eric,

I took a look at that, but it uses:

import matplotlib.numerix.npyma as ma

...and matplotlib.numerix isn't listed in the API reference. Where are the docs for this?

numerix is obsolete, and numerix.npyma was a temporary method to provide access to either of two masked array implementations. It is probably time for me to remove it from the examples. Substitute

import numpy.ma as ma

The ma module is documented as part of numpy.

Specifically, what I have is an array like so:

['','','',1.1,2.2]

Try something like this:

import numpy.ma as ma
from pylab import *

aa = [3.4, 2.5, '','','',1.1,2.2]
def to_num(arg):
if arg == '':
return 9999.0
return arg

aanum = array([to_num(arg) for arg in aa])
aamasked = ma.masked_where(aanum==9999.0, aanum)
plot(aamasked)
show()

Eric

···

I want to mask the strings out so I don't get ValueErrors raised when I call plot functions with that array.

Chris Withers wrote:
> Eric Firing wrote:
You should use numpy.masked_where(numpy.isnan(aa), aa).

or use masked_invalid directly (shortcut to masked_where((isnan(aa) |
isinf(aa))

> I only wish that masked_equal didn't blow up when aa contains datetime
> objects

Could you send me an example of the kind of data you're using ?
As it seems you're dealing with series indexed in time, you may want to try
scikits.timeseries, a package Matt Knox and myself implemented for that very
reason.

···

On Tuesday 18 March 2008 16:17:08 Eric Firing wrote:

You should use numpy.masked_where(numpy.isnan(aa), aa).

(I meant numpy.ma.masked_where(...))

or use masked_invalid directly (shortcut to masked_where((isnan(aa) | isinf(aa))

I don't see it in numpy.ma, with numpy from svn.

In any case, the fastest method is masked_where(~numpy.isfinite(aa), aa):

In [1]:import numpy

In [2]:xx = numpy.random.rand(10000)

In [3]:xx[xx>0.8] = numpy.nan

In [6]:timeit numpy.ma.masked_where(~numpy.isfinite(xx), xx)
10000 loops, best of 3: 83.9 �s per loop

In [7]:timeit numpy.ma.masked_where(numpy.isnan(xx), xx)
10000 loops, best of 3: 119 �s per loop

In [9]:timeit numpy.ma.masked_where((numpy.isnan(xx)|numpy.isinf(xx)), xx)
1000 loops, best of 3: 260 �s per loop

So, wherever you do have masked_invalid defined, you might want to use the faster implementation with ~isfinite.

Eric

···

On Tuesday 18 March 2008 16:17:08 Eric Firing wrote:

I only wish that masked_equal didn't blow up when aa contains datetime
objects

Could you send me an example of the kind of data you're using ?
As it seems you're dealing with series indexed in time, you may want to try scikits.timeseries, a package Matt Knox and myself implemented for that very reason.

Could you send me an example of the kind of data you're using ?

It's basically performance and volume data for a high-volume website.
Unfortunately, the data is gappy in places due to data collection errors in the past...
(it's important the gaps are shown, rather than trying to interpolate them away, however)

As it seems you're dealing with series indexed in time, you may want to try scikits.timeseries, a package Matt Knox and myself implemented for that very reason.

How would this help me here and where can I find out about it?

Indeed, I guess I was seeing nans being treated as missing values rather than being masked...

You should use numpy.masked_where(numpy.isnan(aa), aa).

I am now

However, I'm still running into problems when I try and plot the gappy data on a filled line as follows:

dates = *an array of datetimes*
values = *an array containing data values and a few nans*
values = numpy.ma.masked_where(numpy.isnan(values),values)
xs,ys = mlab.poly_between(dates,0,values)
pylab.fill(xs,ys,'r')

For starters, I get this warning:

numpy\core\ma.py:609: UserWarning: Cannot automatically convert masked array to numeric because data is masked in one or more locations.

...and wherever a NaN occurs in the data, the line is plotted off the top of the axes. I want it to appear at 0 if there's no data. Well, ideally just not appear at all, but I'd settle for appearing at 0...

Both with respect to documentation and functionality, what you are encountering is the historical aspect of masked arrays as a tacked-on part of python numeric packages, and of matplotlib. Support and integration are improving, but still far from perfect. A largely new, and substantially different, implementation of masked arrays has been transplanted into numpy since the last release. Similarly, mpl got a heart transplant since the last release, and it has some implications for the way nans and masked arrays are handled. There is lots more room for fundamental work on both numpy masked arrays (e.g., moving core code to pyrex/cython or C to speed them up) and on mpl.

Now with respect to your particular case here, trying to plot a filled line with gaps: poly_between has no notion of masked arrays at present. If it did, how should it behave? At the very least, additional arguments are needed to specify what should happen for fill-type plotting with missing values. If we can come up with a clear description of the behaviors that should be available, then maybe we can provide them in mpl. I would be happy to fix this gap in mpl's handling of gappy data, but I can't make it a priority use of my time right now.

For a quick fix, it sounds like what you need is either a function to break up your data set into gapless chunks, each of which could be plotted by a call to fill, or a function (a variant of poly_between) that would replace the gap regions with top and bottom lines at the same place (the bottom level? the x-axis?) so the whole thing could be plotted in one call to fill, provided the patch outline is suppressed.

I seem to recall someone else with a similar need in the past few months, so maybe someone on the list has a ready-made solution for you.

Eric

Chris Withers wrote:

···

Eric Firing wrote:

This is not doing what you think it is,

Indeed, I guess I was seeing nans being treated as missing values rather than being masked...

You should use numpy.masked_where(numpy.isnan(aa), aa).

I am now

However, I'm still running into problems when I try and plot the gappy data on a filled line as follows:

dates = *an array of datetimes*
values = *an array containing data values and a few nans*
values = numpy.ma.masked_where(numpy.isnan(values),values)
xs,ys = mlab.poly_between(dates,0,values)
pylab.fill(xs,ys,'r')

For starters, I get this warning:

numpy\core\ma.py:609: UserWarning: Cannot automatically convert masked array to numeric because data is masked in one or more locations.

...and wherever a NaN occurs in the data, the line is plotted off the top of the axes. I want it to appear at 0 if there's no data. Well, ideally just not appear at all, but I'd settle for appearing at 0...

Chris,
My 2c:
Your data is indexed in time, right ? Your x-axis is a date object ? Then use
scikits.timeseries http://scipy.org/scipy/scikits/wiki/TimeSeries
That package was designed to take missing dates/data into account. That way,
you can plot your data with the gaps already taken into account: we have
written a specific matplotlib interface, you'll find the details following
the link above. I must admit we didn't implement poly_between for timeseries.
Most likely, we'd have to implement it for regular masked arrays first, as
mentioned by Eric.
What you could do is to fill your array with some kind of baseline, such as 0,
or your minimum data, or wtvr. That's just a quick trick and no fix.

Both with respect to documentation and functionality, what you are encountering is the historical aspect of masked arrays as a tacked-on part of python numeric packages, and of matplotlib.

*sigh* I feel lucky

Support and integration are improving, but still far from perfect.

I wish I could help, but my knowledge is lacking...

Now with respect to your particular case here, trying to plot a filled line with gaps: poly_between has no notion of masked arrays at present. If it did, how should it behave?

Well, what I actually settled on was juat doing using:

my_masked_array.filled(0)

...to plot with.

At the very least, additional arguments are needed to specify what should happen for fill-type plotting with missing values.

Indeed, what I personally would have liked was a complete gap where the data is missing, but I guess that would have to return multiple polygons, and I don't know how that would work?

provide them in mpl. I would be happy to fix this gap in mpl's handling of gappy data,

...heh

but I can't make it a priority use of my time right now.

I was interested in learning more about TimeSeries, and had a few questions…

Your data is indexed in time, right ? Your x-axis is a date object ?

Just to be clear on the language: “indexed in time” means data for which the x-axis is a series of dates, correct? But I am not sure what is meant by the “x-axis being a date object”–wouldn’t it be a axis object with the values comprising it being date objects? I’m not trying to split hairs, I’m just unclear about the way this is typically described and it would be useful for me to be clear about it.

That package was designed to take missing dates/data into account. That way,

you can plot your data with the gaps already taken into account: we have

written a specific matplotlib interface, you’ll find the details following

the link above.

I’ve looked at the link. Could you explain what TimeSeries does that the mpl modules dates and dateutil don’t do, or when one would use one versus the other?

For my part, I need to simply plot values with dates (and yes with some dates missing no doubt) as the x-axis and am looking for various ways to do it well.

I'm not sure what this is giving me.
The dates are all python datetimes in a list already.
The missing values started off as '', I turned those into nan and then created a ma with the nan's masked.

What more would TimeSeries give me?

the link above. I must admit we didn't implement poly_between for timeseries. Most likely, we'd have to implement it for regular masked arrays first, as mentioned by Eric.

OK.

What you could do is to fill your array with some kind of baseline, such as 0, or your minimum data, or wtvr. That's just a quick trick and no fix.

Indeed, that's what I had to do.

I have to admit, I see some interesting things while scanning that wiki page, but nothing that would have helped me...