SF.net SVN: matplotlib: [4325] trunk/matplotlib/lib/matplotlib/axes.py

dsdale@...189... wrote:

Revision: 4325
          http://matplotlib.svn.sourceforge.net/matplotlib/?rev=4325&view=rev
Author: dsdale
Date: 2007-11-15 13:23:27 -0800 (Thu, 15 Nov 2007)

Log Message:
-----------
added npy.seterr(invalid='ignore') to beginning of axes.py, to silence repeated warnings created by finding extrema of arrays containing nans
(discovered during calls to errorbar)

Darren,

Is this hiding a problem that will pop up farther down the line? I think the strategy so far has been that inputs to plotting functions should use masked arrays, not nans, and correspondingly, the plotting functions should handle masked arrays gracefully. Although nans are used at some internal stages, I don't think they are handled correctly from end to end. We could add nan checks at the early argument processing stage, but it would slow things down a bit.

Eric

> Revision: 4325
>
> http://matplotlib.svn.sourceforge.net/matplotlib/?rev=4325&view=rev
> Author: dsdale
> Date: 2007-11-15 13:23:27 -0800 (Thu, 15 Nov 2007)
>
> Log Message:
> -----------
> added npy.seterr(invalid='ignore') to beginning of axes.py, to silence
> repeated warnings created by finding extrema of arrays containing nans
> (discovered during calls to errorbar)

Darren,

Is this hiding a problem that will pop up farther down the line?

I don't know, this problem was pretty well hidden to begin with. I consider it
a bug that numpy doesnt gracefully handle finding the extrema of an array
that containing nans. Why should this warrant a warning?

I think the strategy so far has been that inputs to plotting functions
should use masked arrays, not nans, and correspondingly, the plotting
functions should handle masked arrays gracefully. Although nans are
used at some internal stages, I don't think they are handled correctly
from end to end. We could add nan checks at the early argument
processing stage, but it would slow things down a bit.

Do you mean that matplotlib does not support input that contains nans? Should
the average user really have to care if they are passing input with nans in
it? I think not. I must have misunderstood.

Darren

···

On Thursday 15 November 2007 06:12:32 pm Eric Firing wrote:

dsdale@...189... wrote:

Darren Dale wrote:

Revision: 4325
         http://matplotlib.svn.sourceforge.net/matplotlib/?rev=4325&view=rev
Author: dsdale
Date: 2007-11-15 13:23:27 -0800 (Thu, 15 Nov 2007)

Log Message:
-----------
added npy.seterr(invalid='ignore') to beginning of axes.py, to silence
repeated warnings created by finding extrema of arrays containing nans
(discovered during calls to errorbar)

Darren,

Is this hiding a problem that will pop up farther down the line?

I don't know, this problem was pretty well hidden to begin with. I consider it a bug that numpy doesnt gracefully handle finding the extrema of an array that containing nans. Why should this warrant a warning?

There are major differences of opinion, or differences of application, as to how nans and other floating point oddities should be handled. As a result, numpy was designed to allow the user to specify how floating point exceptions should be handled. Matlab-style handling of nans--which I have always found enormously useful in Matlab--imposes a significant computational cost, and neither the style nor the cost is acceptable to a substantial fraction of the numpy community. Therefore numpy supplies nanmax and nanmin for the case where you want to ignore nans, and amax and amin for the case where a nan means something is wrong and you don't want to ignore the nan. (There are also nanargmax, nanargmin, and nansum.)

I think the strategy so far has been that inputs to plotting functions
should use masked arrays, not nans, and correspondingly, the plotting
functions should handle masked arrays gracefully. Although nans are used at some internal stages, I don't think they are handled correctly
from end to end. We could add nan checks at the early argument
processing stage, but it would slow things down a bit.

Do you mean that matplotlib does not support input that contains nans? Should the average user really have to care if they are passing input with nans in it? I think not. I must have misunderstood.

I think that nans "do the right thing" in some places but not others; they have never been explicitly supported in plot function input. The design decision was to use masked arrays, which are more general, instead. I have thought about a possible alternative, in which masked arrays would be immediately converted to nan-filled arrays, and nans would be fully used and supported internally as well as in the interface. I never came to the conclusion that this was a good idea, though, because masked arrays have some advantages. Therefore I have been trying to improve masked array use and support in mpl and in numpy.

Eric

···

On Thursday 15 November 2007 06:12:32 pm Eric Firing wrote:

dsdale@...189... wrote:

Darren

In any case, I think it's dangerous to set numpy's global error handling mode permanently. Is it feasible to do this on a need-to-protect basis by wrapping just the cases where this is needed with:

npy_orig_err = npy.seterr(invalid='ignore')
try:
     do_potentially_risky_stuff()
finally:
     npy.seterr(npy_orig_err)

Users might have code, for example, where ignoring this error will lead to bad consequences (including hard-to-find bugs).

-Andrew

Eric Firing wrote:

···

dsdale@...189... wrote:

Revision: 4325
          http://matplotlib.svn.sourceforge.net/matplotlib/?rev=4325&view=rev
Author: dsdale
Date: 2007-11-15 13:23:27 -0800 (Thu, 15 Nov 2007)

Log Message:
-----------
added npy.seterr(invalid='ignore') to beginning of axes.py, to silence repeated warnings created by finding extrema of arrays containing nans
(discovered during calls to errorbar)

Darren,

Is this hiding a problem that will pop up farther down the line? I think the strategy so far has been that inputs to plotting functions should use masked arrays, not nans, and correspondingly, the plotting functions should handle masked arrays gracefully. Although nans are used at some internal stages, I don't think they are handled correctly from end to end. We could add nan checks at the early argument processing stage, but it would slow things down a bit.

Eric

-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
Matplotlib-devel mailing list
Matplotlib-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/matplotlib-devel

I reverted my change this morning.

I think nan's and inf's are a fact of life. They sometimes pop up in my work,
and would prefer that matplotlib handle them properly. But I haven't
contributed much to the actual plotting functions and don't know much about
the advantages of masked arrays, so I'll defer to you.

Darren

···

On Thursday 15 November 2007 08:51:11 pm Eric Firing wrote:

Darren Dale wrote:
> On Thursday 15 November 2007 06:12:32 pm Eric Firing wrote:
>> dsdale@...189... wrote:
>>> Revision: 4325
>>>
>>> http://matplotlib.svn.sourceforge.net/matplotlib/?rev=4325&view=rev
>>> Author: dsdale
>>> Date: 2007-11-15 13:23:27 -0800 (Thu, 15 Nov 2007)
>>>
>>> Log Message:
>>> -----------
>>> added npy.seterr(invalid='ignore') to beginning of axes.py, to silence
>>> repeated warnings created by finding extrema of arrays containing nans
>>> (discovered during calls to errorbar)
>>
>> Darren,
>>
>> Is this hiding a problem that will pop up farther down the line?
>
> I don't know, this problem was pretty well hidden to begin with. I
> consider it a bug that numpy doesnt gracefully handle finding the extrema
> of an array that containing nans. Why should this warrant a warning?

There are major differences of opinion, or differences of application,
as to how nans and other floating point oddities should be handled. As
a result, numpy was designed to allow the user to specify how floating
point exceptions should be handled. Matlab-style handling of
nans--which I have always found enormously useful in Matlab--imposes a
significant computational cost, and neither the style nor the cost is
acceptable to a substantial fraction of the numpy community. Therefore
numpy supplies nanmax and nanmin for the case where you want to ignore
nans, and amax and amin for the case where a nan means something is
wrong and you don't want to ignore the nan. (There are also nanargmax,
nanargmin, and nansum.)

>> I think the strategy so far has been that inputs to plotting functions
>> should use masked arrays, not nans, and correspondingly, the plotting
>> functions should handle masked arrays gracefully. Although nans are
>> used at some internal stages, I don't think they are handled correctly
>> from end to end. We could add nan checks at the early argument
>> processing stage, but it would slow things down a bit.
>
> Do you mean that matplotlib does not support input that contains nans?
> Should the average user really have to care if they are passing input
> with nans in it? I think not. I must have misunderstood.

I think that nans "do the right thing" in some places but not others;
they have never been explicitly supported in plot function input. The
design decision was to use masked arrays, which are more general,
instead. I have thought about a possible alternative, in which masked
arrays would be immediately converted to nan-filled arrays, and nans
would be fully used and supported internally as well as in the
interface. I never came to the conclusion that this was a good idea,
though, because masked arrays have some advantages. Therefore I have
been trying to improve masked array use and support in mpl and in numpy.

Darren Dale wrote:

I think nan's and inf's are a fact of life. They sometimes pop up in my work, and would prefer that matplotlib handle them properly. But I haven't contributed much to the actual plotting functions and don't know much about the advantages of masked arrays, so I'll defer to you.

Darren,

I would like to return to this. It may make good sense for us to use something like x=ma.masked_where(~npy.isfinite(x), x) at the outermost level of argument processing, in place of x = ma.asarray(x). It will add a few msec, which is OK if it is done once or a few times per plot. It is not good if it happens hundreds of times per plot, though, so it is acceptable only if there is a clear separation between outer argument processing--the most public part of the API--and internal functions.

The method suggested above actually won't work with numpy.ma--only with maskedarray.

I have to put this off for a while longer.

Eric