purpose of 'stepfilled' in hist()?

Benjamin_Root1 · August 11, 2010, 8:11pm

I am tracing down a bug in hist() and I am trying to figure out what is it about the ‘stepfilled’ mode that is different from the regular ‘bar’ mode. Currently, the hist() code has a separate if branch for dealing with ‘step’ and ‘stepfilled’, and that branch has a bunch of bugs. The best that I can figure is that it prevents lines from being drawn in between the bins. If that is the only difference, maybe we could somehow use the bar functions?

At the very least, I think this needs to be documented better to make it clear why this oddball approach is happening.

Thanks,
Ben Root

Benjamin_Root1 · August 11, 2010, 8:56pm

By the way, the bugs I was referring to both have to do with log=True and the two stepped modes.

First, the smallest of values to histogram (the minimum bin value) is, for some reason, used to clip the lower bounds of the histogram count. In some situations, this can result in most if not all the graph not being shown.

Second, related to the first is a problem with the ‘stepfilled’ mode in log space. The stepfilled mode uses the minimum bin value to anchor the patches. However, this can cause the polygon to not close correctly and can get some unsightly artifacts. I have attached an image demonstrating this bug. This has been reported before:

http://old.nabble.com/hist%28%29-with-log-and-step-tp28888742p28888742.html
http://old.nabble.com/Bug-in-stepfilled-hist-with-log-y-tp28538074p28538074.html

In one of the comments, the reporter concluded that it appeared fixed, but either he was experiencing a slightly different problem, or he was mistaken.

I have also included a possible patch for addressing these issues. The approach simply sets the minimum value to be zero when not doing log, and use 1.0 when log=True. This differs slightly from how the normal bar graphs are done where a value of 1e-100 is used when log=True, but then the axes limits are adjusted to use 1.0 as a lower limit.

Cheers,
Ben Root

histlog.patch (494 Bytes)

···

On Wed, Aug 11, 2010 at 3:11 PM, Benjamin Root <ben.root@…854…> wrote:

I am tracing down a bug in hist() and I am trying to figure out what is it about the ‘stepfilled’ mode that is different from the regular ‘bar’ mode. Currently, the hist() code has a separate if branch for dealing with ‘step’ and ‘stepfilled’, and that branch has a bunch of bugs. The best that I can figure is that it prevents lines from being drawn in between the bins. If that is the only difference, maybe we could somehow use the bar functions?

At the very least, I think this needs to be documented better to make it clear why this oddball approach is happening.

Thanks,
Ben Root

_Erik_Tollerud · August 24, 2010, 2:23am

This is definitely a bug, but I thought I'd clarify and add in a little more...

The distinction between 'step' and 'stepfilled' is that 'step' is
supposed to show only the outline of the histogram with no lines in
between bins (standard practice in some fields), while 'stepfilled' is
supposed to do the same, but have a different-colored fill between the
steps and the x-axis. This is different from 'bar' because the bars
always have either an outline bounding each bar, or no outline at all.
An alternative approach, presumably, would be to eliminate
'stepfilled' and instead just pass in some keyword that tells whether
or not to draw the filled color region or not, but that was judged
confusing because it would have no meaning for the bar types.

As for the log=True errors, I think what this was supposed to do was
have the minimum number of bin *counts* be the replacement for 0s,
rather the minimum *value*, so that's just a pure bug. This is might
have been my fault - the code has changed quite a bit from the
original patch, so I'm not sure at this point. The logic was that
this makes more sense than arbitrarily choosing 1 - if you have a
histogram where the bins are all within, say, 1000 and 10000, but one
of them is 0, it perhaps looks better to set the bottom to the 1000
rather than 1... It was really just an arbitrary choice that no one
objected to at the time.

As I think about it, it might make sense to change it so that the log
keyword can be used to set the assumed minimum value for empty bins if
it is greater than 0 (and stick with the default you suggested of 1 if
log=True). The attached patch includes this change, adopted from
Ben's original patch, as well as clarifying all of this in teh
docstring.

histfix2.patch (4.71 KB)

···

On Wed, Aug 11, 2010 at 1:56 PM, Benjamin Root <ben.root@...553...> wrote:

On Wed, Aug 11, 2010 at 3:11 PM, Benjamin Root <ben.root@...553...> wrote:

I am tracing down a bug in hist() and I am trying to figure out what is it
about the 'stepfilled' mode that is different from the regular 'bar' mode.
Currently, the hist() code has a separate if branch for dealing with 'step'
and 'stepfilled', and that branch has a bunch of bugs. The best that I can
figure is that it prevents lines from being drawn in between the bins. If
that is the only difference, maybe we could somehow use the bar functions?

At the very least, I think this needs to be documented better to make it
clear why this oddball approach is happening.

Thanks,
Ben Root

By the way, the bugs I was referring to both have to do with log=True and
the two stepped modes.

First, the smallest of values to histogram (the minimum bin value) is, for
some reason, used to clip the lower bounds of the histogram count. In some
situations, this can result in most if not all the graph not being shown.

Second, related to the first is a problem with the 'stepfilled' mode in log
space. The stepfilled mode uses the minimum bin value to anchor the
patches. However, this can cause the polygon to not close correctly and can
get some unsightly artifacts. I have attached an image demonstrating this
bug. This has been reported before:

http://old.nabble.com/hist()-with-log-and-step-tp28888742p28888742.html
http://old.nabble.com/Bug-in-stepfilled-hist-with-log-y-tp28538074p28538074.html

In one of the comments, the reporter concluded that it appeared fixed, but
either he was experiencing a slightly different problem, or he was mistaken.

I have also included a possible patch for addressing these issues. The
approach simply sets the minimum value to be zero when not doing log, and
use 1.0 when log=True. This differs slightly from how the normal bar graphs
are done where a value of 1e-100 is used when log=True, but then the axes
limits are adjusted to use 1.0 as a lower limit.

Cheers,
Ben Root

------------------------------------------------------------------------------
This SF.net email is sponsored by

Make an app they can't live without
Enter the BlackBerry Developer Challenge
http://p.sf.net/sfu/RIM-dev2dev
_______________________________________________
Matplotlib-devel mailing list
Matplotlib-devel@lists.sourceforge.net
matplotlib-devel List Signup and Options

--
Erik Tollerud

_Erik_Tollerud · August 24, 2010, 6:39pm

I just realized the patch I sent before includes some other changes...
the attached version should only be the fix for this particular bug.

histfix2only.diff (2.06 KB)

···

On Mon, Aug 23, 2010 at 7:23 PM, Erik Tollerud <erik.tollerud@...149...> wrote:

This is definitely a bug, but I thought I'd clarify and add in a little more...

The distinction between 'step' and 'stepfilled' is that 'step' is
supposed to show only the outline of the histogram with no lines in
between bins (standard practice in some fields), while 'stepfilled' is
supposed to do the same, but have a different-colored fill between the
steps and the x-axis. This is different from 'bar' because the bars
always have either an outline bounding each bar, or no outline at all.
An alternative approach, presumably, would be to eliminate
'stepfilled' and instead just pass in some keyword that tells whether
or not to draw the filled color region or not, but that was judged
confusing because it would have no meaning for the bar types.

As for the log=True errors, I think what this was supposed to do was
have the minimum number of bin *counts* be the replacement for 0s,
rather the minimum *value*, so that's just a pure bug. This is might
have been my fault - the code has changed quite a bit from the
original patch, so I'm not sure at this point. The logic was that
this makes more sense than arbitrarily choosing 1 - if you have a
histogram where the bins are all within, say, 1000 and 10000, but one
of them is 0, it perhaps looks better to set the bottom to the 1000
rather than 1... It was really just an arbitrary choice that no one
objected to at the time.

As I think about it, it might make sense to change it so that the log
keyword can be used to set the assumed minimum value for empty bins if
it is greater than 0 (and stick with the default you suggested of 1 if
log=True). The attached patch includes this change, adopted from
Ben's original patch, as well as clarifying all of this in teh
docstring.

On Wed, Aug 11, 2010 at 1:56 PM, Benjamin Root <ben.root@...553...> wrote:

On Wed, Aug 11, 2010 at 3:11 PM, Benjamin Root <ben.root@...553...> wrote:

I am tracing down a bug in hist() and I am trying to figure out what is it
about the 'stepfilled' mode that is different from the regular 'bar' mode.
Currently, the hist() code has a separate if branch for dealing with 'step'
and 'stepfilled', and that branch has a bunch of bugs. The best that I can
figure is that it prevents lines from being drawn in between the bins. If
that is the only difference, maybe we could somehow use the bar functions?

At the very least, I think this needs to be documented better to make it
clear why this oddball approach is happening.

Thanks,
Ben Root

By the way, the bugs I was referring to both have to do with log=True and
the two stepped modes.

First, the smallest of values to histogram (the minimum bin value) is, for
some reason, used to clip the lower bounds of the histogram count. In some
situations, this can result in most if not all the graph not being shown.

Second, related to the first is a problem with the 'stepfilled' mode in log
space. The stepfilled mode uses the minimum bin value to anchor the
patches. However, this can cause the polygon to not close correctly and can
get some unsightly artifacts. I have attached an image demonstrating this
bug. This has been reported before:

http://old.nabble.com/hist()-with-log-and-step-tp28888742p28888742.html
http://old.nabble.com/Bug-in-stepfilled-hist-with-log-y-tp28538074p28538074.html

In one of the comments, the reporter concluded that it appeared fixed, but
either he was experiencing a slightly different problem, or he was mistaken.

I have also included a possible patch for addressing these issues. The
approach simply sets the minimum value to be zero when not doing log, and
use 1.0 when log=True. This differs slightly from how the normal bar graphs
are done where a value of 1e-100 is used when log=True, but then the axes
limits are adjusted to use 1.0 as a lower limit.

Cheers,
Ben Root

------------------------------------------------------------------------------
This SF.net email is sponsored by

Make an app they can't live without
Enter the BlackBerry Developer Challenge
http://p.sf.net/sfu/RIM-dev2dev
_______________________________________________
Matplotlib-devel mailing list
Matplotlib-devel@lists.sourceforge.net
matplotlib-devel List Signup and Options

--
Erik Tollerud

--
Erik Tollerud

Eric_Firing2 · August 24, 2010, 8:53pm

+ if log is true:
+ minimum = 1.0

Don't you mean True, not true?

Eric

···

On 08/24/2010 08:39 AM, Erik Tollerud wrote:

I just realized the patch I sent before includes some other changes...
the attached version should only be the fix for this particular bug.

_Erik_Tollerud · August 24, 2010, 11:16pm

Whoops, yes, that should be True... Also realized a slight error in
the description of how the mimum is set - both of those are fixed in
the attached diff.

histfix3.diff (2.08 KB)

···

On Tue, Aug 24, 2010 at 1:53 PM, Eric Firing <efiring@...229...> wrote:

On 08/24/2010 08:39 AM, Erik Tollerud wrote:

I just realized the patch I sent before includes some other changes...
the attached version should only be the fix for this particular bug.

+ if log is true:
+ minimum = 1.0

Don't you mean True, not true?

Eric

------------------------------------------------------------------------------
Sell apps to millions through the Intel(R) Atom(Tm) Developer Program
Be part of this innovative community and reach millions of netbook users
worldwide. Take advantage of special opportunities to increase revenue and
speed time-to-market. Join now, and jumpstart your future.
http://p.sf.net/sfu/intel-atom-d2d
_______________________________________________
Matplotlib-devel mailing list
Matplotlib-devel@lists.sourceforge.net
matplotlib-devel List Signup and Options

--
Erik Tollerud

Benjamin_Root1 · August 25, 2010, 1:04am

Just one quick issue, in the docstring, you have a typo “Toutlines” in the part that describes ‘steps’. I believe that was supposed to be “The outlines”.

Ben Root

···

On Tue, Aug 24, 2010 at 6:16 PM, Erik Tollerud <erik.tollerud@…149…> wrote:

Whoops, yes, that should be True… Also realized a slight error in

the description of how the mimum is set - both of those are fixed in

the attached diff.

On Tue, Aug 24, 2010 at 1:53 PM, Eric Firing <efiring@…229…> wrote:

On 08/24/2010 08:39 AM, Erik Tollerud wrote:

I just realized the patch I sent before includes some other changes…

the attached version should only be the fix for this particular bug.
       if log is true:
           minimum = 1.0
Don’t you mean True, not true?

Eric

Sell apps to millions through the Intel(R) Atom™ Developer Program

Be part of this innovative community and reach millions of netbook users

worldwide. Take advantage of special opportunities to increase revenue and

speed time-to-market. Join now, and jumpstart your future.

http://p.sf.net/sfu/intel-atom-d2d

Matplotlib-devel mailing list

Matplotlib-devel@lists.sourceforge.net

https://lists.sourceforge.net/lists/listinfo/matplotlib-devel

–

Erik Tollerud

Anne_Archibald · August 25, 2010, 2:01am

Whoops, yes, that should be True... Also realized a slight error in
the description of how the mimum is set - both of those are fixed in
the attached diff.

Um, this is a kind of important point of style: it is much better to
use "if foo:" than "if foo is True:" or even "if foo == True:".
Long-standing python convention allows things like 1, 7.0, numpy
booleans that are true, and nonempty lists to have a value of True.
Using "if foo:", this works. Using "if foo is True:", this cannot
possibly work; even though 1==True, it is not true that 1 is True.
"is" has a very specific meaning that should be used only when
appropriate (generally, for None or for mutable objects).

Incidentally, the stairstep look of histograms is something I use a
lot. But if we're looking for bells and whistles to add, I often need
error bars on the histogram values (usually the error bar should be
the square root of the value, though for really small values there's a
correction based on Poisson statistics). Since I also often deal with
background-subtracted histograms that often need to repeat the data, I
expect to need to use errorbar() regardless, so I wouldn't worry too
much about this.

Anne

···

On 24 August 2010 19:16, Erik Tollerud <erik.tollerud@...149...> wrote:

On Tue, Aug 24, 2010 at 1:53 PM, Eric Firing <efiring@...229...> wrote:

On 08/24/2010 08:39 AM, Erik Tollerud wrote:

I just realized the patch I sent before includes some other changes...
the attached version should only be the fix for this particular bug.

+ if log is true:
+ minimum = 1.0

Don't you mean True, not true?

Eric

------------------------------------------------------------------------------
Sell apps to millions through the Intel(R) Atom(Tm) Developer Program
Be part of this innovative community and reach millions of netbook users
worldwide. Take advantage of special opportunities to increase revenue and
speed time-to-market. Join now, and jumpstart your future.
http://p.sf.net/sfu/intel-atom-d2d
_______________________________________________
Matplotlib-devel mailing list
Matplotlib-devel@lists.sourceforge.net
matplotlib-devel List Signup and Options

--
Erik Tollerud

------------------------------------------------------------------------------
Sell apps to millions through the Intel(R) Atom(Tm) Developer Program
Be part of this innovative community and reach millions of netbook users
worldwide. Take advantage of special opportunities to increase revenue and
speed time-to-market. Join now, and jumpstart your future.
http://p.sf.net/sfu/intel-atom-d2d
_______________________________________________
Matplotlib-devel mailing list
Matplotlib-devel@lists.sourceforge.net
matplotlib-devel List Signup and Options

Benjamin_Root1 · August 25, 2010, 2:22am

Anne,

While it probably could be better done, the logic of the entire if statement is to first check to see if someone explicitly set a True value (default is False), and that sets the minimum to 1.0. Then, if it isn’t explicitly True, then it checks to see if it is a numerical value and uses that value to indicate the minimum. Only if it is None or False does it then go to the last branch which would set minimum to zero.

Maybe it should use a cbook function that test for a numerical value explicitly instead and do that first, then have a check for the Truthiness of log?

Ben Root

P.S. – I think firefox needs to update its spell-check dictionary, I thought Steven Colbert got “truthiness” to be added to Webster’s…

···

On Tue, Aug 24, 2010 at 9:01 PM, Anne Archibald <aarchiba@…877…> wrote:

On 24 August 2010 19:16, Erik Tollerud <erik.tollerud@…149…> wrote:

Whoops, yes, that should be True… Also realized a slight error in

the description of how the mimum is set - both of those are fixed in

the attached diff.

Um, this is a kind of important point of style: it is much better to

use “if foo:” than “if foo is True:” or even “if foo == True:”.

Long-standing python convention allows things like 1, 7.0, numpy

booleans that are true, and nonempty lists to have a value of True.

Using “if foo:”, this works. Using “if foo is True:”, this cannot

possibly work; even though 1==True, it is not true that 1 is True.

“is” has a very specific meaning that should be used only when

appropriate (generally, for None or for mutable objects).

Incidentally, the stairstep look of histograms is something I use a

lot. But if we’re looking for bells and whistles to add, I often need

error bars on the histogram values (usually the error bar should be

the square root of the value, though for really small values there’s a

correction based on Poisson statistics). Since I also often deal with

background-subtracted histograms that often need to repeat the data, I

expect to need to use errorbar() regardless, so I wouldn’t worry too

much about this.

Anne

On Tue, Aug 24, 2010 at 1:53 PM, Eric Firing <efiring@…229…> wrote:

On 08/24/2010 08:39 AM, Erik Tollerud wrote:

I just realized the patch I sent before includes some other changes…

the attached version should only be the fix for this particular bug.
       if log is true:
           minimum = 1.0
Don’t you mean True, not true?

Eric

Sell apps to millions through the Intel(R) Atom™ Developer Program

Be part of this innovative community and reach millions of netbook users

worldwide. Take advantage of special opportunities to increase revenue and

speed time-to-market. Join now, and jumpstart your future.

http://p.sf.net/sfu/intel-atom-d2d

Matplotlib-devel mailing list

Matplotlib-devel@lists.sourceforge.net

https://lists.sourceforge.net/lists/listinfo/matplotlib-devel

–

Erik Tollerud

Sell apps to millions through the Intel(R) Atom™ Developer Program

Be part of this innovative community and reach millions of netbook users

worldwide. Take advantage of special opportunities to increase revenue and

speed time-to-market. Join now, and jumpstart your future.

http://p.sf.net/sfu/intel-atom-d2d

Matplotlib-devel mailing list

Matplotlib-devel@lists.sourceforge.net

https://lists.sourceforge.net/lists/listinfo/matplotlib-devel

Anne_Archibald · August 25, 2010, 5:00am

> Whoops, yes, that should be True... Also realized a slight error in
> the description of how the mimum is set - both of those are fixed in
> the attached diff.

Um, this is a kind of important point of style: it is much better to
use "if foo:" than "if foo is True:" or even "if foo == True:".
Long-standing python convention allows things like 1, 7.0, numpy
booleans that are true, and nonempty lists to have a value of True.
Using "if foo:", this works. Using "if foo is True:", this cannot
possibly work; even though 1==True, it is not true that 1 is True.
"is" has a very specific meaning that should be used only when
appropriate (generally, for None or for mutable objects).

[snip]

While it probably could be better done, the logic of the entire if statement
is to first check to see if someone explicitly set a True value (default is
False), and that sets the minimum to 1.0. Then, if it isn't explicitly
True, then it checks to see if it is a numerical value and uses that value
to indicate the minimum. Only if it is None or False does it then go to the
last branch which would set minimum to zero.

Maybe it should use a cbook function that test for a numerical value
explicitly instead and do that first, then have a check for the Truthiness
of log?

I realize API changes are a pain, but this seems error-prone from a
user's point of view. If they accidentally use 1 instead of "True" -
common among C or old Python users - suddenly the function does
something startling. (There's also an ambiguity between zero and
False, but that's probably not so serious here.) If I were designing
an API from scratch I'd probably go with a separate parameter for the
minimum (or not, if ylim can fix it after the fact) and a dedicated
one for "should we use a log scale". Failing that, maybe the string
"auto" to indicate automatic minimum values and None for a default?

If you're going to use True to mean something different from 1,
though, I'd make sure to put a warning in the docstring. Unfortunately
you can't just rely on duck typing to tell numeric values from
booleans, since float(True) is 1.0. On the other hand, "is True" fails
for numpy booleans, and "== True" passes for 1.0. So if this is your
constraint, I'm not sure you can do better than "is True", but it's a
huge UI wart.

Anne

P.S. keep in mind that the values users pass are not always directly
visible to users - they might be passing a value output from someone
else's routine that is described as returning a boolean value but
actually returns an integer. This is particularly common among C or
Fortran routines, which are in turn very common in the numerical
world. From the other direction, if you pull a value out of a boolean
numpy array, you get a numpy boolean which will never "is True". -A

···

On 24 August 2010 22:22, Benjamin Root <ben.root@...553...> wrote:

On Tue, Aug 24, 2010 at 9:01 PM, Anne Archibald <aarchiba@...824...> > wrote:

On 24 August 2010 19:16, Erik Tollerud <erik.tollerud@...149...> wrote:

Benjamin_Root1 · August 25, 2010, 2:50pm

I agree. After thinking about it further, I realized a few other ways that this will silently fail. Quite personally, I feel that the hist/bar family of functions ought to be nuked from orbit. It is absolutely amazing just how convoluted those functions are. We seem to be acquiescing to every single feature request rather than thinking about how one could use the existing matplotlib tools. For example, if one wants error bars on their histograms/bar plots, then they should be able to call hist() and then errorbars() to achieve their needs.

For logarithmic hist(), I am not exactly sure why we should have a keyword argument to indicate a mode of operation. We have loglog(), semilogx(), semilogy(). A user could also call set_xscale() or set_yscale() and the limits accordingly. I am not saying they are the best examples/approaches, merely pointing out the different ways we have for log plotting.

Should we have histlog() and barlog() functions? Are we willing to make an effort to look at some of these plotting functions and clean them up?

Ben Root

···

On Wed, Aug 25, 2010 at 12:00 AM, Anne Archibald <aarchiba@…862…4…> wrote:

On 24 August 2010 22:22, Benjamin Root <ben.root@…553…> wrote:

On Tue, Aug 24, 2010 at 9:01 PM, Anne Archibald <aarchiba@…824…> > > > wrote:

On 24 August 2010 19:16, Erik Tollerud <erik.tollerud@…149…> wrote:

Whoops, yes, that should be True… Also realized a slight error in

the description of how the mimum is set - both of those are fixed in

the attached diff.

Um, this is a kind of important point of style: it is much better to

use “if foo:” than “if foo is True:” or even “if foo == True:”.

Long-standing python convention allows things like 1, 7.0, numpy

booleans that are true, and nonempty lists to have a value of True.

Using “if foo:”, this works. Using “if foo is True:”, this cannot

possibly work; even though 1==True, it is not true that 1 is True.

“is” has a very specific meaning that should be used only when

appropriate (generally, for None or for mutable objects).

[snip]

While it probably could be better done, the logic of the entire if statement

is to first check to see if someone explicitly set a True value (default is

False), and that sets the minimum to 1.0. Then, if it isn’t explicitly

True, then it checks to see if it is a numerical value and uses that value

to indicate the minimum. Only if it is None or False does it then go to the

last branch which would set minimum to zero.

Maybe it should use a cbook function that test for a numerical value

explicitly instead and do that first, then have a check for the Truthiness

of log?

I realize API changes are a pain, but this seems error-prone from a

user’s point of view. If they accidentally use 1 instead of “True” -

common among C or old Python users - suddenly the function does

something startling. (There’s also an ambiguity between zero and

False, but that’s probably not so serious here.) If I were designing

an API from scratch I’d probably go with a separate parameter for the

minimum (or not, if ylim can fix it after the fact) and a dedicated

one for “should we use a log scale”. Failing that, maybe the string

“auto” to indicate automatic minimum values and None for a default?

If you’re going to use True to mean something different from 1,

though, I’d make sure to put a warning in the docstring. Unfortunately

you can’t just rely on duck typing to tell numeric values from

booleans, since float(True) is 1.0. On the other hand, “is True” fails

for numpy booleans, and “== True” passes for 1.0. So if this is your

constraint, I’m not sure you can do better than “is True”, but it’s a

huge UI wart.

Anne

P.S. keep in mind that the values users pass are not always directly

visible to users - they might be passing a value output from someone

else’s routine that is described as returning a boolean value but

actually returns an integer. This is particularly common among C or

Fortran routines, which are in turn very common in the numerical

world. From the other direction, if you pull a value out of a boolean

numpy array, you get a numpy boolean which will never “is True”. -A

Eric_Firing2 · August 25, 2010, 5:06pm

     >>
     >> > Whoops, yes, that should be True... Also realized a slight
    error in
     >> > the description of how the mimum is set - both of those are
    fixed in
     >> > the attached diff.
     >>
     >> Um, this is a kind of important point of style: it is much better to
     >> use "if foo:" than "if foo is True:" or even "if foo == True:".
     >> Long-standing python convention allows things like 1, 7.0, numpy
     >> booleans that are true, and nonempty lists to have a value of True.
     >> Using "if foo:", this works. Using "if foo is True:", this cannot
     >> possibly work; even though 1==True, it is not true that 1 is True.
     >> "is" has a very specific meaning that should be used only when
     >> appropriate (generally, for None or for mutable objects).

    [snip]

     > While it probably could be better done, the logic of the entire
    if statement
     > is to first check to see if someone explicitly set a True value
    (default is
     > False), and that sets the minimum to 1.0. Then, if it isn't
    explicitly
     > True, then it checks to see if it is a numerical value and uses
    that value
     > to indicate the minimum. Only if it is None or False does it
    then go to the
     > last branch which would set minimum to zero.
     >
     > Maybe it should use a cbook function that test for a numerical value
     > explicitly instead and do that first, then have a check for the
    Truthiness
     > of log?

    I realize API changes are a pain, but this seems error-prone from a
    user's point of view. If they accidentally use 1 instead of "True" -
    common among C or old Python users - suddenly the function does
    something startling. (There's also an ambiguity between zero and
    False, but that's probably not so serious here.) If I were designing
    an API from scratch I'd probably go with a separate parameter for the
    minimum (or not, if ylim can fix it after the fact) and a dedicated
    one for "should we use a log scale". Failing that, maybe the string
    "auto" to indicate automatic minimum values and None for a default?

    If you're going to use True to mean something different from 1,
    though, I'd make sure to put a warning in the docstring. Unfortunately
    you can't just rely on duck typing to tell numeric values from
    booleans, since float(True) is 1.0. On the other hand, "is True" fails
    for numpy booleans, and "== True" passes for 1.0. So if this is your
    constraint, I'm not sure you can do better than "is True", but it's a
    huge UI wart.

    Anne

    P.S. keep in mind that the values users pass are not always directly
    visible to users - they might be passing a value output from someone
    else's routine that is described as returning a boolean value but
    actually returns an integer. This is particularly common among C or
    Fortran routines, which are in turn very common in the numerical
    world. From the other direction, if you pull a value out of a boolean
    numpy array, you get a numpy boolean which will never "is True". -A

I agree. After thinking about it further, I realized a few other ways
that this will silently fail. Quite personally, I feel that the
hist/bar family of functions ought to be nuked from orbit. It is
absolutely amazing just how convoluted those functions are. We seem to
be acquiescing to every single feature request rather than thinking
about how one could use the existing matplotlib tools. For example, if
one wants error bars on their histograms/bar plots, then they should be
able to call hist() and then errorbars() to achieve their needs.

Ben,

"Me, too!" regarding dismay over the hist/bar family (including box plots). They need to be rethought and re-implemented in their own module. Dealing with the transition may be a pain, but so is every experience with trying to maintain them.

Eric

···

On 08/25/2010 04:50 AM, Benjamin Root wrote:

On Wed, Aug 25, 2010 at 12:00 AM, Anne Archibald > <aarchiba@...824... <mailto:aarchiba@…824…>> wrote:
    On 24 August 2010 22:22, Benjamin Root <ben.root@...553... > <mailto:ben.root@…553…>> wrote:
     > On Tue, Aug 24, 2010 at 9:01 PM, Anne Archibald > <aarchiba@...824... <mailto:aarchiba@…824…>> > > wrote:
     >> On 24 August 2010 19:16, Erik Tollerud <erik.tollerud@...149... > <mailto:erik.tollerud@…149…>> wrote:

For logarithmic hist(), I am not exactly sure why we should have a
keyword argument to indicate a mode of operation. We have loglog(),
semilogx(), semilogy(). A user could also call set_xscale() or
set_yscale() and the limits accordingly. I am not saying they are the
best examples/approaches, merely pointing out the different ways we have
for log plotting.

Should we have histlog() and barlog() functions? Are we willing to make
an effort to look at some of these plotting functions and clean them up?

Ben Root

_Erik_Tollerud · August 27, 2010, 9:04pm

I also completely agree that, ideally, hist/bar/box would get a full
re-write... A re-write of the hist side of things would be very
useful for me, and something that I might end up doing myself, if I
can find the time. I haven't really done anything with the bar or box
plots, though, so I'm not sure what needs to be changed there.

But presumably we're stuck with backwards-compatibility for the
existing function... and as it stands now, it's clearly bugged, so I
still think the patch should be applied in the interm.

To address the True/1.0 issue, though: The reason I did it here is not
because True is really a special case - this was just a performance
tweak... The whole code snippet is as follows:

if log is True:
    minimum = 1.0
elif log:
    minimum = float(log)
else:
    minimum = 0.0

You'll note that if I change the "elif log:" to "if log" and remove
the first case completely, the behavior is unchanged, because
float(True) = 1.0 ... that was the reason for the choice of 1 as the
default. The only reason I'm special-casing "log is True" is because
the default is False, so all other things being equal, the most likely
thing someone would assume who didn't read the docs closely would
assume is that it should be "True" - so it's only a tiny performance
boost for those cases. Honestly, it's not a big deal - I did it out
of habit due to other projects where you get significant performance
gains by special-casing the default argument.

···

I realize API changes are a pain, but this seems error-prone from a
user's point of view. If they accidentally use 1 instead of "True" -
common among C or old Python users - suddenly the function does
something startling. (There's also an ambiguity between zero and
False, but that's probably not so serious here.) If I were designing
an API from scratch I'd probably go with a separate parameter for the
minimum (or not, if ylim can fix it after the fact) and a dedicated
one for "should we use a log scale". Failing that, maybe the string
"auto" to indicate automatic minimum values and None for a default?

On Wed, Aug 25, 2010 at 10:06 AM, Eric Firing <efiring@...229...> wrote:

On 08/25/2010 04:50 AM, Benjamin Root wrote:
On Wed, Aug 25, 2010 at 12:00 AM, Anne Archibald >> <aarchiba@...824... <mailto:aarchiba@…824…>> wrote:

On 24 August 2010 22:22, Benjamin Root <ben.root@...553... >> <mailto:ben.root@…553…>> wrote:
> On Tue, Aug 24, 2010 at 9:01 PM, Anne Archibald >> <aarchiba@...824... <mailto:aarchiba@…824…>> >> > wrote:
>>
>> On 24 August 2010 19:16, Erik Tollerud <erik.tollerud@...761..... >> <mailto:erik.tollerud@…149…>> wrote:
>> > Whoops, yes, that should be True... Also realized a slight
error in
>> > the description of how the mimum is set - both of those are
fixed in
>> > the attached diff.
>>
>> Um, this is a kind of important point of style: it is much better
to
>> use "if foo:" than "if foo is True:" or even "if foo == True:".
>> Long-standing python convention allows things like 1, 7.0, numpy
>> booleans that are true, and nonempty lists to have a value of True.
>> Using "if foo:", this works. Using "if foo is True:", this cannot
>> possibly work; even though 1==True, it is not true that 1 is True.
>> "is" has a very specific meaning that should be used only when
>> appropriate (generally, for None or for mutable objects).

[snip]
&gt; While it probably could be better done, the logic of the entire
if statement
> is to first check to see if someone explicitly set a True value
(default is
> False), and that sets the minimum to 1.0. Then, if it isn't
explicitly
> True, then it checks to see if it is a numerical value and uses
that value
> to indicate the minimum. Only if it is None or False does it
then go to the
> last branch which would set minimum to zero.
>
> Maybe it should use a cbook function that test for a numerical value
> explicitly instead and do that first, then have a check for the
Truthiness
> of log?

I realize API changes are a pain, but this seems error-prone from a
user's point of view. If they accidentally use 1 instead of "True" -
common among C or old Python users - suddenly the function does
something startling. (There's also an ambiguity between zero and
False, but that's probably not so serious here.) If I were designing
an API from scratch I'd probably go with a separate parameter for the
minimum (or not, if ylim can fix it after the fact) and a dedicated
one for "should we use a log scale". Failing that, maybe the string
"auto" to indicate automatic minimum values and None for a default?

If you're going to use True to mean something different from 1,
though, I'd make sure to put a warning in the docstring. Unfortunately
you can't just rely on duck typing to tell numeric values from
booleans, since float(True) is 1.0. On the other hand, "is True" fails
for numpy booleans, and "== True" passes for 1.0. So if this is your
constraint, I'm not sure you can do better than "is True", but it's a
huge UI wart.

Anne

P.S. keep in mind that the values users pass are not always directly
visible to users - they might be passing a value output from someone
else's routine that is described as returning a boolean value but
actually returns an integer. This is particularly common among C or
Fortran routines, which are in turn very common in the numerical
world. From the other direction, if you pull a value out of a boolean
numpy array, you get a numpy boolean which will never "is True". -A

I agree. After thinking about it further, I realized a few other ways
that this will silently fail. Quite personally, I feel that the
hist/bar family of functions ought to be nuked from orbit. It is
absolutely amazing just how convoluted those functions are. We seem to
be acquiescing to every single feature request rather than thinking
about how one could use the existing matplotlib tools. For example, if
one wants error bars on their histograms/bar plots, then they should be
able to call hist() and then errorbars() to achieve their needs.
Ben,

"Me, too!" regarding dismay over the hist/bar family (including box plots).
They need to be rethought and re-implemented in their own module. Dealing
with the transition may be a pain, but so is every experience with trying to
maintain them.

Eric

For logarithmic hist(), I am not exactly sure why we should have a
keyword argument to indicate a mode of operation. We have loglog(),
semilogx(), semilogy(). A user could also call set_xscale() or
set_yscale() and the limits accordingly. I am not saying they are the
best examples/approaches, merely pointing out the different ways we have
for log plotting.

Should we have histlog() and barlog() functions? Are we willing to make
an effort to look at some of these plotting functions and clean them up?

Ben Root

--
Erik Tollerud