Stacked plotting in matplotlib

Hi all,

I’d like to bring up a question spurred by PRs #847(mine) and #819 (recently accepted). These PRs both deal with stacked plots. #819 adds the stackplot function to axes.py as a new function, which plots different 2-d datasets stacked atop each other. #847 slightly modifies the functioning of hist in axes.py by adding a new kwarg which allows datasets to be stacked. Currently this is only possible using the barstacked histtype. #847 makes it also work with the step and stepfilled histtypes.

One of the issues that has been raised in the comments of #847 is whether we want to take this opportunity to come up with a unified way to handle “stacked-ness”. Michael Droettboom suggested I raise this issue on this list. So far, there are 3 different approaches:

  1. The state before #819. AFAIK the only way to do any sort of stacking was to call hist with histtype="barstacked". This treats stacked histograms as a different type of histogram than non-stacked histograms. One of my motivations for writing #847 was to get stacked step and stepfilled histograms, which would require adding several new histtypes (stepstacked and stepfilledstacked). It seems to me that histtype mostly controls the style of the histogram plotted, and shouldn’t have anything to do with “stacked-ness”, so I think this is kind of clunky.

  2. The approach I take in #847. Add a new kwarg which controls whether or not multiple datasets are stacked. I think this is the cleanest implementation, although that’s probably obvious because it’s how I wrote my PR. To keep everything consistent in this approach, we should remove the stackplot function added in #819, and move that functionality to the plot function, adding a stacked kwarg there.

  3. The approach of #819. With this approach, we would add a separate function to handle stacked versions of different plots. I’d re-write #847 as a new function called stackhist. This approach, IMO, doesn’t scale well if we want to add “stacked-ness” to more plot types in the future.

Please take a look at this and send comments about these proposals or any others you might have. I hope the community can come to a consensus which unifies the handling of stacked-ness.

Whatever we end up choosing, I think adding a stacked step histogram will make it much easier to promote the use of mpl in high energy physics, where we use this style of plot frequently.

Thanks,

Nic Eggert

Graduate Fellow

Cornell University

Oops, sorry. I realized it was actually Ben Root who suggested I start this discussion. Don’t want to put words in anyones mouth.

Nic

···

On Sun, Aug 12, 2012 at 11:51 PM, Nic Eggert <nse23@…143…> wrote:

Hi all,

I’d like to bring up a question spurred by PRs #847(mine) and #819 (recently accepted). These PRs both deal with stacked plots. #819 adds the stackplot function to axes.py as a new function, which plots different 2-d datasets stacked atop each other. #847 slightly modifies the functioning of hist in axes.py by adding a new kwarg which allows datasets to be stacked. Currently this is only possible using the barstacked histtype. #847 makes it also work with the step and stepfilled histtypes.

One of the issues that has been raised in the comments of #847 is whether we want to take this opportunity to come up with a unified way to handle “stacked-ness”. Michael Droettboom suggested I raise this issue on this list. So far, there are 3 different approaches:

  1. The state before #819. AFAIK the only way to do any sort of stacking was to call hist with histtype="barstacked". This treats stacked histograms as a different type of histogram than non-stacked histograms. One of my motivations for writing #847 was to get stacked step and stepfilled histograms, which would require adding several new histtypes (stepstacked and stepfilledstacked). It seems to me that histtype mostly controls the style of the histogram plotted, and shouldn’t have anything to do with “stacked-ness”, so I think this is kind of clunky.
  1. The approach I take in #847. Add a new kwarg which controls whether or not multiple datasets are stacked. I think this is the cleanest implementation, although that’s probably obvious because it’s how I wrote my PR. To keep everything consistent in this approach, we should remove the stackplot function added in #819, and move that functionality to the plot function, adding a stacked kwarg there.
  1. The approach of #819. With this approach, we would add a separate function to handle stacked versions of different plots. I’d re-write #847 as a new function called stackhist. This approach, IMO, doesn’t scale well if we want to add “stacked-ness” to more plot types in the future.

Please take a look at this and send comments about these proposals or any others you might have. I hope the community can come to a consensus which unifies the handling of stacked-ness.

Whatever we end up choosing, I think adding a stacked step histogram will make it much easier to promote the use of mpl in high energy physics, where we use this style of plot frequently.

Thanks,

Nic Eggert

Graduate Fellow

Cornell University

Hey Nic,

Thanks for bringing this up. I was the author for #819, so I'd like to
get some dicussion going on this, too. Sorry for the delay, I was in the
midst of writing a thesis, which I am now free of.

Hi all,

I'd like to bring up a question spurred by PRs #847(mine) and #819
(recently accepted). These PRs both deal with stacked plots. #819 adds the
stackplot function to axes.py as a new function, which plots different 2-d
datasets stacked atop each other. #847 slightly modifies the functioning of
`hist` in axes.py by adding a new kwarg which allows datasets to be
stacked. Currently this is only possible using the `barstacked` histtype.
#847 makes it also work with the `step` and `stepfilled` histtypes.

One of the issues that has been raised in the comments of #847 is whether
we want to take this opportunity to come up with a unified way to handle
"stacked-ness". Michael Droettboom suggested I raise this issue on this
list. So far, there are 3 different approaches:

1. The state before #819. AFAIK the only way to do any sort of stacking was
to call `hist` with `histtype="barstacked"`. This treats stacked histograms
as a different type of histogram than non-stacked histograms. One of my
motivations for writing #847 was to get stacked step and stepfilled
histograms, which would require adding several new histtypes (stepstacked
and stepfilledstacked). It seems to me that histtype mostly controls the
style of the histogram plotted, and shouldn't have anything to do with
"stacked-ness", so I think this is kind of clunky.

2. The approach I take in #847. Add a new kwarg which controls whether or
not multiple datasets are stacked. I think this is the cleanest
implementation, although that's probably obvious because it's how I wrote
my PR. To keep everything consistent in this approach, we should remove the
stackplot function added in #819, and move that functionality to the `plot`
function, adding a `stacked` kwarg there.

3. The approach of #819. With this approach, we would add a separate
function to handle stacked versions of different plots. I'd re-write #847
as a new function called `stackhist`. This approach, IMO, doesn't scale
well if we want to add "stacked-ness" to more plot types in the future.

I'm in favour of numero dos, even though for #819 I took approach number
3. I didn't really think about the bigger picture here with regards to
stackedness of other plot types. But since seeing your stacked histogram
changeset, this seems like a more sensible route.

I say this with zero authority, though.

It'd be nice to have a few people chime in with their two cents.

···

On Sun, Aug 12, 2012 at 11:51:24PM -0500, Nic Eggert wrote:

Please take a look at this and send comments about these proposals or any
others you might have. I hope the community can come to a consensus which
unifies the handling of stacked-ness.

Whatever we end up choosing, I think adding a stacked step histogram will
make it much easier to promote the use of mpl in high energy physics, where
we use this style of plot frequently.

Thanks,

Nic Eggert
Graduate Fellow
Cornell University

--
Damon McDougall
http://www.damon-is-a-geek.com
B2.39
Mathematics Institute
University of Warwick
Coventry
West Midlands
CV4 7AL
United Kingdom

Hey Nic,

Thanks for bringing this up. I was the author for #819, so I'd like to
get some dicussion going on this, too. Sorry for the delay, I was in the
midst of writing a thesis, which I am now free of.

Hi all,

I'd like to bring up a question spurred by PRs #847(mine) and #819
(recently accepted). These PRs both deal with stacked plots. #819 adds the
stackplot function to axes.py as a new function, which plots different 2-d
datasets stacked atop each other. #847 slightly modifies the functioning of
`hist` in axes.py by adding a new kwarg which allows datasets to be
stacked. Currently this is only possible using the `barstacked` histtype.
#847 makes it also work with the `step` and `stepfilled` histtypes.

One of the issues that has been raised in the comments of #847 is whether
we want to take this opportunity to come up with a unified way to handle
"stacked-ness". Michael Droettboom suggested I raise this issue on this
list. So far, there are 3 different approaches:

1. The state before #819. AFAIK the only way to do any sort of stacking was
to call `hist` with `histtype="barstacked"`. This treats stacked histograms
as a different type of histogram than non-stacked histograms. One of my
motivations for writing #847 was to get stacked step and stepfilled
histograms, which would require adding several new histtypes (stepstacked
and stepfilledstacked). It seems to me that histtype mostly controls the
style of the histogram plotted, and shouldn't have anything to do with
"stacked-ness", so I think this is kind of clunky.

2. The approach I take in #847. Add a new kwarg which controls whether or
not multiple datasets are stacked. I think this is the cleanest
implementation, although that's probably obvious because it's how I wrote
my PR. To keep everything consistent in this approach, we should remove the
stackplot function added in #819, and move that functionality to the `plot`
function, adding a `stacked` kwarg there.

3. The approach of #819. With this approach, we would add a separate
function to handle stacked versions of different plots. I'd re-write #847
as a new function called `stackhist`. This approach, IMO, doesn't scale
well if we want to add "stacked-ness" to more plot types in the future.

I'm in favour of numero dos, even though for #819 I took approach number
3. I didn't really think about the bigger picture here with regards to
stackedness of other plot types. But since seeing your stacked histogram
changeset, this seems like a more sensible route.

I say this with zero authority, though.

It'd be nice to have a few people chime in with their two cents.

OK, here are mine: I oppose overloading plot with a "stacked" kwarg and functionality. It is complicated enough as it is. I don't see any problem with having "stackplot" and hist(..., stacked=True). They are just not all that similar. Nor are "plot" and "stackplot" so very similar. But stacked and non-stacked histograms *are* very similar, so using the kwarg to turn on stacking there makes sense.

Elaborating slightly: stacking in plot makes sense only when there is a single abcissa in the data set, but plot supports inputs for which this is not the case; that means that using a stacked kwarg would require explaining this, and trapping invalid inputs when stacked is True. Messy. Much neater to have a separate function.

In the case of a histogram, there is a single set of bins, so a single abcissa. Therefore turning on stacking only affects the way the lines are displayed, and does not require additional input validity checking.

I would be cautious about looking around for more places to add a "stacked" kwarg. Where is it really needed? Let's try to keep mpl from getting more complicated than necessary.

Eric

···

On 2012/08/23 11:55 AM, Damon McDougall wrote:

On Sun, Aug 12, 2012 at 11:51:24PM -0500, Nic Eggert wrote:

Please take a look at this and send comments about these proposals or any
others you might have. I hope the community can come to a consensus which
unifies the handling of stacked-ness.

Whatever we end up choosing, I think adding a stacked step histogram will
make it much easier to promote the use of mpl in high energy physics, where
we use this style of plot frequently.

Thanks,

Nic Eggert
Graduate Fellow
Cornell University

Eric, you make a good point. I'm okay with that approach as well. It
also has the benefit of being the least work.

Nic

···

On Thu, Aug 23, 2012 at 7:21 PM, Eric Firing <efiring@...229...> wrote:

On 2012/08/23 11:55 AM, Damon McDougall wrote:

Hey Nic,

Thanks for bringing this up. I was the author for #819, so I'd like to
get some dicussion going on this, too. Sorry for the delay, I was in the
midst of writing a thesis, which I am now free of.

On Sun, Aug 12, 2012 at 11:51:24PM -0500, Nic Eggert wrote:

Hi all,

I'd like to bring up a question spurred by PRs #847(mine) and #819
(recently accepted). These PRs both deal with stacked plots. #819 adds the
stackplot function to axes.py as a new function, which plots different 2-d
datasets stacked atop each other. #847 slightly modifies the functioning of
`hist` in axes.py by adding a new kwarg which allows datasets to be
stacked. Currently this is only possible using the `barstacked` histtype.
#847 makes it also work with the `step` and `stepfilled` histtypes.

One of the issues that has been raised in the comments of #847 is whether
we want to take this opportunity to come up with a unified way to handle
"stacked-ness". Michael Droettboom suggested I raise this issue on this
list. So far, there are 3 different approaches:

1. The state before #819. AFAIK the only way to do any sort of stacking was
to call `hist` with `histtype="barstacked"`. This treats stacked histograms
as a different type of histogram than non-stacked histograms. One of my
motivations for writing #847 was to get stacked step and stepfilled
histograms, which would require adding several new histtypes (stepstacked
and stepfilledstacked). It seems to me that histtype mostly controls the
style of the histogram plotted, and shouldn't have anything to do with
"stacked-ness", so I think this is kind of clunky.

2. The approach I take in #847. Add a new kwarg which controls whether or
not multiple datasets are stacked. I think this is the cleanest
implementation, although that's probably obvious because it's how I wrote
my PR. To keep everything consistent in this approach, we should remove the
stackplot function added in #819, and move that functionality to the `plot`
function, adding a `stacked` kwarg there.

3. The approach of #819. With this approach, we would add a separate
function to handle stacked versions of different plots. I'd re-write #847
as a new function called `stackhist`. This approach, IMO, doesn't scale
well if we want to add "stacked-ness" to more plot types in the future.

I'm in favour of numero dos, even though for #819 I took approach number
3. I didn't really think about the bigger picture here with regards to
stackedness of other plot types. But since seeing your stacked histogram
changeset, this seems like a more sensible route.

I say this with zero authority, though.

It'd be nice to have a few people chime in with their two cents.

OK, here are mine: I oppose overloading plot with a "stacked" kwarg and
functionality. It is complicated enough as it is. I don't see any
problem with having "stackplot" and hist(..., stacked=True). They are
just not all that similar. Nor are "plot" and "stackplot" so very
similar. But stacked and non-stacked histograms *are* very similar, so
using the kwarg to turn on stacking there makes sense.

Elaborating slightly: stacking in plot makes sense only when there is a
single abcissa in the data set, but plot supports inputs for which this
is not the case; that means that using a stacked kwarg would require
explaining this, and trapping invalid inputs when stacked is True.
Messy. Much neater to have a separate function.

In the case of a histogram, there is a single set of bins, so a single
abcissa. Therefore turning on stacking only affects the way the lines
are displayed, and does not require additional input validity checking.

I would be cautious about looking around for more places to add a
"stacked" kwarg. Where is it really needed? Let's try to keep mpl from
getting more complicated than necessary.

Eric

Please take a look at this and send comments about these proposals or any
others you might have. I hope the community can come to a consensus which
unifies the handling of stacked-ness.

Whatever we end up choosing, I think adding a stacked step histogram will
make it much easier to promote the use of mpl in high energy physics, where
we use this style of plot frequently.

Thanks,

Nic Eggert
Graduate Fellow
Cornell University

------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and
threat landscape has changed and how IT managers can respond. Discussions
will include endpoint security, mobile security and the latest in malware
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Matplotlib-devel mailing list
Matplotlib-devel@lists.sourceforge.net
matplotlib-devel List Signup and Options

Quick q: how would things like log plots be handles for the stacked
case? Log plots are really just axis scale choices on a normal plot,
but for historical reasons they happen to be implemented via a bunch
of different functions. But for that reason, any interface changes
that make sense for plot pretty should also apply to the *log*
functions, no?

Cheers,

f

···

On Thu, Aug 23, 2012 at 4:21 PM, Eric Firing <efiring@...229...> wrote:

OK, here are mine: I oppose overloading plot with a "stacked" kwarg and
functionality. It is complicated enough as it is. I don't see any
problem with having "stackplot" and hist(..., stacked=True). They are
just not all that similar. Nor are "plot" and "stackplot" so very
similar. But stacked and non-stacked histograms *are* very similar, so
using the kwarg to turn on stacking there makes sense.

OK, here are mine: I oppose overloading plot with a "stacked" kwarg and
functionality. It is complicated enough as it is. I don't see any
problem with having "stackplot" and hist(..., stacked=True). They are
just not all that similar. Nor are "plot" and "stackplot" so very
similar. But stacked and non-stacked histograms *are* very similar, so
using the kwarg to turn on stacking there makes sense.

Quick q: how would things like log plots be handles for the stacked
case? Log plots are really just axis scale choices on a normal plot,
but for historical reasons they happen to be implemented via a bunch
of different functions. But for that reason, any interface changes
that make sense for plot pretty should also apply to the *log*
functions, no?

I'm not sure I understand what you are getting at, but I don't think there should be any interface changes for plot or for their log variants.

Eric

···

On 2012/08/23 2:54 PM, Fernando Perez wrote:

On Thu, Aug 23, 2012 at 4:21 PM, Eric Firing <efiring@...229...> wrote:

Cheers,

f

I think this gives more reason to not add a stacked kwarg to plot. You
would need to add it to the log variants as well.

Nic

···

On Thu, Aug 23, 2012 at 10:56 PM, Eric Firing <efiring@...229...> wrote:

On 2012/08/23 2:54 PM, Fernando Perez wrote:

On Thu, Aug 23, 2012 at 4:21 PM, Eric Firing <efiring@...229...> wrote:

OK, here are mine: I oppose overloading plot with a "stacked" kwarg and
functionality. It is complicated enough as it is. I don't see any
problem with having "stackplot" and hist(..., stacked=True). They are
just not all that similar. Nor are "plot" and "stackplot" so very
similar. But stacked and non-stacked histograms *are* very similar, so
using the kwarg to turn on stacking there makes sense.

Quick q: how would things like log plots be handles for the stacked
case? Log plots are really just axis scale choices on a normal plot,
but for historical reasons they happen to be implemented via a bunch
of different functions. But for that reason, any interface changes
that make sense for plot pretty should also apply to the *log*
functions, no?

I'm not sure I understand what you are getting at, but I don't think
there should be any interface changes for plot or for their log variants.

Eric

Cheers,

f

------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and
threat landscape has changed and how IT managers can respond. Discussions
will include endpoint security, mobile security and the latest in malware
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Matplotlib-devel mailing list
Matplotlib-devel@lists.sourceforge.net
matplotlib-devel List Signup and Options

Hi Eric,

···

On Thu, Aug 23, 2012 at 7:56 PM, Eric Firing <efiring@...229...> wrote:

I'm not sure I understand what you are getting at, but I don't think there
should be any interface changes for plot or for their log variants.

I probably phrased my question poorly. I'm just wondering, how would
one use the proposed stackplot function to obtain a stacked plot but
that used log axes (x, y or both)?

Cheers,

f

Hi Eric,

I'm not sure I understand what you are getting at, but I don't think there
should be any interface changes for plot or for their log variants.

I probably phrased my question poorly. I'm just wondering, how would
one use the proposed stackplot function to obtain a stacked plot but
that used log axes (x, y or both)?

One would follow the stackplot call with calls to xscale('log') and/or yscale('log'). This works fine for the x-axis (if x values are positive), but when the y-axis is log, the bottom region is not filled, presumably because it is trying to fill down to zero. I haven't looked at the code, so I don't know whether there is some way of improving this behavior without the stackplot call knowing beforehand that it will be dealing with a log axis.

Eric

···

On 2012/08/23 6:41 PM, Fernando Perez wrote:

On Thu, Aug 23, 2012 at 7:56 PM, Eric Firing <efiring@...229...> wrote:

Cheers,

f

This is a similar problem that we face with bar() and hist()…

Ben Root

···

On Fri, Aug 24, 2012 at 1:44 AM, Eric Firing <efiring@…229…> wrote:

On 2012/08/23 6:41 PM, Fernando Perez wrote:

Hi Eric,

On Thu, Aug 23, 2012 at 7:56 PM, Eric Firing <efiring@…229…> wrote:

I’m not sure I understand what you are getting at, but I don’t think there

should be any interface changes for plot or for their log variants.

I probably phrased my question poorly. I’m just wondering, how would

one use the proposed stackplot function to obtain a stacked plot but

that used log axes (x, y or both)?

One would follow the stackplot call with calls to xscale(‘log’) and/or

yscale(‘log’). This works fine for the x-axis (if x values are

positive), but when the y-axis is log, the bottom region is not filled,

presumably because it is trying to fill down to zero. I haven’t looked

at the code, so I don’t know whether there is some way of improving this

behavior without the stackplot call knowing beforehand that it will be

dealing with a log axis.

Eric

Stacked type histograms have this problem as well. The solution I've
found is to do fig.set_yscale('log', nonposy='clip').

···

On Fri, Aug 24, 2012 at 8:43 AM, Benjamin Root <ben.root@...553...> wrote:

On Fri, Aug 24, 2012 at 1:44 AM, Eric Firing <efiring@...229...> wrote:

On 2012/08/23 6:41 PM, Fernando Perez wrote:
> Hi Eric,
>
> On Thu, Aug 23, 2012 at 7:56 PM, Eric Firing <efiring@...229...> wrote:
>> I'm not sure I understand what you are getting at, but I don't think
>> there
>> should be any interface changes for plot or for their log variants.
>
> I probably phrased my question poorly. I'm just wondering, how would
> one use the proposed stackplot function to obtain a stacked plot but
> that used log axes (x, y or both)?

One would follow the stackplot call with calls to xscale('log') and/or
yscale('log'). This works fine for the x-axis (if x values are
positive), but when the y-axis is log, the bottom region is not filled,
presumably because it is trying to fill down to zero. I haven't looked
at the code, so I don't know whether there is some way of improving this
behavior without the stackplot call knowing beforehand that it will be
dealing with a log axis.

Eric

This is a similar problem that we face with bar() and hist()...

Ben Root

------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and
threat landscape has changed and how IT managers can respond. Discussions
will include endpoint security, mobile security and the latest in malware
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Matplotlib-devel mailing list
Matplotlib-devel@lists.sourceforge.net
matplotlib-devel List Signup and Options