Histogram with overlapping bins

Steven_Boada · October 20, 2012, 9:21pm

Let's say I generate a bunch of random numbers from 0-1. Then, I'd like to make a histogram of it. But here's the clincher. I'd like my bins to overlap a bit. For example, if the first bin is from 0 - 0.1, centered on 0.05, I'd like the next (second) bin to be centered on 0.1 and range from 0.05 - 0.15.

So basically, I want the width of each bin to be greater than the spacing.

Is this something that could be done with the histogram function? I did a couple of google searches and couldn't come up with anything meaningful. Apparently, 'rwidth' in the hist function just makes the displayed bars bigger or smaller.

Any thoughts?

···

--

Steven Boada

Doctoral Student
Dept of Physics and Astronomy
Texas A&M University
boada@...3847...

Steven_Boada · October 20, 2012, 9:25pm

It'd be cool if we could do something like

bins = [(0.0,0.05,0.1),(0.05,0.1,0.15)...]

Where I have specified the left edge, center and right edge of each bin. Yeah, that'd be pretty slick.

S

···

On Sat Oct 20 16:21:41 2012, Steven Boada wrote:

Let's say I generate a bunch of random numbers from 0-1. Then, I'd
like to make a histogram of it. But here's the clincher. I'd like my
bins to overlap a bit. For example, if the first bin is from 0 - 0.1,
centered on 0.05, I'd like the next (second) bin to be centered on 0.1
and range from 0.05 - 0.15.

So basically, I want the width of each bin to be greater than the
spacing.

Is this something that could be done with the histogram function? I
did a couple of google searches and couldn't come up with anything
meaningful. Apparently, 'rwidth' in the hist function just makes the
displayed bars bigger or smaller.

Any thoughts?

--

Steven Boada

Doctoral Student
Dept of Physics and Astronomy
Texas A&M University
boada@...3847...

Damon_McDougall3 · October 20, 2012, 9:50pm

My thoughts are that this goes against everything a histogram is set
out to do; attempt to provide a 'discretised' probability distribution
function given a set of discrete samples. Lets say a sample lies in
the region where two bins overlap. How do you define which bin the
sample lies in? Both? If both, how do you define the value of the
approximated probability distribution on a bin? You could just take
the height of the bin, but some of the bin's mass lies in each of the
neighbouring bins.

If you don't want to apply mass to the neighbouring bins for a sample
that lies in the region where two bins overlap, you could just pick
one. You then have the problem of non-uniqueness. If you'd picked the
other bin you'd have a different probability distribution function.
This a bad property to have.

If you don't want to pick a neighbouring bin to apply more mass, and
just increase the width of the each bin's matplotlib.patches.Patch
object, then that is more sensible. Except now you have the problem of
displaying the histogram. Which bin gets displayed over its left
neighbour? And its right neighbour?

I dread to think what this would imply if you also wanted to stack
such histograms. A potential can of worms.

···

On Sat, Oct 20, 2012 at 10:25 PM, Steven Boada <boada@...3847...> wrote:

It'd be cool if we could do something like

bins = [(0.0,0.05,0.1),(0.05,0.1,0.15)...]

Where I have specified the left edge, center and right edge of each
bin. Yeah, that'd be pretty slick.

S

On Sat Oct 20 16:21:41 2012, Steven Boada wrote:

Let's say I generate a bunch of random numbers from 0-1. Then, I'd
like to make a histogram of it. But here's the clincher. I'd like my
bins to overlap a bit. For example, if the first bin is from 0 - 0.1,
centered on 0.05, I'd like the next (second) bin to be centered on 0.1
and range from 0.05 - 0.15.

So basically, I want the width of each bin to be greater than the
spacing.

Is this something that could be done with the histogram function? I
did a couple of google searches and couldn't come up with anything
meaningful. Apparently, 'rwidth' in the hist function just makes the
displayed bars bigger or smaller.

Any thoughts?

--

Steven Boada

Doctoral Student
Dept of Physics and Astronomy
Texas A&M University
boada@...3847...

--
Damon McDougall
http://www.damon-is-a-geek.com
B2.39
Mathematics Institute
University of Warwick
Coventry
West Midlands
CV4 7AL
United Kingdom

Benjamin_Root · October 20, 2012, 10:37pm

The closest I could think of as something reasonable is to apply a convolution of some sort to the discrete pdf to produce an approximation of a continuous PDF.

Cheers!

Ben Root

···

On Saturday, October 20, 2012, Damon McDougall wrote:

On Sat, Oct 20, 2012 at 10:25 PM, Steven Boada <boada@…3847…> wrote:

It’d be cool if we could do something like

bins = [(0.0,0.05,0.1),(0.05,0.1,0.15)…]

Where I have specified the left edge, center and right edge of each

bin. Yeah, that’d be pretty slick.

S

On Sat Oct 20 16:21:41 2012, Steven Boada wrote:

Let’s say I generate a bunch of random numbers from 0-1. Then, I’d

like to make a histogram of it. But here’s the clincher. I’d like my

bins to overlap a bit. For example, if the first bin is from 0 - 0.1,

centered on 0.05, I’d like the next (second) bin to be centered on 0.1

and range from 0.05 - 0.15.

So basically, I want the width of each bin to be greater than the

spacing.

Is this something that could be done with the histogram function? I

did a couple of google searches and couldn’t come up with anything

meaningful. Apparently, ‘rwidth’ in the hist function just makes the

displayed bars bigger or smaller.

Any thoughts?

–

Steven Boada

Doctoral Student

Dept of Physics and Astronomy

Texas A&M University

boada@…3847…

My thoughts are that this goes against everything a histogram is set

out to do; attempt to provide a ‘discretised’ probability distribution

function given a set of discrete samples. Lets say a sample lies in

the region where two bins overlap. How do you define which bin the

sample lies in? Both? If both, how do you define the value of the

approximated probability distribution on a bin? You could just take

the height of the bin, but some of the bin’s mass lies in each of the

neighbouring bins.

If you don’t want to apply mass to the neighbouring bins for a sample

that lies in the region where two bins overlap, you could just pick

one. You then have the problem of non-uniqueness. If you’d picked the

other bin you’d have a different probability distribution function.

This a bad property to have.

If you don’t want to pick a neighbouring bin to apply more mass, and

just increase the width of the each bin’s matplotlib.patches.Patch

object, then that is more sensible. Except now you have the problem of

displaying the histogram. Which bin gets displayed over its left

neighbour? And its right neighbour?

I dread to think what this would imply if you also wanted to stack

such histograms. A potential can of worms.

Damon_McDougall3 · October 20, 2012, 10:42pm

Yes. That's possible. The issue here, though, is getting the discrete
case to start with. There are multiple ways to do it depending on your
choice of bin, and the result is not independent of this choice.

···

On Sat, Oct 20, 2012 at 11:37 PM, Benjamin Root <ben.root@...1304...> wrote:

On Saturday, October 20, 2012, Damon McDougall wrote:

On Sat, Oct 20, 2012 at 10:25 PM, Steven Boada <boada@...3847...> >> wrote:
> It'd be cool if we could do something like
>
> bins = [(0.0,0.05,0.1),(0.05,0.1,0.15)...]
>
> Where I have specified the left edge, center and right edge of each
> bin. Yeah, that'd be pretty slick.
>
> S
>
> On Sat Oct 20 16:21:41 2012, Steven Boada wrote:
>> Let's say I generate a bunch of random numbers from 0-1. Then, I'd
>> like to make a histogram of it. But here's the clincher. I'd like my
>> bins to overlap a bit. For example, if the first bin is from 0 - 0.1,
>> centered on 0.05, I'd like the next (second) bin to be centered on 0.1
>> and range from 0.05 - 0.15.
>>
>> So basically, I want the width of each bin to be greater than the
>> spacing.
>>
>> Is this something that could be done with the histogram function? I
>> did a couple of google searches and couldn't come up with anything
>> meaningful. Apparently, 'rwidth' in the hist function just makes the
>> displayed bars bigger or smaller.
>>
>> Any thoughts?
>>
>
> --
>
> Steven Boada
>
> Doctoral Student
> Dept of Physics and Astronomy
> Texas A&M University
> boada@...3847...

My thoughts are that this goes against everything a histogram is set
out to do; attempt to provide a 'discretised' probability distribution
function given a set of discrete samples. Lets say a sample lies in
the region where two bins overlap. How do you define which bin the
sample lies in? Both? If both, how do you define the value of the
approximated probability distribution on a bin? You could just take
the height of the bin, but some of the bin's mass lies in each of the
neighbouring bins.

If you don't want to apply mass to the neighbouring bins for a sample
that lies in the region where two bins overlap, you could just pick
one. You then have the problem of non-uniqueness. If you'd picked the
other bin you'd have a different probability distribution function.
This a bad property to have.

If you don't want to pick a neighbouring bin to apply more mass, and
just increase the width of the each bin's matplotlib.patches.Patch
object, then that is more sensible. Except now you have the problem of
displaying the histogram. Which bin gets displayed over its left
neighbour? And its right neighbour?

I dread to think what this would imply if you also wanted to stack
such histograms. A potential can of worms.

The closest I could think of as something reasonable is to apply a
convolution of some sort to the discrete pdf to produce an approximation of
a continuous PDF.

Cheers!
Ben Root

--
Damon McDougall
http://www.damon-is-a-geek.com
B2.39
Mathematics Institute
University of Warwick
Coventry
West Midlands
CV4 7AL
United Kingdom