two distributions on a histogram

Hello,

Is it possible to plot two histograms on the same axis without having
the bars on top of each other.

I'm trying to determine how similar a distribution of activity is
between a large data set and a small subset.

I have 2 million records with a last activity date. I can plot both
the sample and the full population on a normalized histogram, and in
different colours but the later plot covers smaller values of the
earlier one.

Thanks
Neil

See below for Antonio Gonzalez solution (last year) that I have started using and happy with it

Neil M wrote:

···

Hello,

Is it possible to plot two histograms on the same axis without having
the bars on top of each other.

I'm trying to determine how similar a distribution of activity is
between a large data set and a small subset.

I have 2 million records with a last activity date. I can plot both
the sample and the full population on a normalized histogram, and in
different colours but the later plot covers smaller values of the
earlier one.

Thanks
Neil
  
-------- Original Message --------
Subject: Re: [Matplotlib-users] plotting overlapped histograms
Date: Mon, 13 Nov 2006 19:02:03 +0100
From: Antonio Gonzalez <Antonio.Gonzalez@...1053...>
To: David E. Konerding <dekonerding@...1352...>
CC: Matplotlib-users@lists.sourceforge.net
References: <4558AFCA.8060109@...1352...>

To compare two histograms you can plot a bihistogram as suggested on
http://www.itl.nist.gov/div898/handbook/eda/section3/bihistog.htm

The little function I've written to do so is below. See if it helps.

Antonio

import scipy
from pylab import figure

def bihist(y1, y2, nbins=10, h=None):
  '''
  Bihistogram.
  h is an axis handle. If not present, a new figure is created.
  '''
  if h is None: h = figure().add_subplot(111)
  xmin = scipy.floor(scipy.minimum(y1.min(), y2.min()))
  xmax = scipy.ceil(scipy.maximum(y1.max(), y2.max()))
  bins = scipy.linspace(xmin, xmax, nbins)
  n1, bins1, patch1 = h.hist(y1, bins)
  n2, bins2, patch2 = h.hist(y2, bins)
  # set ymax:
  ymax = 0
  for i in patch1:
    height = i.get_height()
    if height > ymax: ymax = height
  # invert second histogram and set ymin:
  ymin = 0
  for i in patch2:
    height = i.get_height()
    height = -height
    i.set_height(height)
    if height < ymin: ymin = height
  h.set_ylim(ymin*1.1, ymax*1.1)
  h.figure.canvas.draw()

_______________________________________________
Matplotlib-users mailing list
Matplotlib-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/matplotlib-users

Neil M wrote:

Hello,

Is it possible to plot two histograms on the same axis without having
the bars on top of each other.

I'm trying to determine how similar a distribution of activity is
between a large data set and a small subset.

I have 2 million records with a last activity date. I can plot both
the sample and the full population on a normalized histogram, and in
different colours but the later plot covers smaller values of the
earlier one.

Thanks
Neil

Neil,

I can think of two alternatives. If one of the two distributions has all values higher than the other, so you want it to be behind, then you can use the zorder property of the patches. From your description it sounds like this is the case. If not, however, you can set the alpha property so that both sets of bars are semi-transparent.

Both of these properties can be passed in as kwargs to the hist() function:

hist(randn(200), edgecolor='r', zorder=5, alpha=0.5)
hist(randn(500), edgecolor='g', zorder=4, alpha=0.5)

Eric

···

-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems? Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/
_______________________________________________
Matplotlib-users mailing list
Matplotlib-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/matplotlib-users

Thanks

That was just what I was looking for.
The bihistogram solution is interesting too, but I think I will to
with alpha since I might want to use a log y scale.

Regards
Neil

···

On Nov 12, 2007 3:06 PM, Eric Firing <efiring@...202...> wrote:

Neil M wrote:
> Hello,
>
> Is it possible to plot two histograms on the same axis without having
> the bars on top of each other.
>
> I'm trying to determine how similar a distribution of activity is
> between a large data set and a small subset.
>
> I have 2 million records with a last activity date. I can plot both
> the sample and the full population on a normalized histogram, and in
> different colours but the later plot covers smaller values of the
> earlier one.
>
> Thanks
> Neil

Neil,

I can think of two alternatives. If one of the two distributions has
all values higher than the other, so you want it to be behind, then you
can use the zorder property of the patches. From your description it
sounds like this is the case. If not, however, you can set the alpha
property so that both sets of bars are semi-transparent.

Both of these properties can be passed in as kwargs to the hist() function:

hist(randn(200), edgecolor='r', zorder=5, alpha=0.5)
hist(randn(500), edgecolor='g', zorder=4, alpha=0.5)

Eric

>
> -------------------------------------------------------------------------
> This SF.net email is sponsored by: Splunk Inc.
> Still grepping through log files to find problems? Stop.
> Now Search log events and configuration files using AJAX and a browser.
> Download your FREE copy of Splunk now >> http://get.splunk.com/

> _______________________________________________
> Matplotlib-users mailing list
> Matplotlib-users@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/matplotlib-users