Normalized Histograms

Steven_Boada · November 30, 2011, 3:25pm

Hi Users,

I'm looking to make a histogram that is normalized by the total number of items shown in the histogram. For example:

Let's say that I have an array 1000 items long. If I make a histogram in the normal way hist(x,10) then I get a histogram showing the total number of items in each bin. What I want to do is take that total number in each bin and divide them by 1000 and then make the plot.

So if one of my bins has 350 objects in it, then it would be changed to 0.35.

Another way to say it would be that I want the height of the histogram to represent the fraction of the total. I am pretty sure that this is different than using the "normed=True" flag, but I couldn't find anyone talking about this when I searched.

Thanks

Steven

Jeff_Blackburne1 · November 30, 2011, 4:42pm

Hi Steven,

Try this:

import numpy as np
import numpy.random
import matplotlib as mpl
import matplotlib.pyplot as plt

x = np.random.randn(1000)
h, binedg = np.histogram(x, 10)

wid = binedg[1:] - binedg[:-1]
plt.bar(binedg[:-1], h/float(x.size), width=wid)

···

On Nov 30, 2011, at 10:25 AM, Steven Boada wrote:

Hi Users,

I'm looking to make a histogram that is normalized by the total number
of items shown in the histogram. For example:

Let's say that I have an array 1000 items long. If I make a histogram in
the normal way hist(x,10) then I get a histogram showing the total
number of items in each bin. What I want to do is take that total number
in each bin and divide them by 1000 and then make the plot.

So if one of my bins has 350 objects in it, then it would be changed to
0.35.

Another way to say it would be that I want the height of the histogram
to represent the fraction of the total. I am pretty sure that this is
different than using the "normed=True" flag, but I couldn't find anyone
talking about this when I searched.

Thanks

Steven

------------------------------------------------------------------------------
All the data continuously generated in your IT infrastructure
contains a definitive record of customers, application performance,
security threats, fraudulent activity, and more. Splunk takes this
data and makes sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-novd2d
_______________________________________________
Matplotlib-users mailing list
Matplotlib-users@lists.sourceforge.net
matplotlib-users List Signup and Options

Tony_S_Yu3 · November 30, 2011, 6:45pm

One option: You can plot the normal hist and then change the tick labels appropriately. Here’s some code for accomplishing that:

···

On Wed, Nov 30, 2011 at 11:42 AM, Jeffrey Blackburne <jblackburne@…3027…> wrote:

Hi Steven,

Try this:

import numpy as np

import numpy.random

import matplotlib as mpl

import matplotlib.pyplot as plt

x = np.random.randn(1000)

h, binedg = np.histogram(x, 10)

wid = binedg[1:] - binedg[:-1]

plt.bar(binedg[:-1], h/float(x.size), width=wid)

On Nov 30, 2011, at 10:25 AM, Steven Boada wrote:

Hi Users,

I’m looking to make a histogram that is normalized by the total number

of items shown in the histogram. For example:

Let’s say that I have an array 1000 items long. If I make a

histogram in

the normal way hist(x,10) then I get a histogram showing the total

number of items in each bin. What I want to do is take that total

number

in each bin and divide them by 1000 and then make the plot.

So if one of my bins has 350 objects in it, then it would be

changed to

0.35.

Another way to say it would be that I want the height of the histogram

to represent the fraction of the total. I am pretty sure that this is

different than using the “normed=True” flag, but I couldn’t find

anyone

talking about this when I searched.

Thanks

Steven

#~~~
import numpy as np

import matplotlib.pyplot as plt
from matplotlib.ticker import FuncFormatter, MultipleLocator

N = 350
ytick_step = 0.05
data = np.random.normal(size=N)

def norm_num(x, pos):
return ‘%g’ % (x / float(N))

locator = MultipleLocator(N * ytick_step)
formatter = FuncFormatter(norm_num)

f, ax = plt.subplots()
ax.yaxis.set_major_formatter(formatter)
ax.yaxis.set_major_locator(locator)
ax.hist(data)

plt.show()
#~~~

Note that the formatter object is all you need to change to the desired scale. But, that result will usually look ugly, because you’ll get tick labels with long, ugly floating point numbers. The locator object fixes that issue.

Best,
-Tony