Integer histograms are a different beast altogether. It is not very hard
to define a natural bin width for integer histograms: 1. The only
sensible alternatives are integer multiples of that.
import numpy as np
import matplotlib.pyplot as plt
data = np.int32(np.rint(200*np.random.randn(10000)))
axis = np.arange(data.min(), data.max()+1)
hist = np.zeros((data.max()-data.min()+1,), dtype=np.int32)
# unfortunately the shortcut hist[data-data.min()] += 1 does not work,
# the list of indices in data is simplified before looping implicitly.
# Explicit loop:
for item in data:
hist[item-data.min()] += 1
plt.plot(axis,hist)
plt.show()
This histogram can easily be adapted to any sensible bin size, as this
is the finest possible increment. With floats you have to do things the
hard way because there is no such thing as a natural bin size.
And yes, the np.histogram() function is much faster.
hist2 = np.histogram(data, bins=data.max()-data.min())
plt.plot(hist2[1][0:-1]+0.5, hist2[0])
plt.show()
I don't like putting the data on the bin-boundaries, as it is very clear
what the bins can be in this case.
Yes, this is not so much a hard suggestion, as it is a line of thought.
Treating integer data for histograms differently from pseudo continuous
data is the natural way in in my view. Scaling (grouping bins) could be
done to ensure that the most populated bin contains 4*ndata/nbins points
(yes, this fails for uniformly distributed data).
Maarten
···
On Fri, 2010-10-22 at 13:39 -0500, Ryan May wrote:
Thanks for that. This actually led me here:
Histogram - Wikipedia which gives a bunch of
different ways to estimate the number of bins/binsize. It might be
worth looking at one of these in general. However, ironically enough,
these wouldn't actually give the original poster the desired
results--the binsizes would lead to lots of bins, many of which would
be empty due to the integer data. In fact, it seems that all of these
methods are going to break down due to integer data. I guess you could
take the ceiling of the calculated binsize...anyone have an opinion on
whether calculating binsize/nbins would be a step forward over leaving
the default (of 10) and letting the user calculate if they like?
--
KNMI, De Bilt
T: 030 2206 747
E: Maarten.Sneep@...3329...
Room B 2.42