histogram

Hi,

just a small question about histogram. I saw that the result of the hist
function from pylab and histogram from numpy+scipy can be slightly different
when the array is big and with real data (not integer). I'll probably told
something stupid but perhaps that will be good to have consistancies between
both function, won't it?

N.

humufr@...136... wrote:

Hi,

just a small question about histogram. I saw that the result of the hist
function from pylab and histogram from numpy+scipy can be slightly different
when the array is big and with real data (not integer). I'll probably told
something stupid but perhaps that will be good to have consistancies between
both function, won't it?

There are lots of different, equally valid ways to construct a histogram.
pylab.hist() and scipy.stats.histogram() probably use different algorithms. It's
probably not worth changing one just to match the other. Much better would be to
provide a broader interface to let the user twiddle the various knobs he would
like to twiddle. I believe David Huard posted an improved histogram class that
implements a number of useful features.

···

--
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
that is made terrible by our own mad attempt to interpret it as though it had
an underlying truth."
  -- Umberto Eco

Hum,
I did, but it is still pretty rough. I did some changes to it a while ago to use objects and it still isn’t complete.

I’ll try to get the class in working order by the weekend.

Cheers,

David

Here it is. Have fun.
Suggestions are welcome !

David

“”"Class constructor to compute weighted 1-D histogram.
Usage: Histogram(data, bins=10, range=None, weights=None, normed=False)

 Input parameters

histogramc.py (13 KB)

···
 ----------------------
    data:  Input array
    bins:  Number of bins or the bin array(overides the range).
    range: A tuple of two values defining the lower and upper ends of the bin span.
           If no argument is given, defaults to (data.min(), data.max()).
    weights: Array of weights stating the importance of each data. This array must have the same shape as data.

Methods

-----------
    add_data(values, weights): Add values array to existing data and update bin count. This does not modify the histogram bining.
    optimize_binning(method): Chooses an optimal number of bin. Available methods are : Freedman, Scott.
    score(percentile): Returns interpolated value at given percentile.
    cdf(x, method): Return interpolated cdf at x.

Attributes

    freq: The resulting bin count. If normed is true, this is a frequency.  If normed is False, then freq is simply the number of data falling into each bin.

    cum_freq: The cumulative bin count.
    data: Array of data.
    weights: Array of weights.
    s_data: Array of sorted data.
    
    range     : Bin limits : (min, max)

    Nbins     : Number of bins
    bin       : The bin array (Nbin + 1)
    normed: Normalization factor. Setting it to True will normed the density to 1.
    weighted  : Boolean indicating whether or not the data is weighted.

"""