Computing Simple Statistics from a Histogram

Wayne_Watson · December 1, 2009, 12:32pm

Is there some statistics function that computes the mean, std. dev., min/max, etc. from a frequency distribution?

···

--
Wayne Watson (Watson Adventures, Prop., Nevada City, CA)

             (121.015 Deg. W, 39.262 Deg. N) GMT-8 hr std. time)
              Obz Site: 39� 15' 7" N, 121� 2' 32" W, 2700 feet
                          The popular press and many authorities believe the number
          of pedifiles that prowl the web is 50,00. There are no
          figures that support this. The number of children below
          18 years of age kidnapped by strangers is 1 in 600,00,
          or 115 per year. -- The Science of Fear by D. Gardner
                     Web Page: <www.speckledwithstars.net/>

_John_Hunter1 · December 1, 2009, 12:48pm

numpy has many functions for basic descriptive statistics. If "data"
is an array of your data, you can do (import numpy as np)

mean: np.mean(data)
median: np.median(data)
standard deviation: np.std(data)
min: np.min(data)
max: np.max(data)

In scipy.stats, there are many more (skew, kurtosis, etc...) See
also, this example:

JDH

···

On Tue, Dec 1, 2009 at 6:32 AM, Wayne Watson <sierra_mtnview@...209...> wrote:

Is there some statistics function that computes the mean, std. dev., min/max, etc. from a frequency distribution?

Wayne_Watson · December 1, 2009, 4:51pm

I do not believe that any of those calculations are based on the pdf, frequency of occurrence-histogram. This, (1, 2,2, 4, 2,5,4) and not this (1,3, 0,2,1). The latter are the frequencies of occurrence for 1,2,3,4,5.

John Hunter wrote:

···

On Tue, Dec 1, 2009 at 6:32 AM, Wayne Watson > <sierra_mtnview@...209...> wrote:

Is there some statistics function that computes the mean, std. dev., min/max, etc. from a frequency distribution?

numpy has many functions for basic descriptive statistics. If "data"
is an array of your data, you can do (import numpy as np)

mean: np.mean(data)
median: np.median(data)
standard deviation: np.std(data)
min: np.min(data)
max: np.max(data)

In scipy.stats, there are many more (skew, kurtosis, etc...) See
also, this example:

matplotlib download | SourceForge.net

JDH

--
Wayne Watson (Watson Adventures, Prop., Nevada City, CA)

             (121.015 Deg. W, 39.262 Deg. N) GMT-8 hr std. time)
              Obz Site: 39� 15' 7" N, 121� 2' 32" W, 2700 feet
                          The popular press and many authorities believe the number
          of pedofiles that prowl the web is 50,00. There are no
          figures that support this. The number of children below
          18 years of age kidnapped by strangers is 1 in 600,000,
          or 115 per year. -- The Science of Fear by D. Gardner
                     Web Page: <www.speckledwithstars.net/>

Matthias_Michler · December 2, 2009, 8:16am

Hi Wayne,

you are right all these function use the sample-data and not the pdf /
frequency of occurence-histogram, because typically the data is available and
not the pdf. Maybe the scipy mailing list could give you a solution to your
problem.

In case that your freqency of occurence are integers you could do something
like the following to generate the sample-data and than you the previous
mentioned functions:
bin_centers = np.array([ 1., 2., 3., 4., 5.])
n_vals = np.array([1, 3, 0, 2, 1])
sample_new = np.array()
for bin_center, n in zip(bin_centers, n_vals):
# append new value 'n' times:
sample_new = np.concatenate((sample_new, [bin_center]*n))

Kind regards,
Matthias

···

On Tuesday 01 December 2009 17:51:31 Wayne Watson wrote:

I do not believe that any of those calculations are based on the pdf,
frequency of occurrence-histogram. This, (1, 2,2, 4, 2,5,4) and not this
(1,3, 0,2,1). The latter are the frequencies of occurrence for 1,2,3,4,5.

John Hunter wrote:
> On Tue, Dec 1, 2009 at 6:32 AM, Wayne Watson > > > > <sierra_mtnview@...209...> wrote:
>> Is there some statistics function that computes the mean, std. dev.,
>> min/max, etc. from a frequency distribution?
>
> numpy has many functions for basic descriptive statistics. If "data"
> is an array of your data, you can do (import numpy as np)
>
> mean: np.mean(data)
> median: np.median(data)
> standard deviation: np.std(data)
> min: np.min(data)
> max: np.max(data)
>
> In scipy.stats, there are many more (skew, kurtosis, etc...) See
> also, this example:
>
>
> http://matplotlib.svn.sourceforge.net/viewvc/matplotlib/trunk/py4science/
>examples/stats_descriptives.py?view=markup&pathrev=4027
>
> JDH

Wayne_Watson · December 4, 2009, 9:58am

Hi, actually, more question was more informational than how to do it. I
wrote a function to do it, but wondered why such a function didn't seem
to exist. In my case, the histogram is from a small processor that
produces frequency data from 307K points. Unraveling the frequency data
and returning it to original set of points seems counterproductive. The
data is produced from the pixel values in a 640x480 image.

Matthias Michler wrote:

···

Hi Wayne,

you are right all these function use the sample-data and not the pdf / frequency of occurence-histogram, because typically the data is available and not the pdf. Maybe the scipy mailing list could give you a solution to your problem.

In case that your freqency of occurence are integers you could do something like the following to generate the sample-data and than you the previous mentioned functions:
bin_centers = np.array([ 1., 2., 3., 4., 5.])
n_vals = np.array([1, 3, 0, 2, 1])
sample_new = np.array()
for bin_center, n in zip(bin_centers, n_vals): # append new value 'n' times: sample_new = np.concatenate((sample_new, [bin_center]*n))

Kind regards,
Matthias

On Tuesday 01 December 2009 17:51:31 Wayne Watson wrote:


I do not believe that any of those calculations are based on the pdf,
frequency of occurrence-histogram. This, (1, 2,2, 4, 2,5,4) and not this
(1,3, 0,2,1). The latter are the frequencies of occurrence for 1,2,3,4,5.

John Hunter wrote:


On Tue, Dec 1, 2009 at 6:32 AM, Wayne Watson >>> >>> <sierra_mtnview@...209...> wrote:


Is there some statistics function that computes the mean, std. dev.,
min/max, etc. from a frequency distribution?


numpy has many functions for basic descriptive statistics. If "data"
is an array of your data, you can do (import numpy as np)

mean: np.mean(data)
median: np.median(data)
standard deviation: np.std(data)
min: np.min(data)
max: np.max(data)

In scipy.stats, there are many more (skew, kurtosis, etc...) See
also, this example:

http://matplotlib.svn.sourceforge.net/viewvc/matplotlib/trunk/py4science/
examples/stats_descriptives.py?view=markup&pathrev=4027

JDH

------------------------------------------------------------------------------
Join us December 9, 2009 for the Red Hat Virtual Experience,
a free event focused on virtualization and cloud computing. Attend in-depth sessions from your desk. Your couch. Anywhere.
http://p.sf.net/sfu/redhat-sfdev2dev
_______________________________________________
Matplotlib-users mailing list
Matplotlib-users@lists.sourceforge.net
matplotlib-users List Signup and Options

--
Wayne Watson (Watson Adventures, Prop., Nevada City, CA)

(121.015 Deg. W, 39.262 Deg. N) GMT-8 hr std. time)
Obz Site: 39� 15' 7" N, 121� 2' 32" W, 2700 feet

          The popular press and many authorities believe the number
          of pedofiles that prowl the web is 50,00. There are no
          figures that support this. The number of children below
          18 years of age kidnapped by strangers is 1 in 600,000,
          or 115 per year. -- The Science of Fear by D. Gardner

Web Page: <www.speckledwithstars.net/>