 # Computing Simple Statistics from a Histogram

Is there some statistics function that computes the mean, std. dev., min/max, etc. from a frequency distribution?

···

--

(121.015 Deg. W, 39.262 Deg. N) GMT-8 hr std. time)
Obz Site: 39� 15' 7" N, 121� 2' 32" W, 2700 feet
The popular press and many authorities believe the number
of pedifiles that prowl the web is 50,00. There are no
figures that support this. The number of children below
18 years of age kidnapped by strangers is 1 in 600,00,
or 115 per year. -- The Science of Fear by D. Gardner
Web Page: <www.speckledwithstars.net/>

numpy has many functions for basic descriptive statistics. If "data"
is an array of your data, you can do (import numpy as np)

mean: np.mean(data)
median: np.median(data)
standard deviation: np.std(data)
min: np.min(data)
max: np.max(data)

In scipy.stats, there are many more (skew, kurtosis, etc...) See
also, this example:

http://matplotlib.svn.sourceforge.net/viewvc/matplotlib/trunk/py4science/examples/stats_descriptives.py?view=markup&pathrev=4027

JDH

···

On Tue, Dec 1, 2009 at 6:32 AM, Wayne Watson <sierra_mtnview@...209...> wrote:

Is there some statistics function that computes the mean, std. dev., min/max, etc. from a frequency distribution?

I do not believe that any of those calculations are based on the pdf, frequency of occurrence-histogram. This, (1, 2,2, 4, 2,5,4) and not this (1,3, 0,2,1). The latter are the frequencies of occurrence for 1,2,3,4,5.

John Hunter wrote:

···

On Tue, Dec 1, 2009 at 6:32 AM, Wayne Watson > <sierra_mtnview@...209...> wrote:

Is there some statistics function that computes the mean, std. dev., min/max, etc. from a frequency distribution?

numpy has many functions for basic descriptive statistics. If "data"
is an array of your data, you can do (import numpy as np)

mean: np.mean(data)
median: np.median(data)
standard deviation: np.std(data)
min: np.min(data)
max: np.max(data)

In scipy.stats, there are many more (skew, kurtosis, etc...) See
also, this example:

http://matplotlib.svn.sourceforge.net/viewvc/matplotlib/trunk/py4science/examples/stats_descriptives.py?view=markup&pathrev=4027

JDH

--

(121.015 Deg. W, 39.262 Deg. N) GMT-8 hr std. time)
Obz Site: 39� 15' 7" N, 121� 2' 32" W, 2700 feet
The popular press and many authorities believe the number
of pedofiles that prowl the web is 50,00. There are no
figures that support this. The number of children below
18 years of age kidnapped by strangers is 1 in 600,000,
or 115 per year. -- The Science of Fear by D. Gardner
Web Page: <www.speckledwithstars.net/>

Hi Wayne,

you are right all these function use the sample-data and not the pdf /
frequency of occurence-histogram, because typically the data is available and
not the pdf. Maybe the scipy mailing list could give you a solution to your
problem.

In case that your freqency of occurence are integers you could do something
like the following to generate the sample-data and than you the previous
mentioned functions:
bin_centers = np.array([ 1., 2., 3., 4., 5.])
n_vals = np.array([1, 3, 0, 2, 1])
sample_new = np.array([])
for bin_center, n in zip(bin_centers, n_vals):
# append new value 'n' times:
sample_new = np.concatenate((sample_new, [bin_center]*n))

Kind regards,
Matthias

···

On Tuesday 01 December 2009 17:51:31 Wayne Watson wrote:

I do not believe that any of those calculations are based on the pdf,
frequency of occurrence-histogram. This, (1, 2,2, 4, 2,5,4) and not this
(1,3, 0,2,1). The latter are the frequencies of occurrence for 1,2,3,4,5.

John Hunter wrote:
> On Tue, Dec 1, 2009 at 6:32 AM, Wayne Watson > > > > <sierra_mtnview@...209...> wrote:
>> Is there some statistics function that computes the mean, std. dev.,
>> min/max, etc. from a frequency distribution?
>
> numpy has many functions for basic descriptive statistics. If "data"
> is an array of your data, you can do (import numpy as np)
>
> mean: np.mean(data)
> median: np.median(data)
> standard deviation: np.std(data)
> min: np.min(data)
> max: np.max(data)
>
> In scipy.stats, there are many more (skew, kurtosis, etc...) See
> also, this example:
>
>
> http://matplotlib.svn.sourceforge.net/viewvc/matplotlib/trunk/py4science/
>examples/stats_descriptives.py?view=markup&pathrev=4027
>
> JDH

Hi, actually, more question was more informational than how to do it. I
wrote a function to do it, but wondered why such a function didn't seem
to exist. In my case, the histogram is from a small processor that
produces frequency data from 307K points. Unraveling the frequency data
and returning it to original set of points seems counterproductive. The
data is produced from the pixel values in a 640x480 image.

Matthias Michler wrote:

···

Hi Wayne,

you are right all these function use the sample-data and not the pdf / frequency of occurence-histogram, because typically the data is available and not the pdf. Maybe the scipy mailing list could give you a solution to your problem.

In case that your freqency of occurence are integers you could do something like the following to generate the sample-data and than you the previous mentioned functions:
bin_centers = np.array([ 1., 2., 3., 4., 5.])
n_vals = np.array([1, 3, 0, 2, 1])
sample_new = np.array([])
for bin_center, n in zip(bin_centers, n_vals): # append new value 'n' times: sample_new = np.concatenate((sample_new, [bin_center]*n))

Kind regards,
Matthias

On Tuesday 01 December 2009 17:51:31 Wayne Watson wrote:

I do not believe that any of those calculations are based on the pdf,
frequency of occurrence-histogram. This, (1, 2,2, 4, 2,5,4) and not this
(1,3, 0,2,1). The latter are the frequencies of occurrence for 1,2,3,4,5.

John Hunter wrote:

On Tue, Dec 1, 2009 at 6:32 AM, Wayne Watson >>> >>> <sierra_mtnview@...209...> wrote:

Is there some statistics function that computes the mean, std. dev.,
min/max, etc. from a frequency distribution?

numpy has many functions for basic descriptive statistics. If "data"
is an array of your data, you can do (import numpy as np)

mean: np.mean(data)
median: np.median(data)
standard deviation: np.std(data)
min: np.min(data)
max: np.max(data)

In scipy.stats, there are many more (skew, kurtosis, etc...) See
also, this example:

http://matplotlib.svn.sourceforge.net/viewvc/matplotlib/trunk/py4science/
examples/stats_descriptives.py?view=markup&pathrev=4027

JDH

------------------------------------------------------------------------------
a free event focused on virtualization and cloud computing. Attend in-depth sessions from your desk. Your couch. Anywhere.
http://p.sf.net/sfu/redhat-sfdev2dev
_______________________________________________
Matplotlib-users mailing list
Matplotlib-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/matplotlib-users

--