how to plot the empirical cdf of an array?

How can I plot the empirical CDF of an array of numbers in matplotlib
in Python? I'm looking for the cdf analog of pylab's "hist" function.

One thing I can think of is:

from scipy.stats import cumfreq
a = array([...]) # my array of numbers
num_bins = 20
b = cumfreq(a, num_bins)
plt.plot(b)

Is that correct though? Is there an easier/better way?

thanks.

Hi,

I would use pyplot.hist to produce a histogram with a bar for each bin. By use
of the keyword argument 'cumulative' you can select cumulative frequency
distribution instead of density.

Kind regards,
Matthias

···

On Friday July 9 2010 06:02:58 per freem wrote:

How can I plot the empirical CDF of an array of numbers in matplotlib
in Python? I'm looking for the cdf analog of pylab's "hist" function.

One thing I can think of is:

from scipy.stats import cumfreq
a = array([...]) # my array of numbers
num_bins = 20
b = cumfreq(a, num_bins)
plt.plot(b)

Is that correct though? Is there an easier/better way?

I recalled David Huard posted the below,
which apparently was once in the sandbox...
hth,
Alan Isaac

def empiricalcdf(data, method='Hazen'):
     """Return the empirical cdf.

     Methods available (here i goes from 1 to N)
         Hazen: (i-0.5)/N
         Weibull: i/(N+1)
         Chegodayev: (i-.3)/(N+.4)
         Cunnane: (i-.4)/(N+.2)
         Gringorten: (i-.44)/(N+.12)
         California: (i-1)/N

     :see:
http://svn.scipy.org/svn/scipy/trunk/scipy/sandbox/dhuard/stats.py
     :author: David Huard
     """
     i = np.argsort(np.argsort(data)) + 1.
     nobs = len(data)
     method = method.lower()
     if method == 'hazen':
         cdf = (i-0.5)/nobs
     elif method == 'weibull':
         cdf = i/(nobs+1.)
     elif method == 'california':
         cdf = (i-1.)/nobs
     elif method == 'chegodayev':
         cdf = (i-.3)/(nobs+.4)
     elif method == 'cunnane':
         cdf = (i-.4)/(nobs+.2)
     elif method == 'gringorten':
         cdf = (i-.44)/(nobs+.12)
     else:
         raise 'Unknown method. Choose among Weibull, Hazen, Chegodayev,
Cunnane, Gringorten and California.'
     return cdf

···

On 7/9/2010 12:02 AM, per freem wrote:

How can I plot the empirical CDF of an array of numbers in matplotlib
in Python?

I'd like to clarify: I want the empirical cdf, but I want it to be
normalized. There's a normed=True option to plt.hist but how can I do
the equivalent for CDFs?

···

On Fri, Jul 9, 2010 at 9:14 AM, Alan G Isaac <alan.isaac@...287...> wrote:

On 7/9/2010 12:02 AM, per freem wrote:

How can I plot the empirical CDF of an array of numbers in matplotlib
in Python?

I recalled David Huard posted the below,
which apparently was once in the sandbox...
hth,
Alan Isaac

def empiricalcdf(data, method='Hazen'):
"""Return the empirical cdf.

Methods available \(here i goes from 1 to N\)
    Hazen:       \(i\-0\.5\)/N
    Weibull:     i/\(N\+1\)
    Chegodayev:  \(i\-\.3\)/\(N\+\.4\)
    Cunnane:     \(i\-\.4\)/\(N\+\.2\)
    Gringorten:  \(i\-\.44\)/\(N\+\.12\)
    California:  \(i\-1\)/N

:see:

http://svn.scipy.org/svn/scipy/trunk/scipy/sandbox/dhuard/stats.py
:author: David Huard
"""
i = np.argsort(np.argsort(data)) + 1.
nobs = len(data)
method = method.lower()
if method == 'hazen':
cdf = (i-0.5)/nobs
elif method == 'weibull':
cdf = i/(nobs+1.)
elif method == 'california':
cdf = (i-1.)/nobs
elif method == 'chegodayev':
cdf = (i-.3)/(nobs+.4)
elif method == 'cunnane':
cdf = (i-.4)/(nobs+.2)
elif method == 'gringorten':
cdf = (i-.44)/(nobs+.12)
else:
raise 'Unknown method. Choose among Weibull, Hazen, Chegodayev,
Cunnane, Gringorten and California.'
return cdf

------------------------------------------------------------------------------
This SF.net email is sponsored by Sprint
What will you do first with EVO, the first 4G phone?
Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first
_______________________________________________
Matplotlib-users mailing list
Matplotlib-users@lists.sourceforge.net
matplotlib-users List Signup and Options

There is no such thing as a normalized empirical CDF. Or rather, there is no such thing as an unnormalized empirical CDF.

Alan's code is good. Unless if you have a truly staggering number of points, there is no reason to bin the data first.

···

On 7/9/10 10:02 AM, per freem wrote:

I'd like to clarify: I want the empirical cdf, but I want it to be
normalized. There's a normed=True option to plt.hist but how can I do
the equivalent for CDFs?

--
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
  that is made terrible by our own mad attempt to interpret it as though it had
  an underlying truth."
   -- Umberto Eco

How does Alan's code compare with using cumfreq and then plotting its
result? Is the only difference that cumfreq bins the data?

···

On Fri, Jul 9, 2010 at 10:12 AM, Robert Kern <robert.kern@...287...> wrote:

On 7/9/10 10:02 AM, per freem wrote:

I'd like to clarify: I want the empirical cdf, but I want it to be
normalized. There's a normed=True option to plt.hist but how can I do
the equivalent for CDFs?

There is no such thing as a normalized empirical CDF. Or rather, there is no
such thing as an unnormalized empirical CDF.

Alan's code is good. Unless if you have a truly staggering number of points,
there is no reason to bin the data first.

--
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
that is made terrible by our own mad attempt to interpret it as though it had
an underlying truth."
-- Umberto Eco

------------------------------------------------------------------------------
This SF.net email is sponsored by Sprint
What will you do first with EVO, the first 4G phone?
Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first
_______________________________________________
Matplotlib-users mailing list
Matplotlib-users@lists.sourceforge.net
matplotlib-users List Signup and Options

Also, I am not sure how to use alan's code.

If I try:

ec = empirical_cdf(my_data)
plt.plot(ec)

it doesn't actually look like a cdf

···

On Fri, Jul 9, 2010 at 10:17 AM, per freem <perfreem@...287...> wrote:

How does Alan's code compare with using cumfreq and then plotting its
result? Is the only difference that cumfreq bins the data?

On Fri, Jul 9, 2010 at 10:12 AM, Robert Kern <robert.kern@...287...> wrote:

On 7/9/10 10:02 AM, per freem wrote:

I'd like to clarify: I want the empirical cdf, but I want it to be
normalized. There's a normed=True option to plt.hist but how can I do
the equivalent for CDFs?

There is no such thing as a normalized empirical CDF. Or rather, there is no
such thing as an unnormalized empirical CDF.

Alan's code is good. Unless if you have a truly staggering number of points,
there is no reason to bin the data first.

--
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
that is made terrible by our own mad attempt to interpret it as though it had
an underlying truth."
-- Umberto Eco

------------------------------------------------------------------------------
This SF.net email is sponsored by Sprint
What will you do first with EVO, the first 4G phone?
Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first
_______________________________________________
Matplotlib-users mailing list
Matplotlib-users@lists.sourceforge.net
matplotlib-users List Signup and Options

Make sure my_data is sorted first.

plt.plot(my_data, ec)

You probably want to use one of the "steps" linestyles; I'm not sure which one would be best. It probably doesn't matter much.

···

On 7/9/10 10:31 AM, per freem wrote:

Also, I am not sure how to use alan's code.

If I try:

ec = empirical_cdf(my_data)
plt.plot(ec)

it doesn't actually look like a cdf

--
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
  that is made terrible by our own mad attempt to interpret it as though it had
  an underlying truth."
   -- Umberto Eco