# cumulative distribution function

Hi. I haven't been active for a while, but now I have another paper that I need to get out...

Anyway, I need to draw a cumulative distribution function, as the reviewers of my last paper really nailed me to the wall for including histograms instead of CDFs. Is there any way to plot a CDF with matplotlib?

Hi. I haven't been active for a while, but now I have another paper
that I need to get out...

Anyway, I need to draw a cumulative distribution function, as the
reviewers of my last paper really nailed me to the wall for including
histograms instead of CDFs. Is there any way to plot a CDF with
matplotlib?

For analytic cdfs, see scipy.stats. I assume you need an empirical
cdf. You can use matplotlib.mlab.hist to compute the empirical pdf
(use normed=True to return a PDF rather than a frequency count). Then
use numpy.cumsum to do the cumulative sum of the pdf, multiplying by
the binsize so it approximates the integral.

import matplotlib.mlab
from pylab import figure, show, nx

x = nx.mlab.randn(10000)
p,bins = matplotlib.mlab.hist(x, 50, normed=True)
db = bins[1]-bins[0]
cdf = nx.cumsum(p*db)

fig = figure()
ax.bar(bins, cdf, width=0.8*db)
show()

···

On 3/17/07, Simson Garfinkel <simsong@...1340...> wrote:

Hi. I haven't been active for a while, but now I have another paper
that I need to get out...

Thanks. I've taken a new job, moved to california, and have been flying between the two coasts every week. It doesn't leave much time for mailing lists...

Anyway, I need to draw a cumulative distribution function, as the
reviewers of my last paper really nailed me to the wall for including
histograms instead of CDFs. Is there any way to plot a CDF with
matplotlib?

For analytic cdfs, see scipy.stats. I assume you need an empirical
cdf. You can use matplotlib.mlab.hist to compute the empirical pdf
(use normed=True to return a PDF rather than a frequency count). Then
use numpy.cumsum to do the cumulative sum of the pdf, multiplying by
the binsize so it approximates the integral.

import matplotlib.mlab
from pylab import figure, show, nx

x = nx.mlab.randn(10000)
p,bins = matplotlib.mlab.hist(x, 50, normed=True)
db = bins[1]-bins[0]
cdf = nx.cumsum(p*db)

fig = figure()
ax.bar(bins, cdf, width=0.8*db)
show()

Thanks! I'll try it out and see what happens.

···

On Mar 18, 2007, at 12:41 PM, John Hunter wrote:

On 3/17/07, Simson Garfinkel <simsong@...1340...> wrote:

Thanks for the information. Unfortunately, this CDF doesn't look like the CDF that we see in other published papers. I'm not sure what they are done with... But they have a thin line that shows the integral of all measurements, rather than a bar graph. The problem with a bar graph is that different bin widths give different results.

GNU Plot seems to do a decent job, as can e seen at http://chem.skku.ac.kr/~wkpark/tutor/gnuplot/gpdocs/prob.htm. But there should be a way to do this nicely with matplotlib, right?

···

On Mar 18, 2007, at 12:41 PM, John Hunter wrote:

On 3/17/07, Simson Garfinkel <simsong@...1340...> wrote:

Hi. I haven't been active for a while, but now I have another paper
that I need to get out...

Anyway, I need to draw a cumulative distribution function, as the
reviewers of my last paper really nailed me to the wall for including
histograms instead of CDFs. Is there any way to plot a CDF with
matplotlib?

For analytic cdfs, see scipy.stats. I assume you need an empirical
cdf. You can use matplotlib.mlab.hist to compute the empirical pdf
(use normed=True to return a PDF rather than a frequency count). Then
use numpy.cumsum to do the cumulative sum of the pdf, multiplying by
the binsize so it approximates the integral.

import matplotlib.mlab
from pylab import figure, show, nx

x = nx.mlab.randn(10000)
p,bins = matplotlib.mlab.hist(x, 50, normed=True)
db = bins[1]-bins[0]
cdf = nx.cumsum(p*db)

fig = figure()
ax.bar(bins, cdf, width=0.8*db)
show()

Just replace

ax.bar(bins, p)

with

ax.plot(bins, b)

in the example code I posted previously...

JDH

···

On 3/20/07, Simson Garfinkel <simsong@...1340...> wrote:

Thanks for the information. Unfortunately, this CDF doesn't look like
the CDF that we see in other published papers. I'm not sure what they
are done with... But they have a thin line that shows the integral
of all measurements, rather than a bar graph. The problem with a bar
graph is that different bin widths give different results.

GNU Plot seems to do a decent job, as can e seen at http://
chem.skku.ac.kr/~wkpark/tutor/gnuplot/gpdocs/prob.htm. But there
should be a way to do this nicely with matplotlib, right?

Hi Simson,

Thanks for the information. Unfortunately, this CDF doesn't look like
the CDF that we see in other published papers. I'm not sure what they
are done with... But they have a thin line that shows the integral
of all measurements, rather than a bar graph. The problem with a bar
graph is that different bin widths give different results.

GNU Plot seems to do a decent job, as can e seen at http://
chem.skku.ac.kr/~wkpark/tutor/gnuplot/gpdocs/prob.htm. But there
should be a way to do this nicely with matplotlib, right?

Try this one:

x = sin(arange(0,100,0.1)) ## your function

## plot the sorted value of your function against
## a linear vektor from 0 to 1 with the same length

plot(sort(x), arange(len(x))/float(len(x)))