Yes, I understand there are alternatives – but I still think a simple, binned histogram is a fairly basic feature.
KDEs are nice but can easily be overtweaked (if I see one I certainly want to know how the bandwidth was selected, otherwise it’s not better than a histogram – even worse, as the issue is now hidden); while CDFs (essentially, your second proposition) can be useful, some kinds of data are traditionally represented as histograms and CDFs would only confuse readers.
Antony
···
2014-05-30 15:11 GMT-07:00 Mark Voorhies <mark.voorhies@…4539…4…>:
On 05/30/2014 08:25 AM, Antony Lee wrote:
I can still need to bin data, e.g. when the data range is “large”, or at
least not small compared to the number of data points.
Antony
Two alternatives to histograms that you might consider:
Kernel density estimation (KDE)
This blog post has a good discussion motivating KDE from issues with bin choice in histograms:
And this follow up explores the various KDE implementations in the “Scientific Python” stack:
http://jakevdp.github.io/blog/2013/12/01/kernel-density-estimation/
A rank vs. value plot, e.g.:
plot(sorted(r))
This is horizontal for peaks (lots of copies of similar values) and vertical for tails/gaps,
so it presents the same information as a histogram, but without requiring bin choice.
–Mark
2014-05-30 5:03 GMT-07:00 Yoshi Rokuko <yoshi@…3676…>:
Am Thu, 29 May 2014 14:14:52 -0700
schrieb Antony Lee <antony.lee@…1016…>:
Hi,
When histogramming integer data, is there an easy way to tell
matplotlib that I want a certain number of bins, and each bin to
cover an equal number of integers (except possibly the last one)?
(in order to avoid having some bins higher than others merely because
they cover more integers) I know I can pass in an explicit bins array
(something like list(range(min, max, (max-min)//n)) + max) but I was
hoping for something simpler, like hist(data, nbins=42,
equal_integer_coverage=True). Best,
Antony
Int data is discrete. For discrete variables you don’t need bins, you
don’t estimate the frequency distribution you know it exactly by
counting.
Of course you could do that with the hist function:
pl.hist(r, np.arange(min(r)-0.5, max(r)+1.5), histtype=‘step’)
Time is money. Stop wasting it! Get your web API in 5 minutes.
Matplotlib-users mailing list
Matplotlib-users@…1735…sourceforge.net
https://lists.sourceforge.net/lists/listinfo/matplotlib-users
Time is money. Stop wasting it! Get your web API in 5 minutes.
Matplotlib-users mailing list
Matplotlib-users@…1735…sourceforge.net
https://lists.sourceforge.net/lists/listinfo/matplotlib-users