Looks like my evenings this week (after today) will be open. I was thinking about coding up a potentially major overhaul of the axes.Axes.boxplot. Here's a rough outline of what I was thinking:

1) Improve the bootstrapping of the confidence intervals around the median

2) Add support for masked arrays (i.e., let user specify if masked values should be considered or not -- currently they are always considered, IIRC)

3) Improve the calculation of the percentiles to be consistent with SciPy and R.

#1 seems like something that'll be nice. #2 seems pretty essential to me. The third improvement is something for which I would want y'all's blessing before moving ahead. However, I think it's pretty critical. See (25th and 75th percentiles) below:

import numpy as np

import matplotlib.mlab as mlab

import scipy.stats as stats

def comparePercentiles(x):

mlp = mlab.prctile(x)

stp = np.array()

for p in (0.0, 25.0, 50.0, 75.0, 100.0):

stp = np.hstack([stp, stats.scoreatpercentile(x,p)])

outstring = """

mlab \t scipy

## ···

-------------

%0.3f \t %0.3f (0th)

%0.3f \t %0.3f (25th)

%0.3f \t %0.3f (50th)

%0.3f \t %0.3f (75th)

%0.3f \t %0.3f (100th)

""" % (mlp[0], stp[0], mlp[1], stp[1], mlp[2], stp[2], mlp[3], stp[3], mlp[4], stp[4])

print(outstring)

comparePercentiles(x)

mlab scipy

----------------------

-1.245 -1.245 (0th)

-0.950 -0.802 (25th)

-0.162 -0.162 (50th)

0.571 0.266 (75th)

1.067 1.067 (100th)

Copying and pasting the exact same data into R I get:

quantile(x, probs=c(0.0, 0.25, 0.50, 0.75, 1.0))

0% 25% 50% 75% 100%

-1.2448508 -0.8022337 -0.1617812 0.2661112 1.0666244

Seems like it's clear that something needs to be done. AFAICT, scipy is not listed as a dependency of matplotlib, so it'll probably just be easier to retool mlab.prctile to return values that agree with scipy and R. What do you think? Would this be a welcome contribution?

Thanks,

-Paul Hobson