of salt... I'd write that code as

notch_max = med + (iq/2) * (pi/np.sqrt(row))

and it makes more sense. The notch limits are an estimate of the

interval of the median, which is (one-half, for each up/down) the

q3-q1 range times a normalization factor which is pi/sqrt(n), where

n==row=len(d). The 1/sqrt(n) makes some sense, as it's the usual

statistical error normalization factor. The multiplication by pi, I'm

not so sure, and I can't find that exact formula in any quick stats

reference, but I'm sure someone who actually knows stats can point out

where it comes from.

Note that the code below does:

if notch_max > q3:

notch_max = q3

if notch_min < q1:

notch_min = q1

though matlab explicitly states in:

http://www.mathworks.com/access/helpdesk/help/toolbox/stats/boxplot.html

that

"""

Interval endpoints are the extremes of the notches or the centers of

the triangular markers. When the sample size is small, notches may

extend beyond the end of the box.

"""

So it seems to me that the more principled thing to do would be to

leave those notch markers outside the box if they land there, because

that's a warning of the robustness of the estimation. Clipping them to

q1/q3 is effectively hiding a problem...

cheers,

f

## ···

On Tue, Dec 15, 2009 at 9:57 AM, Andrew Straw <strawman@...36...> wrote:

notch_max = med + 1.57*iq/np.sqrt(row)

notch_min = med - 1.57*iq/np.sqrt(row)

Is this code actually calculating a meaningful value? If so, what?

From the statistics ignoramus in the room, so take this with a grain