of salt... I'd write that code as
notch_max = med + (iq/2) * (pi/np.sqrt(row))
and it makes more sense. The notch limits are an estimate of the
interval of the median, which is (one-half, for each up/down) the
q3-q1 range times a normalization factor which is pi/sqrt(n), where
n==row=len(d). The 1/sqrt(n) makes some sense, as it's the usual
statistical error normalization factor. The multiplication by pi, I'm
not so sure, and I can't find that exact formula in any quick stats
reference, but I'm sure someone who actually knows stats can point out
where it comes from.
Note that the code below does:
if notch_max > q3:
notch_max = q3
if notch_min < q1:
notch_min = q1
though matlab explicitly states in:
Interval endpoints are the extremes of the notches or the centers of
the triangular markers. When the sample size is small, notches may
extend beyond the end of the box.
So it seems to me that the more principled thing to do would be to
leave those notch markers outside the box if they land there, because
that's a warning of the robustness of the estimation. Clipping them to
q1/q3 is effectively hiding a problem...
On Tue, Dec 15, 2009 at 9:57 AM, Andrew Straw <strawman@...36...> wrote:
notch_max = med + 1.57*iq/np.sqrt(row)
notch_min = med - 1.57*iq/np.sqrt(row)
Is this code actually calculating a meaningful value? If so, what?
From the statistics ignoramus in the room, so take this with a grain