incorrect boxplot?

system · September 14, 2009, 5:30pm

I tried the following (most output text is deleted):

In [1]: ob1=[1,1,2,2,1,2,4,3,2,2,2,3,4,5,6,7,8,9,7,6,4,5,5]
In [2]: import matplotlib.pyplot as plt In [3]: plt.figure() In [4]: plt.boxplot(ob1) In [5]: plt.savefig('test.png') In [6]: import scipy.stats In [7]: scipy.stats.scoreatpercentile(ob1,75) Out[7]: 5.5

Note that the 75th percentile is 5.5. R agrees with this calculation. However, in the boxplot, the top of the box is around 6, not 5.5. Isn't the top of the box supposed to be at the 75th percentile?

Thanks,

Jason

···

--
Jason Grout

Gokhan_SEVER1 · September 14, 2009, 6:49pm

From matplotlib/lib/matplotlib/axes.py

You can see how matplotlib calculating percentiles. And yes it doesn’t conform with scipy’s scoreatpercentile()

        # get median and quartiles

        q1, med, q3 = mlab.prctile(d,[25,50,75])

I[36]: q1
O[36]: 2.0

I[37]: med
O[37]: 4.0

I[38]: q3
O[38]: 6.0

Could this be due to a rounding? I don’t know, but I am curious to hear the explanations for this discrepancy.

···

On Mon, Sep 14, 2009 at 12:30 PM, <jason-sage@…2780…0…> wrote:

I tried the following (most output text is deleted):

In [1]: ob1=[1,1,2,2,1,2,4,3,2,2,2,3,4,5,6,7,8,9,7,6,4,5,5]

In [2]: import matplotlib.pyplot as

plt

In [3]:

plt.figure()

In [4]:

plt.boxplot(ob1)

In [5]:

plt.savefig(‘test.png’)

In [6]: import

scipy.stats

In [7]:

scipy.stats.scoreatpercentile(ob1,75)

Out[7]: 5.5

Note that the 75th percentile is 5.5. R agrees with this calculation.

However, in the boxplot, the top of the box is around 6, not 5.5. Isn’t

the top of the box supposed to be at the 75th percentile?

Thanks,

Jason

–

Jason Grout

–
Gökhan

Robert_Kern3 · September 14, 2009, 7:07pm

prctile does not handle the case where the exact percentile lies between two items. scoreatpercentile does.

···

On 2009-09-14 13:49 PM, Gökhan Sever wrote:

On Mon, Sep 14, 2009 at 12:30 PM, <jason-sage@...2130... > <mailto:jason-sage@…2130…>> wrote:

    I tried the following (most output text is deleted):

    In [1]: ob1=[1,1,2,2,1,2,4,3,2,2,2,3,4,5,6,7,8,9,7,6,4,5,5]
    In [2]: import matplotlib.pyplot as
    plt
    In [3]:
    plt.figure()
    In [4]:
    plt.boxplot(ob1)
    In [5]:
    plt.savefig('test.png')
    In [6]: import
    scipy.stats
    In [7]:
    scipy.stats.scoreatpercentile(ob1,75)
    Out[7]: 5.5

    Note that the 75th percentile is 5.5. R agrees with this calculation.
    However, in the boxplot, the top of the box is around 6, not 5.5. Isn't
    the top of the box supposed to be at the 75th percentile?

    Thanks,

    Jason

    --
    Jason Grout

From matplotlib/lib/matplotlib/axes.py

You can see how matplotlib calculating percentiles. And yes it doesn't
conform with scipy's scoreatpercentile()

             # get median and quartiles
             q1, med, q3 = mlab.prctile(d,[25,50,75])

I[36]: q1
O[36]: 2.0

I[37]: med
O[37]: 4.0

I[38]: q3
O[38]: 6.0

Could this be due to a rounding? I don't know, but I am curious to hear
the explanations for this discrepancy.

--
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
  that is made terrible by our own mad attempt to interpret it as though it had
  an underlying truth."
   -- Umberto Eco

system · September 14, 2009, 8:45pm

Robert Kern wrote:

prctile does not handle the case where the exact percentile lies between two items. scoreatpercentile does.

If mlab is supposed to be compatible with matlab, then isn't this a problem?

From matlab, version 7.2.0.283 (R2006a)

>> prctile([1 1 2 2 1 2 4 3 2 2 2 3 4 5 6 7 8 9 7 6 4 5 5],[0 25 50 75 100])

ans =

1.0000 2.0000 4.0000 5.7500 9.0000

Of course, the 75th percentile is different here too (5.75 instead of scipy's 5.5). I don't know how to explain that discrepancy.

Jason

···

--
Jason Grout

Gokhan_SEVER1 · September 14, 2009, 9:08pm

Now there are 3 different 75 percentiles :). Any ideas, which is one the most correct?

I have used matplotlib’s percentile outputs on some of my abstracts and posters, not yet in a paper. Not a big difference amongst them, but still makes me think, should I compare similar other function results with other programs when I do data analyses.

···

On Mon, Sep 14, 2009 at 3:45 PM, <jason-sage@…2781…> wrote:

Robert Kern wrote:

prctile does not handle the case where the exact percentile lies between two

items. scoreatpercentile does.

If mlab is supposed to be compatible with matlab, then isn’t this a problem?

From matlab, version 7.2.0.283 (R2006a)

prctile([1 1 2 2 1 2 4 3 2 2 2 3 4 5 6 7 8 9 7 6 4 5 5],[0 25 50 75

100])

ans =
1.0000    2.0000    4.0000    5.7500    9.0000
Of course, the 75th percentile is different here too (5.75 instead of

scipy’s 5.5). I don’t know how to explain that discrepancy.

Jason

–
Gökhan

Robert_Kern3 · September 14, 2009, 9:17pm

They are all reasonable. There are lots of different ways of handling this case. From the R documentation:

http://sekhon.berkeley.edu/stats/html/quantile.html

···

On 2009-09-14 16:08 PM, Gökhan Sever wrote:

On Mon, Sep 14, 2009 at 3:45 PM, <jason-sage@...2130... > <mailto:jason-sage@…2130…>> wrote:

    Robert Kern wrote:
     > prctile does not handle the case where the exact percentile lies
    between two
     > items. scoreatpercentile does.
     >

    If mlab is supposed to be compatible with matlab, then isn't this a
    problem?

      From matlab, version 7.2.0.283 (R2006a)

     >> prctile([1 1 2 2 1 2 4 3 2 2 2 3 4 5 6 7 8 9 7 6 4 5 5],[0 25 50 75
    100])

    ans =

        1.0000 2.0000 4.0000 5.7500 9.0000

    Of course, the 75th percentile is different here too (5.75 instead of
    scipy's 5.5). I don't know how to explain that discrepancy.

    Jason

Now there are 3 different 75 percentiles :). Any ideas, which is one the
most correct?

--
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
  that is made terrible by our own mad attempt to interpret it as though it had
  an underlying truth."
   -- Umberto Eco

Andrew_Straw4 · December 21, 2009, 12:47am

Robert Kern wrote:

···

On 2009-09-14 13:49 PM, Gökhan Sever wrote:


On Mon, Sep 14, 2009 at 12:30 PM, <jason-sage@...2130... >> <mailto:jason-sage@…2130…>> wrote:

    I tried the following (most output text is deleted):

    In [1]: ob1=[1,1,2,2,1,2,4,3,2,2,2,3,4,5,6,7,8,9,7,6,4,5,5]
    In [2]: import matplotlib.pyplot as
    plt
    In [3]:
    plt.figure()
    In [4]:
    plt.boxplot(ob1)
    In [5]:
    plt.savefig('test.png')
    In [6]: import
    scipy.stats
    In [7]:
    scipy.stats.scoreatpercentile(ob1,75)
    Out[7]: 5.5

    Note that the 75th percentile is 5.5. R agrees with this calculation.
    However, in the boxplot, the top of the box is around 6, not 5.5. Isn't
    the top of the box supposed to be at the 75th percentile?

    Thanks,

    Jason

    --
    Jason Grout

From matplotlib/lib/matplotlib/axes.py

You can see how matplotlib calculating percentiles. And yes it doesn't
conform with scipy's scoreatpercentile()

             # get median and quartiles
             q1, med, q3 = mlab.prctile(d,[25,50,75])

I[36]: q1
O[36]: 2.0

I[37]: med
O[37]: 4.0

I[38]: q3
O[38]: 6.0

Could this be due to a rounding? I don't know, but I am curious to hear
the explanations for this discrepancy.

prctile does not handle the case where the exact percentile lies between two
items. scoreatpercentile does.

Fixed in r8039.