boxplot behaviour in an extreme scenario

Hi,

the outliers in the boxplot do not seem to be drawn in the following extreme
scenario:
Data Value: 1, Frequency: 5
Data Value: 2, Frequency: 100
Data Value: 3, Frequency: 5

Here, Q1 = Q2 = Q3, so IQR = 0.
Data values 1 and 3 are therefore outliers according to the definition in
the api
(Refer to parameter "whis" under "boxplot":
http://matplotlib.org/api/pyplot_api.html
<http://matplotlib.org/api/pyplot_api.html> )

But the code below produces a boxplot that shows them as max-min whiskers
(rather than fliers):

import matplotlib.pyplot as plt
data = 100 * [2] + 5 * [1] + 5 * [3]
ax = plt.gca()
bp = ax.boxplot(data, showfliers=True)
for flier in bp['fliers']:
    flier.set(marker='o', color='gray')

<http://matplotlib.1069221.n5.nabble.com/file/n46027/figure_1.png>

What I though it would look like is obtained by perturbing half of the data
points 2 to 2.000001:

<http://matplotlib.1069221.n5.nabble.com/file/n46027/figure_2.png>

Is this a bug or I'm not getting something right?

rgds
marcus

···

--
View this message in context: http://matplotlib.1069221.n5.nabble.com/boxplot-behaviour-in-an-extreme-scenario-tp46027.html
Sent from the matplotlib - users mailing list archive at Nabble.com.

Are you running python 2 or python 3? If you’re on python 2, what happens if you add “from future import division” to the top of your script?

···

On Tue, Aug 25, 2015 at 10:31 PM, chtan <chtan@…4693…> wrote:

Hi,

the outliers in the boxplot do not seem to be drawn in the following extreme

scenario:

Data Value: 1, Frequency: 5

Data Value: 2, Frequency: 100

Data Value: 3, Frequency: 5

Here, Q1 = Q2 = Q3, so IQR = 0.

Data values 1 and 3 are therefore outliers according to the definition in

the api

(Refer to parameter “whis” under “boxplot”:

http://matplotlib.org/api/pyplot_api.html

<http://matplotlib.org/api/pyplot_api.html> )

But the code below produces a boxplot that shows them as max-min whiskers

(rather than fliers):

import matplotlib.pyplot as plt

data = 100 * [2] + 5 * [1] + 5 * [3]

ax = plt.gca()

bp = ax.boxplot(data, showfliers=True)

for flier in bp[‘fliers’]:

flier.set(marker='o', color='gray')

<http://matplotlib.1069221.n5.nabble.com/file/n46027/figure_1.png>

What I though it would look like is obtained by perturbing half of the data

points 2 to 2.000001:

<http://matplotlib.1069221.n5.nabble.com/file/n46027/figure_2.png>

Is this a bug or I’m not getting something right?

rgds

marcus

View this message in context: http://matplotlib.1069221.n5.nabble.com/boxplot-behaviour-in-an-extreme-scenario-tp46027.html

Sent from the matplotlib - users mailing list archive at Nabble.com.



Matplotlib-users mailing list

Matplotlib-users@lists.sourceforge.net

https://lists.sourceforge.net/lists/listinfo/matplotlib-users

Your perturbed and unperturbed scenarios draw the same figure on my machine (mpl v1.4.1).

The reason why you don’t get any outliers is the following:
Boxplot uses matplotlib.cbook.boxplot_stats under the hood to compute where everything will be drawn. If you look in there, you’ll see this little nugget:

        # interquartile range
stats['iqr'] = q3 - q1
if stats['iqr'] == 0:
whis = 'range'
···

When whis = ‘range’, the whiskers fall back to extending to the min an max. So that is at least the intent of the code. Open to a different interpretation of what should be happening, though.

On Wed, Aug 26, 2015 at 1:08 AM, Paul Hobson <pmhobson@…287…> wrote:

Are you running python 2 or python 3? If you’re on python 2, what happens if you add “from future import division” to the top of your script?

On Tue, Aug 25, 2015 at 10:31 PM, chtan <chtan@…4693…> wrote:

Hi,

the outliers in the boxplot do not seem to be drawn in the following extreme

scenario:

Data Value: 1, Frequency: 5

Data Value: 2, Frequency: 100

Data Value: 3, Frequency: 5

Here, Q1 = Q2 = Q3, so IQR = 0.

Data values 1 and 3 are therefore outliers according to the definition in

the api

(Refer to parameter “whis” under “boxplot”:

http://matplotlib.org/api/pyplot_api.html

<http://matplotlib.org/api/pyplot_api.html> )

But the code below produces a boxplot that shows them as max-min whiskers

(rather than fliers):

import matplotlib.pyplot as plt

data = 100 * [2] + 5 * [1] + 5 * [3]

ax = plt.gca()

bp = ax.boxplot(data, showfliers=True)

for flier in bp[‘fliers’]:

flier.set(marker='o', color='gray')

<http://matplotlib.1069221.n5.nabble.com/file/n46027/figure_1.png>

What I though it would look like is obtained by perturbing half of the data

points 2 to 2.000001:

<http://matplotlib.1069221.n5.nabble.com/file/n46027/figure_2.png>

Is this a bug or I’m not getting something right?

rgds

marcus

View this message in context: http://matplotlib.1069221.n5.nabble.com/boxplot-behaviour-in-an-extreme-scenario-tp46027.html

Sent from the matplotlib - users mailing list archive at Nabble.com.



Matplotlib-users mailing list

Matplotlib-users@lists.sourceforge.net

https://lists.sourceforge.net/lists/listinfo/matplotlib-users

I'm on python 2.

I get the same outputs after adding "from __future__ import division".

···

--
View this message in context: http://matplotlib.1069221.n5.nabble.com/boxplot-behaviour-in-an-extreme-scenario-tp46027p46031.html
Sent from the matplotlib - users mailing list archive at Nabble.com.

Uh, now I understand why it's behaving this way. Tx Paul.

From the documentation, it seems natural to expect the behaviour to be

uniform throughout the meaningful range for IQR.

How may I go about searching for the responsible code on my own in
situations like this?

From the perplexing behaviour to the little nugget in

matplotlib.cbook.boxplot_stats, the path isn't clear to me.

Any general advice?

···

--
View this message in context: http://matplotlib.1069221.n5.nabble.com/boxplot-behaviour-in-an-extreme-scenario-tp46027p46032.html
Sent from the matplotlib - users mailing list archive at Nabble.com.

Even though I’m familiar with the boxplot source code, I largely use IPython for quick investigations like this.

In IPython, doing something like “matplotlib.Axes.boxplot??” shows the full source code for that functions.

Then I saw/remembered that boxplot now just calls matplotlib.cbook.boxplot_stats and passes the results to matplotlib.Axes.bxp.

So then I did “matplotlib.cbook.boxplot_stats” to see how the whiskers were computed.

-paul

···

On Wed, Aug 26, 2015 at 8:43 PM, chtan <chtan@…4693…> wrote:

Uh, now I understand why it’s behaving this way. Tx Paul.

From the documentation, it seems natural to expect the behaviour to be

uniform throughout the meaningful range for IQR.

How may I go about searching for the responsible code on my own in

situations like this?

From the perplexing behaviour to the little nugget in

matplotlib.cbook.boxplot_stats, the path isn’t clear to me.

Any general advice?

View this message in context: http://matplotlib.1069221.n5.nabble.com/boxplot-behaviour-in-an-extreme-scenario-tp46027p46032.html
Sent from the matplotlib - users mailing list archive at Nabble.com.



Matplotlib-users mailing list

Matplotlib-users@lists.sourceforge.net

https://lists.sourceforge.net/lists/listinfo/matplotlib-users

Great, thanks!

Rgds
marcus

···

--
View this message in context: http://matplotlib.1069221.n5.nabble.com/boxplot-behaviour-in-an-extreme-scenario-tp46027p46034.html
Sent from the matplotlib - users mailing list archive at Nabble.com.