One more question regarding to boxplotting

Hello,

I construct my boxplots (shown in this figure: http://img204.imageshack.us/img204/7518/boxplot2.png) using 5th, 25th, 50th, 75th, 95th percent of my data explicitly. For some reason on boxplot 3 and 5 on the figure I get fliers instead of whiskers on the lower parts.

Do you have any idea what could be the reason for this behaviour?

Gökhan

Why don't you perform a histogram on the data that produced that boxplot, .. seeing the shape of that histogram may answer your own question. Is it skewed or normal distribution?

Gökhan SEVER wrote:

···

Hello,

I construct my boxplots (shown in this figure: http://img204.imageshack.us/img204/7518/boxplot2.png) using 5th, 25th, 50th, 75th, 95th percent of my data explicitly. For some reason on boxplot 3 and 5 on the figure I get fliers instead of whiskers on the lower parts.

Do you have any idea what could be the reason for this behaviour?

Gökhan
------------------------------------------------------------------------

------------------------------------------------------------------------------
The NEW KODAK i700 Series Scanners deliver under ANY circumstances! Your
production scanning environment may not be a perfect world - but thanks to
Kodak, there's a perfect scanner to get the job done! With the NEW KODAK i700
Series Scanner you'll get full speed at 300 dpi even with all image processing features enabled. http://p.sf.net/sfu/kodak-com
------------------------------------------------------------------------

_______________________________________________
Matplotlib-users mailing list
Matplotlib-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/matplotlib-users

I can’t tell easily about the distribution of data points looking at histograms, since I am calling boxplot as in following notation:

In [32]: d[2][8:]
Out[32]: array([98.2507, 99.6293, 100.0359, 100.1859, 100.4691])

Here the elements of my array are 5th, 25, 50, 75, 95th percentile of the original data array, where these are created with a simple calculation before boxplot command is called.

For example when I do, boxplot([5,25,50,75,95]) I get the desired effect exactly. For some reason in my case 2 out of 12 boxplot has fliers instead of whiskers to be drawn. Might this be related to rounding off these numbers?

Thanks.

Gökhan

···

On Tue, May 12, 2009 at 5:32 PM, Stephen George <steve_geo@…887…> wrote:

Why don’t you perform a histogram on the data that produced that boxplot, … seeing the shape of that histogram may answer your own question. Is it skewed or normal distribution?

Gökhan SEVER wrote:

Hello,

I construct my boxplots (shown in this figure: http://img204.imageshack.us/img204/7518/boxplot2.png) using 5th, 25th, 50th, 75th, 95th percent of my data explicitly. For some reason on boxplot 3 and 5 on the figure I get fliers instead of whiskers on the lower parts.

Do you have any idea what could be the reason for this behaviour?

Gökhan



The NEW KODAK i700 Series Scanners deliver under ANY circumstances! Your

production scanning environment may not be a perfect world - but thanks to

Kodak, there’s a perfect scanner to get the job done! With the NEW KODAK i700

Series Scanner you’ll get full speed at 300 dpi even with all image processing features enabled. http://p.sf.net/sfu/kodak-com



Matplotlib-users mailing list

Matplotlib-users@lists.sourceforge.net

https://lists.sourceforge.net/lists/listinfo/matplotlib-users

Gökhan SEVER-2 wrote:

For some reason on boxplot 3 and 5 on the figure I get fliers instead of
whiskers on the lower parts.

When I look closely at your graphic it looks to me like the lower whiskers
are in fact being plotted, but just (essentially) overlayed on lower
quartile part of the interquartile box. What do you see if you only plot
d[2][8:] with no other boxes? Perhaps showing only one of the problem boxes
will allow the Y axis resolution to be such that you can see this effect
easier...

···

--
View this message in context: http://www.nabble.com/One-more-question-regarding-to-boxplotting-tp23508395p23514606.html
Sent from the matplotlib - users mailing list archive at Nabble.com.

I zoomed into the plot to see if the whiskers are usually being plotted. There seems like a vertical line plotted over lower part of the boxplot, however not in the right place.

I am attaching the simple text file that has the quartile values in it. I run ipython --pylab and do the following for a simple test.

d = loadtxt(‘tas’, skiprows=2)

for a one line testing

boxplot(d[0][8:])

all boxplots in one plot

boxplot([d[i][8:] for i in range(12)])

Gökhan

tas (2.14 KB)

···

On Tue, May 12, 2009 at 10:05 PM, Josh Hemann <jhemann@…120…1899…> wrote:

Gökhan SEVER-2 wrote:

For some reason on boxplot 3 and 5 on the figure I get fliers instead of

whiskers on the lower parts.

When I look closely at your graphic it looks to me like the lower whiskers

are in fact being plotted, but just (essentially) overlayed on lower

quartile part of the interquartile box. What do you see if you only plot

d[2][8:] with no other boxes? Perhaps showing only one of the problem boxes

will allow the Y axis resolution to be such that you can see this effect

easier…

View this message in context: http://www.nabble.com/One-more-question-regarding-to-boxplotting-tp23508395p23514606.html

Sent from the matplotlib - users mailing list archive at Nabble.com.


The NEW KODAK i700 Series Scanners deliver under ANY circumstances! Your

production scanning environment may not be a perfect world - but thanks to

Kodak, there’s a perfect scanner to get the job done! With the NEW KODAK i700

Series Scanner you’ll get full speed at 300 dpi even with all image

processing features enabled. http://p.sf.net/sfu/kodak-com


Matplotlib-users mailing list

Matplotlib-users@lists.sourceforge.net

https://lists.sourceforge.net/lists/listinfo/matplotlib-users

Thanks for sending the data and code. After playing around some I still don't
have a confident guess as to the problem (or solution), but here is what I
would look at more...

I issued plot(d[i][8:]) for i 0,1,...11 and looked at the shape of the
lines. For the two problem boxes, the plots of the associated data have
steep jumps between the 5th and 25th percentiles, when compared with the
data associated with the "good" boxes. So, what you have calculated as the
5th and 25th percentiles are not necessarily calculated by boxplot as such
because boxplot does not know that you are handing it percentiles of your
underlying data: boxplot actually computes the percentiles assuming that the
input _is_ the raw data. I would guess that if you gave boxplot the raw data
you would not see this issue of missing whiskers.

···

--
View this message in context: http://www.nabble.com/One-more-question-regarding-to-boxplotting-tp23508395p23526653.html
Sent from the matplotlib - users mailing list archive at Nabble.com.

Thank you for the response once again.

That’s why I am suspecting actually the raw data. At the problem points there might be not included values or missing values where not exist on the normal plots.

I will find the original data and feed boxplot with it to see how it effects the final result.

Gökhan

···

On Wed, May 13, 2009 at 12:58 PM, Josh Hemann <jhemann@…120…1899…> wrote:

Thanks for sending the data and code. After playing around some I still don’t

have a confident guess as to the problem (or solution), but here is what I

would look at more…

I issued plot(d[i][8:]) for i 0,1,…11 and looked at the shape of the

lines. For the two problem boxes, the plots of the associated data have

steep jumps between the 5th and 25th percentiles, when compared with the

data associated with the “good” boxes. So, what you have calculated as the

5th and 25th percentiles are not necessarily calculated by boxplot as such

because boxplot does not know that you are handing it percentiles of your

underlying data: boxplot actually computes the percentiles assuming that the

input is the raw data. I would guess that if you gave boxplot the raw data

you would not see this issue of missing whiskers.

View this message in context: http://www.nabble.com/One-more-question-regarding-to-boxplotting-tp23508395p23526653.html

Sent from the matplotlib - users mailing list archive at Nabble.com.


The NEW KODAK i700 Series Scanners deliver under ANY circumstances! Your

production scanning environment may not be a perfect world - but thanks to

Kodak, there’s a perfect scanner to get the job done! With the NEW KODAK i700

Series Scanner you’ll get full speed at 300 dpi even with all image

processing features enabled. http://p.sf.net/sfu/kodak-com


Matplotlib-users mailing list

Matplotlib-users@lists.sourceforge.net

https://lists.sourceforge.net/lists/listinfo/matplotlib-users

Ok,

With this figure, it is clearer to see what’s wrong with two of my boxplots. I pull the original data and feed boxplot with it.

The 1st boxplot is using only quartiles and the next is providing the actual data array.

http://img140.imageshack.us/img140/4705/boxplots.png

To me the second boxplot seems more convenient to put an academic paper. What do you think? These boxplots only show the variation in true air speed of a small leg of a research flight.

Would there be a better representation of in addition to / as an alternative boxplotting?

Gökhan

···

On Wed, May 13, 2009 at 1:41 PM, Gökhan SEVER <gokhansever@…287…> wrote:

Thank you for the response once again.

That’s why I am suspecting actually the raw data. At the problem points there might be not included values or missing values where not exist on the normal plots.

I will find the original data and feed boxplot with it to see how it effects the final result.

Gökhan

On Wed, May 13, 2009 at 12:58 PM, Josh Hemann <jhemann@…1899…> wrote:

Thanks for sending the data and code. After playing around some I still don’t

have a confident guess as to the problem (or solution), but here is what I

would look at more…

I issued plot(d[i][8:]) for i 0,1,…11 and looked at the shape of the

lines. For the two problem boxes, the plots of the associated data have

steep jumps between the 5th and 25th percentiles, when compared with the

data associated with the “good” boxes. So, what you have calculated as the

5th and 25th percentiles are not necessarily calculated by boxplot as such

because boxplot does not know that you are handing it percentiles of your

underlying data: boxplot actually computes the percentiles assuming that the

input is the raw data. I would guess that if you gave boxplot the raw data

you would not see this issue of missing whiskers.

View this message in context: http://www.nabble.com/One-more-question-regarding-to-boxplotting-tp23508395p23526653.html

Sent from the matplotlib - users mailing list archive at Nabble.com.


The NEW KODAK i700 Series Scanners deliver under ANY circumstances! Your

production scanning environment may not be a perfect world - but thanks to

Kodak, there’s a perfect scanner to get the job done! With the NEW KODAK i700

Series Scanner you’ll get full speed at 300 dpi even with all image

processing features enabled. http://p.sf.net/sfu/kodak-com


Matplotlib-users mailing list

Matplotlib-users@lists.sourceforge.net

https://lists.sourceforge.net/lists/listinfo/matplotlib-users

One more point to add.

I issued one more boxplot with prctile(data) (a mlab command which boxplot calls internally to calculate percentiles) as an argument to it.

Guess what?

I get almost the same as in initially I have :slight_smile: without a lower whisker.

I don’t know I am confusing myself or is it the data…

Gökhan

···

On Wed, May 13, 2009 at 7:56 PM, Gökhan SEVER <gokhansever@…287…> wrote:

Ok,

With this figure, it is clearer to see what’s wrong with two of my boxplots. I pull the original data and feed boxplot with it.

The 1st boxplot is using only quartiles and the next is providing the actual data array.

http://img140.imageshack.us/img140/4705/boxplots.png

To me the second boxplot seems more convenient to put an academic paper. What do you think? These boxplots only show the variation in true air speed of a small leg of a research flight.

Would there be a better representation of in addition to / as an alternative boxplotting?

Gökhan

On Wed, May 13, 2009 at 1:41 PM, Gökhan SEVER <gokhansever@…287…> wrote:

Thank you for the response once again.

That’s why I am suspecting actually the raw data. At the problem points there might be not included values or missing values where not exist on the normal plots.

I will find the original data and feed boxplot with it to see how it effects the final result.

Gökhan

On Wed, May 13, 2009 at 12:58 PM, Josh Hemann <jhemann@…1899…> wrote:

Thanks for sending the data and code. After playing around some I still don’t

have a confident guess as to the problem (or solution), but here is what I

would look at more…

I issued plot(d[i][8:]) for i 0,1,…11 and looked at the shape of the

lines. For the two problem boxes, the plots of the associated data have

steep jumps between the 5th and 25th percentiles, when compared with the

data associated with the “good” boxes. So, what you have calculated as the

5th and 25th percentiles are not necessarily calculated by boxplot as such

because boxplot does not know that you are handing it percentiles of your

underlying data: boxplot actually computes the percentiles assuming that the

input is the raw data. I would guess that if you gave boxplot the raw data

you would not see this issue of missing whiskers.

View this message in context: http://www.nabble.com/One-more-question-regarding-to-boxplotting-tp23508395p23526653.html

Sent from the matplotlib - users mailing list archive at Nabble.com.


The NEW KODAK i700 Series Scanners deliver under ANY circumstances! Your

production scanning environment may not be a perfect world - but thanks to

Kodak, there’s a perfect scanner to get the job done! With the NEW KODAK i700

Series Scanner you’ll get full speed at 300 dpi even with all image

processing features enabled. http://p.sf.net/sfu/kodak-com


Matplotlib-users mailing list

Matplotlib-users@lists.sourceforge.net

https://lists.sourceforge.net/lists/listinfo/matplotlib-users

Hello,

I have finally solved this riddle while reading the source code of boxplot in axes.py file. And yes whisker plotting is done different than I expect. When I assigned “whis” keyword to 3.0 the lower whisker is plotted on the right spot. And Josh, yes you were right, it did plot the lower whisker as seen on my very first uploaded image.

Still a question stays in my mind: How do you decribe box-whisker plots in your writing while using matplotlib’s boxplot command? It uses 25, 50, 75th percentiles of the data for sure, but apart from what I expected whiskers are not at 5th, and 95th percentiles of the data respectively.

Could someone please comment on this?

Gökhan

···

On Wed, May 13, 2009 at 8:43 PM, Gökhan SEVER <gokhansever@…287…> wrote:

One more point to add.

I issued one more boxplot with prctile(data) (a mlab command which boxplot calls internally to calculate percentiles) as an argument to it.

Guess what?

I get almost the same as in initially I have :slight_smile: without a lower whisker.

I don’t know I am confusing myself or is it the data…

Gökhan

On Wed, May 13, 2009 at 7:56 PM, Gökhan SEVER <gokhansever@…287…> wrote:

Ok,

With this figure, it is clearer to see what’s wrong with two of my boxplots. I pull the original data and feed boxplot with it.

The 1st boxplot is using only quartiles and the next is providing the actual data array.

http://img140.imageshack.us/img140/4705/boxplots.png

To me the second boxplot seems more convenient to put an academic paper. What do you think? These boxplots only show the variation in true air speed of a small leg of a research flight.

Would there be a better representation of in addition to / as an alternative boxplotting?

Gökhan

On Wed, May 13, 2009 at 1:41 PM, Gökhan SEVER <gokhansever@…287…> wrote:

Thank you for the response once again.

That’s why I am suspecting actually the raw data. At the problem points there might be not included values or missing values where not exist on the normal plots.

I will find the original data and feed boxplot with it to see how it effects the final result.

Gökhan

On Wed, May 13, 2009 at 12:58 PM, Josh Hemann <jhemann@…1899…> wrote:

Thanks for sending the data and code. After playing around some I still don’t

have a confident guess as to the problem (or solution), but here is what I

would look at more…

I issued plot(d[i][8:]) for i 0,1,…11 and looked at the shape of the

lines. For the two problem boxes, the plots of the associated data have

steep jumps between the 5th and 25th percentiles, when compared with the

data associated with the “good” boxes. So, what you have calculated as the

5th and 25th percentiles are not necessarily calculated by boxplot as such

because boxplot does not know that you are handing it percentiles of your

underlying data: boxplot actually computes the percentiles assuming that the

input is the raw data. I would guess that if you gave boxplot the raw data

you would not see this issue of missing whiskers.

View this message in context: http://www.nabble.com/One-more-question-regarding-to-boxplotting-tp23508395p23526653.html

Sent from the matplotlib - users mailing list archive at Nabble.com.


The NEW KODAK i700 Series Scanners deliver under ANY circumstances! Your

production scanning environment may not be a perfect world - but thanks to

Kodak, there’s a perfect scanner to get the job done! With the NEW KODAK i700

Series Scanner you’ll get full speed at 300 dpi even with all image

processing features enabled. http://p.sf.net/sfu/kodak-com


Matplotlib-users mailing list

Matplotlib-users@lists.sourceforge.net

https://lists.sourceforge.net/lists/listinfo/matplotlib-users