Boxplots with Bootstrapped Intervals

Hey folks,

I recently modified the Axes method boxplot so that the confidence intervals around the mean are computed not with a static formula, but by bootstrapping the median as many times as the user specifies. Also, I commented out the lines that prevent the boxplots from folding around the hinges (but that's obviously minor and in the current SVN if I'm not mistaken).

Is this something that would be worth including in matplotlib? I've never contributed to a project like this before and my code is probably pretty sloppy by MPL standards. I'm not really sure what's appropriate to contribute and what's not.

Regards,
-paul h.

PHobson@...814... wrote:

Hey folks,

I recently modified the Axes method boxplot so that the confidence intervals around the mean are computed not with a static formula, but by bootstrapping the median as many times as the user specifies. Also, I commented out the lines that prevent the boxplots from folding around the hinges (but that's obviously minor and in the current SVN if I'm not mistaken).

Is this something that would be worth including in matplotlib? I've never contributed to a project like this before and my code is probably pretty sloppy by MPL standards. I'm not really sure what's appropriate to contribute and what's not.
  
Hi Paul,

This sounds interesting.

I think the best thing to do is to post the patch so that it can be
reviewed. Sending the output of "svn diff" as an attachment to this
email list would be easy from our end. (A github based submission --
fork the repo and push your commits -- would also work well for me, but
I'm not sure about the other MPL devs.)

-Andrew

Andrew,

Thanks for the reply. At the risk of embarrassment, I'm going to admit that I'm not at all familiar with SVN other than I know that it's version control software. Nonetheless I gave it a shot.

I guess I should add that I didn't account for the fact that the user might want to have the CIs output with the other boxplot properties. Shouldn't be too hard to add in though. Also, I'm using the percentile method -- meaning that after I get my "normal" distribution of medians, I simply use mlab's percentile function to get the 2.5th and 97.5th percentile of that distribution. The other method (bias-corrected and accelerated) was too complex for me to code up quickly without using Rpy2, and that just seemed silly.

Thanks again,
-paul

boxplot.patch (2.6 KB)

···

PHobson@...814... wrote:
> Hey folks,
>
> I recently modified the Axes method boxplot so that the confidence
intervals around the mean are computed not with a static formula, but by
bootstrapping the median as many times as the user specifies. Also, I
commented out the lines that prevent the boxplots from folding around the
hinges (but that's obviously minor and in the current SVN if I'm not
mistaken).
>
> Is this something that would be worth including in matplotlib? I've
never contributed to a project like this before and my code is probably
pretty sloppy by MPL standards. I'm not really sure what's appropriate to
contribute and what's not.
>

-----Original Message-----
From: Andrew Straw [mailto:strawman@…36…]
Sent: Wednesday, February 10, 2010 2:20 PM
To: Paul Hobson
Cc: matplotlib-devel@lists.sourceforge.net
Subject: Re: [matplotlib-devel] Boxplots with Bootstrapped Intervals
...
I think the best thing to do is to post the patch so that it can be
reviewed. Sending the output of "svn diff" as an attachment to this
email list would be easy from our end. (A github based submission --
fork the repo and push your commits -- would also work well for me, but
I'm not sure about the other MPL devs.)

PHobson@...814... wrote:

    

Hey folks,

I recently modified the Axes method boxplot so that the confidence
      

intervals around the mean are computed not with a static formula, but by
bootstrapping the median as many times as the user specifies. Also, I
commented out the lines that prevent the boxplots from folding around the
hinges (but that's obviously minor and in the current SVN if I'm not
mistaken).
    

Is this something that would be worth including in matplotlib? I've
      

never contributed to a project like this before and my code is probably
pretty sloppy by MPL standards. I'm not really sure what's appropriate to
contribute and what's not.
    
From: Andrew Straw [mailto:strawman@…36…]
Sent: Wednesday, February 10, 2010 2:20 PM
To: Paul Hobson
Cc: matplotlib-devel@lists.sourceforge.net
Subject: Re: [matplotlib-devel] Boxplots with Bootstrapped Intervals
...
I think the best thing to do is to post the patch so that it can be
reviewed. Sending the output of "svn diff" as an attachment to this
email list would be easy from our end. (A github based submission --
fork the repo and push your commits -- would also work well for me, but
I'm not sure about the other MPL devs.)
    
Andrew,

Thanks for the reply. At the risk of embarrassment, I'm going to admit that I'm not at all familiar with SVN other than I know that it's version control software. Nonetheless I gave it a shot.

I guess I should add that I didn't account for the fact that the user might want to have the CIs output with the other boxplot properties. Shouldn't be too hard to add in though. Also, I'm using the percentile method -- meaning that after I get my "normal" distribution of medians, I simply use mlab's percentile function to get the 2.5th and 97.5th percentile of that distribution. The other method (bias-corrected and accelerated) was too complex for me to code up quickly without using Rpy2, and that just seemed silly.
  

Hi Paul,

I committed a modified version of your code in r8127. This new code is
backwards compatible in the sense that it doesn't change anything for
existing uses of boxplot, but allows use of the bootstrapped approach by
specifying "notch=1" and "bootstrap=N" where N is the number of
resampling steps.

Thanks,
Andrew

···

PHobson@...814... wrote:
-----Original Message-----