boxplot -- how (more)

That should all be in the boxplot docstring. Do you use ipython? If
not, you should :slight_smile:

if so, just do `plt.boxplot?` at the ipython terminal and it'll show up.
-paul

路路路

On Tue, Aug 21, 2012 at 8:56 AM, Virgil Stokes <vs@...2650...> wrote:

On 21-Aug-2012 17:50, Paul Hobson wrote:

On Tue, Aug 21, 2012 at 7:58 AM, Virgil Stokes <vs@...2650...> wrote:

In reference to my previous email.

How can I find the outliers (samples points beyond the whiskers) in the
data
used for the boxplot?

Here is a code snippet that shows how it was used for the timings data (a
list
of 4 sublists (y1,y2,y3,y4), each containing 400,000 real data values),
聽聽聽聽...
聽聽聽聽...
聽聽聽聽...
聽聽聽聽# Box Plots
聽聽聽聽plt.subplot(2,1,2)
聽聽聽聽timings = [y1,y2,y3,y4]
聽聽聽聽pos = np.array(range(len(timings)))+1
聽聽聽聽bp = plt.boxplot( timings, sym='k+', patch_artist=True,
聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽positions=pos, notch=1, bootstrap=5000 )

聽聽聽聽plt.xlabel('Algorithm')
聽聽聽聽plt.ylabel('Exection time (sec)')
聽聽聽聽plt.ylim(0.9*ymin,1.1*ymax)

聽聽聽聽plt.setp(bp['whiskers'], color='k', linestyle='-' )
聽聽聽聽plt.setp(bp['fliers'], markersize=3.0)
聽聽聽聽plt.title('Box plots (%4d trials)' %(n))
聽聽聽聽plt.show()
聽聽聽聽...
聽聽聽聽...
聽聽聽聽...

Again my questions:
1) How to get the value of the median?
2) How to find the outliers (outside the whiskers)?
3) How to find the width of the notch?

Virgil, the objects stuffed inside the `bp` dictionary should have
methods to retrieve their values. Let's see:

In [35]: x = np.random.lognormal(mean=1.25, sigma=1.35, size=(37,3))

In [36]: bp = plt.boxplot(x, bootstrap=5000, notch=True)

In [37]: # Question 1
聽聽聽聽聽...: print('medians')
聽聽聽聽聽...: for n, median in enumerate(bp['medians']):
聽聽聽聽聽...: print('%d: %f' % (n, median.get_ydata()[0]))
聽聽聽聽聽...:
medians
0: 6.339692
1: 3.449320
2: 4.503706

In [38]: # Question 2
聽聽聽聽聽...: print('fliers')
聽聽聽聽聽...: for n in range(0, len(bp['fliers']), 2):
聽聽聽聽聽...: print('%d: upper outliers = \t' % (n/2,))
聽聽聽聽聽...: print(bp['fliers'][n].get_ydata())
聽聽聽聽聽...: print('\n%d: lower outliers = \t' % (n/2,))
聽聽聽聽聽...: print(bp['fliers'][n+1].get_ydata())
聽聽聽聽聽...: print('\n')
聽聽聽聽聽...:

You had no outliers!

In [39]: # Question 3
聽聽聽聽聽...: print('Confidence Intervals')
聽聽聽聽聽...: for n, box in enumerate(bp['boxes']):
聽聽聽聽聽...: print('%d: lower CI: %f' % (n, box.get_ydata()[2]))
聽聽聽聽聽...: print('%d: upper CI: %f' % (n, box.get_ydata()[4]))
聽聽聽聽聽...:
Confidence Intervals
0: lower CI: 1.760701
0: upper CI: 10.102221
1: lower CI: 1.626386
1: upper CI: 5.601927
2: lower CI: 2.173173

Hope that helps,
-paul

Just what I was looking for Paul! Thanks very much.

One final question --- Where can I find the documentation that answers my
questions and gives more details about the equations used for the width of
notch. etc.?

Thanks again :slight_smile:

I still have a problem...
Let me show the updated code snippet again
聽聽聽...
聽聽聽# Box Plots
聽聽聽iplt += 1
聽聽聽plt.figure(iplt)
聽聽聽timings = [ya[0],ya[1],ya[2],ya[3]]
聽聽聽pos = np.array(range(len(timings)))+1
聽聽聽bp = plt.boxplot( timings, sym='k+', patch_artist=True,
聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽positions=pos, notch=1, bootstrap=5000 )
聽聽聽print ('medians')
聽聽聽for nn,median in enumerate(bp['medians']):
聽聽聽聽聽聽聽print('%d: %f' %(nn,median.get_ydata()[0]))

聽聽聽print('fliers')
聽聽聽for nn in range(0, len(bp['fliers']), 2):
聽聽聽聽聽聽聽print('%d: upper outliers = \t' % (nn/2,))
聽聽聽聽聽聽聽print(bp['fliers'][nn].get_ydata())
聽聽聽聽聽聽聽print('\n%d: lower outliers = \t' % (nn/2,))
聽聽聽聽聽聽聽print(bp['fliers'][nn+1].get_ydata())
聽聽聽聽聽聽聽print('\n')

聽聽聽print('Confidence Intervals')
聽聽聽for nn, box in enumerate(bp['boxes']):
聽聽聽聽聽聽聽print('%d: lower CI: %f' % (nn, box.get_ydata()[2]))<--- FAILS!
聽聽聽聽聽聽聽print('%d: upper CI: %f' % (nn, box.get_ydata()[4]))
聽聽聽...

Medians and fliers work perfectly; but, I get the following error message when trying to access the confidence intervals:

AttributeError: 'PathPatch' object has no attribute 'get_ydata'

Note, I am using boxplot with 4 sets of data and I am using matplotlib vers. 1.1.0.

Any suggestions on how to fix this problem?

路路路

On 21-Aug-2012 17:59, Paul Hobson wrote:

On Tue, Aug 21, 2012 at 8:56 AM, Virgil Stokes <vs@...2650...> wrote:

On 21-Aug-2012 17:50, Paul Hobson wrote:

On Tue, Aug 21, 2012 at 7:58 AM, Virgil Stokes <vs@...2650...> wrote:

In reference to my previous email.

How can I find the outliers (samples points beyond the whiskers) in the
data
used for the boxplot?

Here is a code snippet that shows how it was used for the timings data (a
list
of 4 sublists (y1,y2,y3,y4), each containing 400,000 real data values),
聽聽聽聽聽...
聽聽聽聽聽# Box Plots
聽聽聽聽聽plt.subplot(2,1,2)
聽聽聽聽聽timings = [y1,y2,y3,y4]
聽聽聽聽聽pos = np.array(range(len(timings)))+1
聽聽聽聽聽bp = plt.boxplot( timings, sym='k+', patch_artist=True,
聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽positions=pos, notch=1, bootstrap=5000 )

聽聽聽聽聽plt.xlabel('Algorithm')
聽聽聽聽聽plt.ylabel('Exection time (sec)')
聽聽聽聽聽plt.ylim(0.9*ymin,1.1*ymax)

聽聽聽聽聽plt.setp(bp['whiskers'], color='k', linestyle='-' )
聽聽聽聽聽plt.setp(bp['fliers'], markersize=3.0)
聽聽聽聽聽plt.title('Box plots (%4d trials)' %(n))
聽聽聽聽聽plt.show()
聽聽聽聽聽...

Again my questions:
1) How to get the value of the median?
2) How to find the outliers (outside the whiskers)?
3) How to find the width of the notch?

Virgil, the objects stuffed inside the `bp` dictionary should have
methods to retrieve their values. Let's see:

In [35]: x = np.random.lognormal(mean=1.25, sigma=1.35, size=(37,3))

In [36]: bp = plt.boxplot(x, bootstrap=5000, notch=True)

In [37]: # Question 1
聽聽聽聽聽聽...: print('medians')
聽聽聽聽聽聽...: for n, median in enumerate(bp['medians']):
聽聽聽聽聽聽...: print('%d: %f' % (n, median.get_ydata()[0]))
聽聽聽聽聽聽...:
medians
0: 6.339692
1: 3.449320
2: 4.503706

In [38]: # Question 2
聽聽聽聽聽聽...: print('fliers')
聽聽聽聽聽聽...: for n in range(0, len(bp['fliers']), 2):
聽聽聽聽聽聽...: print('%d: upper outliers = \t' % (n/2,))
聽聽聽聽聽聽...: print(bp['fliers'][n].get_ydata())
聽聽聽聽聽聽...: print('\n%d: lower outliers = \t' % (n/2,))
聽聽聽聽聽聽...: print(bp['fliers'][n+1].get_ydata())
聽聽聽聽聽聽...: print('\n')
聽聽聽聽聽聽...:

You had no outliers!

In [39]: # Question 3
聽聽聽聽聽聽...: print('Confidence Intervals')
聽聽聽聽聽聽...: for n, box in enumerate(bp['boxes']):
聽聽聽聽聽聽...: print('%d: lower CI: %f' % (n, box.get_ydata()[2]))
聽聽聽聽聽聽...: print('%d: upper CI: %f' % (n, box.get_ydata()[4]))
聽聽聽聽聽聽...:
Confidence Intervals
0: lower CI: 1.760701
0: upper CI: 10.102221
1: lower CI: 1.626386
1: upper CI: 5.601927
2: lower CI: 2.173173

Hope that helps,
-paul

Just what I was looking for Paul! Thanks very much.

One final question --- Where can I find the documentation that answers my
questions and gives more details about the equations used for the width of
notch. etc.?

Thanks again :slight_smile:

That should all be in the boxplot docstring. Do you use ipython? If
not, you should :slight_smile:

if so, just do `plt.boxplot?` at the ipython terminal and it'll show up.
-paul

I found the solution,

聽聽one must have,

patch_artist=False

in the boxplot call.

:slight_smile:

路路路

On 22-Aug-2012 11:23, Virgil Stokes wrote:

On 21-Aug-2012 17:59, Paul Hobson wrote:

On Tue, Aug 21, 2012 at 8:56 AM, Virgil Stokes <vs@...2650...> wrote:

On 21-Aug-2012 17:50, Paul Hobson wrote:

On Tue, Aug 21, 2012 at 7:58 AM, Virgil Stokes <vs@...2650...> wrote:

In reference to my previous email.

How can I find the outliers (samples points beyond the whiskers) in the
data
used for the boxplot?

Here is a code snippet that shows how it was used for the timings data (a
list
of 4 sublists (y1,y2,y3,y4), each containing 400,000 real data values),
聽聽聽聽聽聽...
聽聽聽聽聽聽# Box Plots
聽聽聽聽聽聽plt.subplot(2,1,2)
聽聽聽聽聽聽timings = [y1,y2,y3,y4]
聽聽聽聽聽聽pos = np.array(range(len(timings)))+1
聽聽聽聽聽聽bp = plt.boxplot( timings, sym='k+', patch_artist=True,
聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽positions=pos, notch=1, bootstrap=5000 )

聽聽聽聽聽聽plt.xlabel('Algorithm')
聽聽聽聽聽聽plt.ylabel('Exection time (sec)')
聽聽聽聽聽聽plt.ylim(0.9*ymin,1.1*ymax)

聽聽聽聽聽聽plt.setp(bp['whiskers'], color='k', linestyle='-' )
聽聽聽聽聽聽plt.setp(bp['fliers'], markersize=3.0)
聽聽聽聽聽聽plt.title('Box plots (%4d trials)' %(n))
聽聽聽聽聽聽plt.show()
聽聽聽聽聽聽...

Again my questions:
1) How to get the value of the median?
2) How to find the outliers (outside the whiskers)?
3) How to find the width of the notch?

Virgil, the objects stuffed inside the `bp` dictionary should have
methods to retrieve their values. Let's see:

In [35]: x = np.random.lognormal(mean=1.25, sigma=1.35, size=(37,3))

In [36]: bp = plt.boxplot(x, bootstrap=5000, notch=True)

In [37]: # Question 1
聽聽聽聽聽聽聽...: print('medians')
聽聽聽聽聽聽聽...: for n, median in enumerate(bp['medians']):
聽聽聽聽聽聽聽...: print('%d: %f' % (n, median.get_ydata()[0]))
聽聽聽聽聽聽聽...:
medians
0: 6.339692
1: 3.449320
2: 4.503706

In [38]: # Question 2
聽聽聽聽聽聽聽...: print('fliers')
聽聽聽聽聽聽聽...: for n in range(0, len(bp['fliers']), 2):
聽聽聽聽聽聽聽...: print('%d: upper outliers = \t' % (n/2,))
聽聽聽聽聽聽聽...: print(bp['fliers'][n].get_ydata())
聽聽聽聽聽聽聽...: print('\n%d: lower outliers = \t' % (n/2,))
聽聽聽聽聽聽聽...: print(bp['fliers'][n+1].get_ydata())
聽聽聽聽聽聽聽...: print('\n')
聽聽聽聽聽聽聽...:

You had no outliers!

In [39]: # Question 3
聽聽聽聽聽聽聽...: print('Confidence Intervals')
聽聽聽聽聽聽聽...: for n, box in enumerate(bp['boxes']):
聽聽聽聽聽聽聽...: print('%d: lower CI: %f' % (n, box.get_ydata()[2]))
聽聽聽聽聽聽聽...: print('%d: upper CI: %f' % (n, box.get_ydata()[4]))
聽聽聽聽聽聽聽...:
Confidence Intervals
0: lower CI: 1.760701
0: upper CI: 10.102221
1: lower CI: 1.626386
1: upper CI: 5.601927
2: lower CI: 2.173173

Hope that helps,
-paul

Just what I was looking for Paul! Thanks very much.

One final question --- Where can I find the documentation that answers my
questions and gives more details about the equations used for the width of
notch. etc.?

Thanks again :slight_smile:

That should all be in the boxplot docstring. Do you use ipython? If
not, you should :slight_smile:

if so, just do `plt.boxplot?` at the ipython terminal and it'll show up.
-paul

I still have a problem...
Let me show the updated code snippet again
聽聽聽聽...
聽聽聽聽# Box Plots
聽聽聽聽iplt += 1
聽聽聽聽plt.figure(iplt)
聽聽聽聽timings = [ya[0],ya[1],ya[2],ya[3]]
聽聽聽聽pos = np.array(range(len(timings)))+1
聽聽聽聽bp = plt.boxplot( timings, sym='k+', patch_artist=True,
聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽positions=pos, notch=1, bootstrap=5000 )
聽聽聽聽print ('medians')
聽聽聽聽for nn,median in enumerate(bp['medians']):
聽聽聽聽聽聽聽聽print('%d: %f' %(nn,median.get_ydata()[0]))

聽聽聽聽print('fliers')
聽聽聽聽for nn in range(0, len(bp['fliers']), 2):
聽聽聽聽聽聽聽聽print('%d: upper outliers = \t' % (nn/2,))
聽聽聽聽聽聽聽聽print(bp['fliers'][nn].get_ydata())
聽聽聽聽聽聽聽聽print('\n%d: lower outliers = \t' % (nn/2,))
聽聽聽聽聽聽聽聽print(bp['fliers'][nn+1].get_ydata())
聽聽聽聽聽聽聽聽print('\n')

聽聽聽聽print('Confidence Intervals')
聽聽聽聽for nn, box in enumerate(bp['boxes']):
聽聽聽聽聽聽聽聽print('%d: lower CI: %f' % (nn, box.get_ydata()[2]))<--- FAILS!
聽聽聽聽聽聽聽聽print('%d: upper CI: %f' % (nn, box.get_ydata()[4]))
聽聽聽聽...

Medians and fliers work perfectly; but, I get the following error message when
trying to access the confidence intervals:

AttributeError: 'PathPatch' object has no attribute 'get_ydata'

Note, I am using boxplot with 4 sets of data and I am using matplotlib vers. 1.1.0.

Any suggestions on how to fix this problem?