plt.hist() doesn't recognize the masked array?

Dear all matplotlib users,

Happy New Year.
I try to check the distribution of a 2D array and I find that the histogram plot function doesn’t respect the numpy masked array?

In [188]: a=range(1,6); b=np.array(a+a[::-1])

In [189]: b=np.ma.masked_equal(b,2); b=np.ma.masked_equal(b,5)

In [190]: b
Out[190]:
masked_array(data = [1 – 3 4 – -- 4 3 – 1],
mask = [False True False False True True False False True False],
fill_value = 5)

In [191]: n,bins,patches=plt.hist(b)

In [192]: n
Out[192]: array([2, 0, 2, 0, 0, 2, 0, 2, 0, 2])

In [193]: n.sum()
Out[193]: 10

it seems that all the elements (masked or not) are counted in the history plotting?
and the original value is used but not the fill_value?

I attach a figure below.

In [194]: plt.show()

eg_hist.png

···


Chao YUE
Laboratoire des Sciences du Climat et de l’Environnement (LSCE-IPSL)
UMR 1572 CEA-CNRS-UVSQ
Batiment 712 - Pe 119
91191 GIF Sur YVETTE Cedex

Tel: (33) 01 69 08 29 02; Fax:01.69.08.77.16


Yes, this is a known issue (at least, from the comments within the function). Looks like hist() uses np.asarray() instead of np.asanyarray(), which would result in the array being stripped of its mask. However, I don’t think the fix is as straight-forward as changing that to np.asanyarray(). I will take a peek and see what can be done.

Ben Root

···

On Mon, Jan 2, 2012 at 11:10 AM, Chao YUE <chaoyuejoy@…1896…> wrote:

Dear all matplotlib users,

Happy New Year.
I try to check the distribution of a 2D array and I find that the histogram plot function doesn’t respect the numpy masked array?

In [188]: a=range(1,6); b=np.array(a+a[::-1])

In [189]: b=np.ma.masked_equal(b,2); b=np.ma.masked_equal(b,5)

In [190]: b
Out[190]:
masked_array(data = [1 – 3 4 – – 4 3 – 1],
mask = [False True False False True True False False True False],

   fill_value = 5)

In [191]: n,bins,patches=plt.hist(b)

In [192]: n
Out[192]: array([2, 0, 2, 0, 0, 2, 0, 2, 0, 2])

In [193]: n.sum()
Out[193]: 10

it seems that all the elements (masked or not) are counted in the history plotting?

and the original value is used but not the fill_value?

I attach a figure below.

In [194]: plt.show()

Thanks Ben.

cheers,

Chao

2012/1/2 Benjamin Root <ben.root@…1304…>

···

On Mon, Jan 2, 2012 at 11:10 AM, Chao YUE <chaoyuejoy@…287…> wrote:

Dear all matplotlib users,

Happy New Year.
I try to check the distribution of a 2D array and I find that the histogram plot function doesn’t respect the numpy masked array?

In [188]: a=range(1,6); b=np.array(a+a[::-1])

In [189]: b=np.ma.masked_equal(b,2); b=np.ma.masked_equal(b,5)

In [190]: b
Out[190]:
masked_array(data = [1 – 3 4 – – 4 3 – 1],
mask = [False True False False True True False False True False],

   fill_value = 5)

In [191]: n,bins,patches=plt.hist(b)

In [192]: n
Out[192]: array([2, 0, 2, 0, 0, 2, 0, 2, 0, 2])

In [193]: n.sum()
Out[193]: 10

it seems that all the elements (masked or not) are counted in the history plotting?

and the original value is used but not the fill_value?

I attach a figure below.

In [194]: plt.show()

Yes, this is a known issue (at least, from the comments within the function). Looks like hist() uses np.asarray() instead of np.asanyarray(), which would result in the array being stripped of its mask. However, I don’t think the fix is as straight-forward as changing that to np.asanyarray(). I will take a peek and see what can be done.

Ben Root


Chao YUE
Laboratoire des Sciences du Climat et de l’Environnement (LSCE-IPSL)
UMR 1572 CEA-CNRS-UVSQ
Batiment 712 - Pe 119
91191 GIF Sur YVETTE Cedex

Tel: (33) 01 69 08 29 02; Fax:01.69.08.77.16