weird interaction: pickle, numpy, matplotlib.hist

Hi All,

I've encountered a strange problem: I've been running some python code on both a linux box and OS X, both with python 2.4.1 and the latest numpy and matplotlib from svn.

I have found that when I transfer pickled numpy arrays from one machine to the other (in either direction), the resulting data *looks* all right (i.e., it is a numpy array of the correct type with the correct values at the correct indices), but it seems to produce the wrong result in (at least) one circumstance: matplotlib.hist() gives the completely wrong picture (and set of bins).

This can be ameliorated by running the array through
    arr=numpy.asarray(arr, dtype=numpy.float64)
but this seems like a complete kludge (and is only needed when you do the transfer between machines).

I've attached a minimal code that exhibits the problem: try
  test_pickle_hist.test(write=True)
on one machine, transfer the output file to another machine, and run
  test_pickle_hist.test(write=False)
on another, and you should see a very strange result (and it should be fixed if you set asarray=True).

Any ideas?

Andrew

test_pickle_hist.py (508 Bytes)

Andrew Jaffe wrote:

Hi All,

I've encountered a strange problem: I've been running some python code
on both a linux box and OS X, both with python 2.4.1 and the latest
numpy and matplotlib from svn.

I have found that when I transfer pickled numpy arrays from one machine
to the other (in either direction), the resulting data *looks* all right
(i.e., it is a numpy array of the correct type with the correct values
at the correct indices), but it seems to produce the wrong result in (at
least) one circumstance: matplotlib.hist() gives the completely wrong
picture (and set of bins).

This can be ameliorated by running the array through
   arr=numpy.asarray(arr, dtype=numpy.float64)
but this seems like a complete kludge (and is only needed when you do
the transfer between machines).

You have a byteorder issue. You Linux box, which I presume has an Intel or AMD
CPU, is little-endian where your OS X box, which I presume has a PPC CPU, is
big-endian. numpy arrays can store their data in either endianness on either
kind of platform; their dtype objects tell you which byteorder they are using.

In the dtype specifications below, '>' means big-endian (I am using a PPC
PowerBook), and '<' means little-endian.

In [31]: a = linspace(0, 10, 11)

In [32]: a
Out[32]: array([ 0., 1., 2., 3., 4., 5., 6., 7., 8., 9., 10.])

In [33]: a.dtype
Out[33]: dtype('>f8')

In [34]: b = a.newbyteorder()

In [35]: b
Out[35]:
array([ 0.00000000e+000, 3.03865194e-319, 3.16202013e-322,
         1.04346664e-320, 2.05531309e-320, 2.56123631e-320,
         3.06715953e-320, 3.57308275e-320, 4.07900597e-320,
         4.33196758e-320, 4.58492919e-320])

In [36]: b.dtype
Out[36]: dtype('<f8')

In [41]: a.tostring()[-8:]
Out[41]: '@$\x00\x00\x00\x00\x00\x00'

In [42]: b.tostring()[-8:]
Out[42]: '@$\x00\x00\x00\x00\x00\x00'

Apparently, the pickle stores the data in the creator machine's byteorder and so
marked. When the reading machine loads the pickle, it recognizes that the
byteorder is opposite its native byteorder by its dtype.

Most operations work as you might expect:

In [44]: a.astype(dtype('<f8'))
Out[44]: array([ 0., 1., 2., 3., 4., 5., 6., 7., 8., 9., 10.])

In [45]: c = _

In [46]: c.dtype
Out[46]: dtype('<f8')

In [47]: a + c
Out[47]: array([ 0., 2., 4., 6., 8., 10., 12., 14., 16., 18., 20.])

Some don't:

In [54]: c.sort()

In [55]: c
Out[55]: array([ 0., 2., 3., 4., 5., 6., 7., 8., 9., 10., 1.])

This is a bug.

http://projects.scipy.org/scipy/numpy/ticket/47

···

--
Robert Kern
robert.kern@...287...

"I have come to believe that the whole world is an enigma, a harmless enigma
that is made terrible by our own mad attempt to interpret it as though it had
an underlying truth."
  -- Umberto Eco