load function

Humufr · January 31, 2005, 4:01pm

Hi John,

I did some change in the load function, it's help for me so perhaps some other people will appreciate it. I add the possibility to choose which columns inside a file you want use.

ex: I have a file like:

1 2 3

and I want use only the columns 2 and 3:

so I can do this with the new load function:

load('toto.dat',columns=[2,3])

Perhaps you can arrange this to have something a little bit cleaner (I'm a beginner with python).

I add the change to have the matrix transpose too to correct the problem I hab before to recuperate the columns and not the lines when you are doing:

x,y,z = load('toto.dat')

Thanks,

Nicolas

ps: I add something inside the doc but you had to correct this too for a correct english or/and more understandable text if you include these changes.

···

------------------------------------------------------------

def load(fname,comments='%',columns=None):
"""
Load ASCII data from fname into an array and return the array.

The data must be regular, same number of values in every row

fname can be a filename or a file handle.

A character for to delimit the comments can be use (optional),

the default is the matlab character '%'.

   An second optional argument can be add, to tell which columns you
      want use in the file. This arguments is a list who contains the
      number of columns beggining by 1.

matfile data is not currently supported, but see
Nigel Wade's matfile ftp://ion.le.ac.uk/matfile/matfile.tar.gz

Example usage:

x,y = load('test.dat') # data in two columns

X = load('test.dat') # a matrix of data

x = load('test.dat') # a single column of data

x = load('test.dat,'#') # the character use like a comment delimiter is '#'

"""

   if is_string_like(fname):
       fh = file(fname)
   elif hasattr(fname, 'seek'):
       fh = fname
   else:
       raise ValueError('fname must be a string or file handle')
      X = []
   numCols = None
   for line in fh:
       line = line[:line.find(comments)].strip()
       if not len(line): continue
       row = [float(val) for val in line.split()]
       thisLen = len(row)
       if numCols is not None and thisLen != numCols:
           raise ValueError('All rows must have the same number of columns')
       if columns is not None:
           row = [row[i] for i in (array(columns)-1)]
       X.append(row)

   X = array(X)
   r,c = X.shape
   if r==1 or c==1:
       X.shape = max([r,c]),
   return transpose(X)

Stephen_Walton1 · January 31, 2005, 6:39pm

I don't think the load function needs to be changed in the way you suggest. The "problem" is not the load function. It is the fact that numarray arrays are stored in row-major, not column-major, order, and so tuple unpacking a numarray array goes by row, not by column.

In [15]: A=arange(6)

In [16]: A.shape=(3,2)

In [17]: x,y=A

···

---------------------------------------------------------------------------
exceptions.ValueError Traceback (most recent call last)

ValueError: too many values to unpack

The transpose is required here as well if you want to unpack by columns. If I have a file containing 731 rows and 17 columns and use 'load', I get an array with 731 rows and 17 columns, exactly as I expect.

I'm far from a Python expert myself ;-), but you can do what you're trying with the single line

x,y=transpose(load('toto.dat')[:,1:3])

(note that array indexing in Python is zero-based, not one-based, and also read up on how slices work).