parsing tab separated files into dictionaries - alternative to genfromtxt?

per_freem · November 11, 2009, 5:53pm

hi all,

i've been using genfromtxt to parse tab separated files for plotting
purposes in matplotlib. the problem is that genfromtxt seems to give
only two ways to access the contents of the file: one is by column,
where you can use:

d = genfromtxt(...)

and then do d['header_name1'] to access the column named by
'header_name1', d['header_name2'] to access the column named by
'header_name2', etc. Or it will allow you to traverse the file line
by line, and then access each header by number, i.e.

for line in d:
  field1 = d[0]
  field2 = d[1]
  # etc.

the problem is that the second method relies on knowing the order of
the fields rather than just their name, and the first method does not
allow line by line iteration.
ideally what i would like is to be able to traverse each line of the
parsed file, and then refer to each of its fields by header name, so
that if the column order in the file changes my program will be
unaffected:

for line in d:
field1 = ['header_name1']
field2 = ['header_name2']

is there a way to do this using standard matplotlib/numpy/scipy
utilities? i could write my own code to do this but it seems like
something somebody probably already thought of a good representation
for and has implemented a more optimized version than i could write on
my own. does such a thing exist?

thanks very much

Gokhan_SEVER1 · November 11, 2009, 6:16pm

hi all,

i've been using genfromtxt to parse tab separated files for plotting
purposes in matplotlib. the problem is that genfromtxt seems to give
only two ways to access the contents of the file: one is by column,
where you can use:

d = genfromtxt(...)

and then do d['header_name1'] to access the column named by
'header_name1', d['header_name2'] to access the column named by
'header_name2', etc. Or it will allow you to traverse the file line
by line, and then access each header by number, i.e.

for line in d:
field1 = d[0]
field2 = d[1]
# etc.

the problem is that the second method relies on knowing the order of
the fields rather than just their name, and the first method does not
allow line by line iteration.
ideally what i would like is to be able to traverse each line of the
parsed file, and then refer to each of its fields by header name, so
that if the column order in the file changes my program will be
unaffected:

for line in d:
field1 = ['header_name1']
field2 = ['header_name2']

is there a way to do this using standard matplotlib/numpy/scipy
utilities? i could write my own code to do this but it seems like
something somebody probably already thought of a good representation
for and has implemented a more optimized version than i could write on
my own. does such a thing exist?

thanks very much

I have a constructor class to read space-delimited ASCII files.

class NasaFile(object):
   def __init__(self, filename):
            ...
            # Reading data
            _data = np.loadtxt(filename, dtype='float', skiprows=self.NLHEAD).T
            # Read using data['Time'] syntax
            self.data = dict(zip(self.VDESC, _data))
            ...

There is a meta-header in this type of data and NLHEAD is the variable
telling me how many lines to skip to reach the actual data. VDESC
tells me what each columns are (starting with Time variable and many
other different measurement results.)

There is not any column dependence in this case, and generically read
any length specifically formatted data. For instance:

from nasafile import NasaFile

c = NasaFile("mydata")

c.data['Time'] gets me the whole Time column as an ndarray . Why do
you think dictionaries are not sufficient for your case? I was using
locals() to create automatic names but that was not a very wise
approach.

···

On Wed, Nov 11, 2009 at 11:53 AM, per freem <perfreem@...287...> wrote:

_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@...177...
http://mail.scipy.org/mailman/listinfo/numpy-discussion

--
Gökhan