I suggest using numpy record arrays for this -- the columns have names
and the data can be of different types. You can save and load them
using pickle (numpy.load and numpy.save will use pickle under the
hood).
If you want to stick with ASCII flat file representation (eg for use
with other programs), in matplotlib svn there are two nice functions
to help here: rec2csv and csv2rec. They support saving numpy record
arrays to CSV files with column names, and loading these back up later
doign type introspection to figure out the types (datetime, str,
float, int).
In [1]: import numpy as n
In [2]: import matplotlib.mlab as mlab
In [3]: x = n.random.rand(20,4)
In [4]: r = n.rec.fromrecords(x, names='age,weight,height,cash')
In [5]: r.dtype
Out[5]: dtype([('age', '<f8'), ('weight', '<f8'), ('height', '<f8'),
('cash', '<f8')])
In [7]: mlab.rec2csv(r, 'mydata.csv')
In [8]: !head mydata.csv
age,weight,height,cash
0.0449935,0.252057,0.316116,0.0635711
0.777189,0.155186,0.0537382,0.233598
0.731376,0.654577,0.977792,0.0171022
0.685975,0.373741,0.714592,0.620079
0.634548,0.956708,0.360962,0.885379
0.431011,0.359094,0.21484,0.961865
0.115155,0.78767,0.352753,0.769402
0.984747,0.720163,0.887608,0.316844
0.0478857,0.813668,0.882535,0.8837
In [9]: newr = mlab.csv2rec('mydata.csv')
In [10]: newr.dtype
Out[10]: dtype([('age', '<f8'), ('weight', '<f8'), ('height', '<f8'),
('cash', '<f8')])
In [11]: newr
Out[11]:
recarray([ (0.044993499999999999, 0.25205699999999998,
0.31611600000000001, 0.063571100000000005),
(0.77718900000000002, 0.15518599999999999, 0.0537382, 0.233598),
(0.73137600000000003, 0.65457699999999996, 0.97779199999999999,
0.017102200000000001),
(0.685975, 0.37374099999999999, 0.714592, 0.62007900000000005),
(0.634548, 0.956708, 0.36096200000000001, 0.88537900000000003),
(0.43101099999999998, 0.35909400000000002, 0.21484, 0.96186499999999997),
(0.11515499999999999, 0.78766999999999998, 0.35275299999999998,
0.76940200000000003),
(0.98474700000000004, 0.720163, 0.88760799999999995,
0.31684400000000001),
(0.047885700000000003, 0.81366799999999995,
0.88253499999999996, 0.88370000000000004),
(0.044475599999999997, 0.89918900000000002,
0.076484499999999997, 0.114994),
(0.75139299999999998, 0.70954300000000003, 0.458505,
0.33839900000000001),
(0.14619299999999999, 0.907717, 0.24915200000000001,
0.67030400000000001),
(0.89663199999999998, 0.61957300000000004,
0.0060039200000000003, 0.048883500000000003),
(0.20794000000000001, 0.56046499999999999,
0.078303899999999996, 0.216032),
(0.28726000000000002, 0.14282500000000001, 0.51740200000000003,
0.553037),
(0.96326999999999996, 0.21327299999999999, 0.72040999999999999,
0.181446),
(0.31984000000000001, 0.39338299999999998, 0.45787899999999998,
0.33919199999999999),
(0.42086200000000001, 0.98801499999999998, 0.53429000000000004,
0.074105699999999997),
(0.104211, 0.15845100000000001, 0.13339200000000001,
0.99228300000000003),
(0.73563299999999998, 0.948407, 0.44708900000000001,
0.79521399999999998)],
dtype=[('age', '<f8'), ('weight', '<f8'), ('height', '<f8'),
('cash', '<f8')])
You can also work with non floating point data
In [14]: url = 'http://ichart.finance.yahoo.com/table.csv?s=GE&d=10&e=20&f=2007&g=d&a=0&b=2&c=1962&ignore=.csv’
In [15]: import urllib
In [16]: urllib.urlretrieve(url, 'ge.csv')
Out[16]: ('ge.csv', <httplib.HTTPMessage instance at 0x8fa7b2c>)
In [17]: r = mlab.csv2rec('ge.csv')
In [18]: !head ge.csv
Date,Open,High,Low,Close,Volume,Adj Close
2007-11-19,38.48,38.51,38.00,38.16,35415000,38.16
2007-11-16,38.50,38.67,37.87,38.65,50181100,38.65
2007-11-15,38.93,38.93,38.13,38.31,41590000,38.31
2007-11-14,39.90,39.95,38.82,39.01,39650800,39.01
2007-11-13,38.50,39.25,38.25,39.21,42053400,39.21
2007-11-12,38.24,39.04,38.17,38.25,36968000,38.25
2007-11-09,38.52,38.75,38.11,38.38,42662200,38.38
2007-11-08,39.20,39.32,37.50,39.02,52970300,39.02
2007-11-07,39.90,39.93,38.99,39.08,46720100,39.08
In [20]: r[:10]
Out[20]:
recarray([ (datetime.datetime(2007, 11, 19, 0, 0), 38.479999999999997,
38.509999999999998, 38.0, 38.159999999999997, 35415000,
38.159999999999997),
(datetime.datetime(2007, 11, 16, 0, 0), 38.5,
38.670000000000002, 37.869999999999997, 38.649999999999999, 50181100,
38.649999999999999),
(datetime.datetime(2007, 11, 15, 0, 0), 38.93, 38.93,
38.130000000000003, 38.310000000000002, 41590000, 38.310000000000002),
(datetime.datetime(2007, 11, 14, 0, 0), 39.899999999999999,
39.950000000000003, 38.82, 39.009999999999998, 39650800,
39.009999999999998),
(datetime.datetime(2007, 11, 13, 0, 0), 38.5, 39.25, 38.25,
39.210000000000001, 42053400, 39.210000000000001),
(datetime.datetime(2007, 11, 12, 0, 0), 38.240000000000002,
39.039999999999999, 38.170000000000002, 38.25, 36968000, 38.25),
(datetime.datetime(2007, 11, 9, 0, 0), 38.520000000000003,
38.75, 38.109999999999999, 38.380000000000003, 42662200,
38.380000000000003),
(datetime.datetime(2007, 11, 8, 0, 0), 39.200000000000003,
39.32, 37.5, 39.020000000000003, 52970300, 39.020000000000003),
(datetime.datetime(2007, 11, 7, 0, 0), 39.899999999999999,
39.93, 38.990000000000002, 39.079999999999998, 46720100,
39.079999999999998),
(datetime.datetime(2007, 11, 6, 0, 0), 40.200000000000003,
40.490000000000002, 39.969999999999999, 40.18, 42131000, 40.18)],
dtype=[('date', '|O4'), ('open', '<f8'), ('high', '<f8'),
('low', '<f8'), ('close', '<f8'), ('volume', '<i4'), ('adj_close',
'<f8')])
···
On Nov 20, 2007 10:43 AM, Jordan Atlas <jca33@...163...> wrote:
Can someone recomend a way to save the data in such a way that the
columns (or rows) are labeled? In otherwords, it would be nice to be
able to open the saved data and know what each row is without having to
refer to the script that created it. (referring to the creating script
feels error prone when you have many rows of data being saved). I'm
currently using the 'pylab.save' function to save the data.