csv2rec column names

Anton_Vasilescu · January 3, 2009, 5:21pm

Hi all,

I have a lot of csv files to process, all of them with the same number of
columns. The only issue is that each file has a unique column name for the
fourth column.

All the csv2rec examples I found are using the r.column_name format to
access the data in that column which is of no use for me because of the
unique names. Is there a way to access that data using the column number? I
bet this should be something simple but I cannot figure it out...

Thanks in advance,
Anton

···

--
View this message in context: http://www.nabble.com/csv2rec-column-names-tp21267055p21267055.html
Sent from the matplotlib - users mailing list archive at Nabble.com.

_Patrick_Marsh1 · January 3, 2009, 5:28pm

I'm not sure what you are needing it for, but I would suggest looking
into numpy's loadtxt function. You can use this to load the csv data
into numpy arrays and pass the resulting arrays arround.

-Patrick

···

On Sat, Jan 3, 2009 at 11:21 AM, antonv <vasilescu_anton@...9...> wrote:

Hi all,

I have a lot of csv files to process, all of them with the same number of
columns. The only issue is that each file has a unique column name for the
fourth column.

All the csv2rec examples I found are using the r.column_name format to
access the data in that column which is of no use for me because of the
unique names. Is there a way to access that data using the column number? I
bet this should be something simple but I cannot figure it out...

Thanks in advance,
Anton
--
View this message in context: http://www.nabble.com/csv2rec-column-names-tp21267055p21267055.html
Sent from the matplotlib - users mailing list archive at Nabble.com.

------------------------------------------------------------------------------
_______________________________________________
Matplotlib-users mailing list
Matplotlib-users@lists.sourceforge.net
matplotlib-users List Signup and Options

Anton_Vasilescu · January 3, 2009, 5:39pm

I am plotting the data in those csv files and the forst 4 columns in the
files have the same title but the 5th has the name based on the date and
time so it would be unique in each of the files. As I have about 600 files
to batch process, adjusting my script manually is not an option.

The way I have it for one test file is:

r = mlab.csv2rec('test.csv')
#i know that the column name for the 5th column is 'htsgw_12191800'
#so to read the data in the 5th column i just use:
z = r.htsgw_12191800

What i need is to be able to get that data by specifying the column number
as that stays the same in all files.

I'll look at numpy but I hope there is a simpler way.

Thanks,
Anton

Patrick Marsh-2 wrote:

···

I'm not sure what you are needing it for, but I would suggest looking
into numpy's loadtxt function. You can use this to load the csv data
into numpy arrays and pass the resulting arrays arround.

-Patrick

On Sat, Jan 3, 2009 at 11:21 AM, antonv <vasilescu_anton@...9...> wrote:

Hi all,

I have a lot of csv files to process, all of them with the same number of
columns. The only issue is that each file has a unique column name for
the
fourth column.

All the csv2rec examples I found are using the r.column_name format to
access the data in that column which is of no use for me because of the
unique names. Is there a way to access that data using the column number?
I
bet this should be something simple but I cannot figure it out...

Thanks in advance,
Anton
--
View this message in context:
http://www.nabble.com/csv2rec-column-names-tp21267055p21267055.html
Sent from the matplotlib - users mailing list archive at Nabble.com.

------------------------------------------------------------------------------
_______________________________________________
Matplotlib-users mailing list
Matplotlib-users@lists.sourceforge.net
matplotlib-users List Signup and Options

------------------------------------------------------------------------------
_______________________________________________
Matplotlib-users mailing list
Matplotlib-users@lists.sourceforge.net
matplotlib-users List Signup and Options

--
View this message in context: http://www.nabble.com/csv2rec-column-names-tp21267055p21267232.html
Sent from the matplotlib - users mailing list archive at Nabble.com.

_Patrick_Marsh1 · January 3, 2009, 5:59pm

In my limited opinion, numpy's loadtxt is the way to go. Loadtxt
doesn't care about the headerYou can read in the arrays like this:

# read in all 5 columns as text
col1, col2, col3, col4, col5 = np.loadtxt(filename, dtype=dtype, unpack=True)

or if you want to skip the column headings and read in just a specific
data type of just the last column

# read in only column 5, as a specific dtype, and exclude the column 5 heading
col5_no_header = np.loadtxt(filename, skiprows=1, usecols=(5),
dtype=dtype, unpack=True)

-Patrick

···

On Sat, Jan 3, 2009 at 11:39 AM, antonv <vasilescu_anton@...9...> wrote:

I am plotting the data in those csv files and the forst 4 columns in the
files have the same title but the 5th has the name based on the date and
time so it would be unique in each of the files. As I have about 600 files
to batch process, adjusting my script manually is not an option.

The way I have it for one test file is:

r = mlab.csv2rec('test.csv')
#i know that the column name for the 5th column is 'htsgw_12191800'
#so to read the data in the 5th column i just use:
z = r.htsgw_12191800

What i need is to be able to get that data by specifying the column number
as that stays the same in all files.

I'll look at numpy but I hope there is a simpler way.

Thanks,
Anton

Patrick Marsh-2 wrote:

I'm not sure what you are needing it for, but I would suggest looking
into numpy's loadtxt function. You can use this to load the csv data
into numpy arrays and pass the resulting arrays arround.

-Patrick

On Sat, Jan 3, 2009 at 11:21 AM, antonv <vasilescu_anton@...9...> wrote:

Hi all,

I have a lot of csv files to process, all of them with the same number of
columns. The only issue is that each file has a unique column name for
the
fourth column.

All the csv2rec examples I found are using the r.column_name format to
access the data in that column which is of no use for me because of the
unique names. Is there a way to access that data using the column number?
I
bet this should be something simple but I cannot figure it out...

Thanks in advance,
Anton
--
View this message in context:
http://www.nabble.com/csv2rec-column-names-tp21267055p21267055.html
Sent from the matplotlib - users mailing list archive at Nabble.com.

------------------------------------------------------------------------------
_______________________________________________
Matplotlib-users mailing list
Matplotlib-users@lists.sourceforge.net
matplotlib-users List Signup and Options

------------------------------------------------------------------------------
_______________________________________________
Matplotlib-users mailing list
Matplotlib-users@lists.sourceforge.net
matplotlib-users List Signup and Options

--
View this message in context: http://www.nabble.com/csv2rec-column-names-tp21267055p21267232.html
Sent from the matplotlib - users mailing list archive at Nabble.com.

------------------------------------------------------------------------------
_______________________________________________
Matplotlib-users mailing list
Matplotlib-users@lists.sourceforge.net
matplotlib-users List Signup and Options

Anton_Vasilescu · January 3, 2009, 6:05pm

You're right! I read more about recarrays and they were built specially for
being called by the column name, so I shouldn't have used csv2rec from the
start!

Thanks for the quick responses!
Anton

Patrick Marsh-2 wrote:

···

In my limited opinion, numpy's loadtxt is the way to go. Loadtxt
doesn't care about the headerYou can read in the arrays like this:

# read in all 5 columns as text
col1, col2, col3, col4, col5 = np.loadtxt(filename, dtype=dtype,
unpack=True)

or if you want to skip the column headings and read in just a specific
data type of just the last column

# read in only column 5, as a specific dtype, and exclude the column 5
heading
col5_no_header = np.loadtxt(filename, skiprows=1, usecols=(5),
dtype=dtype, unpack=True)

-Patrick

On Sat, Jan 3, 2009 at 11:39 AM, antonv <vasilescu_anton@...9...> wrote:

I am plotting the data in those csv files and the forst 4 columns in the
files have the same title but the 5th has the name based on the date and
time so it would be unique in each of the files. As I have about 600
files
to batch process, adjusting my script manually is not an option.

The way I have it for one test file is:

r = mlab.csv2rec('test.csv')
#i know that the column name for the 5th column is 'htsgw_12191800'
#so to read the data in the 5th column i just use:
z = r.htsgw_12191800

What i need is to be able to get that data by specifying the column
number
as that stays the same in all files.

I'll look at numpy but I hope there is a simpler way.

Thanks,
Anton

Patrick Marsh-2 wrote:

I'm not sure what you are needing it for, but I would suggest looking
into numpy's loadtxt function. You can use this to load the csv data
into numpy arrays and pass the resulting arrays arround.

-Patrick

On Sat, Jan 3, 2009 at 11:21 AM, antonv <vasilescu_anton@...9...> >>> wrote:

Hi all,

I have a lot of csv files to process, all of them with the same number
of
columns. The only issue is that each file has a unique column name for
the
fourth column.

All the csv2rec examples I found are using the r.column_name format to
access the data in that column which is of no use for me because of the
unique names. Is there a way to access that data using the column
number?
I
bet this should be something simple but I cannot figure it out...

Thanks in advance,
Anton
--
View this message in context:
http://www.nabble.com/csv2rec-column-names-tp21267055p21267055.html
Sent from the matplotlib - users mailing list archive at Nabble.com.

------------------------------------------------------------------------------
_______________________________________________
Matplotlib-users mailing list
Matplotlib-users@lists.sourceforge.net
matplotlib-users List Signup and Options

------------------------------------------------------------------------------
_______________________________________________
Matplotlib-users mailing list
Matplotlib-users@lists.sourceforge.net
matplotlib-users List Signup and Options

--
View this message in context:
http://www.nabble.com/csv2rec-column-names-tp21267055p21267232.html
Sent from the matplotlib - users mailing list archive at Nabble.com.

------------------------------------------------------------------------------
_______________________________________________
Matplotlib-users mailing list
Matplotlib-users@lists.sourceforge.net
matplotlib-users List Signup and Options

------------------------------------------------------------------------------
_______________________________________________
Matplotlib-users mailing list
Matplotlib-users@lists.sourceforge.net
matplotlib-users List Signup and Options

--
View this message in context: http://www.nabble.com/csv2rec-column-names-tp21267055p21267490.html
Sent from the matplotlib - users mailing list archive at Nabble.com.

PGM · January 3, 2009, 8:31pm

FYI, I recoded np.loadtxt to handle missing data, automatic name definition and conversion functions, as a merge of np.loadtxt and mlab.csv2rec. You can access the code here:

Hopefully these functions will make it to numpy at one point or another.

Note also that you are not limited to recarrays: you can use what's called a flexible-type arrays, which still gives the possibility to access individual fields by keys, without the overload of recarrays (where fields can also be accessed as attributes). For example:
>>> x=np.array([(1,10.), (2,20.)], dtype=[('A',int),('B',float)])
>>>x['A']
array([1, 2])

···

On Jan 3, 2009, at 12:59 PM, Patrick Marsh wrote:

In my limited opinion, numpy's loadtxt is the way to go. Loadtxt
doesn't care about the headerYou can read in the arrays like this:

# read in all 5 columns as text
col1, col2, col3, col4, col5 = np.loadtxt(filename, dtype=dtype, unpack=True)

or if you want to skip the column headings and read in just a specific
data type of just the last column

# read in only column 5, as a specific dtype, and exclude the column 5 heading
col5_no_header = np.loadtxt(filename, skiprows=1, usecols=(5),
dtype=dtype, unpack=True)

-Patrick

On Sat, Jan 3, 2009 at 11:39 AM, antonv <vasilescu_anton@...9...> > wrote:

I am plotting the data in those csv files and the forst 4 columns in the
files have the same title but the 5th has the name based on the date and
time so it would be unique in each of the files. As I have about 600 files
to batch process, adjusting my script manually is not an option.

The way I have it for one test file is:

r = mlab.csv2rec('test.csv')
#i know that the column name for the 5th column is 'htsgw_12191800'
#so to read the data in the 5th column i just use:
z = r.htsgw_12191800

What i need is to be able to get that data by specifying the column number
as that stays the same in all files.

I'll look at numpy but I hope there is a simpler way.

Thanks,
Anton

Patrick Marsh-2 wrote:

I'm not sure what you are needing it for, but I would suggest looking
into numpy's loadtxt function. You can use this to load the csv data
into numpy arrays and pass the resulting arrays arround.

-Patrick

On Sat, Jan 3, 2009 at 11:21 AM, antonv >>> <vasilescu_anton@...9...> wrote:

Hi all,

I have a lot of csv files to process, all of them with the same number of
columns. The only issue is that each file has a unique column name for
the
fourth column.

All the csv2rec examples I found are using the r.column_name format to
access the data in that column which is of no use for me because of the
unique names. Is there a way to access that data using the column number?
I
bet this should be something simple but I cannot figure it out...

Thanks in advance,
Anton
--
View this message in context:
http://www.nabble.com/csv2rec-column-names-tp21267055p21267055.html
Sent from the matplotlib - users mailing list archive at Nabble.com.

------------------------------------------------------------------------------
_______________________________________________
Matplotlib-users mailing list
Matplotlib-users@lists.sourceforge.net
matplotlib-users List Signup and Options

------------------------------------------------------------------------------
_______________________________________________
Matplotlib-users mailing list
Matplotlib-users@lists.sourceforge.net
matplotlib-users List Signup and Options

--
View this message in context: http://www.nabble.com/csv2rec-column-names-tp21267055p21267232.html
Sent from the matplotlib - users mailing list archive at Nabble.com.

------------------------------------------------------------------------------
_______________________________________________
Matplotlib-users mailing list
Matplotlib-users@lists.sourceforge.net
matplotlib-users List Signup and Options

------------------------------------------------------------------------------
_______________________________________________
Matplotlib-users mailing list
Matplotlib-users@lists.sourceforge.net
matplotlib-users List Signup and Options

Gaius_Hammond · January 3, 2009, 9:30pm

Hi all,

Does anyone know if it's possible to make the polar plot look like a 12- or 24-hr clockface? I.e. 0 (or 12) at the top rather than the right, and labelled in 12ths (or 24ths) instead of degrees?

Thanks,

G

_Ryan_May1 · January 4, 2009, 1:00am

Pierre GM wrote:

Note also that you are not limited to recarrays: you can use what's
called a flexible-type arrays, which still gives the possibility to
access individual fields by keys, without the overload of recarrays
(where fields can also be accessed as attributes). For example:
>>> x=np.array([(1,10.), (2,20.)], dtype=[('A',int),('B',float)])
>>>x['A']
array([1, 2])

True, but the problem in this case is that he wants to access by column number,
which you can't really do with recarray or flexible dtype arrays.

Ryan

···

--
Ryan May
Graduate Research Assistant
School of Meteorology
University of Oklahoma

Thomas_I-N · February 3, 2009, 1:23pm

Hello Anton,

I just had the same problem and came up with the following solution:

a = csv2rec(fname) # read a csv file into a
a[a.dtype.names[5]] # access column 6 (index 5) in the file

As a shorthand you could assign the column names to another field in the
recarray:

a.cols = a.dtype.names
a[a.cols[5]] # access column 6 (index 5) in the file

Hope this helps, even though it may not be good coding practice. I am a
novice myself...

Best regards,

Thomas

antonv wrote:

···

Hi all,

I have a lot of csv files to process, all of them with the same number of
columns. The only issue is that each file has a unique column name for the
fourth column.

All the csv2rec examples I found are using the r.column_name format to
access the data in that column which is of no use for me because of the
unique names. Is there a way to access that data using the column number?
I bet this should be something simple but I cannot figure it out...

Thanks in advance,
Anton

--
View this message in context: http://www.nabble.com/csv2rec-column-names-tp21267055p21809832.html
Sent from the matplotlib - users mailing list archive at Nabble.com.