Newbie problem using csv2rec

Hi folks,

I have a (newbie) problem using csv2rec. I am a regular python user
but this is my first time using matplotlib and numpy after being
inspired by attending a talk by Dr. John Hunter.

I am trying to read a csv file that has >6000 lines that look like this:

<code>
8/17/2009,4:49:52
PM,CVAgent,Information,5,537,N/A,THP-PR-APVL,F:20090210::7881558:3893255:311247:166422::,Completed..
8/17/2009,4:49:52
PM,CVAgent,Information,3,537,N/A,THP-PR-APVL,F:20090210::7881558:3888955:311247:166422::,From
Disk..
8/17/2009,4:49:51 PM,CVAgent,Warning,8,556,N/A,THP-PR-APVL,Exception
in CVProcess.GetNewfile: The process cannot access the file because it
is being used by another process..,
8/17/2009,4:49:51 PM,CVAgent,Information,2,447,N/A,THP-PR-APVL,SDAY ->
R:20090210:::3893955:311247:166422::20090210:::3893955:388247:166422::50:,.
8/17/2009,4:29:55
PM,CVAgent,Information,5,537,N/A,THP-PR-APVL,F:20090728::7881558:4888461:22088980:964878::,Completed..
8/17/2009,4:29:55
PM,CVAgent,Information,3,537,N/A,THP-PR-APVL,F:20090728::7881558:4888461:22030980:964878::,From
Disk..
8/17/2009,4:29:54 PM,CVAgent,Information,2,3045,N/A,THP-PR-APVL,JJULIO
-> R:20090728:::4888461:22030980:964878::20090728:::4888461:22030980:964878::50:,.
8/17/2009,4:24:02
PM,CVAgent,Information,5,537,N/A,THP-PR-APVL,F:20090226::7881558:2882501:325032:316888::,Completed..
8/17/2009,4:24:02
PM,CVAgent,Information,3,537,N/A,THP-PR-APVL,F:20090226::7881558:8822501:325882:318816::,From
Disk..
8/17/2009,4:23:56 PM,CVAgent,Information,2,556,N/A,THP-PR-APVL,tdietz
-> R:20090226::::325882:318816::20090226::::325882:318816::50:,.
8/17/2009,4:21:41 PM,CVAgent,Information,2,3045,N/A,THP-PR-APVL,tdietz
-> R:20090226::::325882:318816::20090226::::325032:318816::50:,.
8/17/2009,4:19:44
PM,CVAgent,Information,5,537,N/A,THP-PR-APVL,F:20090210::7881558:2882613:278887:4020000::,Completed..
8/17/2009,4:19:43
PM,CVAgent,Information,3,537,N/A,THP-PR-APVL,F:20090210::7881558:2882613:278777:4020000::,From
Disk..
8/17/2009,4:19:42 PM,CVAgent,Information,2,793,N/A,THP-PR-APVL,MUTSCH
-> R:20090210:::2882613:278887:4020000::20090210:::2882613:278887:4020000::50:,.
8/17/2009,4:11:02
PM,CVAgent,Information,5,793,N/A,THP-PR-APVL,F:20090817::7881558:1776517:1211:58800::,Completed..
8/17/2009,4:49:52
PM,CVAgent,Information,5,537,N/A,THP-PR-APVL,F:20090210::7881558:3893255:311247:166422::,Completed..
</code>

I have given the columns names since there is not a header line:
<code>
In [150]: print names
('date', 'time', 'program', 'level', 'error_id', 'thread', 'na',
'machine', 'request', 'detail')
</code>

and I have provided convert functions to be sure the data is read correctly:
<code>
In [152]: print converterd
{'thread': <type 'int'>, 'level': <type 'str'>, 'na': <type 'str'>,
'request': <type 'str'>, 'detail': <type 'str'>, 'machine': <type
'str'>, 'program': <type 'str'>, 'time': <function str2time at
0x03795530>, 'date': <function str2date at
0x037950B0>}
</code>

(I'm not sure if this is needed. IPython seems to recognize csv2rec
just fine but the sample program does an import like this.)
<code>
In [141]: import matplotlib.mlab as mlab
</code>

So now I call csv2rec on my file. It takes a second or so to gulp it
all in and then returns without error.
<code>
In [142]: r=mlab.csv2rec(filename,converterd=converterd,names=names)
</code>

So now I look to see what I have. And it's nothing like I thought it
would be. I expected thousands of records and I have 10. I expected
times and dates, ints and strings. And all I have are masked values.
<code>
In [143]: r
Out[143]:
masked_records(
date : [-- -- -- -- -- -- -- -- -- --]
time : [-- -- -- -- -- -- -- -- -- --]
program : [-- -- -- -- -- -- -- -- -- --]
level : [-- -- -- -- -- -- -- -- -- --]
error_id : [-- -- -- -- -- -- -- -- -- --]
thread : [-- -- -- -- -- -- -- -- -- --]
na : [-- -- -- -- -- -- -- -- -- --]
machine : [-- -- -- -- -- -- -- -- -- --]
request : [-- -- -- -- -- -- -- -- -- --]
detail : [-- -- -- -- -- -- -- -- -- --]
fill_value : ('?', '?', '?', '?', '?', '?', '?', '?', '?', '?')
)
</code>

So I look at the mask. I see no clues here.
<code>
In [144]: r.mask
Out[144]:
array([(True, True, True, True, True, True, True, True, True, True),
(True, True, True, True, True, True, True, True, True, True),
(True, True, True, True, True, True, True, True, True, True),
(True, True, True, True, True, True, True, True, True, True),
(True, True, True, True, True, True, True, True, True, True),
(True, True, True, True, True, True, True, True, True, True),
(True, True, True, True, True, True, True, True, True, True),
(True, True, True, True, True, True, True, True, True, True),
(True, True, True, True, True, True, True, True, True, True),
(True, True, True, True, True, True, True, True, True, True)],
dtype=[('date', '|b1'), ('time', '|b1'), ('program', '|b1'),
('level', '|b1'), ('error_id', '|b1'), ('thread', '|b1'), ('na',
'|b1'), ('machine', '|b1'),
('request', '|b1'), ('detail', '|b1')])
</code>

Well, maybe if I change the mask I can see what is being hidden.
<code>
In [145]: r.mask[0]
Out[145]: (True, True, True, True, True, True, True, True, True, True)

In [146]: r.mask[0]=(False,)*10

In [147]: r
Out[147]:
masked_records(
date : [2009-08-17 -- -- -- -- -- -- -- -- --]
time : [2009-08-17 -- -- -- -- -- -- -- -- --]
program : [2009-08-17 -- -- -- -- -- -- -- -- --]
level : [2009-08-17 -- -- -- -- -- -- -- -- --]
error_id : [2009-08-17 -- -- -- -- -- -- -- -- --]
thread : [2009-08-17 -- -- -- -- -- -- -- -- --]
na : [2009-08-17 -- -- -- -- -- -- -- -- --]
machine : [2009-08-17 -- -- -- -- -- -- -- -- --]
request : [2009-08-17 -- -- -- -- -- -- -- -- --]
detail : [2009-08-17 -- -- -- -- -- -- -- -- --]
fill_value : ('?', '?', '?', '?', '?', '?', '?', '?', '?', '?')
)
</code>

So I think I see what is going on. Rather than taking each line of
the input file as a record it is taking each column as a record.
Since I said there are ten values per record it stopped after ten rows
since that is all the columns it had to fill in.

Now you know my problem.

How do I get csv2rec to read my file so I can start getting nice
histograms of counts per day?

A further question is why am I getting masked records at all and how
do I control this? I don't see anything in the numpy or matplotlib
user guides that answer this. I did find a helpful document on the
web (http://www.bom.gov.au/bmrc/climdyn/staff/lih/pubs/docs/masks.pdf)
that explained what masks are
and why and how they can be used. I don't need them and would like to
make sure that nothing is masked.

Thanks in advance for helping a newbie over the hump.

Phil Robare

Are you able to post your file somewhere -- publicly if possible (eg
attach it to your response) but offlist if it contains info you would
not like to disseminate widely?

JDH

···

On Fri, Aug 21, 2009 at 11:27 AM, Phil Robare<verisimilidude@...287...> wrote:

I have a (newbie) problem using csv2rec. I am a regular python user
but this is my first time using matplotlib and numpy after being
inspired by attending a talk by Dr. John Hunter.

I am trying to read a csv file that has >6000 lines that look like this:

The sixteen lines of data you sent work in a little histogram-generator for me, ignoring the masking (as a nearly-newbie, I can say that ignoring the stuff I don't yet care about usually works):

from matplotlib.mlab import csv2rec, csv
import pylab as p
import numpy as n
names = ('date', 'time', 'program', 'level', 'error_id', 'thread', 'na', 'machine', 'request', 'detail')
r = csv2rec("/Users/clew/Documents/pycode/test.csv", names = names)
print r.shape
print r[3]
for name in names:
     print 'Values of ', name, ':'
     print r[name]

for row in r:
     if row['thread'] == 537: print row

print type(r['thread'])

n, bins, patches = p.hist(r['thread'])
print n,bins,patches
p.savefig('csvhistogram')
p.show()

Does this work for you? On the whole file?

&C

Hi folks,

I have a (newbie) problem using csv2rec. I am a regular python user
but this is my first time using matplotlib and numpy after being
inspired by attending a talk by Dr. John Hunter.

I am trying to read a csv file that has >6000 lines that look like this:

<code>
8/17/2009,4:49:52
PM,CVAgent,Information,5,537,N/A,THP-PR-APVL,F:20090210::7881558:3893255:311247:166422::,Completed..
8/17/2009,4:49:52
PM,CVAgent,Information,3,537,N/A,THP-PR-APVL,F:20090210::7881558:3888955:311247:166422::,From
Disk..
8/17/2009,4:49:51 PM,CVAgent,Warning,8,556,N/A,THP-PR-APVL,Exception
in CVProcess.GetNewfile: The process cannot access the file because it
is being used by another process..,
8/17/2009,4:49:51 PM,CVAgent,Information,2,447,N/A,THP-PR-APVL,SDAY ->
R:20090210:::3893955:311247:166422::20090210:::3893955:388247:166422::50:,.
8/17/2009,4:29:55
PM,CVAgent,Information,5,537,N/A,THP-PR-APVL,F:20090728::7881558:4888461:22088980:964878::,Completed..
8/17/2009,4:29:55
PM,CVAgent,Information,3,537,N/A,THP-PR-APVL,F:20090728::7881558:4888461:22030980:964878::,From
Disk..
8/17/2009,4:29:54 PM,CVAgent,Information,2,3045,N/A,THP-PR-APVL,JJULIO
-> R:20090728:::4888461:22030980:964878::20090728:::4888461:22030980:964878::50:,.
8/17/2009,4:24:02
PM,CVAgent,Information,5,537,N/A,THP-PR-APVL,F:20090226::7881558:2882501:325032:316888::,Completed..
8/17/2009,4:24:02
PM,CVAgent,Information,3,537,N/A,THP-PR-APVL,F:20090226::7881558:8822501:325882:318816::,From
Disk..
8/17/2009,4:23:56 PM,CVAgent,Information,2,556,N/A,THP-PR-APVL,tdietz
-> R:20090226::::325882:318816::20090226::::325882:318816::50:,.
8/17/2009,4:21:41 PM,CVAgent,Information,2,3045,N/A,THP-PR-APVL,tdietz
-> R:20090226::::325882:318816::20090226::::325032:318816::50:,.
8/17/2009,4:19:44
PM,CVAgent,Information,5,537,N/A,THP-PR-APVL,F:20090210::7881558:2882613:278887:4020000::,Completed..
8/17/2009,4:19:43
PM,CVAgent,Information,3,537,N/A,THP-PR-APVL,F:20090210::7881558:2882613:278777:4020000::,From
Disk..
8/17/2009,4:19:42 PM,CVAgent,Information,2,793,N/A,THP-PR-APVL,MUTSCH
-> R:20090210:::2882613:278887:4020000::20090210:::2882613:278887:4020000::50:,.
8/17/2009,4:11:02
PM,CVAgent,Information,5,793,N/A,THP-PR-APVL,F:20090817::7881558:1776517:1211:58800::,Completed..
8/17/2009,4:49:52
PM,CVAgent,Information,5,537,N/A,THP-PR-APVL,F:20090210::7881558:3893255:311247:166422::,Completed..
</code>

I have given the columns names since there is not a header line:
<code>
In [150]: print names
('date', 'time', 'program', 'level', 'error_id', 'thread', 'na',
'machine', 'request', 'detail')
</code>

and I have provided convert functions to be sure the data is read correctly:
<code>
In [152]: print converterd
{'thread': <type 'int'>, 'level': <type 'str'>, 'na': <type 'str'>,
'request': <type 'str'>, 'detail': <type 'str'>, 'machine': <type
'str'>, 'program': <type 'str'>, 'time': <function str2time at
0x03795530>, 'date': <function str2date at
0x037950B0>}
</code>

(I'm not sure if this is needed. IPython seems to recognize csv2rec
just fine but the sample program does an import like this.)
<code>
In [141]: import matplotlib.mlab as mlab
</code>

So now I call csv2rec on my file. It takes a second or so to gulp it
all in and then returns without error.
<code>
In [142]: r=mlab.csv2rec(filename,converterd=converterd,names=names)
</code>

So now I look to see what I have. And it's nothing like I thought it
would be. I expected thousands of records and I have 10. I expected
times and dates, ints and strings. And all I have are masked values.
<code>
In [143]: r
Out[143]:
masked_records(
       date : [-- -- -- -- -- -- -- -- -- --]
       time : [-- -- -- -- -- -- -- -- -- --]
    program : [-- -- -- -- -- -- -- -- -- --]
      level : [-- -- -- -- -- -- -- -- -- --]
   error_id : [-- -- -- -- -- -- -- -- -- --]
     thread : [-- -- -- -- -- -- -- -- -- --]
         na : [-- -- -- -- -- -- -- -- -- --]
    machine : [-- -- -- -- -- -- -- -- -- --]
    request : [-- -- -- -- -- -- -- -- -- --]
     detail : [-- -- -- -- -- -- -- -- -- --]
   fill_value : ('?', '?', '?', '?', '?', '?', '?', '?', '?', '?')
             )
</code>

So I look at the mask. I see no clues here.
<code>
In [144]: r.mask
Out[144]:
array([(True, True, True, True, True, True, True, True, True, True),
      (True, True, True, True, True, True, True, True, True, True),
      (True, True, True, True, True, True, True, True, True, True)],
     dtype=[('date', '|b1'), ('time', '|b1'), ('program', '|b1'),
('level', '|b1'), ('error_id', '|b1'), ('thread', '|b1'), ('na',
'|b1'), ('machine', '|b1'),
('request', '|b1'), ('detail', '|b1')])
</code>

Well, maybe if I change the mask I can see what is being hidden.
<code>
In [145]: r.mask[0]
Out[145]: (True, True, True, True, True, True, True, True, True, True)

In [146]: r.mask[0]=(False,)*10

In [147]: r
Out[147]:
masked_records(
       date : [2009-08-17 -- -- -- -- -- -- -- -- --]
       time : [2009-08-17 -- -- -- -- -- -- -- -- --]
    program : [2009-08-17 -- -- -- -- -- -- -- -- --]
      level : [2009-08-17 -- -- -- -- -- -- -- -- --]
   error_id : [2009-08-17 -- -- -- -- -- -- -- -- --]
     thread : [2009-08-17 -- -- -- -- -- -- -- -- --]
         na : [2009-08-17 -- -- -- -- -- -- -- -- --]
    machine : [2009-08-17 -- -- -- -- -- -- -- -- --]
    request : [2009-08-17 -- -- -- -- -- -- -- -- --]
     detail : [2009-08-17 -- -- -- -- -- -- -- -- --]
   fill_value : ('?', '?', '?', '?', '?', '?', '?', '?', '?', '?')
             )
</code>

So I think I see what is going on. Rather than taking each line of
the input file as a record it is taking each column as a record.
Since I said there are ten values per record it stopped after ten rows
since that is all the columns it had to fill in.

Now you know my problem.

How do I get csv2rec to read my file so I can start getting nice
histograms of counts per day?

A further question is why am I getting masked records at all and how
do I control this? I don't see anything in the numpy or matplotlib
user guides that answer this. I did find a helpful document on the
web (http://www.bom.gov.au/bmrc/climdyn/staff/lih/pubs/docs/masks.pdf)
that explained what masks are
and why and how they can be used. I don't need them and would like to
make sure that nothing is masked.

Thanks in advance for helping a newbie over the hump.

Phil Robare

------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day
trial. Simplify your report design, integration and deployment - and focus on
what you do best, core application coding. Discover what's new with
Crystal Reports now. http://p.sf.net/sfu/bobj-july
_______________________________________________
Matplotlib-users mailing list
Matplotlib-users@lists.sourceforge.net
matplotlib-users List Signup and Options

Chloe Lewis
Graduate student, Amundson Lab
Division of Ecosystem Sciences, ESPM
University of California, Berkeley
137 Mulford Hall - #3114
Berkeley, CA 94720-3114
chlewis@...2456...

···

On Aug 21, 2009, at 9:27 AM, Phil Robare wrote: