NOAA .bull file parsing

Dear all,

I know this is not related to matplotlib but this seems to be the only place
where I found people that have knowledge of both NOAA data and python so
please bear with me.

The .bull file that NOAA gives for upload is an ascii file formatted for
human readability but it creates a lot of issues when I am trying to parse
it. Here is a link to one of these files:

ftp://ftpprd.ncep.noaa.gov/pub/data/nccf/com/wave/prod/wave.20090117/bulls.t00z/akw.46001.bull

Do you have any idea on how to extract the data there in columns for
plotting with matplotlib? If you look at the file you'll notice that there
is both a header and a footer for the file that needs to be eliminated and
the main columns have sub columns also. Another issue is that in a column
there is missing data that should keep it's relationship with the time
column. And the last issue, some of the values there are preceded by a "*"
sign that should just be removed too.

Any ideas are greatly appreciated!

Anton

···

--
View this message in context: http://www.nabble.com/NOAA-.bull-file-parsing-tp21513800p21513800.html
Sent from the matplotlib - users mailing list archive at Nabble.com.

Anton,
You may wanna check on the numpy list as well.
I recently reimplemented a function to read text file as a combination of numpy.loadtxt and mlab.csv2rec, that handles missing data nicely. You can get it here for the moment:
https://code.launchpad.net/~pierregm/numpy/numpy_addons
The function you would need is mafromtxt, in fromascii. Alternatively, you can try using the scikits.timeseries package (http://pytseries.sourceforge.net/): recent SVN versions introduced tsfromtxt, that read a text file and return a timeseries.

However, none of these possibilities will work out-of-the-box, because of the presence of the footer. What you could do is write a first function that gets rid of this footer (example of MO: open the file, read all the lines in a list, get rid of the first 7 rows (header) and last 8 ones, store the result in a file). Once you have only the data, use mafromtxt (for example) using space as a delimiter, and specify the columns you want to use with usecols (that way, you can get rid of the column with the '*'). The missing data should be taken into account properly.

Let me know how it goes.
P.

···

On Jan 17, 2009, at 2:16 AM, antonv wrote:

Dear all,

I know this is not related to matplotlib but this seems to be the only place
where I found people that have knowledge of both NOAA data and python so
please bear with me.

The .bull file that NOAA gives for upload is an ascii file formatted for
human readability but it creates a lot of issues when I am trying to parse
it. Here is a link to one of these files:

ftp://ftpprd.ncep.noaa.gov/pub/data/nccf/com/wave/prod/wave.20090117/bulls.t00z/akw.46001.bull

Do you have any idea on how to extract the data there in columns for
plotting with matplotlib? If you look at the file you'll notice that there
is both a header and a footer for the file that needs to be eliminated and
the main columns have sub columns also. Another issue is that in a column
there is missing data that should keep it's relationship with the time
column. And the last issue, some of the values there are preceded by a "*"
sign that should just be removed too.

Any ideas are greatly appreciated!

Anton

--
View this message in context: http://www.nabble.com/NOAA-.bull-file-parsing-tp21513800p21513800.html
Sent from the matplotlib - users mailing list archive at Nabble.com.

------------------------------------------------------------------------------
This SF.net email is sponsored by:
SourcForge Community
SourceForge wants to tell your story.
http://p.sf.net/sfu/sf-spreadtheword
_______________________________________________
Matplotlib-users mailing list
Matplotlib-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/matplotlib-users

Hi Pierre,

Thanks for the quick and thorough response!
What I ended up doing is writing a custom function that does all the stuff
that I needed without using numpy or mlab.

Anton

Pierre GM-2 wrote:

···

Anton,
You may wanna check on the numpy list as well.
I recently reimplemented a function to read text file as a combination
of numpy.loadtxt and mlab.csv2rec, that handles missing data nicely.
You can get it here for the moment:
https://code.launchpad.net/~pierregm/numpy/numpy_addons
The function you would need is mafromtxt, in fromascii. Alternatively,
you can try using the scikits.timeseries package
(http://pytseries.sourceforge.net/
): recent SVN versions introduced tsfromtxt, that read a text file and
return a timeseries.

However, none of these possibilities will work out-of-the-box, because
of the presence of the footer. What you could do is write a first
function that gets rid of this footer (example of MO: open the file,
read all the lines in a list, get rid of the first 7 rows (header) and
last 8 ones, store the result in a file). Once you have only the data,
use mafromtxt (for example) using space as a delimiter, and specify
the columns you want to use with usecols (that way, you can get rid of
the column with the '*'). The missing data should be taken into
account properly.

Let me know how it goes.
P.

On Jan 17, 2009, at 2:16 AM, antonv wrote:

Dear all,

I know this is not related to matplotlib but this seems to be the
only place
where I found people that have knowledge of both NOAA data and
python so
please bear with me.

The .bull file that NOAA gives for upload is an ascii file formatted
for
human readability but it creates a lot of issues when I am trying to
parse
it. Here is a link to one of these files:

ftp://ftpprd.ncep.noaa.gov/pub/data/nccf/com/wave/prod/wave.20090117/bulls.t00z/akw.46001.bull

Do you have any idea on how to extract the data there in columns for
plotting with matplotlib? If you look at the file you'll notice that
there
is both a header and a footer for the file that needs to be
eliminated and
the main columns have sub columns also. Another issue is that in a
column
there is missing data that should keep it's relationship with the time
column. And the last issue, some of the values there are preceded by
a "*"
sign that should just be removed too.

Any ideas are greatly appreciated!

Anton

--
View this message in context:
http://www.nabble.com/NOAA-.bull-file-parsing-tp21513800p21513800.html
Sent from the matplotlib - users mailing list archive at Nabble.com.

------------------------------------------------------------------------------
This SF.net email is sponsored by:
SourcForge Community
SourceForge wants to tell your story.
http://p.sf.net/sfu/sf-spreadtheword
_______________________________________________
Matplotlib-users mailing list
Matplotlib-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/matplotlib-users

------------------------------------------------------------------------------
This SF.net email is sponsored by:
SourcForge Community
SourceForge wants to tell your story.
http://p.sf.net/sfu/sf-spreadtheword
_______________________________________________
Matplotlib-users mailing list
Matplotlib-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/matplotlib-users

--
View this message in context: http://www.nabble.com/NOAA-.bull-file-parsing-tp21513800p21554671.html
Sent from the matplotlib - users mailing list archive at Nabble.com.