[Matplotlib-users] Reshaping Tabular Data to Column Data

Benson_Randall · March 24, 2020, 10:12pm

Hi –

I seem to be stuck here trying to convert downloaded data that is in tabular form since I need to analyze it in single column format. The data is from here and I’m
using read_csv as shown below –

pnadata1= pd.read_csv(‘https://www.esrl.noaa.gov/psd/data/correlation/pna.data’)

The downloaded data looks like this with the values in months 1-12 across the columns below for each year 1948-present –

1948 2020

0 1948 -99.90 -99.90 -99.90 -99.90 -99.90 -99.9…

1 1949 -99.90 -99.90 -99.90 -99.90 -99.90 -99.9…

2 1950 -3.65 -1.69 -0.06 -0.23 -0.40 -1.9…

3 1951 -1.18 -2.11 -1.09 0.47 1.19 -0.5…

4 1952 -3.19 0.08 -0.69 0.89 -0.79 0.5…

And, I need it to be reshaped to look like this in yyyymm values format so I can compare it easily to other downloaded data in the column format –

194801 -99.90

194802 -99.90

194803 -99.90

…

202001 -3.45

202002 0.87

I’ve tried working with “melt” and “group” in different combinations but cannot figure it
out given the online references. Thank you,

Randall

···

Randall P. Benson, PhD

Wind Asset Meteorologist, Energy Resource - Onshore

1125 NW Couch Street, Suite 700

Portland, Oregon, USA 97209

Telephone 503-796-7129
Cell 971-227-2477
randall.benson@avangrid.com

In the interest of the environment,
please print only if necessary and recycle.

Scott_Lasley_via_Mat · March 25, 2020, 1:28am

Does this do what you want?

# read data from noaa, set index to first column and column names to '00' - '12'
df = pd.read_csv('https://www.esrl.noaa.gov/psd/data/correlation/pna.data’,
                 skiprows=1,
                 skipfooter=3,
                 index_col=[0],
                 header=None,
                 names=['{:02d}'.format(i) for i in range(13)],
                 delim_whitespace=True)
# unstack the dataframe to get a Series with and a multiindex
# consisting of the date and column name
df = df.unstack()
# sort index by [date, column name] to "group" the dates
df = df.sort_index(level=[1,0])
# create a new index using the existing multiindex levels
ndx = [int(''.join([y,c]))
       for y in df.index.levels[1].astype(str)
       for c in df.index.levels[0]]
df.index = ndx
df.head(28)

194801 -99.90
194802 -99.90
194803 -99.90
...
194911 -99.90
194912 -99.90
195001 -3.65
195002 -1.69
195003 -0.06
195004 -0.23

df.tail(12)

202001 -0.95
202002 -0.07
202003 -99.90
202004 -99.90
202005 -99.90
202006 -99.90
202007 -99.90
202008 -99.90
202009 -99.90
202010 -99.90
202011 -99.90
202012 -99.90

Note that the 202001 and 202002 values do not agree with the values in your email message but do agree with the values on the website.

Hth,
Scott

···

On Mar 24, 2020, at 6:12 PM, Benson, Randall <Randall.Benson@avangrid.com> wrote:

Hi –
I seem to be stuck here trying to convert downloaded data that is in tabular form since I need to analyze it in single column format. The data is from here and I’m using read_csv as shown below –

pnadata1= pd.read_csv('https://www.esrl.noaa.gov/psd/data/correlation/pna.data’)

The downloaded data looks like this with the values in months 1-12 across the columns below for each year 1948-present –

1948 2020
0 1948 -99.90 -99.90 -99.90 -99.90 -99.90 -99.9...
1 1949 -99.90 -99.90 -99.90 -99.90 -99.90 -99.9...
2 1950 -3.65 -1.69 -0.06 -0.23 -0.40 -1.9...
3 1951 -1.18 -2.11 -1.09 0.47 1.19 -0.5...
4 1952 -3.19 0.08 -0.69 0.89 -0.79 0.5...

And, I need it to be reshaped to look like this in yyyymm values format so I can compare it easily to other downloaded data in the column format –

194801 -99.90
194802 -99.90
194803 -99.90
...
202001 -3.45
202002 0.87

I’ve tried working with “melt” and “group” in different combinations but cannot figure it out given the online references. Thank you,
Randall

<image001.jpg>
Randall P. Benson, PhD
Wind Asset Meteorologist, Energy Resource - Onshore

1125 NW Couch Street, Suite 700
Portland, Oregon, USA 97209
Telephone 503-796-7129
Cell 971-227-2477
randall.benson@avangrid.com

<image002.png>

In the interest of the environment,
please print only if necessary and recycle.

==============================================================

Please consider the environment before printing this email.

If you have received this message in error, please notify the sender and immediately delete this message and any attachment hereto and/or copy hereof, as such message contains confidential information intended solely for the individual or entity to whom it is addressed. The use or disclosure of such information to third parties is prohibited by law and may give rise to civil or criminal liability.

The views presented in this message are solely those of the author(s) and do not necessarily represent the opinion of Avangrid Renewables, LLC. or any company of its group. Neither Avangrid Renewables, LLC. nor any company of its group guarantees the integrity, security or proper receipt of this message. Likewise, neither Avangrid Renewables, LLC. nor any company of its group accepts any liability whatsoever for any possible damages arising from, or in connection with, data interception, software viruses or manipulation by third parties.

==============================================================

_______________________________________________
Matplotlib-users mailing list
Matplotlib-users@python.org
Matplotlib-users Info Page

_______________________________________________
Matplotlib-users mailing list
Matplotlib-users@python.org
https://mail.python.org/mailman/listinfo/matplotlib-users

Scott_Lasley_via_Mat · March 25, 2020, 2:56am

Here's a version without the unnecessary string manipulations

# read data from noaa, set the index to the first column
df = pd.read_csv('https://www.esrl.noaa.gov/psd/data/correlation/pna.data’,
                 skiprows=1,
                 skipfooter=3,
                 index_col=[0],
                 header=None,
                 engine='python', # c engine doesn't have skipfooter
                 delim_whitespace=True)
# unstack the dataframe to get a Series with a multiindex
# consisting of the column name and date
df = df.unstack()
# create a new index using the existing multiindex levels
ndx = [y * 100 + c
       for (c, y) in df.index.to_flat_index()]
df.index = ndx
df = df.sort_index()

···

On Mar 24, 2020, at 9:28 PM, Scott Lasley via Matplotlib-users <matplotlib-users@python.org> wrote:

Does this do what you want?

# read data from noaa, set index to first column and column names to '00' - '12'
df = pd.read_csv('https://www.esrl.noaa.gov/psd/data/correlation/pna.data’,
                skiprows=1,
                skipfooter=3,
                index_col=[0],
                header=None,
                names=['{:02d}'.format(i) for i in range(13)],
                delim_whitespace=True)
# unstack the dataframe to get a Series with and a multiindex
# consisting of the date and column name
df = df.unstack()
# sort index by [date, column name] to "group" the dates
df = df.sort_index(level=[1,0])
# create a new index using the existing multiindex levels
ndx = [int(''.join([y,c]))
      for y in df.index.levels[1].astype(str)
      for c in df.index.levels[0]]
df.index = ndx
df.head(28)

194801 -99.90
194802 -99.90
194803 -99.90
...
194911 -99.90
194912 -99.90
195001 -3.65
195002 -1.69
195003 -0.06
195004 -0.23

df.tail(12)

202001 -0.95
202002 -0.07
202003 -99.90
202004 -99.90
202005 -99.90
202006 -99.90
202007 -99.90
202008 -99.90
202009 -99.90
202010 -99.90
202011 -99.90
202012 -99.90

Note that the 202001 and 202002 values do not agree with the values in your email message but do agree with the values on the website.

Hth,
Scott

On Mar 24, 2020, at 6:12 PM, Benson, Randall <Randall.Benson@avangrid.com> wrote:

Hi –
I seem to be stuck here trying to convert downloaded data that is in tabular form since I need to analyze it in single column format. The data is from here and I’m using read_csv as shown below –

pnadata1= pd.read_csv('https://www.esrl.noaa.gov/psd/data/correlation/pna.data’)

The downloaded data looks like this with the values in months 1-12 across the columns below for each year 1948-present –

                                   1948 2020
0 1948 -99.90 -99.90 -99.90 -99.90 -99.90 -99.9...
1 1949 -99.90 -99.90 -99.90 -99.90 -99.90 -99.9...
2 1950 -3.65 -1.69 -0.06 -0.23 -0.40 -1.9...
3 1951 -1.18 -2.11 -1.09 0.47 1.19 -0.5...
4 1952 -3.19 0.08 -0.69 0.89 -0.79 0.5...

And, I need it to be reshaped to look like this in yyyymm values format so I can compare it easily to other downloaded data in the column format –

194801 -99.90
194802 -99.90
194803 -99.90
...
202001 -3.45
202002 0.87

I’ve tried working with “melt” and “group” in different combinations but cannot figure it out given the online references. Thank you,
Randall

<image001.jpg>
Randall P. Benson, PhD
Wind Asset Meteorologist, Energy Resource - Onshore

1125 NW Couch Street, Suite 700
Portland, Oregon, USA 97209
Telephone 503-796-7129
Cell 971-227-2477
randall.benson@avangrid.com

<image002.png>

In the interest of the environment,
please print only if necessary and recycle.

==============================================================

Please consider the environment before printing this email.

If you have received this message in error, please notify the sender and immediately delete this message and any attachment hereto and/or copy hereof, as such message contains confidential information intended solely for the individual or entity to whom it is addressed. The use or disclosure of such information to third parties is prohibited by law and may give rise to civil or criminal liability.

The views presented in this message are solely those of the author(s) and do not necessarily represent the opinion of Avangrid Renewables, LLC. or any company of its group. Neither Avangrid Renewables, LLC. nor any company of its group guarantees the integrity, security or proper receipt of this message. Likewise, neither Avangrid Renewables, LLC. nor any company of its group accepts any liability whatsoever for any possible damages arising from, or in connection with, data interception, software viruses or manipulation by third parties.

==============================================================

_______________________________________________
Matplotlib-users mailing list
Matplotlib-users@python.org
Matplotlib-users Info Page

_______________________________________________
Matplotlib-users mailing list
Matplotlib-users@python.org
Matplotlib-users Info Page

_______________________________________________
Matplotlib-users mailing list
Matplotlib-users@python.org
https://mail.python.org/mailman/listinfo/matplotlib-users