Loding CSV file and plotting histogram of a particular column

AR12 · February 28, 2014, 7:06pm

Hi,

I have a csv file where head -5 looks like this:

A B C
100 0.45 0.3
67 0.25 0.4
50.6 0.2 0.6
56.4 0.4 0.3

The columns are tab separated. I want to load this CSV file and plot the
histogram of the third or second column. I was able to load the csv file
using this:
data=csv2rec('Downloads/Sample.txt',delimiter='\t',skiprows=0)
The file has 2792 rows including the top header row.

When I do

···

data['A'] I get this error:

---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-19-856828b8eaa3> in <module>()
----> 1 data['A']

/Library/Python/2.7/site-packages/numpy-1.9.0.dev_297f54b-py2.7-macosx-10.9-intel.egg/numpy/core/records.pyc
in __getitem__(self, indx)
    457
    458 def __getitem__(self, indx):
--> 459 obj = ndarray.__getitem__(self, indx)
    460 if (isinstance(obj, ndarray) and obj.dtype.isbuiltin):
    461 return obj.view(ndarray)

ValueError: field named A not found

First is data['A'] supposed to read the whole A column? Once I read the
column I want to be able to plot it. Can I simply do

hist(data['A'],bins=30) or something like that.

Many thanks,
AR

--
View this message in context: http://matplotlib.1069221.n5.nabble.com/Loding-CSV-file-and-plotting-histogram-of-a-particular-column-tp42938.html
Sent from the matplotlib - users mailing list archive at Nabble.com.

_Paul_Hobson1 · February 28, 2014, 8:42pm

Sounds like you want to use pandas, not numpy.

import pandas

import matplotlib.pyplot as plt

df = pandas.read_csv(‘myfile.txt’, sep=‘\t’)

plt.hist(data[‘A’], bins=30)

…should do it for you.

···

On Fri, Feb 28, 2014 at 11:06 AM, AR12 <aarthi.reddy@…287…> wrote:

Hi,

I have a csv file where head -5 looks like this:

A B C

100 0.45 0.3

67 0.25 0.4

50.6 0.2 0.6

56.4 0.4 0.3

The columns are tab separated. I want to load this CSV file and plot the

histogram of the third or second column. I was able to load the csv file

using this:

data=csv2rec(‘Downloads/Sample.txt’,delimiter=‘\t’,skiprows=0)

The file has 2792 rows including the top header row.

When I do

data[‘A’] I get this error:

ValueError Traceback (most recent call last)

in ()

----> 1 data[‘A’]

/Library/Python/2.7/site-packages/numpy-1.9.0.dev_297f54b-py2.7-macosx-10.9-intel.egg/numpy/core/records.pyc

in getitem(self, indx)
457

458     def __getitem__(self, indx):
→ 459 obj = ndarray.getitem(self, indx)
460         if (isinstance(obj, ndarray) and obj.dtype.isbuiltin):

461             return obj.view(ndarray)
ValueError: field named A not found

First is data[‘A’] supposed to read the whole A column? Once I read the

column I want to be able to plot it. Can I simply do

hist(data[‘A’],bins=30) or something like that.

Many thanks,

AR

–

View this message in context: http://matplotlib.1069221.n5.nabble.com/Loding-CSV-file-and-plotting-histogram-of-a-particular-column-tp42938.html

Sent from the matplotlib - users mailing list archive at Nabble.com.

Flow-based real-time traffic analytics software. Cisco certified tool.

Monitor traffic, SLAs, QoS, Medianet, WAAS etc. with NetFlow Analyzer

Customize your own dashboards, set traffic alerts and generate reports.

Network behavioral analysis & security monitoring. All-in-one tool.

http://pubads.g.doubleclick.net/gampad/clk?id=126839071&iu=/4140/ostg.clktrk

Matplotlib-users mailing list

Matplotlib-users@lists.sourceforge.net

https://lists.sourceforge.net/lists/listinfo/matplotlib-users

Piet_van_Oostrum · February 28, 2014, 8:51pm

AR12 wrote:

> Hi,
>
> I have a csv file where head -5 looks like this:
>
> A B C
> 100 0.45 0.3
> 67 0.25 0.4
> 50.6 0.2 0.6
> 56.4 0.4 0.3
>
> The columns are tab separated. I want to load this CSV file and plot the
> histogram of the third or second column. I was able to load the csv file
> using this:
> data=csv2rec('Downloads/Sample.txt',delimiter='\t',skiprows=0)
> The file has 2792 rows including the top header row.
>
> When I do
> >> data['A'] I get this error:
> ---------------------------------------------------------------------------
> ValueError Traceback (most recent call last)
> <ipython-input-19-856828b8eaa3> in <module>()
> ----> 1 data['A']

numpy.csv2rec lowercases the column names.
"If *names* is *None*, a header row is required to automatically
assign the recarray names. The headers will be lower cased,
spaces will be converted to underscores, and illegal attribute
name characters removed. If *names* is not *None*, it is a
sequence of names to use for the column names. In this case, it
is assumed there is no header row."

So data['a'] should do it.

···

--
Piet van Oostrum <piet@...4462...>
WWW: http://pietvanoostrum.com/
PGP key: [8DAE142BE17999C4]

AR12 · February 28, 2014, 9:02pm

Thanks, this worked for two of the columns. For the third column, I get this error: Sorry to bug you about this. Do you know where I can find the solution to this problem?

···

On Fri, Feb 28, 2014 at 2:42 PM, Paul Hobson-2 [via matplotlib] <[hidden email]> wrote:

Sounds like you want to use pandas, not numpy.

import pandas

import matplotlib.pyplot as plt

df = pandas.read_csv(‘myfile.txt’, sep=‘\t’)

plt.hist(data[‘A’], bins=30)

…should do it for you.

Flow-based real-time traffic analytics software. Cisco certified tool.

Monitor traffic, SLAs, QoS, Medianet, WAAS etc. with NetFlow Analyzer

Customize your own dashboards, set traffic alerts and generate reports.

Network behavioral analysis & security monitoring. All-in-one tool.

http://pubads.g.doubleclick.net/gampad/clk?id=126839071&iu=/4140/ostg.clktrk

Matplotlib-users mailing list

[hidden email]

https://lists.sourceforge.net/lists/listinfo/matplotlib-users

If you reply to this email, your message will be added to the discussion below:

http://matplotlib.1069221.n5.nabble.com/Loding-CSV-file-and-plotting-histogram-of-a-particular-column-tp42938p42942.html

To unsubscribe from Loding CSV file and plotting histogram of a particular column, click here.

  [NAML](http://matplotlib.1069221.n5.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml)

–
Aarthi Reddy
408-603-1385

On Fri, Feb 28, 2014 at 11:06 AM, AR12 <[hidden email]> wrote:

Hi,

I have a csv file where head -5 looks like this:

A B C

100 0.45 0.3

67 0.25 0.4

50.6 0.2 0.6

56.4 0.4 0.3

The columns are tab separated. I want to load this CSV file and plot the

histogram of the third or second column. I was able to load the csv file

using this:

data=csv2rec(‘Downloads/Sample.txt’,delimiter=‘\t’,skiprows=0)

The file has 2792 rows including the top header row.

When I do

data[‘A’] I get this error:

ValueError Traceback (most recent call last)

in ()

----> 1 data[‘A’]

/Library/Python/2.7/site-packages/numpy-1.9.0.dev_297f54b-py2.7-macosx-10.9-intel.egg/numpy/core/records.pyc

in getitem(self, indx)
457

458     def __getitem__(self, indx):
→ 459 obj = ndarray.getitem(self, indx)
460         if (isinstance(obj, ndarray) and obj.dtype.isbuiltin):

461             return obj.view(ndarray)
ValueError: field named A not found

First is data[‘A’] supposed to read the whole A column? Once I read the

column I want to be able to plot it. Can I simply do

hist(data[‘A’],bins=30) or something like that.

Many thanks,

AR

–

View this message in context: http://matplotlib.1069221.n5.nabble.com/Loding-CSV-file-and-plotting-histogram-of-a-particular-column-tp42938.html

Sent from the matplotlib - users mailing list archive at Nabble.com.

Flow-based real-time traffic analytics software. Cisco certified tool.

Monitor traffic, SLAs, QoS, Medianet, WAAS etc. with NetFlow Analyzer

Customize your own dashboards, set traffic alerts and generate reports.

Network behavioral analysis & security monitoring. All-in-one tool.

http://pubads.g.doubleclick.net/gampad/clk?id=126839071&iu=/4140/ostg.clktrk

Matplotlib-users mailing list

[hidden email]

https://lists.sourceforge.net/lists/listinfo/matplotlib-users

_Sterling_P_Smith · February 28, 2014, 9:31pm

You have an uppercase 'Confidence'. Are you using pandas or numpy? For numpy, from Piet's email, you need a lowercase key. What does
`print df['Confidence'].shape`
yield? Because the error looks like you have an array with no size (zero dimensions), so perhaps you are still not reading in your file correctly.

-Sterling

···

On Feb 28, 2014, at 1:02PM, AR12 wrote:

Thanks, this worked for two of the columns. For the third column, I get this error: Sorry to bug you about this. Do you know where I can find the solution to this problem?

---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-10-ae5186552dfe> in <module>()
----> 1 plt.hist(df['Confidence'],bins=10)

/Library/Python/2.7/site-packages/matplotlib-1.4.x-py2.7-macosx-10.9-intel.egg/matplotlib/pyplot.pyc in hist(x, bins, range, normed, weights, cumulative, bottom, histtype, align, orientation, rwidth, log, color, label, stacked, hold, **kwargs)
   2875 histtype=histtype, align=align, orientation=orientation,
   2876 rwidth=rwidth, log=log, color=color, label=label,
-> 2877 stacked=stacked, **kwargs)
   2878 draw_if_interactive()
   2879 finally:

/Library/Python/2.7/site-packages/matplotlib-1.4.x-py2.7-macosx-10.9-intel.egg/matplotlib/axes/_axes.pyc in hist(self, x, bins, range, normed, weights, cumulative, bottom, histtype, align, orientation, rwidth, log, color, label, stacked, **kwargs)
   5477 xmax = -np.inf
   5478 for xi in x:
-> 5479 if len(xi) > 0:
   5480 xmin = min(xmin, xi.min())
   5481 xmax = max(xmax, xi.max())

TypeError: len() of unsized object

On Fri, Feb 28, 2014 at 2:42 PM, Paul Hobson-2 [via matplotlib] <[hidden email]> wrote:
Sounds like you want to use pandas, not numpy.

import pandas
import matplotlib.pyplot as plt
df = pandas.read_csv('myfile.txt', sep='\t')
plt.hist(data['A'], bins=30)

...should do it for you.

On Fri, Feb 28, 2014 at 11:06 AM, AR12 <[hidden email]> wrote:
Hi,

I have a csv file where head -5 looks like this:

A B C
100 0.45 0.3
67 0.25 0.4
50.6 0.2 0.6
56.4 0.4 0.3

The columns are tab separated. I want to load this CSV file and plot the
histogram of the third or second column. I was able to load the csv file
using this:
data=csv2rec('Downloads/Sample.txt',delimiter='\t',skiprows=0)
The file has 2792 rows including the top header row.

When I do
>> data['A'] I get this error:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-19-856828b8eaa3> in <module>()
----> 1 data['A']

/Library/Python/2.7/site-packages/numpy-1.9.0.dev_297f54b-py2.7-macosx-10.9-intel.egg/numpy/core/records.pyc
in __getitem__(self, indx)
    457
    458 def __getitem__(self, indx):
--> 459 obj = ndarray.__getitem__(self, indx)
    460 if (isinstance(obj, ndarray) and obj.dtype.isbuiltin):
    461 return obj.view(ndarray)

ValueError: field named A not found

First is data['A'] supposed to read the whole A column? Once I read the
column I want to be able to plot it. Can I simply do
>> hist(data['A'],bins=30) or something like that.

Many thanks,
AR

--
View this message in context: http://matplotlib.1069221.n5.nabble.com/Loding-CSV-file-and-plotting-histogram-of-a-particular-column-tp42938.html
Sent from the matplotlib - users mailing list archive at Nabble.com.

------------------------------------------------------------------------------
Flow-based real-time traffic analytics software. Cisco certified tool.
Monitor traffic, SLAs, QoS, Medianet, WAAS etc. with NetFlow Analyzer
Customize your own dashboards, set traffic alerts and generate reports.
Network behavioral analysis & security monitoring. All-in-one tool.
http://pubads.g.doubleclick.net/gampad/clk?id=126839071&iu=/4140/ostg.clktrk
_______________________________________________
Matplotlib-users mailing list
[hidden email]
matplotlib-users List Signup and Options

------------------------------------------------------------------------------
Flow-based real-time traffic analytics software. Cisco certified tool.
Monitor traffic, SLAs, QoS, Medianet, WAAS etc. with NetFlow Analyzer
Customize your own dashboards, set traffic alerts and generate reports.
Network behavioral analysis & security monitoring. All-in-one tool.
http://pubads.g.doubleclick.net/gampad/clk?id=126839071&iu=/4140/ostg.clktrk
_______________________________________________
Matplotlib-users mailing list
[hidden email]
matplotlib-users List Signup and Options

If you reply to this email, your message will be added to the discussion below:
http://matplotlib.1069221.n5.nabble.com/Loding-CSV-file-and-plotting-histogram-of-a-particular-column-tp42938p42942.html
To unsubscribe from Loding CSV file and plotting histogram of a particular column, click here.
NAML

--
Aarthi Reddy
408-603-1385

View this message in context: Re: Loding CSV file and plotting histogram of a particular column
Sent from the matplotlib - users mailing list archive at Nabble.com.
------------------------------------------------------------------------------
Flow-based real-time traffic analytics software. Cisco certified tool.
Monitor traffic, SLAs, QoS, Medianet, WAAS etc. with NetFlow Analyzer
Customize your own dashboards, set traffic alerts and generate reports.
Network behavioral analysis & security monitoring. All-in-one tool.
http://pubads.g.doubleclick.net/gampad/clk?id=126839071&iu=/4140/ostg.clktrk_______________________________________________
Matplotlib-users mailing list
Matplotlib-users@lists.sourceforge.net
matplotlib-users List Signup and Options