Not able to access CSV file:

Hello friends,

I am a newbee to matplotlib and I am trying to plot (scatter plot) some values. The data is quite big and I have them in a CSV file. For a starter I thought I will use loadrec.py example to see if I am able to import the data from the CSV file. The loadrec.py goes like this:

from matplotlib import mlab

from pylab import figure, show

import matplotlib.cbook as cbook

datafile = cbook.get_sample_data(‘msft.csv’, asfileobj=False)

print ‘loading’, datafile

a = mlab.csv2rec(datafile)

a.sort()

print a.dtype

fig = figure()

ax = fig.add_subplot(111)

ax.plot(a.date, a.adj_close, ‘-’)

fig.autofmt_xdate()

I believe, for the CSV file to be accessed, it has to be placed in the sample_data folder (for windows). So I placed my csv file in the sample_data folder and ran the script.

The output was

Traceback (most recent call last):

File “C:\Python26\loadrec.py”, line 5, in

datafile = cbook.get_sample_data(‘ch1.csv’, asfileobj=False)

File “C:\Python26\Lib\site-packages\matplotlib\cbook.py”, line 662, in get_sample_data

return myserver.get_sample_data(fname, asfileobj=asfileobj)

File “C:\Python26\Lib\site-packages\matplotlib\cbook.py”, line 620, in get_sample_data

raise KeyError(msg)

KeyError: 'file ch1.csv not in cache; received HTTP Error 404: Not Found when trying to retrieve’

The data in my CSV file looks like this

0.9963

0

0.499

0.9901

0.0025

0

1

0.0017

1

0.0173

0.9837

If anyone can understand the problem please give me your suggestions. I will be very thankful if any of you can show me exactly how to scatter plot this kind of data.

Karthikraja Velmurugan,

Graduate research assistant,

Dept of Biomedical Informatics,

Arizona State University,

248-421-7394

Hi,

firstly, I do not fully understand why you have chosen such a complicated solution to a rather simple problem. If the data in your file really is like the example then you could simply put the file ‘ch1.csv’ into the same directory as your Python script.

I have slightly modified it (I don’t like the “from” import statements too much) and commented your lines.

#from matplotlib import mlab
#from pylab import figure, show

#import matplotlib.cbook as cbook

import pylab

#datafile = cbook.get_sample_data(‘ch1.csv’, asfileobj=False)

datafile = ‘ch1.csv’

print ‘loading’, datafile

#a = mlab.csv2rec(datafile)

a = pylab.loadtxt(datafile, comments=’#’, delimiter=’;’)

a.sort()

print a.dtype

fig = pylab.figure()

ax = fig.add_subplot(111)

#ax.plot(a.date, a.adj_close, ‘-’)

#fig.autofmt_xdate()

ax.plot(a, ‘o’)

fig.show()

I hope it helps, let me know wether you need a different approach!

2011/5/25 Karthikraja Velmurugan <velmurugan.karthikraja@…287…>

···

Hello friends,

I am a newbee to matplotlib and I am trying to plot (scatter plot) some values. The data is quite big and I have them in a CSV file. For a starter I thought I will use loadrec.py example to see if I am able to import the data from the CSV file. The loadrec.py goes like this:

from matplotlib import mlab

from pylab import figure, show

import matplotlib.cbook as cbook

datafile = cbook.get_sample_data(‘msft.csv’, asfileobj=False)

print ‘loading’, datafile

a = mlab.csv2rec(datafile)

a.sort()

print a.dtype

fig = figure()

ax = fig.add_subplot(111)

ax.plot(a.date, a.adj_close, ‘-’)

fig.autofmt_xdate()

I believe, for the CSV file to be accessed, it has to be placed in the sample_data folder (for windows). So I placed my csv file in the sample_data folder and ran the script.

The output was

Traceback (most recent call last):

** File “C:\Python26\loadrec.py”, line 5, in **

** datafile = cbook.get_sample_data(‘ch1.csv’, asfileobj=False)**

** File “C:\Python26\Lib\site-packages\matplotlib\cbook.py”, line 662, in get_sample_data**

** return myserver.get_sample_data(fname, asfileobj=asfileobj)**

** File “C:\Python26\Lib\site-packages\matplotlib\cbook.py”, line 620, in get_sample_data**

** raise KeyError(msg)**

KeyError: 'file ch1.csv not in cache; received HTTP Error 404: Not Found when trying to retrieve’

The data in my CSV file looks like this

0.9963

0

0.499

0.9901

0.0025

0

1

0.0017

1

0.0173

0.9837

If anyone can understand the problem please give me your suggestions. I will be very thankful if any of you can show me exactly how to scatter plot this kind of data.

Karthikraja Velmurugan,

Graduate research assistant,

Dept of Biomedical Informatics,

Arizona State University,

248-421-7394


vRanger cuts backup time in half-while increasing security.
With the market-leading solution for virtual backup and recovery,

you get blazing-fast, flexible, and affordable data protection.
Download your free trial now.
http://p.sf.net/sfu/quest-d2dcopy1


Matplotlib-users mailing list
Matplotlib-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/matplotlib-users

Hello Daniel,

The code you have given is simple and works fab. Thank you very much. But I wasn’t able to find an example which accesses the columns of a CSV files when I import data through “datafile=“filename.csv”” option. It will be great if you could help with accessing individual columns. What excatly I am looking for is to access individual coulmns (of the same CSV file), do calculations using the two coumns and plot them into seperate subplots of the same graph.

I modified the script a lil bit. Please find it below:

import matplotlib.pyplot as plt
import pylab
datafile1 = ‘ch1_s1_lrr.csv’
datafile2 = 'ch1_s1_baf.csv’

a1 = pylab.loadtxt(datafile1, comments=’#’, delimiter=’;’)
b1 = pylab.loadtxt(datafile2, comments=’#’, delimiter=’;’)

v1 = [0,98760,0,1]
v2 = [0,98760,-2,2]

plt.figure(1)

plt.subplot(4,1,1)
print ‘loading’, datafile1
plt.axis(v2)
plt.plot(a1, ‘r.’)

plt.subplot(4,1,2)
print ‘loading’, datafile2
plt.axis(v1)
plt.plot(b1, ‘b.’)

plt.show()

Thank you very much in advance for your time and suggestions.

Karthik

Hi,

the content of the CSV is stored as an array after reading. You can
simply access rows and columns like in Matlab:

firstrow = a1[0]
firstcol = a1.T[0]

The .T transposes the array.

The second element of the third row would be

elem32 = a1[2][1]
which is equivalent to
elem32 = a1[2,1]

A range of e.g. rows 3 to 6 is
range36 = a1[2:6]

Please have a look here for getting started with scipy/numpy:
http://pages.physics.cornell.edu/~myers/teaching/ComputationalMethods/python/arrays.html
and
http://www.scipy.org/NumPy_for_Matlab_Users

Hope this helps,
Daniel

2011/5/27 Karthikraja Velmurugan <velmurugan.karthikraja@...287...>:

···

Hello Daniel,

The code you have given is simple and works fab. Thank you very much. But I
wasn't able to find an example which accesses the columns of a CSV files
when I import data through "datafile="filename.csv"" option. It will be
great if you could help with accessing individual columns. What excatly I am
looking for is to access individual coulmns (of the same CSV file), do
calculations using the two coumns and plot them into seperate subplots of
the same graph.
I modified the script a lil bit. Please find it below:

import matplotlib.pyplot as plt
import pylab
datafile1 = 'ch1_s1_lrr.csv'
datafile2 = 'ch1_s1_baf.csv'
a1 = pylab.loadtxt(datafile1, comments='#', delimiter=';')
b1 = pylab.loadtxt(datafile2, comments='#', delimiter=';')
v1 = [0,98760,0,1]
v2 = [0,98760,-2,2]
plt.figure(1)
plt.subplot(4,1,1)
print 'loading', datafile1
plt.axis(v2)
plt.plot(a1, 'r.')
plt.subplot(4,1,2)
print 'loading', datafile2
plt.axis(v1)
plt.plot(b1, 'b.')
plt.show()

Thank you very much in advance for your time and suggestions.

Karthik

Hello guys,

I was able to plot when I only had 1 column. But now I have a CSV file that has 10,000 rows and 12 columns. I am trying to write a code to plot all these 12 columns into 12 subplots of one graph. Below found is my code for just one column in one csv file. BTW csv2rec does not work in my version of matplotlib.

import matplotlib.pyplot as plt
import pylab
datafile1 = ‘ch1_s1_lrr.csv’
datafile2 = ‘ch1_s1_baf.csv’

a1 = pylab.loadtxt(datafile1, comments=’#’, delimiter=’;’)
b1 = pylab.loadtxt(datafile2, comments=’#’, delimiter=’;’)

v1 = [0,98760,0,1]
v2 = [0,98760,-2,2]

plt.figure(1)

plt.subplot(2,1,1)
print ‘loading’, datafile1
plt.axis(v2)
plt.plot(a1, ‘r.’)

plt.subplot(2,1,2)
print ‘loading’, datafile2
plt.axis(v1)
plt.plot(b1, ‘b.’)

plt.show()

Now I want to be able to import 12 columns from the same file and plot all the values of the 1st six columns and only the values less then 0.05 for the next six columns.

I am a beginner for python and matplotlib and I have never used arrays before so I am stuck at this point for a more than a week. Please help!!!
Any help is appreciated. Thank you for your time and valuable suggestion

Karthik

Hi,

have you tried the examples that I have provided a couple days ago,
see below? I cannot see why it should not work. These are the absolute
basics that you need to understand.

Btw, there is no need to use csv2rec unless you want/need column or row headers.

Here's a full script that does what you want. Now, please take the
time and work through the example that I have provided. In case you
need further help, please don't start a new thread but reply to this
one.

Best regards,
Daniel

# -*- coding: utf-8 -*-

import matplotlib.pyplot as plt
import pylab
import scipy

datafile1 = 'ch1_s1_lrr.csv'
datafile2 = 'ch1_s1_baf.csv'

## create dummy data
data = pylab.rand(10000,12)
pylab.savetxt(datafile1, data, delimiter=';')
pylab.savetxt(datafile2, data, delimiter=';')

## load data and transpose
a1 = pylab.loadtxt(datafile1, comments='#', delimiter=';').T
print 'loading', datafile1
b1 = pylab.loadtxt(datafile2, comments='#', delimiter=';').T
print 'loading', datafile2

## axis limits
#v1 = [0,98760,0,1]
#v2 = [0,98760,-2,2]
v1 = [0,1]
v2 = [-2,2]

plt.close('all')
plt.figure()

plt.subplot(2,1,1)
#plt.axis(v2)
plt.ylim(v2)
#plt.plot(a1, 'r.')
for i in range(6):
    plt.plot(a1[i])

plt.subplot(2,1,2)
#plt.axis(v1)
plt.ylim(v1)
#plt.plot(b1, 'b.')

## need masked arrays here
## http://physics.nmt.edu/~raymond/software/python_notes/paper003.html
m = b1 >= 0.05
b1masked = scipy.ma.array(b1,mask=m)
## print first two cols
print b1masked[0:2]

for i in range(6,12):
    plt.plot(b1masked[i])

plt.show()

2011/6/3 Karthikraja Velmurugan <velmurugan.karthikraja@...287...>:

import matplotlib.pyplot as plt
import pylab
datafile1 = 'ch1_s1_lrr.csv'
datafile2 = 'ch1_s1_baf.csv'

a1 = pylab.loadtxt(datafile1, comments='#', delimiter=';')
b1 = pylab.loadtxt(datafile2, comments='#', delimiter=';')

v1 = [0,98760,0,1]
v2 = [0,98760,-2,2]

plt.figure(1)

plt.subplot(2,1,1)
print 'loading', datafile1
plt.axis(v2)
plt.plot(a1, 'r.')

plt.subplot(2,1,2)
print 'loading', datafile2
plt.axis(v1)
plt.plot(b1, 'b.')

plt.show()

2011/5/30 Daniel Mader <danielstefanmader@...982...>:

···

Hi,

the content of the CSV is stored as an array after reading. You can
simply access rows and columns like in Matlab:

firstrow = a1[0]
firstcol = a1.T[0]

The .T transposes the array.

The second element of the third row would be

elem32 = a1[2][1]
which is equivalent to
elem32 = a1[2,1]

A range of e.g. rows 3 to 6 is
range36 = a1[2:6]

Please have a look here for getting started with scipy/numpy:
http://pages.physics.cornell.edu/~myers/teaching/ComputationalMethods/python/arrays.html
and
http://www.scipy.org/NumPy_for_Matlab_Users

Hope this helps,
Daniel

2011/5/27 Karthikraja Velmurugan <velmurugan.karthikraja@...287...>:

Hello Daniel,

The code you have given is simple and works fab. Thank you very much. But I
wasn't able to find an example which accesses the columns of a CSV files
when I import data through "datafile="filename.csv"" option. It will be
great if you could help with accessing individual columns. What excatly I am
looking for is to access individual coulmns (of the same CSV file), do
calculations using the two coumns and plot them into seperate subplots of
the same graph.
I modified the script a lil bit. Please find it below:

import matplotlib.pyplot as plt
import pylab
datafile1 = 'ch1_s1_lrr.csv'
datafile2 = 'ch1_s1_baf.csv'
a1 = pylab.loadtxt(datafile1, comments='#', delimiter=';')
b1 = pylab.loadtxt(datafile2, comments='#', delimiter=';')
v1 = [0,98760,0,1]
v2 = [0,98760,-2,2]
plt.figure(1)
plt.subplot(4,1,1)
print 'loading', datafile1
plt.axis(v2)
plt.plot(a1, 'r.')
plt.subplot(4,1,2)
print 'loading', datafile2
plt.axis(v1)
plt.plot(b1, 'b.')
plt.show()

Thank you very much in advance for your time and suggestions.

Karthik

Hi Daniel,

···

I used the code but there is small issue. I forgot to mention that my values are signed and unsigned decimal values.

My values look like this

0.0023
-0.0456
0.0419
0.094
-0.0004
0.0236
-0.0237
-0.0043
-0.0718
0.0095
0.0592
-0.0417
0.0023
0.0386
-0.0023
-0.0236
-0.1045
0.098
-0.0006
0.0516
0.0463
-0.0035
-0.0442
0.1371
0.022
-0.0222
0.256
0.4903
0.0662
-0.0763
0.0064
0.1404

After running the code the “pylab.savetxt” saves the same data something like this

8.205965840870644800e-01;8.034591567160346300e-01;5.493847743502982000e-01;2.581157685701491700e-01;6.409997826977161800e-01;3.719908502347885100e-01

When I tried to extract data and print them they look like this (totally different from the actual values!)

[ 0.18353712 0.30468928 0.16164556 …, 0.98860032 0.49681098
0.77393306]

When I tried not using the “pylab.savetxt” function it gives an error like below:

ValueError: invalid literal for float(): 0.0023,-0.0456,0.0419,0.094,0.0224,0.0365

Is there a specific way to handle signed decimal number? If so please suggest some changes. And also I did try using the “array[]” to access individual comulns but I get an error saying the numpy.ndarray object not callable.

import matplotlib.pyplot as plt
import pylab
import scipy
import numpy
datafile1 = ‘vet1.csv’
data = pylab.rand(98760,6)
pylab.savetxt(datafile1, data, delimiter=’;’)
a1 = pylab.loadtxt(datafile1, comments=’#’, delimiter=’;’).T
print ‘loading’, datafile1
v1 = [0,1]
v2 = [-2,2]

plt.close(‘all’)
plt.figure()

plt.ylim(v2)
for i in range(2):
plt.plot(a1[i])

plt.show()

-Karthik

Hi Karthik,

I cannot find any problem with your code. You are mixing modules a little too much to my taste but it’s not a technical problem.

Loading and saving the data works flawless here. Attached is an infile and a modified script, please try this.

2011/6/11 Karthikraja Velmurugan <velmurugan.karthikraja@…287…>

Hi Daniel,


I used the code but there is small issue. I forgot to mention that my values are signed and unsigned decimal values.

My values look like this

0.0023
-0.0456
0.0419
0.094
-0.0004
0.0236
-0.0237
-0.0043
-0.0718
0.0095
0.0592
-0.0417
0.0023
0.0386
-0.0023
-0.0236
-0.1045
0.098
-0.0006
0.0516
0.0463
-0.0035
-0.0442
0.1371
0.022
-0.0222
0.256
0.4903
0.0662
-0.0763
0.0064
0.1404

After running the code the “pylab.savetxt” saves the same data something like this

8.205965840870644800e-01;8.034591567160346300e-01;5.493847743502982000e-01;2.581157685701491700e-01;6.409997826977161800e-01;3.719908502347885100e-01

I assume you are confused about the many decimals. Whenever floats are processed by Python they are real floats, see here: http://docs.python.org/release/2.5.2/tut/node16.html

To me, it looks as if you have truncated the lines, but otherwise there is nothing wrong…

When I tried to extract data and print them they look like this (totally different from the actual values!)

[ 0.18353712 0.30468928 0.16164556 …, 0.98860032 0.49681098
0.77393306]

Yes, these are different numbers. But I assume you are comparing different rows or columns?!

When I tried not using the “pylab.savetxt” function it gives an error like below:

ValueError: invalid literal for float(): 0.0023,-0.0456,0.0419,0.094,0.0224,0.0365

This error message tells you that you are trying to save non-numeric data to a file with that command.
Eg. this will cause the same error: scipy.savetxt(‘asdfasdf.dat’, ‘asdfasdf’)
It is VERY hard to tell what you are doing since you don’t provide exact pieces of code.

Is there a specific way to handle signed decimal number? If so please suggest some changes. And also I did try using the “array[]” to access individual comulns but I get an error saying the numpy.ndarray object not callable.

I must ask again? Have you played with the examples that I provided? You are using the function in a wrong way (again, I can’t tell for sure since you don’t provide code):

In order to acces the first row from a data array, you simply use data[0], the first column is data.T[0].

import matplotlib.pyplot as plt
import pylab
import scipy
import numpy
datafile1 = ‘vet1.csv’
data = pylab.rand(98760,6)
pylab.savetxt(datafile1, data, delimiter=’;’)
a1 = pylab.loadtxt(datafile1, comments=’#’, delimiter=’;’).T
print ‘loading’, datafile1
v1 = [0,1]
v2 = [-2,2]

plt.close(‘all’)
plt.figure()

plt.ylim(v2)
for i in range(2):
plt.plot(a1[i])

plt.show()

-Karthik

Please do provide all steps that cause problems, not just the results. It is impossible to help you with assumptions and guesses :slight_smile:

Best regards,
Daniel

karthikIN.csv (234 Bytes)

untitled0.py (1.07 KB)