Plotting large file (NetCDF)

RE: [Matplotlib-users] Plotting large file (NetCDF)
Hi all,

somebody can show me with an example how can I set the numpy’s broadcasting feature?

Actually, I’m using ‘meshgrid’ in the script but I knew that it takes a lot of time to have the plot.

Thank you.

Raf

···

-----Original Message-----

From: Raffaele Quarta [mailto:raffaele.quarta@…4572…]

Sent: Tue 9/9/2014 3:55 PM

To: Benjamin Root; Ryan Nelson

Cc: Matplotlib Users

Subject: Re: [Matplotlib-users] Plotting large file (NetCDF)

Hi Ben and Ryan,

I will try to figure out as it works.

Thank you.

Regards,

Raf

-----Original Message-----

From: ben.v.root@…287… on behalf of Benjamin Root

Sent: Tue 9/9/2014 3:25 PM

To: Ryan Nelson

Cc: Raffaele Quarta; Matplotlib Users

Subject: Re: [Matplotlib-users] Plotting large file (NetCDF)

Most of the time, you will not need to use meshgrid. Take advantage of

numpy’s broadcasting feature:

http://docs.scipy.org/doc/numpy/user/basics.broadcasting.html

It saves significantly on memory and processing time. Most of

Matplotlib’s plotting functions work well with broadcastable inputs, so

that is a great way to save on memory. NumPy’s ogrid is also a neat tool

for generating broadcastable grids.

When I get a chance, I’ll look through the script for any other obvious

savers.

Cheers!

Ben Root

On Tue, Sep 9, 2014 at 9:02 AM, Ryan Nelson <rnelsonchem@…287…> wrote:

Raffaele,

As Ben pointed out, you might be creating a lot of in memory Numpy arrays

that you probably don’t need/want.

For example, I think (?) slicing all of the variable below:

lons = fh.variables[‘lon’][:]

is making a copy of all that (mmap’ed) data as a Numpy array in memory.

Get rid of the slice ([:]). Of course, these variables are not Numpy

arrays, so you’ll have to change some of your code. For example:

lon_0 = lons.mean()

Will have to become:

lon_0 = np.mean( lons )

If lats and lons are very large sets of data, then meshgrid will make two

very, very large arrays in memory.

For example, try this:

np.meshgrid(np.arange(5), np.arange(5))

The output is two much larger arrays:

[array([[0, 1, 2, 3, 4],

    [0, 1, 2, 3, 4],
    [0, 1, 2, 3, 4],
    [0, 1, 2, 3, 4],
    [0, 1, 2, 3, 4]]),

array([[0, 0, 0, 0, 0],

    [1, 1, 1, 1, 1],
    [2, 2, 2, 2, 2],
    [3, 3, 3, 3, 3],
    [4, 4, 4, 4, 4]])]

I don’t know Basemap at all, so I don’t know if this is necessary. You

might be able to force the meshgrid output into a memmap file, but I don’t

know how to do that right now. Perhaps someone else has some suggestions.

Hope that helps.

Ryan

On Tue, Sep 9, 2014 at 4:07 AM, Raffaele Quarta < > raffaele.quarta@…4572…> wrote:

Hi Jody and Ben,

thanks for your answers.

I tried to use pcolormesh instead of pcolor and the result is very good!

For what concern with the memory system problem, I wasn’t able to solve it.

When I tried to use the bigger file, I got the same problem. Attached you

will find the script that I’m using to make the plot. May be, I didn’t

understand very well how can I use the mmap function.

Regards,

Raffaele.

-----Original Message-----

From: Jody Klymak [mailto:jklymak@…4192… <jklymak@…4192…>]

Sent: Mon 9/8/2014 5:46 PM

To: Benjamin Root

Cc: Raffaele Quarta; Matplotlib Users

Subject: Re: [Matplotlib-users] Plotting large file (NetCDF)

It looks like you are calling pcolor. Can I suggest you try

pcolormesh? ii

75 Mb is not a big file!

Cheers, Jody

On Sep 8, 2014, at 7:38 AM, Benjamin Root <ben.root@…3203…04…> wrote:

(Keeping this on the mailing list so that others can benefit)

What might be happening is that you are keeping around too many numpy

arrays in memory than you actually need. Take advantage of memmapping,

which most netcdf tools provide by default. This keeps the data on disk

rather than in RAM. Second, for very large images, I would suggest either

pcolormesh() or just simply imshow() instead of pcolor() as they are more

way more efficient than pcolor(). In addition, it sounds like you are

dealing with re-sampled data (“at different zoom levels”). Does this mean

that you are re-running contour on re-sampled data? I am not sure what the

benefit of doing that is if one could just simply do the contour once at

the highest resolution.

Without seeing any code, though, I can only provide generic suggestions.

Cheers!

Ben Root

On Mon, Sep 8, 2014 at 10:12 AM, Raffaele Quarta < >> raffaele.quarta@…4572…> wrote:

Hi Ben,

sorry for the few details that I gave to you. I’m trying to make a

contour plot of a variable at different zoom levels by using high

resolution data. The aim is to obtain .PNG output images. Actually, I’m

working with big data (NetCDF file, dimension is about 75Mb). The current

Matplotlib version on my UBUNTU 14.04 machine is the 1.3.1 one. My system

has a RAM capacity of 8Gb.

Actually, I’m dealing with memory system problems when I try to make a

plot. I got the error message as follow:


 cs = m.pcolor(xi,yi,np.squeeze(t))

File “/usr/lib/pymodules/python2.7/mpl_toolkits/basemap/init.py”,

line 521, in with_transform

return plotfunc(self,x,y,data,*args,**kwargs)

File “/usr/lib/pymodules/python2.7/mpl_toolkits/basemap/init.py”,

line 3375, in pcolor

x = ma.masked_values(np.where(x > 1.e20,1.e20,x), 1.e20)

File “/usr/lib/python2.7/dist-packages/numpy/ma/core.py”, line 2195,

in masked_values

condition = umath.less_equal(mabs(xnew - value), atol + rtol *

mabs(value))

MemoryError


Otherwise, when I try to make a plot of smaller file (such as 5Mb), it

works very well. I believe that it’s not something of wrong in the script.

It might be a memory system problem.

I hope that my message is more clear now.

Thanks for the help.

Regards,

Raffaele


Sent: Mon 9/8/2014 3:19 PM

To: Raffaele Quarta

Cc: Matplotlib Users

Subject: Re: [Matplotlib-users] Plotting large file (NetCDF)

You will need to be more specific… much more specific. What kind of

plot

are you making? How big is your data? What version of matplotlib are you

using? How much RAM do you have available compared to the amount of data

(most slowdowns are actually due to swap-thrashing issues). Matplotlib

can

be used for large data, but there exists some speciality tools for the

truly large datasets. The solution depends on the situation.

Ben Root

On Mon, Sep 8, 2014 at 7:45 AM, Raffaele Quarta < >> raffaele.quarta@…4572…> >> > wrote:

Hi,

I’m working with NetCDF format. When I try to make a plot of very

large

file, I have to wait for a long time for plotting. How can I solve

this?

Isn’t there a solution for this problem?

Raffaele

This email was Virus checked by Astaro Security Gateway.

http://www.sophos.com


Want excitement?

Manually upgrade your production database.

When you want reliability, choose Perforce

Perforce version control. Predictably reliable.

http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk


Matplotlib-users mailing list

Matplotlib-users@lists.sourceforge.net

https://lists.sourceforge.net/lists/listinfo/matplotlib-users

This email was Virus checked by Astaro Security Gateway.

http://www.sophos.com


Want excitement?

Manually upgrade your production database.

When you want reliability, choose Perforce

Perforce version control. Predictably reliable.

http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk_______________________________________________

Matplotlib-users mailing list

Matplotlib-users@lists.sourceforge.net

https://lists.sourceforge.net/lists/listinfo/matplotlib-users

Jody Klymak

http://web.uvic.ca/~jklymak/


Want excitement?

Manually upgrade your production database.

When you want reliability, choose Perforce.

Perforce version control. Predictably reliable.

http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk


Matplotlib-users mailing list

Matplotlib-users@lists.sourceforge.net

https://lists.sourceforge.net/lists/listinfo/matplotlib-users


Want excitement?

Manually upgrade your production database.

When you want reliability, choose Perforce.

Perforce version control. Predictably reliable.

http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk


Matplotlib-users mailing list

Matplotlib-users@lists.sourceforge.net

https://lists.sourceforge.net/lists/listinfo/matplotlib-users

This email was Virus checked by Astaro Security Gateway. http://www.sophos.com

In the example you provided, you tried to broadcase two 1D arrays against each other, which isn’t what you want because all you will get is another 1-D array. Broadcasting automatically repeats data for you along a dimension. It is rare to actually call np.broadcast() as it usually happens automatically. Perhaps you should take your questions about broadcasting over to the numpy discussion mailing list where somebody there might be able to better explain it than I.

Cheers!
Ben Root

···

On Mon, Sep 22, 2014 at 6:42 AM, Raffaele Quarta <raffaele.quarta@…4572…> wrote:

Hi all,

somebody can show me with an example how can I set the numpy’s broadcasting feature?

Actually, I’m using ‘meshgrid’ in the script but I knew that it takes a lot of time to have the plot.

Thank you.

Raf

-----Original Message-----

From: Raffaele Quarta [mailto:raffaele.quarta@…4572…]

Sent: Tue 9/9/2014 3:55 PM

To: Benjamin Root; Ryan Nelson

Cc: Matplotlib Users

Subject: Re: [Matplotlib-users] Plotting large file (NetCDF)

Hi Ben and Ryan,

I will try to figure out as it works.

Thank you.

Regards,

Raf

-----Original Message-----

From: ben.v.root@…287… on behalf of Benjamin Root

Sent: Tue 9/9/2014 3:25 PM

To: Ryan Nelson

Cc: Raffaele Quarta; Matplotlib Users

Subject: Re: [Matplotlib-users] Plotting large file (NetCDF)

Most of the time, you will not need to use meshgrid. Take advantage of

numpy’s broadcasting feature:

http://docs.scipy.org/doc/numpy/user/basics.broadcasting.html

It saves significantly on memory and processing time. Most of

Matplotlib’s plotting functions work well with broadcastable inputs, so

that is a great way to save on memory. NumPy’s ogrid is also a neat tool

for generating broadcastable grids.

When I get a chance, I’ll look through the script for any other obvious

savers.

Cheers!

Ben Root

On Tue, Sep 9, 2014 at 9:02 AM, Ryan Nelson <rnelsonchem@…287…> wrote:

Raffaele,

As Ben pointed out, you might be creating a lot of in memory Numpy arrays

that you probably don’t need/want.

For example, I think (?) slicing all of the variable below:

lons = fh.variables[‘lon’][:]

is making a copy of all that (mmap’ed) data as a Numpy array in memory.

Get rid of the slice ([:]). Of course, these variables are not Numpy

arrays, so you’ll have to change some of your code. For example:

lon_0 = lons.mean()

Will have to become:

lon_0 = np.mean( lons )

If lats and lons are very large sets of data, then meshgrid will make two

very, very large arrays in memory.

For example, try this:

np.meshgrid(np.arange(5), np.arange(5))

The output is two much larger arrays:

[array([[0, 1, 2, 3, 4],

    [0, 1, 2, 3, 4],
    [0, 1, 2, 3, 4],
    [0, 1, 2, 3, 4],
    [0, 1, 2, 3, 4]]),

array([[0, 0, 0, 0, 0],

    [1, 1, 1, 1, 1],
    [2, 2, 2, 2, 2],
    [3, 3, 3, 3, 3],
    [4, 4, 4, 4, 4]])]

I don’t know Basemap at all, so I don’t know if this is necessary. You

might be able to force the meshgrid output into a memmap file, but I don’t

know how to do that right now. Perhaps someone else has some suggestions.

Hope that helps.

Ryan

On Tue, Sep 9, 2014 at 4:07 AM, Raffaele Quarta <

raffaele.quarta@…4572…> wrote:

Hi Jody and Ben,

thanks for your answers.

I tried to use pcolormesh instead of pcolor and the result is very good!

For what concern with the memory system problem, I wasn’t able to solve it.

When I tried to use the bigger file, I got the same problem. Attached you

will find the script that I’m using to make the plot. May be, I didn’t

understand very well how can I use the mmap function.

Regards,

Raffaele.

-----Original Message-----

From: Jody Klymak [mailto:jklymak@…4192… <jklymak@…4192…>]

Sent: Mon 9/8/2014 5:46 PM

To: Benjamin Root

Cc: Raffaele Quarta; Matplotlib Users

Subject: Re: [Matplotlib-users] Plotting large file (NetCDF)

It looks like you are calling pcolor. Can I suggest you try

pcolormesh? ii

75 Mb is not a big file!

Cheers, Jody

On Sep 8, 2014, at 7:38 AM, Benjamin Root <ben.root@…1304…> wrote:

(Keeping this on the mailing list so that others can benefit)

What might be happening is that you are keeping around too many numpy

arrays in memory than you actually need. Take advantage of memmapping,

which most netcdf tools provide by default. This keeps the data on disk

rather than in RAM. Second, for very large images, I would suggest either

pcolormesh() or just simply imshow() instead of pcolor() as they are more

way more efficient than pcolor(). In addition, it sounds like you are

dealing with re-sampled data (“at different zoom levels”). Does this mean

that you are re-running contour on re-sampled data? I am not sure what the

benefit of doing that is if one could just simply do the contour once at

the highest resolution.

Without seeing any code, though, I can only provide generic suggestions.

Cheers!

Ben Root

On Mon, Sep 8, 2014 at 10:12 AM, Raffaele Quarta <

raffaele.quarta@…4572…> wrote:

Hi Ben,

sorry for the few details that I gave to you. I’m trying to make a

contour plot of a variable at different zoom levels by using high

resolution data. The aim is to obtain .PNG output images. Actually, I’m

working with big data (NetCDF file, dimension is about 75Mb). The current

Matplotlib version on my UBUNTU 14.04 machine is the 1.3.1 one. My system

has a RAM capacity of 8Gb.

Actually, I’m dealing with memory system problems when I try to make a

plot. I got the error message as follow:


 cs = m.pcolor(xi,yi,np.squeeze(t))

File “/usr/lib/pymodules/python2.7/mpl_toolkits/basemap/init.py”,

line 521, in with_transform

return plotfunc(self,x,y,data,*args,**kwargs)

File “/usr/lib/pymodules/python2.7/mpl_toolkits/basemap/init.py”,

line 3375, in pcolor

x = ma.masked_values(np.where(x > 1.e20,1.e20,x), 1.e20)

File “/usr/lib/python2.7/dist-packages/numpy/ma/core.py”, line 2195,

in masked_values

condition = umath.less_equal(mabs(xnew - value), atol + rtol *

mabs(value))

MemoryError


Otherwise, when I try to make a plot of smaller file (such as 5Mb), it

works very well. I believe that it’s not something of wrong in the script.

It might be a memory system problem.

I hope that my message is more clear now.

Thanks for the help.

Regards,

Raffaele


Sent: Mon 9/8/2014 3:19 PM

To: Raffaele Quarta

Cc: Matplotlib Users

Subject: Re: [Matplotlib-users] Plotting large file (NetCDF)

You will need to be more specific… much more specific. What kind of

plot

are you making? How big is your data? What version of matplotlib are you

using? How much RAM do you have available compared to the amount of data

(most slowdowns are actually due to swap-thrashing issues). Matplotlib

can

be used for large data, but there exists some speciality tools for the

truly large datasets. The solution depends on the situation.

Ben Root

On Mon, Sep 8, 2014 at 7:45 AM, Raffaele Quarta <

raffaele.quarta@…4572…>

wrote:

Hi,

I’m working with NetCDF format. When I try to make a plot of very

large

file, I have to wait for a long time for plotting. How can I solve

this?

Isn’t there a solution for this problem?

Raffaele

This email was Virus checked by Astaro Security Gateway.

http://www.sophos.com


Want excitement?

Manually upgrade your production database.

When you want reliability, choose Perforce

Perforce version control. Predictably reliable.

http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk


Matplotlib-users mailing list

Matplotlib-users@lists.sourceforge.net

https://lists.sourceforge.net/lists/listinfo/matplotlib-users

This email was Virus checked by Astaro Security Gateway.

http://www.sophos.com


Want excitement?

Manually upgrade your production database.

When you want reliability, choose Perforce

Perforce version control. Predictably reliable.

http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk_______________________________________________

Matplotlib-users mailing list

Matplotlib-users@lists.sourceforge.net

https://lists.sourceforge.net/lists/listinfo/matplotlib-users

Jody Klymak

http://web.uvic.ca/~jklymak/


Want excitement?

Manually upgrade your production database.

When you want reliability, choose Perforce.

Perforce version control. Predictably reliable.

http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk


Matplotlib-users mailing list

Matplotlib-users@lists.sourceforge.net

https://lists.sourceforge.net/lists/listinfo/matplotlib-users


Want excitement?

Manually upgrade your production database.

When you want reliability, choose Perforce.

Perforce version control. Predictably reliable.

http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk


Matplotlib-users mailing list

Matplotlib-users@lists.sourceforge.net

https://lists.sourceforge.net/lists/listinfo/matplotlib-users

This email was Virus checked by Astaro Security Gateway. http://www.sophos.com


Meet PCI DSS 3.0 Compliance Requirements with EventLog Analyzer

Achieve PCI DSS 3.0 Compliant Status with Out-of-the-box PCI DSS Reports

Are you Audit-Ready for PCI DSS 3.0 Compliance? Download White paper

Comply to PCI DSS 3.0 Requirement 10 and 11.5 with EventLog Analyzer

http://pubads.g.doubleclick.net/gampad/clk?id=154622311&iu=/4140/ostg.clktrk


Matplotlib-users mailing list

Matplotlib-users@lists.sourceforge.net

https://lists.sourceforge.net/lists/listinfo/matplotlib-users