Plotting large file (NetCDF)

RE: [Matplotlib-users] Plotting large file (NetCDF)
Hi Jody and Ben,

thanks for your answers.

I tried to use pcolormesh instead of pcolor and the result is very good! For what concern with the memory system problem, I wasn’t able to solve it. When I tried to use the bigger file, I got the same problem. Attached you will find the script that I’m using to make the plot. May be, I didn’t understand very well how can I use the mmap function.

Regards,

Raffaele.

prova3_256_5x.py (1.01 KB)

···

-----Original Message-----

From: Jody Klymak [mailto:jklymak@…83…4192…]

Sent: Mon 9/8/2014 5:46 PM

To: Benjamin Root

Cc: Raffaele Quarta; Matplotlib Users

Subject: Re: [Matplotlib-users] Plotting large file (NetCDF)

It looks like you are calling pcolor. Can I suggest you try pcolormesh? ii

75 Mb is not a big file!

Cheers, Jody

On Sep 8, 2014, at 7:38 AM, Benjamin Root <ben.root@…1304…> wrote:

(Keeping this on the mailing list so that others can benefit)

What might be happening is that you are keeping around too many numpy arrays in memory than you actually need. Take advantage of memmapping, which most netcdf tools provide by default. This keeps the data on disk rather than in RAM. Second, for very large images, I would suggest either pcolormesh() or just simply imshow() instead of pcolor() as they are more way more efficient than pcolor(). In addition, it sounds like you are dealing with re-sampled data (“at different zoom levels”). Does this mean that you are re-running contour on re-sampled data? I am not sure what the benefit of doing that is if one could just simply do the contour once at the highest resolution.

Without seeing any code, though, I can only provide generic suggestions.

Cheers!

Ben Root

On Mon, Sep 8, 2014 at 10:12 AM, Raffaele Quarta <raffaele.quarta@…120…4572…> wrote:

Hi Ben,

sorry for the few details that I gave to you. I’m trying to make a contour plot of a variable at different zoom levels by using high resolution data. The aim is to obtain .PNG output images. Actually, I’m working with big data (NetCDF file, dimension is about 75Mb). The current Matplotlib version on my UBUNTU 14.04 machine is the 1.3.1 one. My system has a RAM capacity of 8Gb.

Actually, I’m dealing with memory system problems when I try to make a plot. I got the error message as follow:


 cs = m.pcolor(xi,yi,np.squeeze(t))

File “/usr/lib/pymodules/python2.7/mpl_toolkits/basemap/init.py”, line 521, in with_transform

return plotfunc(self,x,y,data,*args,**kwargs)

File “/usr/lib/pymodules/python2.7/mpl_toolkits/basemap/init.py”, line 3375, in pcolor

x = ma.masked_values(np.where(x > 1.e20,1.e20,x), 1.e20)

File “/usr/lib/python2.7/dist-packages/numpy/ma/core.py”, line 2195, in masked_values

condition = umath.less_equal(mabs(xnew - value), atol + rtol * mabs(value))

MemoryError


Otherwise, when I try to make a plot of smaller file (such as 5Mb), it works very well. I believe that it’s not something of wrong in the script. It might be a memory system problem.

I hope that my message is more clear now.

Thanks for the help.

Regards,

Raffaele


Sent: Mon 9/8/2014 3:19 PM

To: Raffaele Quarta

Cc: Matplotlib Users

Subject: Re: [Matplotlib-users] Plotting large file (NetCDF)

You will need to be more specific… much more specific. What kind of plot

are you making? How big is your data? What version of matplotlib are you

using? How much RAM do you have available compared to the amount of data

(most slowdowns are actually due to swap-thrashing issues). Matplotlib can

be used for large data, but there exists some speciality tools for the

truly large datasets. The solution depends on the situation.

Ben Root

On Mon, Sep 8, 2014 at 7:45 AM, Raffaele Quarta <raffaele.quarta@…878…4572…> > wrote:

Hi,

I’m working with NetCDF format. When I try to make a plot of very large

file, I have to wait for a long time for plotting. How can I solve this?

Isn’t there a solution for this problem?

Raffaele

This email was Virus checked by Astaro Security Gateway. http://www.sophos.com


Want excitement?

Manually upgrade your production database.

When you want reliability, choose Perforce

Perforce version control. Predictably reliable.

http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk


Matplotlib-users mailing list

Matplotlib-users@lists.sourceforge.net

https://lists.sourceforge.net/lists/listinfo/matplotlib-users

This email was Virus checked by Astaro Security Gateway. http://www.sophos.com


Want excitement?

Manually upgrade your production database.

When you want reliability, choose Perforce

Perforce version control. Predictably reliable.

http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk_______________________________________________

Matplotlib-users mailing list

Matplotlib-users@lists.sourceforge.net

https://lists.sourceforge.net/lists/listinfo/matplotlib-users

Jody Klymak

http://web.uvic.ca/~jklymak/

Raffaele,

As Ben pointed out, you might be creating a lot of in memory Numpy arrays that you probably don’t need/want.

For example, I think (?) slicing all of the variable below:

lons = fh.variables[‘lon’][:]

is making a copy of all that (mmap’ed) data as a Numpy array in memory. Get rid of the slice ([:]). Of course, these variables are not Numpy arrays, so you’ll have to change some of your code. For example:

lon_0 = lons.mean()

Will have to become:

lon_0 = np.mean( lons )

If lats and lons are very large sets of data, then meshgrid will make two very, very large arrays in memory.

For example, try this:

np.meshgrid(np.arange(5), np.arange(5))

The output is two much larger arrays:

[array([[0, 1, 2, 3, 4],

[0, 1, 2, 3, 4],

[0, 1, 2, 3, 4],

[0, 1, 2, 3, 4],

[0, 1, 2, 3, 4]]),

array([[0, 0, 0, 0, 0],

[1, 1, 1, 1, 1],

[2, 2, 2, 2, 2],

[3, 3, 3, 3, 3],

[4, 4, 4, 4, 4]])]

I don’t know Basemap at all, so I don’t know if this is necessary. You might be able to force the meshgrid output into a memmap file, but I don’t know how to do that right now. Perhaps someone else has some suggestions.

Hope that helps.

Ryan

···

On Tue, Sep 9, 2014 at 4:07 AM, Raffaele Quarta <raffaele.quarta@…4572…> wrote:

Hi Jody and Ben,

thanks for your answers.

I tried to use pcolormesh instead of pcolor and the result is very good! For what concern with the memory system problem, I wasn’t able to solve it. When I tried to use the bigger file, I got the same problem. Attached you will find the script that I’m using to make the plot. May be, I didn’t understand very well how can I use the mmap function.

Regards,

Raffaele.

-----Original Message-----

From: Jody Klymak [mailto:jklymak@…4192…]

Sent: Mon 9/8/2014 5:46 PM

To: Benjamin Root

Cc: Raffaele Quarta; Matplotlib Users

Subject: Re: [Matplotlib-users] Plotting large file (NetCDF)

It looks like you are calling pcolor. Can I suggest you try pcolormesh? ii

75 Mb is not a big file!

Cheers, Jody

On Sep 8, 2014, at 7:38 AM, Benjamin Root <ben.root@…1304…> wrote:

(Keeping this on the mailing list so that others can benefit)

What might be happening is that you are keeping around too many numpy arrays in memory than you actually need. Take advantage of memmapping, which most netcdf tools provide by default. This keeps the data on disk rather than in RAM. Second, for very large images, I would suggest either pcolormesh() or just simply imshow() instead of pcolor() as they are more way more efficient than pcolor(). In addition, it sounds like you are dealing with re-sampled data (“at different zoom levels”). Does this mean that you are re-running contour on re-sampled data? I am not sure what the benefit of doing that is if one could just simply do the contour once at the highest resolution.

Without seeing any code, though, I can only provide generic suggestions.

Cheers!

Ben Root

On Mon, Sep 8, 2014 at 10:12 AM, Raffaele Quarta <raffaele.quarta@…4572…> wrote:

Hi Ben,

sorry for the few details that I gave to you. I’m trying to make a contour plot of a variable at different zoom levels by using high resolution data. The aim is to obtain .PNG output images. Actually, I’m working with big data (NetCDF file, dimension is about 75Mb). The current Matplotlib version on my UBUNTU 14.04 machine is the 1.3.1 one. My system has a RAM capacity of 8Gb.

Actually, I’m dealing with memory system problems when I try to make a plot. I got the error message as follow:


 cs = m.pcolor(xi,yi,np.squeeze(t))

File “/usr/lib/pymodules/python2.7/mpl_toolkits/basemap/init.py”, line 521, in with_transform

return plotfunc(self,x,y,data,*args,**kwargs)

File “/usr/lib/pymodules/python2.7/mpl_toolkits/basemap/init.py”, line 3375, in pcolor

x = ma.masked_values(np.where(x > 1.e20,1.e20,x), 1.e20)

File “/usr/lib/python2.7/dist-packages/numpy/ma/core.py”, line 2195, in masked_values

condition = umath.less_equal(mabs(xnew - value), atol + rtol * mabs(value))

MemoryError


Otherwise, when I try to make a plot of smaller file (such as 5Mb), it works very well. I believe that it’s not something of wrong in the script. It might be a memory system problem.

I hope that my message is more clear now.

Thanks for the help.

Regards,

Raffaele


Sent: Mon 9/8/2014 3:19 PM

To: Raffaele Quarta

Cc: Matplotlib Users

Subject: Re: [Matplotlib-users] Plotting large file (NetCDF)

You will need to be more specific… much more specific. What kind of plot

are you making? How big is your data? What version of matplotlib are you

using? How much RAM do you have available compared to the amount of data

(most slowdowns are actually due to swap-thrashing issues). Matplotlib can

be used for large data, but there exists some speciality tools for the

truly large datasets. The solution depends on the situation.

Ben Root

On Mon, Sep 8, 2014 at 7:45 AM, Raffaele Quarta <raffaele.quarta@…4572…>

wrote:

Hi,

I’m working with NetCDF format. When I try to make a plot of very large

file, I have to wait for a long time for plotting. How can I solve this?

Isn’t there a solution for this problem?

Raffaele

This email was Virus checked by Astaro Security Gateway. http://www.sophos.com


Want excitement?

Manually upgrade your production database.

When you want reliability, choose Perforce

Perforce version control. Predictably reliable.

http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk


Matplotlib-users mailing list

Matplotlib-users@lists.sourceforge.net

https://lists.sourceforge.net/lists/listinfo/matplotlib-users

This email was Virus checked by Astaro Security Gateway. http://www.sophos.com


Want excitement?

Manually upgrade your production database.

When you want reliability, choose Perforce

Perforce version control. Predictably reliable.

http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk_______________________________________________

Matplotlib-users mailing list

Matplotlib-users@lists.sourceforge.net

https://lists.sourceforge.net/lists/listinfo/matplotlib-users

Jody Klymak

http://web.uvic.ca/~jklymak/


Want excitement?

Manually upgrade your production database.

When you want reliability, choose Perforce.

Perforce version control. Predictably reliable.

http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk


Matplotlib-users mailing list

Matplotlib-users@lists.sourceforge.net

https://lists.sourceforge.net/lists/listinfo/matplotlib-users

Most of the time, you will not need to use meshgrid. Take advantage of numpy’s broadcasting feature:
http://docs.scipy.org/doc/numpy/user/basics.broadcasting.html

It saves significantly on memory and processing time. Most of Matplotlib’s plotting functions work well with broadcastable inputs, so that is a great way to save on memory. NumPy’s ogrid is also a neat tool for generating broadcastable grids.

When I get a chance, I’ll look through the script for any other obvious savers.

Cheers!

Ben Root

···

On Tue, Sep 9, 2014 at 9:02 AM, Ryan Nelson <rnelsonchem@…287…> wrote:

Raffaele,

As Ben pointed out, you might be creating a lot of in memory Numpy arrays that you probably don’t need/want.

For example, I think (?) slicing all of the variable below:

lons = fh.variables[‘lon’][:]

is making a copy of all that (mmap’ed) data as a Numpy array in memory. Get rid of the slice ([:]). Of course, these variables are not Numpy arrays, so you’ll have to change some of your code. For example:

lon_0 = lons.mean()

Will have to become:

lon_0 = np.mean( lons )

If lats and lons are very large sets of data, then meshgrid will make two very, very large arrays in memory.

For example, try this:

np.meshgrid(np.arange(5), np.arange(5))

The output is two much larger arrays:

[array([[0, 1, 2, 3, 4],

[0, 1, 2, 3, 4],

[0, 1, 2, 3, 4],

[0, 1, 2, 3, 4],

[0, 1, 2, 3, 4]]),

array([[0, 0, 0, 0, 0],

[1, 1, 1, 1, 1],

[2, 2, 2, 2, 2],

[3, 3, 3, 3, 3],

[4, 4, 4, 4, 4]])]

I don’t know Basemap at all, so I don’t know if this is necessary. You might be able to force the meshgrid output into a memmap file, but I don’t know how to do that right now. Perhaps someone else has some suggestions.

Hope that helps.

Ryan


Want excitement?

Manually upgrade your production database.

When you want reliability, choose Perforce.

Perforce version control. Predictably reliable.

http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk


Matplotlib-users mailing list

Matplotlib-users@lists.sourceforge.net

https://lists.sourceforge.net/lists/listinfo/matplotlib-users

On Tue, Sep 9, 2014 at 4:07 AM, Raffaele Quarta <raffaele.quarta@…4574…> wrote:

Hi Jody and Ben,

thanks for your answers.

I tried to use pcolormesh instead of pcolor and the result is very good! For what concern with the memory system problem, I wasn’t able to solve it. When I tried to use the bigger file, I got the same problem. Attached you will find the script that I’m using to make the plot. May be, I didn’t understand very well how can I use the mmap function.

Regards,

Raffaele.

-----Original Message-----

From: Jody Klymak [mailto:jklymak@…4192…]

Sent: Mon 9/8/2014 5:46 PM

To: Benjamin Root

Cc: Raffaele Quarta; Matplotlib Users

Subject: Re: [Matplotlib-users] Plotting large file (NetCDF)

It looks like you are calling pcolor. Can I suggest you try pcolormesh? ii

75 Mb is not a big file!

Cheers, Jody

On Sep 8, 2014, at 7:38 AM, Benjamin Root <ben.root@…1304…> wrote:

(Keeping this on the mailing list so that others can benefit)

What might be happening is that you are keeping around too many numpy arrays in memory than you actually need. Take advantage of memmapping, which most netcdf tools provide by default. This keeps the data on disk rather than in RAM. Second, for very large images, I would suggest either pcolormesh() or just simply imshow() instead of pcolor() as they are more way more efficient than pcolor(). In addition, it sounds like you are dealing with re-sampled data (“at different zoom levels”). Does this mean that you are re-running contour on re-sampled data? I am not sure what the benefit of doing that is if one could just simply do the contour once at the highest resolution.

Without seeing any code, though, I can only provide generic suggestions.

Cheers!

Ben Root

On Mon, Sep 8, 2014 at 10:12 AM, Raffaele Quarta <raffaele.quarta@…4572…> wrote:

Hi Ben,

sorry for the few details that I gave to you. I’m trying to make a contour plot of a variable at different zoom levels by using high resolution data. The aim is to obtain .PNG output images. Actually, I’m working with big data (NetCDF file, dimension is about 75Mb). The current Matplotlib version on my UBUNTU 14.04 machine is the 1.3.1 one. My system has a RAM capacity of 8Gb.

Actually, I’m dealing with memory system problems when I try to make a plot. I got the error message as follow:


 cs = m.pcolor(xi,yi,np.squeeze(t))

File “/usr/lib/pymodules/python2.7/mpl_toolkits/basemap/init.py”, line 521, in with_transform

return plotfunc(self,x,y,data,*args,**kwargs)

File “/usr/lib/pymodules/python2.7/mpl_toolkits/basemap/init.py”, line 3375, in pcolor

x = ma.masked_values(np.where(x > 1.e20,1.e20,x), 1.e20)

File “/usr/lib/python2.7/dist-packages/numpy/ma/core.py”, line 2195, in masked_values

condition = umath.less_equal(mabs(xnew - value), atol + rtol * mabs(value))

MemoryError


Otherwise, when I try to make a plot of smaller file (such as 5Mb), it works very well. I believe that it’s not something of wrong in the script. It might be a memory system problem.

I hope that my message is more clear now.

Thanks for the help.

Regards,

Raffaele


Sent: Mon 9/8/2014 3:19 PM

To: Raffaele Quarta

Cc: Matplotlib Users

Subject: Re: [Matplotlib-users] Plotting large file (NetCDF)

You will need to be more specific… much more specific. What kind of plot

are you making? How big is your data? What version of matplotlib are you

using? How much RAM do you have available compared to the amount of data

(most slowdowns are actually due to swap-thrashing issues). Matplotlib can

be used for large data, but there exists some speciality tools for the

truly large datasets. The solution depends on the situation.

Ben Root

On Mon, Sep 8, 2014 at 7:45 AM, Raffaele Quarta <raffaele.quarta@…4572…>

wrote:

Hi,

I’m working with NetCDF format. When I try to make a plot of very large

file, I have to wait for a long time for plotting. How can I solve this?

Isn’t there a solution for this problem?

Raffaele

This email was Virus checked by Astaro Security Gateway. http://www.sophos.com


Want excitement?

Manually upgrade your production database.

When you want reliability, choose Perforce

Perforce version control. Predictably reliable.

http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk


Matplotlib-users mailing list

Matplotlib-users@lists.sourceforge.net

https://lists.sourceforge.net/lists/listinfo/matplotlib-users

This email was Virus checked by Astaro Security Gateway. http://www.sophos.com


Want excitement?

Manually upgrade your production database.

When you want reliability, choose Perforce

Perforce version control. Predictably reliable.

http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk_______________________________________________

Matplotlib-users mailing list

Matplotlib-users@lists.sourceforge.net

https://lists.sourceforge.net/lists/listinfo/matplotlib-users

Jody Klymak

http://web.uvic.ca/~jklymak/


Want excitement?

Manually upgrade your production database.

When you want reliability, choose Perforce.

Perforce version control. Predictably reliable.

http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk


Matplotlib-users mailing list

Matplotlib-users@lists.sourceforge.net

https://lists.sourceforge.net/lists/listinfo/matplotlib-users