Crash when using matplotlib.tri.LinearTriInterpolator

All,

I'm running into a crash while trying to construct a
tri.LinearTriInterpolator. Here is the short version of the code:

    import netCDF4
    import matplotlib.tri as tri

    var = netCDF4.Dataset('filename.cdf').variables
    x = var['x'][:]
    y = var['y'][:]
    data = var['attrname'][:]
    elems = var['element'][:,:]-1

    triang = tri.Triangulation(x, y, triangles=elems)

    # this crashes the python interpreter
    interp = tri.LinearTriInterpolator(triang, data)

The data arrays (x, y, data, elems) are fairly large (>1 mio elements), all
represented as numpy arrays (as returned by netCDF4). The 'data' array is a
masked array and contains masked values.

If somebody cares, I'd be able to post a link to the netCDF data file
causing this.

All this happens when using matplotlib 1.3.1, Win32, Python 2.7.

Any help would be highly appreciated!
Regards Hartmut

···

---------------
http://boost-spirit.com
http://stellar.cct.lsu.edu

Hartmut,

That is an excellent issue report; all the relevant information and nothing
extraneous. Hence the quick response.

The second argument to TriLinearInterpolator (and other TriInterpolator
classes), i.e. your 'data' array, is expected to be an array of the same
size as the 'x' and 'y' arrays. It is not expecting a masked array. If a
masked array is used the mask will be ignored, and so the values behind the
mask will be used as though they were real values. If my memory of netCDF
is correct, this will be whatever 'FillValue' is defined for the file, but
it may depend on what is used to generate the netCDF file.

I would normally expect the code to work but produce useless output. A
crash is possible though. It would be best if you could post a link to the
netCDF file and I will take a closer look to check there is not something
else going wrong.

Ian Thomas

···

On 10 August 2014 18:43, Hartmut Kaiser <hartmut.kaiser@...287...> wrote:

All,

I'm running into a crash while trying to construct a
tri.LinearTriInterpolator. Here is the short version of the code:

    import netCDF4
    import matplotlib.tri as tri

    var = netCDF4.Dataset('filename.cdf').variables
    x = var['x'][:]
    y = var['y'][:]
    data = var['attrname'][:]
    elems = var['element'][:,:]-1

    triang = tri.Triangulation(x, y, triangles=elems)

    # this crashes the python interpreter
    interp = tri.LinearTriInterpolator(triang, data)

The data arrays (x, y, data, elems) are fairly large (>1 mio elements), all
represented as numpy arrays (as returned by netCDF4). The 'data' array is a
masked array and contains masked values.

If somebody cares, I'd be able to post a link to the netCDF data file
causing this.

All this happens when using matplotlib 1.3.1, Win32, Python 2.7.

Any help would be highly appreciated!
Regards Hartmut

Ian,

I'm running into a crash while trying to construct a
tri.LinearTriInterpolator. Here is the short version of the code:

    import netCDF4
    import matplotlib.tri as tri

    var = netCDF4.Dataset('filename.cdf').variables
    x = var['x'][:]
    y = var['y'][:]
    data = var['zeta_max'][:]
    elems = var['element'][:, :]-1

    triang = tri.Triangulation(x, y, triangles=elems)

    # this crashes the python interpreter
    interp = tri.LinearTriInterpolator(triang, data)

The data arrays (x, y, data, elems) are fairly large (>1 mio elements),
all
represented as numpy arrays (as returned by netCDF4). The 'data' array is
a
masked array and contains masked values.

If somebody cares, I'd be able to post a link to the netCDF data file
causing this.

All this happens when using matplotlib 1.3.1, Win32, Python 2.7.

Any help would be highly appreciated!
Regards Hartmut

Hartmut,
That is an excellent issue report; all the relevant information and
nothing extraneous. Hence the quick response.
The second argument to TriLinearInterpolator (and other TriInterpolator
classes), i.e. your 'data' array, is expected to be an array of the same
size as the 'x' and 'y' arrays. It is not expecting a masked array. If a
masked array is used the mask will be ignored, and so the values behind
the mask will be used as though they were real values. If my memory of
netCDF is correct, this will be whatever 'FillValue' is defined for the
file, but it may depend on what is used to generate the netCDF file.
I would normally expect the code to work but produce useless output. A
crash is possible though. It would be best if you could post a link to
the netCDF file and I will take a closer look to check there is not
something else going wrong.

Thanks for the quick response!

Here is the data file: http://tinyurl.com/ms7vzxw. I did some more experiments. The picture stays unchanged, even if I fill the masked values in the array with some real numbers (I'm not saying that this would give me any sensible results...):

    import netCDF4
    import matplotlib.tri as tri

    var = netCDF4.Dataset('maxele.63.nc').variables
    x = var['x'][:]
    y = var['y'][:]
    data = var['zeta_max'][:]
    elems = var['element'][:, :]-1

    triang = tri.Triangulation(x, y, triangles=elems)

    data = data.filled(0.0)

    # this still crashes the python interpreter
    interp = tri.LinearTriInterpolator(triang, data)

Thanks again!
Regards Hartmut

···

---------------

http://stellar.cct.lsu.edu

Hi Hartmut.

I ran the example on my machine (which is a 64-bit Linux box with 8 GB of RAM; Python 2.7, matplotlib 1.3.1) and it runs fine. However, it does use around 2 GB of memory, perhaps slightly more. I think the memory usage might be a problem for you if you are using 32-bit Windows. I’m not familiar with the details but I believe the memory available to a single 32-bit process on Win32 may be only 2 GB. I’m also not familiar with the data you provided, but is it possible to reduce to number of points in order to test if memory limitations are the underlying problemhere?

···

On 11 August 2014 14:54, Hartmut Kaiser <hartmut.kaiser@…287…> wrote:

Ian,

I’m running into a crash while trying to construct a

tri.LinearTriInterpolator. Here is the short version of the code:

import netCDF4
import matplotlib.tri as tri
var = netCDF4.Dataset('filename.cdf').variables
x = var['x'][:]
y = var['y'][:]
data = var['zeta_max'][:]
elems = var['element'][:, :]-1
triang = tri.Triangulation(x, y, triangles=elems)
# this crashes the python interpreter
interp = tri.LinearTriInterpolator(triang, data)

The data arrays (x, y, data, elems) are fairly large (>1 mio elements),

all

represented as numpy arrays (as returned by netCDF4). The ‘data’ array is

a

masked array and contains masked values.

If somebody cares, I’d be able to post a link to the netCDF data file

causing this.

All this happens when using matplotlib 1.3.1, Win32, Python 2.7.

Any help would be highly appreciated!

Regards Hartmut

Hartmut,

That is an excellent issue report; all the relevant information and

nothing extraneous. Hence the quick response.

The second argument to TriLinearInterpolator (and other TriInterpolator

classes), i.e. your ‘data’ array, is expected to be an array of the same

size as the ‘x’ and ‘y’ arrays. It is not expecting a masked array. If a

masked array is used the mask will be ignored, and so the values behind

the mask will be used as though they were real values. If my memory of

netCDF is correct, this will be whatever ‘FillValue’ is defined for the

file, but it may depend on what is used to generate the netCDF file.

I would normally expect the code to work but produce useless output. A

crash is possible though. It would be best if you could post a link to

the netCDF file and I will take a closer look to check there is not

something else going wrong.

Thanks for the quick response!

Here is the data file: http://tinyurl.com/ms7vzxw. I did some more experiments. The picture stays unchanged, even if I fill the masked values in the array with some real numbers (I’m not saying that this would give me any sensible results…):

import netCDF4

import matplotlib.tri as tri

var = netCDF4.Dataset(‘maxele.63.nc’).variables
x = var[‘x’][:]

y = var['y'][:]

data = var[‘zeta_max’][:]
elems = var[‘element’][:, :]-1

triang = tri.Triangulation(x, y, triangles=elems)

data = data.filled(0.0)

# this still crashes the python interpreter

interp = tri.LinearTriInterpolator(triang, data)

Thanks again!
Regards Hartmut


http://boost-spirit.com

http://stellar.cct.lsu.edu



Matplotlib-users mailing list

Matplotlib-users@lists.sourceforge.net

https://lists.sourceforge.net/lists/listinfo/matplotlib-users


Dr Andrew Dawson
Atmospheric, Oceanic & Planetary Physics
Clarendon Laboratory
Parks Road
Oxford OX1 3PU, UK
Tel: +44 (0)1865 282438

Email: dawson@…4402…9…
Web Site: http://www2.physics.ox.ac.uk/contacts/people/dawson

Runs to completion without errors on my installation:

OS X 10.9.4
MacBook Air w/ 8GB of memory
Python 2.7 and matplotlib 1.3.1-1 lib

-Dale

···

On Aug 10, 2014, at 13:43 , Hartmut Kaiser <hartmut.kaiser@...287...> wrote:

All,

I'm running into a crash while trying to construct a
tri.LinearTriInterpolator. Here is the short version of the code:

   import netCDF4
   import matplotlib.tri as tri

   var = netCDF4.Dataset('filename.cdf').variables
   x = var['x'][:]
   y = var['y'][:]
   data = var['attrname'][:]
   elems = var['element'][:,:]-1

   triang = tri.Triangulation(x, y, triangles=elems)

   # this crashes the python interpreter
   interp = tri.LinearTriInterpolator(triang, data)

The data arrays (x, y, data, elems) are fairly large (>1 mio elements), all
represented as numpy arrays (as returned by netCDF4). The 'data' array is a
masked array and contains masked values.

If somebody cares, I'd be able to post a link to the netCDF data file
causing this.

All this happens when using matplotlib 1.3.1, Win32, Python 2.7.

Any help would be highly appreciated!
Regards Hartmut
---------------
http://boost-spirit.com
http://stellar.cct.lsu.edu

------------------------------------------------------------------------------
_______________________________________________
Matplotlib-users mailing list
Matplotlib-users@lists.sourceforge.net
matplotlib-users List Signup and Options

Andrew,

I ran the example on my machine (which is a 64-bit Linux box with 8 GB of
RAM; Python 2.7, matplotlib 1.3.1) and it runs fine. However, it does use
around 2 GB of memory, perhaps slightly more. I think the memory usage
might be a problem for you if you are using 32-bit Windows. I'm not
familiar with the details but I believe the memory available to a single
32-bit process on Win32 may be only 2 GB. I'm also not familiar with the
data you provided, but is it possible to reduce to number of points in
order to test if memory limitations are the underlying problemhere?

Nod, your suspicion is correct. The python interpreter bails out once the memory footprint reaches 2GBytes. That leaves us with the question if this is a quality of implementation issue - using up 2GBytes of main memory for 1 million node elements seems to be a bit excessive...

Thanks everybody for verifying anyways!

Regards Hartmut

···

---------------

http://stellar.cct.lsu.edu

On 11 August 2014 14:54, Hartmut Kaiser <hartmut.kaiser@...287...> wrote:
Ian,

> I'm running into a crash while trying to construct a
> tri.LinearTriInterpolator. Here is the short version of the code:
>
> import netCDF4
> import matplotlib.tri as tri
>
> var = netCDF4.Dataset('filename.cdf').variables
> x = var['x'][:]
> y = var['y'][:]
> data = var['zeta_max'][:]
> elems = var['element'][:, :]-1
>
> triang = tri.Triangulation(x, y, triangles=elems)
>
> # this crashes the python interpreter
> interp = tri.LinearTriInterpolator(triang, data)
>
> The data arrays (x, y, data, elems) are fairly large (>1 mio elements),
> all
> represented as numpy arrays (as returned by netCDF4). The 'data' array
is
> a
> masked array and contains masked values.
>
> If somebody cares, I'd be able to post a link to the netCDF data file
> causing this.
>
> All this happens when using matplotlib 1.3.1, Win32, Python 2.7.
>
> Any help would be highly appreciated!
> Regards Hartmut
>
> Hartmut,
> That is an excellent issue report; all the relevant information and
> nothing extraneous. Hence the quick response.
> The second argument to TriLinearInterpolator (and other TriInterpolator
> classes), i.e. your 'data' array, is expected to be an array of the same
> size as the 'x' and 'y' arrays. It is not expecting a masked array. If
a
> masked array is used the mask will be ignored, and so the values behind
> the mask will be used as though they were real values. If my memory of
> netCDF is correct, this will be whatever 'FillValue' is defined for the
> file, but it may depend on what is used to generate the netCDF file.
> I would normally expect the code to work but produce useless output. A
> crash is possible though. It would be best if you could post a link to
> the netCDF file and I will take a closer look to check there is not
> something else going wrong.
Thanks for the quick response!

Here is the data file: http://tinyurl.com/ms7vzxw. I did some more
experiments. The picture stays unchanged, even if I fill the masked values
in the array with some real numbers (I'm not saying that this would give
me any sensible results...):

    import netCDF4
    import matplotlib.tri as tri
    var = netCDF4.Dataset('maxele.63.nc').variables
    x = var['x'][:]
    y = var['y'][:]
    data = var['zeta_max'][:]
    elems = var['element'][:, :]-1

    triang = tri.Triangulation(x, y, triangles=elems)
    data = data.filled(0.0)

    # this still crashes the python interpreter
    interp = tri.LinearTriInterpolator(triang, data)

Thanks again!
Regards Hartmut
---------------
http://boost-spirit.com
http://stellar.cct.lsu.edu

--------------------------------------------------------------------------
----
_______________________________________________
Matplotlib-users mailing list
Matplotlib-users@lists.sourceforge.net
matplotlib-users List Signup and Options

--
Dr Andrew Dawson
Atmospheric, Oceanic & Planetary Physics
Clarendon Laboratory
Parks Road
Oxford OX1 3PU, UK
Tel: +44 (0)1865 282438
Email: dawson@...4239...
Web Site: http://www2.physics.ox.ac.uk/contacts/people/dawson

> I ran the example on my machine (which is a 64-bit Linux box with 8 GB
of
> RAM; Python 2.7, matplotlib 1.3.1) and it runs fine. However, it does
use
> around 2 GB of memory, perhaps slightly more. I think the memory usage
> might be a problem for you if you are using 32-bit Windows. I'm not
> familiar with the details but I believe the memory available to a single
> 32-bit process on Win32 may be only 2 GB. I'm also not familiar with the
> data you provided, but is it possible to reduce to number of points in
> order to test if memory limitations are the underlying problemhere?

Nod, your suspicion is correct. The python interpreter bails out once the
memory footprint reaches 2GBytes. That leaves us with the question if this
is a quality of implementation issue - using up 2GBytes of main memory for
1 million node elements seems to be a bit excessive...

Thanks everybody for verifying anyways!

Just to round that issue up - I tried running this using Python 2.7 (64Bit) and it does not crash anymore. The memory requirement grows up to almost 4GByte.

I will verify whether I can get the results I hope for and will report back.

Thanks again!
Regards Hartmut

···

---------------

http://stellar.cct.lsu.edu

Regards Hartmut
---------------
http://boost-spirit.com
http://stellar.cct.lsu.edu

>
>
>
> On 11 August 2014 14:54, Hartmut Kaiser <hartmut.kaiser@...287...> > wrote:
> Ian,
>
> > I'm running into a crash while trying to construct a
> > tri.LinearTriInterpolator. Here is the short version of the code:
> >
> > import netCDF4
> > import matplotlib.tri as tri
> >
> > var = netCDF4.Dataset('filename.cdf').variables
> > x = var['x'][:]
> > y = var['y'][:]
> > data = var['zeta_max'][:]
> > elems = var['element'][:, :]-1
> >
> > triang = tri.Triangulation(x, y, triangles=elems)
> >
> > # this crashes the python interpreter
> > interp = tri.LinearTriInterpolator(triang, data)
> >
> > The data arrays (x, y, data, elems) are fairly large (>1 mio
elements),
> > all
> > represented as numpy arrays (as returned by netCDF4). The 'data' array
> is
> > a
> > masked array and contains masked values.
> >
> > If somebody cares, I'd be able to post a link to the netCDF data file
> > causing this.
> >
> > All this happens when using matplotlib 1.3.1, Win32, Python 2.7.
> >
> > Any help would be highly appreciated!
> > Regards Hartmut
> >
> > Hartmut,
> > That is an excellent issue report; all the relevant information and
> > nothing extraneous. Hence the quick response.
> > The second argument to TriLinearInterpolator (and other
TriInterpolator
> > classes), i.e. your 'data' array, is expected to be an array of the
same
> > size as the 'x' and 'y' arrays. It is not expecting a masked
array. If
> a
> > masked array is used the mask will be ignored, and so the values
behind
> > the mask will be used as though they were real values. If my memory
of
> > netCDF is correct, this will be whatever 'FillValue' is defined for
the
> > file, but it may depend on what is used to generate the netCDF file.
> > I would normally expect the code to work but produce useless
output. A
> > crash is possible though. It would be best if you could post a link
to
> > the netCDF file and I will take a closer look to check there is not
> > something else going wrong.
> Thanks for the quick response!
>
> Here is the data file: http://tinyurl.com/ms7vzxw. I did some more
> experiments. The picture stays unchanged, even if I fill the masked
values
> in the array with some real numbers (I'm not saying that this would give
> me any sensible results...):
>
> import netCDF4
> import matplotlib.tri as tri
> var = netCDF4.Dataset('maxele.63.nc').variables
> x = var['x'][:]
> y = var['y'][:]
> data = var['zeta_max'][:]
> elems = var['element'][:, :]-1
>
> triang = tri.Triangulation(x, y, triangles=elems)
> data = data.filled(0.0)
>
> # this still crashes the python interpreter
> interp = tri.LinearTriInterpolator(triang, data)
>
> Thanks again!
> Regards Hartmut
> ---------------
> http://boost-spirit.com
> http://stellar.cct.lsu.edu
>
>
>
> ------------------------------------------------------------------------
--
> ----
> _______________________________________________
> Matplotlib-users mailing list
> Matplotlib-users@lists.sourceforge.net
> matplotlib-users List Signup and Options
>
>
>
>
> --
> Dr Andrew Dawson
> Atmospheric, Oceanic & Planetary Physics
> Clarendon Laboratory
> Parks Road
> Oxford OX1 3PU, UK
> Tel: +44 (0)1865 282438
> Email: dawson@...4239...
> Web Site: http://www2.physics.ox.ac.uk/contacts/people/dawson

Here are the results of my investigation. There is probably more information here than anyone else wants, but it is useful information for future improvements.

Most of the RAM is taken up by a trifinder object which is at the heart of a triinterpolator, and is used to find the triangles of a
Triangulation in which (x,y) points lie. The code

···

interp = tri.LinearTriInterpolator(triang, data)

is equivalent to

trifinder = tri.TrapezoidMapTriFinder(triang)

interp = tri.LinearTriInterpolator(triang, data, trifinder=trifinder)

Using the latter with memory_profiler (https://pypi.python.org/pypi/memory_profiler) indicates that this is where most of the RAM is being used. Here are some figures for trifinder RAM usage as a function of ntri, the number of triangles in the triangulation:

ntri trifinder MB


1000 26
10000 33
100000 116
1000000 912
2140255 1936

The RAM usage is less than linear in ntri, but clearly too much for large triangulations unless you have a lot of RAM.

The trifinder precomputes a tree of nodes to make looking up triangles quick. Searching through 2 million triangles in an ad-hoc manner would be very slow; the trifinder is very fast in comparison. Here are some stats for the tree that trifinder uses (the columns are number of nodes in the tree, maximum node depth, and mean node depth):

ntri nodes max depth mean depth


1000 179097 37 23.24
10000 3271933 53 30.74

100000 36971309 69 37.15
1000000 853117229 87 48.66

The mean depth is the mean number of nodes that have to be traversed to find a triangle, and the max depth is the worst case. The search time is therefore O(log ntri).

The triangle interpolator code is structured in such a way that it is easy to plug in a different trifinder if the default one isn’t appropriate. At the moment there is only the one available however (TrapezoidMapTriFinder). For the problem at hand, a trifinder that is slower but consumes less RAM would be preferable. There are various possibilities, they just have to be implemented! I will take a look at it sometime, but it probably will not be soon.

Ian Thomas

Thanks for your insights, Ian!

A somewhat slower trifinder which requires less memory might be even faster in the end as creating the trifinder itself takes a lot of time (almost a minute in our case).

Regards Hartmut

···

---------------

http://stellar.cct.lsu.edu

-----Original Message-----
From: Ian Thomas [mailto:ianthomas23@…287…]
Sent: Tuesday, August 12, 2014 4:35 AM
To: Hartmut Kaiser
Cc: Andrew Dawson; Carola Kaiser; matplotlib-users
Subject: Re: [Matplotlib-users] Crash when using
matplotlib.tri.LinearTriInterpolator

Here are the results of my investigation. There is probably more
information here than anyone else wants, but it is useful information for
future improvements.

Most of the RAM is taken up by a trifinder object which is at the heart of
a triinterpolator, and is used to find the triangles of a Triangulation in
which (x,y) points lie. The code
    interp = tri.LinearTriInterpolator(triang, data)
is equivalent to
    trifinder = tri.TrapezoidMapTriFinder(triang)
    interp = tri.LinearTriInterpolator(triang, data, trifinder=trifinder)

Using the latter with memory_profiler
(memory-profiler · PyPI) indicates that this is
where most of the RAM is being used. Here are some figures for trifinder
RAM usage as a function of ntri, the number of triangles in the
triangulation:

ntri trifinder MB
---- ------------
1000 26
10000 33
100000 116
1000000 912
2140255 1936

The RAM usage is less than linear in ntri, but clearly too much for large
triangulations unless you have a lot of RAM.

The trifinder precomputes a tree of nodes to make looking up triangles
quick. Searching through 2 million triangles in an ad-hoc manner would be
very slow; the trifinder is very fast in comparison. Here are some stats
for the tree that trifinder uses (the columns are number of nodes in the
tree, maximum node depth, and mean node depth):
   ntri nodes max depth mean depth
------- --------- --------- ----------
   1000 179097 37 23.24
  10000 3271933 53 30.74
100000 36971309 69 37.15
1000000 853117229 87 48.66
The mean depth is the mean number of nodes that have to be traversed to
find a triangle, and the max depth is the worst case. The search time is
therefore O(log ntri).
The triangle interpolator code is structured in such a way that it is easy
to plug in a different trifinder if the default one isn't appropriate. At
the moment there is only the one available however
(TrapezoidMapTriFinder). For the problem at hand, a trifinder that is
slower but consumes less RAM would be preferable. There are various
possibilities, they just have to be implemented! I will take a look at it
sometime, but it probably will not be soon.
Ian Thomas