speeding-up griddata()

Hi all,

I would like to use griddata() to interpolate a function given at specified points of a bunch of other points. While the method works well, it slows down considerably as the number of points to interpolate to increases.

The dependence of time/(number of points) is nonlinear (see the attachment) - it seems that while the Delaunay trinagulation itself is fast, I wonder how to speed-up the interpolation. The docstring says, that it is based on "natural neighbor interpolation" - how are the neighbors searched? Does it use the kd-trees like scipy.spatial? I have a very good experience with scipy.spatial performance.

Also, is there a way of reusing the triangulation when interpolating several times using the same grid?

cheers,
r.

ps: no natgrid
In [9]: mpl.__version__
Out[9]: '0.98.5.3'

times.png

Robert Cimrman wrote:

Hi all,

I would like to use griddata() to interpolate a function given at specified points of a bunch of other points. While the method works well, it slows down considerably as the number of points to interpolate to increases.

The dependence of time/(number of points) is nonlinear (see the attachment) - it seems that while the Delaunay trinagulation itself is fast, I wonder how to speed-up the interpolation. The docstring says, that it is based on "natural neighbor interpolation" - how are the neighbors searched? Does it use the kd-trees like scipy.spatial? I have a very good experience with scipy.spatial performance.

Also, is there a way of reusing the triangulation when interpolating several times using the same grid?

cheers,
r.

Robert: The griddata function uses the delaunay module, which is a straight copy of Robert Kern's delaunay scikit. No one here is that familiar with the internals of delaunay, so I'd suggest you either ask Robert, or dig into the source code yourself (which is here: http://scipy.org/scipy/scikits/browser/trunk/delaunay).

-Jeff

···

--
Jeffrey S. Whitaker Phone : (303)497-6313
Meteorologist FAX : (303)497-6449
NOAA/OAR/PSD R/PSD1 Email : Jeffrey.S.Whitaker@...259...
325 Broadway Office : Skaggs Research Cntr 1D-113
Boulder, CO, USA 80303-3328 Web : http://tinyurl.com/5telg

Hi all,

I would like to use griddata() to interpolate a function given at
specified points of a bunch of other points. While the method works
well, it slows down considerably as the number of points to interpolate
to increases.

The dependence of time/(number of points) is nonlinear (see the
attachment) - it seems that while the Delaunay trinagulation itself is
fast, I wonder how to speed-up the interpolation. The docstring says,
that it is based on "natural neighbor interpolation" - how are the
neighbors searched?

Using the Delaunay triangulation. The "natural neighbors" of an interpolation point are those points participating in triangles in the Delaunay triangulation whose circumcircles include the interpolation point. The triangle that encloses the interpolation point is found by a standard walking procedure, then the neighboring triangles (natural or otherwise) are explored in a breadth-first search around the starting triangle to find the natural neighbors.

Unfortunately, griddata() uses the unstructured-interpolation-points API rather than the more efficient grid-interpolation-points API. In the former, each interpolation point uses the last-found enclosing triangle as the start of the walking search. This works well where adjacent interpolation points are close to each other. This is not the case at the ends of the grid rows. The latter API is smarter and starts a new row of the grid with the triangle from the triangle from the *start* of the previous row rather than the end. I suspect this is largely the cause of the poor performance.

Does it use the kd-trees like scipy.spatial? I have
a very good experience with scipy.spatial performance.

Also, is there a way of reusing the triangulation when interpolating
several times using the same grid?

One would construct a Triangulation() object with the (x,y) data points, get a new NNInterpolator() object using the .nn_interpolator(z) method for each new z data set, and then interpolate your grid on the NNInterpolator.

···

On 2009-07-13 13:20, Robert Cimrman wrote:

--
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
  that is made terrible by our own mad attempt to interpret it as though it had
  an underlying truth."
   -- Umberto Eco

Robert Kern wrote:

Hi all,

I would like to use griddata() to interpolate a function given at
specified points of a bunch of other points. While the method works
well, it slows down considerably as the number of points to interpolate
to increases.

The dependence of time/(number of points) is nonlinear (see the
attachment) - it seems that while the Delaunay trinagulation itself is
fast, I wonder how to speed-up the interpolation. The docstring says,
that it is based on "natural neighbor interpolation" - how are the
neighbors searched?

Using the Delaunay triangulation. The "natural neighbors" of an interpolation point are those points participating in triangles in the Delaunay triangulation whose circumcircles include the interpolation point. The triangle that encloses the interpolation point is found by a standard walking procedure, then the neighboring triangles (natural or otherwise) are explored in a breadth-first search around the starting triangle to find the natural neighbors.

I see, thanks for the explanation. The walking procedure is what is described e.g. in [1], right? (summary; starting from a random triangle, a line is made connecting that triangle with the interpolation point, and triangles along that line are probed.)

[1] http://www.geom.uiuc.edu/software/cglist/GeomDir/ptloc96.ps.gz

Unfortunately, griddata() uses the unstructured-interpolation-points API rather than the more efficient grid-interpolation-points API. In the former, each interpolation point uses the last-found enclosing triangle as the start of the walking search. This works well where adjacent interpolation points are close to each other. This is not the case at the ends of the grid rows. The latter API is smarter and starts a new row of the grid with the triangle from the triangle from the *start* of the previous row rather than the end. I suspect this is largely the cause of the poor performance.

Good to know, I will try to pass the points in groups of close points.

Does it use the kd-trees like scipy.spatial? I have
a very good experience with scipy.spatial performance.

Also, is there a way of reusing the triangulation when interpolating
several times using the same grid?

One would construct a Triangulation() object with the (x,y) data points, get a new NNInterpolator() object using the .nn_interpolator(z) method for each new z data set, and then interpolate your grid on the NNInterpolator.

So if the above fails, I can bypass griddata() by using the delaunay module directly, good.

thank you,
r.

···

On 2009-07-13 13:20, Robert Cimrman wrote:

Robert Kern wrote:

Hi all,

I would like to use griddata() to interpolate a function given at
specified points of a bunch of other points. While the method works
well, it slows down considerably as the number of points to interpolate
to increases.

The dependence of time/(number of points) is nonlinear (see the
attachment) - it seems that while the Delaunay trinagulation itself is
fast, I wonder how to speed-up the interpolation. The docstring says,
that it is based on "natural neighbor interpolation" - how are the
neighbors searched?

Using the Delaunay triangulation. The "natural neighbors" of an interpolation
point are those points participating in triangles in the Delaunay triangulation
whose circumcircles include the interpolation point. The triangle that encloses
the interpolation point is found by a standard walking procedure, then the
neighboring triangles (natural or otherwise) are explored in a breadth-first
search around the starting triangle to find the natural neighbors.

I see, thanks for the explanation. The walking procedure is what is
described e.g. in [1], right? (summary; starting from a random triangle,
a line is made connecting that triangle with the interpolation point,
and triangles along that line are probed.)

[1] http://www.geom.uiuc.edu/software/cglist/GeomDir/ptloc96.ps.gz

Yes.

Does it use the kd-trees like scipy.spatial? I have
a very good experience with scipy.spatial performance.

Also, is there a way of reusing the triangulation when interpolating
several times using the same grid?

One would construct a Triangulation() object with the (x,y) data points, get a
new NNInterpolator() object using the .nn_interpolator(z) method for each new z
data set, and then interpolate your grid on the NNInterpolator.

So if the above fails, I can bypass griddata() by using the delaunay
module directly, good.

Yes. griddata is a fairly light wrapper that exists mainly to sanitize inputs and allow use of the natgrid implementation easily.

···

On 2009-07-14 12:52, Robert Cimrman wrote:

On 2009-07-13 13:20, Robert Cimrman wrote:

--
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
  that is made terrible by our own mad attempt to interpret it as though it had
  an underlying truth."
   -- Umberto Eco

Robert C, Robert K, folks,

  messing with the nice delaunay/testfuncs.py to time
linear_interpolate_grid nn_interpolate_grid and nn_interpolate_unstructured
in _delaunay, I see linear ~ 100 times faster than the nn_ s:

# from: trigrid Ntri=1000 Ngrid=100 run: 21 Jul 2009 17:33 mac 10.4.11 ppc

time: 0.027 sec trigrid: build Triangulation 1000
time: 0.0059 sec trigrid 100 "linear" corners: 0 1 2 1
time: 0.5 sec trigrid 100 "nn_grid" corners: 0 1 2 1
time: 0.49 sec trigrid 100 "nn_unstruct" corners: 0 1 2 1

Correct me: if all 3 methods do gridpoint-to-triangle in the same way,
then the huge diff is in find-neighboring-triangles (6 on average ?), not in
gridpoint-to-triangle ?

This is with the _delaunay.so that comes with the mac 98.5.3 egg,
however that was compiled (-O3 ?)

What to do ?

1) does it matter, how many people care ? (all who believe in telekinesis,
raise my right hand)

2) natgrid ? don't see it in matplotlib.sf.net

3) stick with fast linear, smooth the triangle planes a la 3t^2 - 2t^3 or
fancier smoothing

In any case, add griddata( ... method = "linear" / "nn" ... ) so users have
a choice.

Can a real user or two tell us about the flow,
with some rough numbers for Ntri Ngrid Npix --
    Ntri = nr original sample points, say 1000
    Ngrid 100 x 100
    Npix 800 x 600 ?
(Ntri -> Ngrid slowly and accurately,
then Ngrid -> Npix w fast inaccurate image interpolation ? hmm.)

cheers
  -- denis

···

--
View this message in context: http://www.nabble.com/speeding-up-griddata()-tp24467055p24591133.html
Sent from the matplotlib - users mailing list archive at Nabble.com.

Denis-B wrote:

Robert C, Robert K, folks,

  messing with the nice delaunay/testfuncs.py to time
linear_interpolate_grid nn_interpolate_grid and nn_interpolate_unstructured
in _delaunay, I see linear ~ 100 times faster than the nn_ s:

# from: trigrid Ntri=1000 Ngrid=100 run: 21 Jul 2009 17:33 mac 10.4.11 ppc

time: 0.027 sec trigrid: build Triangulation 1000
time: 0.0059 sec trigrid 100 "linear" corners: 0 1 2 1
time: 0.5 sec trigrid 100 "nn_grid" corners: 0 1 2 1
time: 0.49 sec trigrid 100 "nn_unstruct" corners: 0 1 2 1

Correct me: if all 3 methods do gridpoint-to-triangle in the same way, then the huge diff is in find-neighboring-triangles (6 on average ?), not in
gridpoint-to-triangle ?

This is with the _delaunay.so that comes with the mac 98.5.3 egg,
however that was compiled (-O3 ?)

What to do ?

1) does it matter, how many people care ? (all who believe in telekinesis,
raise my right hand)

2) natgrid ? don't see it in matplotlib.sf.net

3) stick with fast linear, smooth the triangle planes a la 3t^2 - 2t^3 or
fancier smoothing

In any case, add griddata( ... method = "linear" / "nn" ... ) so users have
a choice.
  
Denis: I have added an 'interp' keyword to griddata (svn revision 7287) so you can choose the faster linear interpolation with interp='linear'. However, for linear interpolation the output grid is assumed to be regular with constant dx and dy (and this is *not* checked for by griddata).

The natgrid toolkit is on the sf site under 'Files', then 'matplotlib-toolkits'. It is in fact slower than the Delaunay package included in matplotlib, but a little bit more robust for pathological cases.

-Jeff

···

Can a real user or two tell us about the flow,
with some rough numbers for Ntri Ngrid Npix --
    Ntri = nr original sample points, say 1000
    Ngrid 100 x 100
    Npix 800 x 600 ?
(Ntri -> Ngrid slowly and accurately,
then Ngrid -> Npix w fast inaccurate image interpolation ? hmm.)

cheers
  -- denis

Jeff Whitaker wrote:

Denis: I have added an 'interp' keyword to griddata (svn revision 7287)
so you can choose the faster linear interpolation with interp='linear'.

Thanks Jeff,
   that was quick. Do you also see linear waaaay faster than NN, factor 100
?!
(Fwiw, a quick run of the Mac Shark profiler shows lots of time in
NaturalNeighbors::interpolate_one
which uses stdlib stacks heavily -- overkill for tiny stacks.)

Did my last question on Ntri -> Ngrid -> Npix make any sense at all ?
It would be nice if one could go straight from a triangulation to pixels --
will ask AGG.

cheers
  -- denis

···

--
View this message in context: http://www.nabble.com/speeding-up-griddata()-tp24467055p24646383.html
Sent from the matplotlib - users mailing list archive at Nabble.com.

Denis-B wrote:

Jeff Whitaker wrote:
  

Denis: I have added an 'interp' keyword to griddata (svn revision 7287) so you can choose the faster linear interpolation with interp='linear'.

Thanks Jeff,
   that was quick. Do you also see linear waaaay faster than NN, factor 100
?!
(Fwiw, a quick run of the Mac Shark profiler shows lots of time in
NaturalNeighbors::interpolate_one
which uses stdlib stacks heavily -- overkill for tiny stacks.)
  
Denis: I see more like a factor of 3 or 4, not 100.
Did my last question on Ntri -> Ngrid -> Npix make any sense at all ?
  
Not really, but I haven't had the time to think about it very hard.

-Jeff

···

It would be nice if one could go straight from a triangulation to pixels --
will ask AGG.

cheers
  -- denis

--
Jeffrey S. Whitaker Phone : (303)497-6313
Meteorologist FAX : (303)497-6449
NOAA/OAR/PSD R/PSD1 Email : Jeffrey.S.Whitaker@...259...
325 Broadway Office : Skaggs Research Cntr 1D-113
Boulder, CO, USA 80303-3328 Web : http://tinyurl.com/5telg

<stack> in interpolate_one might be the time hog; I see claims that plain
<vector> is faster.
StackVector<int, 128> in
http://stackoverflow.com/questions/354442/looking-for-c-stl-like-vector-class-but-using-stack-storage
looks nice, but I haven't timed it myself.
cheers
  -- denis

···

--
View this message in context: http://www.nabble.com/speeding-up-griddata()-tp24467055p24679147.html
Sent from the matplotlib - users mailing list archive at Nabble.com.