contribution offer: griddata with gaussian average

Vaclav_Smilauer · October 4, 2009, 2:41pm

Hi,

about a year ago I developed for my own purposes a routine for averaging
irregularly-sampled data using gaussian average. I would like to
contribute it to matplotlib, after a clean-up, if there is interest.

First it stores all data points in a (regular) grid. One can ask for
value at arbitrary point within the grid range: based on stDev
parameter, all data points within some distance (user-settable) are
weighted using the normal distribution function and then the average is
returned. Storing data in grid avoids searching all data points every
time, only those within certain grid-range will be used.

It would be relatively easy to extend to another distribution types
(lines 120-130), such as elliptic-normal distribution, with different
stDev in each direction.

The code is here:
http://bazaar.launchpad.net/~yade-dev/yade/trunk/annotate/head%
3A/lib/smoothing/WeightedAverage2d.hpp

I should be able to convert it to accept numpy arrays, wrap using Python
API instead of boost::python and write some documentation. Let me know
if matplotlib-devel list would be more appropriate for this discussion.

Cheers, Vaclav

_Chris_Barker · October 4, 2009, 8:27pm

Václav Šmilauer wrote:

about a year ago I developed for my own purposes a routine for averaging
irregularly-sampled data using gaussian average.

is this similar to Kernel Density estimation?

http://www.scipy.org/doc/api_docs/SciPy.stats.kde.gaussian_kde.html

In any case, it sounds more like an appropriate contribution to scipy than MPL.

First it stores all data points in a (regular) grid.

What if your data points are not on a regular grid? Wouldn't this be a way to interpolate onto such a grid?

Storing data in grid avoids searching all data points every
time, only those within certain grid-range will be used.

another option would be to store the points in some sort of spatial index, like an r-tree.

I should be able to convert it to accept numpy arrays,

key.

wrap using Python
API instead of boost::python

I'd consider Cython for that.

sounds like a useful module,

-Chris

···

--
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R (206) 526-6959 voice
7600 Sand Point Way NE (206) 526-6329 fax
Seattle, WA 98115 (206) 526-6317 main reception

Chris.Barker@...259...

Robert_Kern3 · October 4, 2009, 9:21pm

No. It is probably closer to radial basis function interpolation (in fact, it almost certainly is a form of RBFs):

http://docs.scipy.org/doc/scipy/reference/tutorial/interpolate.html#id1

···

On 2009-10-04 15:27 PM, Christopher Barker wrote:

Václav Šmilauer wrote:

about a year ago I developed for my own purposes a routine for averaging
irregularly-sampled data using gaussian average.

is this similar to Kernel Density estimation?

http://www.scipy.org/doc/api_docs/SciPy.stats.kde.gaussian_kde.html

--
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
  that is made terrible by our own mad attempt to interpret it as though it had
  an underlying truth."
   -- Umberto Eco

Eric_Firing1 · October 4, 2009, 11:51pm

Václav Šmilauer wrote:

Hi,

about a year ago I developed for my own purposes a routine for averaging
irregularly-sampled data using gaussian average. I would like to
contribute it to matplotlib, after a clean-up, if there is interest.

This sounds like a nice thing to have, but I am wondering whether it should go into scipy instead of mpl. On the other hand, I do see the rationale for simply putting it in mpl so it can be used for contouring etc. without adding an external dependency.

First it stores all data points in a (regular) grid. One can ask for
value at arbitrary point within the grid range: based on stDev
parameter, all data points within some distance (user-settable) are
weighted using the normal distribution function and then the average is
returned. Storing data in grid avoids searching all data points every
time, only those within certain grid-range will be used.

Good idea.

It would be relatively easy to extend to another distribution types
(lines 120-130), such as elliptic-normal distribution, with different
stDev in each direction.

This sounds a lot like "optimal interpolation". Comments? Certainly, in many fields it is essential to be able to specify different std for x and y.

The code is here:
http://bazaar.launchpad.net/~yade-dev/yade/trunk/annotate/head%
3A/lib/smoothing/WeightedAverage2d.hpp

I should be able to convert it to accept numpy arrays, wrap using Python
API instead of boost::python and write some documentation. Let me know
if matplotlib-devel list would be more appropriate for this discussion.

matplotlib-users is fine for checking for interest; it may make sense to move more technical discussion to -devel. But I will ignore that for now.

Is wrapping with cython a reasonable option? I don't think we have any cython in mpl at present, but it seems to me like the way to go for many things, especially new things. It is readable, flexible, and promises to make the eventual transition to Python 3.1 painless for the components that use it. I realize it is C-oriented, not C++ oriented, but my understanding is that the wrapping of C++ is still reasonable in many cases. (I have not tried it.)

Eric

···

Cheers, Vaclav

------------------------------------------------------------------------------
Come build with us! The BlackBerry® Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay ahead of the curve. Join us from November 9-12, 2009. Register now!
http://p.sf.net/sfu/devconf
_______________________________________________
Matplotlib-users mailing list
Matplotlib-users@lists.sourceforge.net
matplotlib-users List Signup and Options

_Ryan_May1 · October 5, 2009, 2:17am

Except in radial basis function interpolation, you solve for the
weights that give the original values at the original data points.
Here, it's just a inverse-distance weighted average, where the weights
are chosen using an exp(-x^2/A) relation. There's a huge difference
between the two when you're dealing with data with noise.

Ryan

···

On Sun, Oct 4, 2009 at 4:21 PM, Robert Kern <robert.kern@...287...> wrote:

On 2009-10-04 15:27 PM, Christopher Barker wrote:

Václav Šmilauer wrote:

about a year ago I developed for my own purposes a routine for averaging
irregularly-sampled data using gaussian average.

is this similar to Kernel Density estimation?

http://www.scipy.org/doc/api_docs/SciPy.stats.kde.gaussian_kde.html

No. It is probably closer to radial basis function interpolation (in fact, it
almost certainly is a form of RBFs):

http://docs.scipy.org/doc/scipy/reference/tutorial/interpolate.html#id1

--
Ryan May
Graduate Research Assistant
School of Meteorology
University of Oklahoma

Robert_Kern3 · October 5, 2009, 2:19am

Ryan May wrote:

···

On Sun, Oct 4, 2009 at 4:21 PM, Robert Kern <robert.kern@...287...> wrote:

On 2009-10-04 15:27 PM, Christopher Barker wrote:

Václav Šmilauer wrote:

about a year ago I developed for my own purposes a routine for averaging
irregularly-sampled data using gaussian average.

is this similar to Kernel Density estimation?

http://www.scipy.org/doc/api_docs/SciPy.stats.kde.gaussian_kde.html

No. It is probably closer to radial basis function interpolation (in fact, it
almost certainly is a form of RBFs):

http://docs.scipy.org/doc/scipy/reference/tutorial/interpolate.html#id1

Except in radial basis function interpolation, you solve for the
weights that give the original values at the original data points.
Here, it's just a inverse-distance weighted average, where the weights
are chosen using an exp(-x^2/A) relation. There's a huge difference
between the two when you're dealing with data with noise.

Fair point.

--
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
  that is made terrible by our own mad attempt to interpret it as though it had
  an underlying truth."
   -- Umberto Eco