I have been using matplotlib’s TrapezoidMapTriFinder for a while for it’s great performance when dealing with points in a triangulation.
Since I needed it to eventually work on general meshes I have been working on a Rust generalization using the same algorithm. I got it to work, but now I am thinking about the API. I noticed that the Triangulation class expects the x and y coordinates to be passed in separately instead of in a single array with shape (n, 2), and the same thing goes for derived classes of TriFinder which implement call(x, y).
My natural choice was to pass in a single array instead of 2, so that on the Rust side I can leverage the type system to make it impossible to pass incompatible arrays. I was wondering what the rationale was to use 2 separate arguments? My guess is that this is a more flexible API since with the matplotlib API I can do:
tri_finder(*xy.T)
if I have a single 2D xy array, or:
tri_finder(x, y)
if I have 2 separate 1D arrays. The API I wanted to use requires to allocate a new array in case x and y are two unrelated arrays.
No, that wasn’t the reason, the separate x and y arguments are for consistency with the rest of Matplotlib. Why do more common functions such as plot use separate x and y arguments?
They may have different dtypes or units such as in a timeseries plot of voltage (or whatever) versus datetime. Keeping the original dtypes and units is important for plotting routines to calculate sensible tick locations for example, and to identify missing data (np.nan vs np.NaT).
x and y may have different dimensions. For example use of plot(x, y) where x.shape is (5,) and y.shape is (5, 3) to plot 3 lines with the same x values.
Item 2 does not apply to Triangulation but item 1 does. It may be slightly dubious scientifically to triangulate voltage vs time but it is mathematically acceptable and fine for visualisation.
As you have identified, what you are really trying to avoid is copying the data. It would be quite possible to write some low level (C++ or Rust) triangulation/trifinder code that can deal with x and y as separate arrays or the same array and either interleaved or stacked. But that is extremely unlikely to happen in Matplotlib where backward compatibility and avoiding API bloat are serious considerations. If I was writing a new standalone C++/Rust library with a Python interface for mesh manipulation then I would consider keeping x and y separate in the low-level code but allow the various same/different array options at the Python level without any copying of data.
Thank you for the explanation, that’s much appreciated!
And thank you for sharing your thoughts as well
Unless I am mistaken you are the original author of the trapezoidal map implementation in matplotlib. I have a ton of respect for your work, I discovered this data structure by looking at your code and that led me to the book by de Berg. I have spent quite some time looking at your implementation to understand how it worked, and it has helped me a lot! So thank you so much I am a big fan.
My Rust implementation ended up being quite different because of the ownership model (it would have been messy if I had tried to use pointers everywhere, plus I would have had to write unsafe code in lots of places), but your add_edge_to_tree sure helped me a lot!
Thanks for your kind words. Yes I did write that code, although it feels like it was a lifetime ago now. It scales well but uses more RAM than it should if I remember correctly. Other people have reported better performance using some other algorithm based on a more generic quadtree structure, but nobody has stepped up to to plate to add a corresponding TriFinder wrapper to Matplotlib.
Oh really? Well my current Rust version uses even more RAM right now haha but I’m working on a way to make it better.
I’d be really interested in looking at these quadtree structures, do you have people or projects that I could look at?