path simplification can decrease the smoothness of data plots

a11 · January 21, 2009, 9:19pm

Michael Droettboom <mdroe@...552...> writes:

>
Thanks for the pointers.

The original simplification code was written by John Hunter (I believe),
and I don't know if it was designed by him also or is a replication of
something published elsewhere. So I take no credit for and have little
knowledge of its original goals.

I'm not sure on everything it does, but it seems to do clipping and removes
line segments where the change in slope is less than some limit. There are
probably better algorithms out there, but this one works surprisingly well
and is fast and simple. I think it should be a requirement that it returns
points which are a subset of the original points- with the change you've
made it does this, right?

However, IMHO the primary purpose of the path simplification in
matplotlib is to improve interactive performance (and smaller file size
is just an convenient side effect of that), I would hesitate to use an
algorithm that is any worse than O(n), since it must be recalculated on
every pan or zoom since the simplification is related to *pixels* not
data units. Even on modern hardware, it is a constant battle keeping
the inner drawing loop fast enough. We could, of course, make the
choice of algorithm user-configurable, or use something more precise
when using a non-interactive backend, but then we would have two
separate code paths to keep in sync and bug free --- not a choice I
take lightly.

I see your point.

I originally encountered a problem when preparing a pdf figure- I had a lot
of high resolution data, and with path simplification the resulting pdf
looked pretty bad (the lines were jagged). But the advantage was a massive
reduction in file size of the pdf. I adjusted perpdNorm2 and got much better
results.

The trick with the present algorithm is to keep the error rate at the
subpixel level through the correct selection of perpdNorm. It seems to
me that the more advanced simplification algorithm is only necessary
when you want to simplify more aggressively than the pixel level. But
what hasn't been done is a proper study of the error rate along the
simplified path of the current approach vs. other possible approaches.
Even this latest change was verified by just looking at the results
which seemingly are better on the data I looked at. So I'm mostly
speaking from my gut rather than evidence here.
>
> #src/agg_py_path_iterator.h
>
> //if the perp vector is less than some number of (squared)
> //pixels in size, then merge the current vector
> if (perpdNorm2 < (1.0 / 9.0))
>
That sounds like a good idea. I'll have a look at doing that.

Right, perhaps the best thing to do is make the tolerance parameter
adjustable, so it can be reduced to speed up drawing in the interactive
backends, but it can also be easily bumped up for extra resolution in the
non-interactive backends like pdf/ps.

Mike

a