bug in path simplification

Mike,

In looking into the handling of inf and nan, I think I have found some complexities and inefficiencies that are easily eliminated (and I have committed some such changes; others are pending), but in the process I have also found what I am fairly sure is a bug in the path simplification code. It is illustrated by the attached modification of nan_test.py. With 128 or more points in the data set, so that the simplification is invoked, the moveto command that should jump across the gap is getting changed to a lineto. This can be seen most easily by applying the attached patch, which includes additional debugging statements to pin down the incorrect command yielded by the simplification, as well as pending changes to unify the handling of masked arrays, nans, and infs. The bug shows up with or without this patch, however. With the patch, it is also triggered by masked_demo.py, which is how I first found it. (The non-debugging, or substantive, parts of the patch are included here for your review or discussion as a separate matter.)

The middle part of the extra debugging output with the patch applied when running the nan_test.py looks like this:

2 214.726000 395.178372
3 return cmd: 2
2 218.012000 387.824331
4 skip: 2 218.012000 387.824331
1 359.310000 396.688044
3 return cmd: 2
2 362.596000 403.422341
3 return cmd: 2

The line starting with "1" is the moveto command and coordinates yielded by the c++ path iterator; the following line is showing that the corresponding command yielded by the simplification code is instead "2", and that it is being returned at a location I have called "3". All this will make sense only when you look at the patched code.

Eric

nan_test.py (698 Bytes)

pathpatch.diff (4.59 KB)

The simplification code was written with the assumption that all of the codes are LINETO. That is, it has no MOVETOs or CURVEs. There is code in backend_agg.h that tries to make sure not to run simplification when this is the case (see should_simplify -- it returns false whenever there is a codes array). This all worked before NaN support was added to the path iterator. However, we now have a case where this limitation of path simplification was not explicitly documented, and is now interacting with a new, more efficient, way to handle skipping that wasn't anticipated.

So the easy fix is to turn off simplification when the array contains NaNs (and bonus points if we can cache that so we don't have to run through the list to find NaNs ahead of time).

The harder fix is to rewrite the simplification code (which is rather opaque) to handle MOVETOs, or perhaps to handle NaNs directly (whichever is easier).

I may not get to this before SciPy, however.

Cheers,
Mike

Eric Firing wrote:

···

Mike,

In looking into the handling of inf and nan, I think I have found some complexities and inefficiencies that are easily eliminated (and I have committed some such changes; others are pending), but in the process I have also found what I am fairly sure is a bug in the path simplification code. It is illustrated by the attached modification of nan_test.py. With 128 or more points in the data set, so that the simplification is invoked, the moveto command that should jump across the gap is getting changed to a lineto. This can be seen most easily by applying the attached patch, which includes additional debugging statements to pin down the incorrect command yielded by the simplification, as well as pending changes to unify the handling of masked arrays, nans, and infs. The bug shows up with or without this patch, however. With the patch, it is also triggered by masked_demo.py, which is how I first found it. (The non-debugging, or substantive, parts of the patch are included here for your review or discussion as a separate matter.)

The middle part of the extra debugging output with the patch applied when running the nan_test.py looks like this:

2 214.726000 395.178372
3 return cmd: 2
2 218.012000 387.824331
4 skip: 2 218.012000 387.824331
1 359.310000 396.688044
3 return cmd: 2
2 362.596000 403.422341
3 return cmd: 2

The line starting with "1" is the moveto command and coordinates yielded by the c++ path iterator; the following line is showing that the corresponding command yielded by the simplification code is instead "2", and that it is being returned at a location I have called "3". All this will make sense only when you look at the patched code.

Eric

Michael Droettboom wrote:

So the easy fix is to turn off simplification when the array contains NaNs (and bonus points if we can cache that so we don't have to run through the list to find NaNs ahead of time).
  

On further thought, this shouldn't be too difficult -- so I'll go ahead and implement this...

The harder fix is to rewrite the simplification code (which is rather opaque) to handle MOVETOs, or perhaps to handle NaNs directly (whichever is easier).
  

...and defer this until I have a good long chunk of time.

Cheers,
Mike

Simplification is now turned off whenever there are nonfinite elements in the vertices array. The "should_simplify" determination is now made in Python (to make it easier to tweak and cache).

I also committed your patch to handle masked arrays in the same way as arrays-with-nonfinite values (which IMHO is rather elegant -- it gets rid of the "more than one way to do it" problem, and should be faster all around.)

Now -- anyone want to improve the simplification algorithm?

Cheers,
Mike

Eric Firing wrote:

···

Mike,

In looking into the handling of inf and nan, I think I have found some complexities and inefficiencies that are easily eliminated (and I have committed some such changes; others are pending), but in the process I have also found what I am fairly sure is a bug in the path simplification code. It is illustrated by the attached modification of nan_test.py. With 128 or more points in the data set, so that the simplification is invoked, the moveto command that should jump across the gap is getting changed to a lineto. This can be seen most easily by applying the attached patch, which includes additional debugging statements to pin down the incorrect command yielded by the simplification, as well as pending changes to unify the handling of masked arrays, nans, and infs. The bug shows up with or without this patch, however. With the patch, it is also triggered by masked_demo.py, which is how I first found it. (The non-debugging, or substantive, parts of the patch are included here for your review or discussion as a separate matter.)

The middle part of the extra debugging output with the patch applied when running the nan_test.py looks like this:

2 214.726000 395.178372
3 return cmd: 2
2 218.012000 387.824331
4 skip: 2 218.012000 387.824331
1 359.310000 396.688044
3 return cmd: 2
2 362.596000 403.422341
3 return cmd: 2

The line starting with "1" is the moveto command and coordinates yielded by the c++ path iterator; the following line is showing that the corresponding command yielded by the simplification code is instead "2", and that it is being returned at a location I have called "3". All this will make sense only when you look at the patched code.

Eric