path simplification can decrease the smoothness of data plots

path simplification can decrease the smoothness of data plots
I’m really excited about the new path simplification option for vector output formats. I tried it the first time yesterday and reduced a PDF from 231 kB to 47 kB. Thanks very much for providing this feature!

However, I have noticed that the simplified paths often look more jagged than the original, at least for my data. I can recreate the effect with the following:

[start]

import numpy as np

import matplotlib.pyplot as plt

x = np.arange(-3,3,0.001)

y = np.exp(-x**2) + np.random.normal(scale=0.001,size=x.size)

plt.plot(x,y)

plt.savefig(‘test.png’)

plt.savefig(‘test.pdf’)

[end]

A sample output is attached, and close inspection shows that the PNG is a smooth curve with a small amount of noise while the PDF version has very noticeable changes in direction from one line segment to the next.

<<test.png>> <<test.pdf>>

The simplification algorithm (agg_py_path_iterator.h) does the following:

If line2 is nearly parallel to line1, add the parallel component to the length of line1, leaving it direction unchanged

which results in a new data point, not contained in the original data. Line1 will continue to be lengthened until it has deviated from the data curve enough that the next true data point is considered non-parallel. The cycle then continues. The result is a line that wanders around the data curve, and only the first point is guaranteed to have existed in the original data set.

Instead, could the simplification algorithm do:

If line2 is nearly parallel to line1, combine them by removing the common point, leaving a single line where both end points existed in the original data

Thanks again,

Andrew Hawryluk

test.png

test.pdf (14.1 KB)

Since I suspect this change will be a little bit of work, I just wanted to put my hand up and say I'm looking into it so we don't duplicate effort here.

I think it's a worthwhile experiment, in any case.

Mike

Andrew Hawryluk wrote:

···

I�m really excited about the new path simplification option for vector output formats. I tried it the first time yesterday and reduced a PDF from 231 kB to 47 kB. Thanks very much for providing this feature!

However, I have noticed that the simplified paths often look more jagged than the original, at least for my data. I can recreate the effect with the following:

[start]

import numpy as np

import matplotlib.pyplot as plt

x = np.arange(-3,3,0.001)

y = np.exp(-x**2) + np.random.normal(scale=0.001,size=x.size)

plt.plot(x,y)

plt.savefig('test.png')

plt.savefig('test.pdf')

[end]

A sample output is attached, and close inspection shows that the PNG is a smooth curve with a small amount of noise while the PDF version has very noticeable changes in direction from one line segment to the next.

<<test.png>> <<test.pdf>>

The simplification algorithm (agg_py_path_iterator.h) does the following:

If line2 is nearly parallel to line1, add the parallel component to the length of line1, leaving it direction unchanged

which results in a new data point, not contained in the original data. Line1 will continue to be lengthened until it has deviated from the data curve enough that the next true data point is considered non-parallel. The cycle then continues. The result is a line that wanders around the data curve, and only the first point is guaranteed to have existed in the original data set.

Instead, could the simplification algorithm do:

If line2 is nearly parallel to line1, combine them by removing the common point, leaving a single line where both end points existed in the original data

Thanks again,

Andrew Hawryluk

------------------------------------------------------------------------

------------------------------------------------------------------------

------------------------------------------------------------------------------
This SF.net email is sponsored by:
SourcForge Community
SourceForge wants to tell your story.
http://p.sf.net/sfu/sf-spreadtheword
------------------------------------------------------------------------

_______________________________________________
Matplotlib-devel mailing list
Matplotlib-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/matplotlib-devel

--
Michael Droettboom
Science Software Branch
Operations and Engineering Division
Space Telescope Science Institute
Operated by AURA for NASA

Michael Droettboom wrote:

Andrew Hawryluk wrote:
  

I�m really excited about the new path simplification option for vector output formats. I tried it the first time yesterday and reduced a PDF from 231 kB to 47 kB. Thanks very much for providing this feature!

However, I have noticed that the simplified paths often look more jagged than the original, at least for my data. I can recreate the effect with the following:

[start]

import numpy as np

import matplotlib.pyplot as plt

x = np.arange(-3,3,0.001)

y = np.exp(-x**2) + np.random.normal(scale=0.001,size=x.size)

plt.plot(x,y)

plt.savefig('test.png')

plt.savefig('test.pdf')

[end]

A sample output is attached, and close inspection shows that the PNG is a smooth curve with a small amount of noise while the PDF version has very noticeable changes in direction from one line segment to the next.

<<test.png>> <<test.pdf>>

The simplification algorithm (agg_py_path_iterator.h) does the following:

If line2 is nearly parallel to line1, add the parallel component to the length of line1, leaving it direction unchanged

which results in a new data point, not contained in the original data. Line1 will continue to be lengthened until it has deviated from the data curve enough that the next true data point is considered non-parallel. The cycle then continues. The result is a line that wanders around the data curve, and only the first point is guaranteed to have existed in the original data set.

Instead, could the simplification algorithm do:

If line2 is nearly parallel to line1, combine them by removing the common point, leaving a single line where both end points existed in the original data
    

I've attached a patch that will only include points from the original data in the simplified path. I hesitate to commit it to SVN, as these things are very hard to get right -- and just because it appears to work better on this data doesn't mean it doesn't create a regression on something else... :wink: That said, it would be nice to confirm that this solution works, because it has the added benefit of being a little simpler computationally. Be sure to blitz your build directory when testing the patch -- distutils won't pick it up as a dependency.

I've attached two PDFs -- one with the original (current trunk) behavior, and one with the new behavior. I plotted the unsimplified plot in thick blue behind the simplified plot in green, so you can see how much deviation there is between the original data and the simplified line (you'll want to zoom way in with your PDF viewer to see it.)

I've also included a new version of your test script which detects "new" data values in the simplified path, and also seeds the random number generator so that results are comparable. I also set the solid_joinstyle to "round", as it makes the wiggliness less pronounced. (There was another thread on this list recently about making that the default setting).

Cheers,
Mike

agg_py_path_iterator.h.diff (4.21 KB)

test.check.new.pdf (22 KB)

test.check.orig.pdf (25.2 KB)

simplify.py (949 Bytes)

···

--
Michael Droettboom
Science Software Branch
Operations and Engineering Division
Space Telescope Science Institute
Operated by AURA for NASA

From: Michael Droettboom [mailto:mdroe@…31…]
Sent: 16 Jan 2009 1:31 PM
To: Andrew Hawryluk
Cc: matplotlib-devel@lists.sourceforge.net
Subject: Re: [matplotlib-devel] path simplification can decrease the
smoothness of data plots

Michael Droettboom wrote:

...

I've attached a patch that will only include points from the original
data in the simplified path. I hesitate to commit it to SVN, as these
things are very hard to get right -- and just because it appears to
work better on this data doesn't mean it doesn't create a regression

on

something else... :wink: That said, it would be nice to confirm that this
solution works, because it has the added benefit of being a little
simpler computationally. Be sure to blitz your build directory when
testing the patch -- distutils won't pick it up as a dependency.

I've attached two PDFs -- one with the original (current trunk)
behavior, and one with the new behavior. I plotted the unsimplified
plot in thick blue behind the simplified plot in green, so you can see
how much deviation there is between the original data and the
simplified line (you'll want to zoom way in with your PDF viewer to

see

it.)

I've also included a new version of your test script which detects
"new"
data values in the simplified path, and also seeds the random number
generator so that results are comparable. I also set the
solid_joinstyle to "round", as it makes the wiggliness less

pronounced.

(There was another thread on this list recently about making that the
default setting).

Cheers,
Mike

--
Michael Droettboom
Science Software Branch
Operations and Engineering Division
Space Telescope Science Institute
Operated by AURA for NASA

Thanks for looking into this! The new plot is much improved, and the
simplified calculations are a pleasant surprise. I was also testing the
previous algorithm with solid_joinstyle set to "round" as it is the
default in my matplotlibrc.

I am probably not able to build your patch here, unless building
matplotlib from source on Windows is easier than I anticipate. May I
send you some data off the list for you to test?

Regards,
Andrew

NOVA Chemicals Research & Technology Centre
Calgary, Canada

···

-----Original Message-----

Thanks for looking into this! The new plot is much improved, and the
simplified calculations are a pleasant surprise. I was also testing the
previous algorithm with solid_joinstyle set to "round" as it is the
default in my matplotlibrc.

I am probably not able to build your patch here, unless building
matplotlib from source on Windows is easier than I anticipate. May I
send you some data off the list for you to test?
  

No problem. I'd also want testing from others -- there aren't a lot of examples in matplotlib itself where simplification even kicks in.

Mike

···

--
Michael Droettboom
Science Software Branch
Operations and Engineering Division
Space Telescope Science Institute
Operated by AURA for NASA

I've checked this change into SVN so others can test it out.

Assuming we don't discover any cases where this is clearly inferior, it should make it into the next major release.

Mike

Andrew Hawryluk wrote:

···

-----Original Message-----
From: Michael Droettboom [mailto:mdroe@…31…]
Sent: 16 Jan 2009 1:31 PM
To: Andrew Hawryluk
Cc: matplotlib-devel@lists.sourceforge.net
Subject: Re: [matplotlib-devel] path simplification can decrease the
smoothness of data plots

Michael Droettboom wrote:
    
...

I've attached a patch that will only include points from the original
data in the simplified path. I hesitate to commit it to SVN, as these
things are very hard to get right -- and just because it appears to
work better on this data doesn't mean it doesn't create a regression
    

on
  

something else... :wink: That said, it would be nice to confirm that this
solution works, because it has the added benefit of being a little
simpler computationally. Be sure to blitz your build directory when
testing the patch -- distutils won't pick it up as a dependency.

I've attached two PDFs -- one with the original (current trunk)
behavior, and one with the new behavior. I plotted the unsimplified
plot in thick blue behind the simplified plot in green, so you can see
how much deviation there is between the original data and the
simplified line (you'll want to zoom way in with your PDF viewer to
    

see
  

it.)

I've also included a new version of your test script which detects
"new"
data values in the simplified path, and also seeds the random number
generator so that results are comparable. I also set the
solid_joinstyle to "round", as it makes the wiggliness less
    

pronounced.
  

(There was another thread on this list recently about making that the
default setting).

Cheers,
Mike

--
Michael Droettboom
Science Software Branch
Operations and Engineering Division
Space Telescope Science Institute
Operated by AURA for NASA
    
Thanks for looking into this! The new plot is much improved, and the
simplified calculations are a pleasant surprise. I was also testing the
previous algorithm with solid_joinstyle set to "round" as it is the
default in my matplotlibrc.

I am probably not able to build your patch here, unless building
matplotlib from source on Windows is easier than I anticipate. May I
send you some data off the list for you to test?

Regards,
Andrew

NOVA Chemicals Research & Technology Centre
Calgary, Canada
  
--
Michael Droettboom
Science Software Branch
Operations and Engineering Division
Space Telescope Science Institute
Operated by AURA for NASA