sluggish pdfs with large data sets

Daniel_Soto · March 3, 2009, 2:50am

hello,

i'm using matplotlib on os x and am having issues with plots of large data sets. i have some plots which contain about ~10000 points and the pdf files generated bring preview.app and quicklook to their knees when they open the pdf files.

here is a small file that reproduces my issues. at 1000 points it is snappy and at 10000 it is a pig.

is there a setting to downsample or otherwise compress?

best,
drs

import matplotlib.pyplot
import scipy

x = scipy.rand(10000)
matplotlib.pyplot.plot(x)
matplotlib.pyplot.savefig('rand.pdf')

Michael_Droettboom1 · March 3, 2009, 1:29pm

With recent versions of matplotlib, you can set the "path.simplify" rcParam to True, which should reduce the data so that vertices that have no impact on the plot appearance (at the given dpi) are removed.

You can do either, in your script:

from matplotlib import rcParam
rcParam['path.simplify'] = True

or in your matplotlibrc file:

path.simplify: True

Hope that helps. The amount of reduction this produces is somewhat data-dependent.

Cheers,
Mike

Daniel Soto wrote:

···

hello,

i'm using matplotlib on os x and am having issues with plots of large data sets. i have some plots which contain about ~10000 points and the pdf files generated bring preview.app and quicklook to their knees when they open the pdf files.

here is a small file that reproduces my issues. at 1000 points it is snappy and at 10000 it is a pig.

is there a setting to downsample or otherwise compress?

best,
drs

import matplotlib.pyplot
import scipy

x = scipy.rand(10000)
matplotlib.pyplot.plot(x)
matplotlib.pyplot.savefig('rand.pdf')

------------------------------------------------------------------------------
Open Source Business Conference (OSBC), March 24-25, 2009, San Francisco, CA
-OSBC tackles the biggest issue in open source: Open Sourcing the Enterprise
-Strategies to boost innovation and cut costs with open source participation
-Receive a $600 discount off the registration fee with the source code: SFAD
http://p.sf.net/sfu/XcvMzF8H
_______________________________________________
Matplotlib-users mailing list
Matplotlib-users@lists.sourceforge.net
matplotlib-users List Signup and Options

--
Michael Droettboom
Science Software Branch
Operations and Engineering Division
Space Telescope Science Institute
Operated by AURA for NASA

Daniel_Soto · March 3, 2009, 3:26pm

thanks for the suggestion. i'm running 0.98.3 and have tried

pdf.compression
path.simplify
agg.path.chunksize

without any change in filesize (176KB) or time to open file (13 sec).

are there any other options or backends that might help?

drs

···

On 3 Mar 2009, at 05:29, Michael Droettboom wrote:

With recent versions of matplotlib, you can set the "path.simplify" rcParam to True, which should reduce the data so that vertices that have no impact on the plot appearance (at the given dpi) are removed.

You can do either, in your script:

from matplotlib import rcParam
rcParam['path.simplify'] = True

or in your matplotlibrc file:

path.simplify: True

Hope that helps. The amount of reduction this produces is somewhat data-dependent.

Cheers,
Mike

Daniel Soto wrote:

hello,

i'm using matplotlib on os x and am having issues with plots of large data sets. i have some plots which contain about ~10000 points and the pdf files generated bring preview.app and quicklook to their knees when they open the pdf files.

here is a small file that reproduces my issues. at 1000 points it is snappy and at 10000 it is a pig.

is there a setting to downsample or otherwise compress?

best,
drs

import matplotlib.pyplot
import scipy

x = scipy.rand(10000)
matplotlib.pyplot.plot(x)
matplotlib.pyplot.savefig('rand.pdf')

------------------------------------------------------------------------------
Open Source Business Conference (OSBC), March 24-25, 2009, San Francisco, CA
-OSBC tackles the biggest issue in open source: Open Sourcing the Enterprise
-Strategies to boost innovation and cut costs with open source participation
-Receive a $600 discount off the registration fee with the source code: SFAD
http://p.sf.net/sfu/XcvMzF8H
_______________________________________________
Matplotlib-users mailing list
Matplotlib-users@lists.sourceforge.net
matplotlib-users List Signup and Options

--
Michael Droettboom
Science Software Branch
Operations and Engineering Division
Space Telescope Science Institute
Operated by AURA for NASA

Michael_Droettboom1 · March 3, 2009, 4:11pm

path.simplify was added some time after 0.98.3. You'll have to upgrade to 0.98.5.x for that feature.

pdf.compression should have some impact on file size, but I doubt it will have much impact on display times, since it doesn't actually remove any data. I'm surprised this isn't having any effect -- perhaps the matplotlibrc file you're editing is not the one being loaded? You can see where the file is being loaded from with:

import matplotlib
matplotlib.get_configdir()

agg.path.chunksize has no effect on PDF output.

Is it possible you're using the Cairo backend, and not matplotlib's own Python-based PDF backend?

As a cheap workaround, you can also easily decimate your data using Numpy with something like:

data = data[::skip]

where 'skip' is the number of data points to skip.

Cheers,
Mike

Daniel Soto wrote:

···

thanks for the suggestion. i'm running 0.98.3 and have tried

pdf.compression
path.simplify
agg.path.chunksize

without any change in filesize (176KB) or time to open file (13 sec).

are there any other options or backends that might help?

drs

On 3 Mar 2009, at 05:29, Michael Droettboom wrote:

With recent versions of matplotlib, you can set the "path.simplify" rcParam to True, which should reduce the data so that vertices that have no impact on the plot appearance (at the given dpi) are removed.

You can do either, in your script:

from matplotlib import rcParam
rcParam['path.simplify'] = True

or in your matplotlibrc file:

path.simplify: True

Hope that helps. The amount of reduction this produces is somewhat data-dependent.

Cheers,
Mike

Daniel Soto wrote:

hello,

i'm using matplotlib on os x and am having issues with plots of large data sets. i have some plots which contain about ~10000 points and the pdf files generated bring preview.app and quicklook to their knees when they open the pdf files.

here is a small file that reproduces my issues. at 1000 points it is snappy and at 10000 it is a pig.

is there a setting to downsample or otherwise compress?

best,
drs

import matplotlib.pyplot
import scipy

x = scipy.rand(10000)
matplotlib.pyplot.plot(x)
matplotlib.pyplot.savefig('rand.pdf')

------------------------------------------------------------------------------

Open Source Business Conference (OSBC), March 24-25, 2009, San Francisco, CA
-OSBC tackles the biggest issue in open source: Open Sourcing the Enterprise
-Strategies to boost innovation and cut costs with open source participation
-Receive a $600 discount off the registration fee with the source code: SFAD
http://p.sf.net/sfu/XcvMzF8H
_______________________________________________
Matplotlib-users mailing list
Matplotlib-users@lists.sourceforge.net
matplotlib-users List Signup and Options

--
Michael Droettboom
Science Software Branch
Operations and Engineering Division
Space Telescope Science Institute
Operated by AURA for NASA

--
Michael Droettboom
Science Software Branch
Operations and Engineering Division
Space Telescope Science Institute
Operated by AURA for NASA

Christopher_Brown · March 3, 2009, 8:35pm

Hi Michael,

With recent versions of matplotlib, you can set the "path.simplify"
rcParam to True, which should reduce the data so that vertices that
have no impact on the plot appearance (at the given dpi) are
removed.

Wow. My time-domain waveform plots went from 3.3 mb to 84 kb. This was an incredibly timely tip too, because I was just about to go to print with my poster, which I did in Scribus, which would hang on the 3-4 mb pdf figure. I was about to go to png, but now I don't have to. Thanks.

···

--
Christopher Brown, Ph.D.
Department of Speech and Hearing Science
Arizona State University

Daniel_Soto · March 4, 2009, 12:21am

ok. i managed to install 0.98.5.x from source into my enthought python distribution.
after that, using path.simplify helped considerably.

as far as the pdf.compression not working, i was using rcParams in the script so i'm
almost certain the options were being loaded.

thanks mike,
drs

···

On 3 Mar 2009, at 08:11, Michael Droettboom wrote:

path.simplify was added some time after 0.98.3. You'll have to upgrade to 0.98.5.x for that feature.

pdf.compression should have some impact on file size, but I doubt it will have much impact on display times, since it doesn't actually remove any data. I'm surprised this isn't having any effect -- perhaps the matplotlibrc file you're editing is not the one being loaded? You can see where the file is being loaded from with:

import matplotlib
matplotlib.get_configdir()

agg.path.chunksize has no effect on PDF output.

Is it possible you're using the Cairo backend, and not matplotlib's own Python-based PDF backend?

As a cheap workaround, you can also easily decimate your data using Numpy with something like:

data = data[::skip]

where 'skip' is the number of data points to skip.

Cheers,
Mike

Daniel Soto wrote:

thanks for the suggestion. i'm running 0.98.3 and have tried

pdf.compression
path.simplify
agg.path.chunksize

without any change in filesize (176KB) or time to open file (13 sec).

are there any other options or backends that might help?

drs

On 3 Mar 2009, at 05:29, Michael Droettboom wrote:

With recent versions of matplotlib, you can set the "path.simplify" rcParam to True, which should reduce the data so that vertices that have no impact on the plot appearance (at the given dpi) are removed.

You can do either, in your script:

from matplotlib import rcParam
rcParam['path.simplify'] = True

or in your matplotlibrc file:

path.simplify: True

Hope that helps. The amount of reduction this produces is somewhat data-dependent.

Cheers,
Mike

Daniel Soto wrote:

hello,

i'm using matplotlib on os x and am having issues with plots of large data sets. i have some plots which contain about ~10000 points and the pdf files generated bring preview.app and quicklook to their knees when they open the pdf files.

here is a small file that reproduces my issues. at 1000 points it is snappy and at 10000 it is a pig.

is there a setting to downsample or otherwise compress?

best,
drs

import matplotlib.pyplot
import scipy

x = scipy.rand(10000)
matplotlib.pyplot.plot(x)
matplotlib.pyplot.savefig('rand.pdf')

------------------------------------------------------------------------------
Open Source Business Conference (OSBC), March 24-25, 2009, San Francisco, CA
-OSBC tackles the biggest issue in open source: Open Sourcing the Enterprise
-Strategies to boost innovation and cut costs with open source participation
-Receive a $600 discount off the registration fee with the source code: SFAD
http://p.sf.net/sfu/XcvMzF8H
_______________________________________________
Matplotlib-users mailing list
Matplotlib-users@lists.sourceforge.net
matplotlib-users List Signup and Options

--
Michael Droettboom
Science Software Branch
Operations and Engineering Division
Space Telescope Science Institute
Operated by AURA for NASA

--
Michael Droettboom
Science Software Branch
Operations and Engineering Division
Space Telescope Science Institute
Operated by AURA for NASA