how to reduce the file size of plots generated with matplotlib

Hello:

I use matplotlib to generate x-y data plots; i.e., 2-D plots. The problem is that the output files (the PDF files containing plots that are generated with matplotlib) are huge. I can generate files that are 100's of KB or even MBs. This seems absurd to me. These file sizes cause programs that use them to come to a grinding halt. My goal is to reduce the plot files that I produce with matplotlib. Details follow.

···

----------

I use matplotlib from EPD.
Enthought Canopy Python 2.7.3 | 64-bit | (default, Aug 8 2013, 05:37:06)

Matplotlib version:

print matplotlib.__version__

1.3.0

OS:
I'm using Mac OS X Version 10.8.4.

----------

I use a home-grown code whose starting point was an example code on matplotlib website.

My relevant imports are:

import numpy
import scipy
import pylab
import matplotlib.pyplot as plt
import matplotlib

My plotting code lines are:

        ## PDF.
        outfile = "basefile" + ".pdf"
        ## pylab.savefig(outfile, bbox_inches=0)
        pylab.savefig(outfile,bbox_inches='tight')

----------

My PDF files contain simple plots which consist of (a) data points only, (b) lines between data points (data points not plotted), or (c) both data points and lines.

I have a consistent problem in that the files produced have sizes that seem way too big.
For example, most recently, I am plotting 3 data sets; each data set has about 90,000 points. If I plot all three sets in one PDF figure, the file size is over 2MB.
This seems absurd to me. I used R plotting for many years (again, my own homegrown code, for 6 years) and never had this issue, and I was making these kinds of plots/figures.

I thought it may be a vector/raster issue, but the following web page says that PDF are generated as vector image, which, to my understanding (which could be wrong), is the more compact format.
http://matplotlib.org/faq/usage_faq.html

Is there a command I can use to reduce the file size? Since I am using these in reports and publications, the figures are almost always less than 3 inches by 3 inches in size; i.e., I do not have issues about taking a raster figure and trying to blow it up. So I am not concerned about pixelation problems that occur when an image is increased in size.

Thank you very much.

c

2014-03-22 20:23 GMT+01:00 Christopher Kuhlman <ckuhlman@...4505...>:
[...]

For example, most recently, I am plotting 3 data sets; each data set has about 90,000 points. If I plot all three sets in one PDF figure, the file size is over 2MB.
This seems absurd to me. I used R plotting for many years (again, my own homegrown code, for 6 years) and never had this issue, and I was making these kinds of plots/figures.

I thought it may be a vector/raster issue, but the following web page says that PDF are generated as vector image, which, to my understanding (which could be wrong), is the more compact format.
http://matplotlib.org/faq/usage_faq.html

[...]

Roughly speaking, size of vector files depend on the number of points
while size of raster files depends on the number of pixels. For your
use case (many points, small images) raster output should be more
compact.

Goyo

Hello:

I use matplotlib to generate x-y data plots; i.e., 2-D plots. The problem
is that the output files (the PDF files containing plots that are generated
with matplotlib) are huge. I can generate files that are 100's of KB or
even MBs. This seems absurd to me. These file sizes cause programs that
use them to come to a grinding halt. My goal is to reduce the plot files
that I produce with matplotlib. Details follow.

----------

I use matplotlib from EPD.
Enthought Canopy Python 2.7.3 | 64-bit | (default, Aug 8 2013, 05:37:06)

Matplotlib version:
>>> print matplotlib.__version__
1.3.0

OS:
I'm using Mac OS X Version 10.8.4.

----------

I use a home-grown code whose starting point was an example code on
matplotlib website.

My relevant imports are:

import numpy
import scipy
import pylab
import matplotlib.pyplot as plt
import matplotlib

My plotting code lines are:

        ## PDF.
        outfile = "basefile" + ".pdf"
        ## pylab.savefig(outfile, bbox_inches=0)
        pylab.savefig(outfile,bbox_inches='tight')

----------

My PDF files contain simple plots which consist of (a) data points only,
(b) lines between data points (data points not plotted), or (c) both data
points and lines.

I have a consistent problem in that the files produced have sizes that
seem way too big.
For example, most recently, I am plotting 3 data sets; each data set has
about 90,000 points. If I plot all three sets in one PDF figure, the file
size is over 2MB.

There is no way ever that a human eye (or the computer screen) is going to
distinguish or even see 90,000 points on a standard line-plot. Especially
if you reduce it to a 3 inch by 3 inch graph. You may want to
downscale/interpolate your data to a more manageable set of points and try
again. I'm no expert of the PDF side of things, but I agree with Goyo that
raster files may give you smaller file sizes.

···

On 22 March 2014 20:23, Christopher Kuhlman wrote:

This seems absurd to me. I used R plotting for many years (again, my own
homegrown code, for 6 years) and never had this issue, and I was making
these kinds of plots/figures.

I thought it may be a vector/raster issue, but the following web page says
that PDF are generated as vector image, which, to my understanding (which
could be wrong), is the more compact format.
http://matplotlib.org/faq/usage_faq.html

Is there a command I can use to reduce the file size? Since I am using
these in reports and publications, the figures are almost always less than
3 inches by 3 inches in size; i.e., I do not have issues about taking a
raster figure and trying to blow it up. So I am not concerned about
pixelation problems that occur when an image is increased in size.

Thank you very much.

c

------------------------------------------------------------------------------
Learn Graph Databases - Download FREE O'Reilly Book
"Graph Databases" is the definitive new guide to graph databases and their
applications. Written by three acclaimed leaders in the field,
this first edition is now available. Download your free book today!
http://p.sf.net/sfu/13534_NeoTech
_______________________________________________
Matplotlib-users mailing list
Matplotlib-users@lists.sourceforge.net
matplotlib-users List Signup and Options

--
Andrea.

"Imagination Is The Only Weapon In The War Against Reality."

# ------------------------------------------------------------- #
def ask_mailing_list_support(email):

    if mention_platform_and_version() and include_sample_app():
        send_message(email)
    else:
        install_malware()
        erase_hard_drives()
# ------------------------------------------------------------- #

Thank you both for your fast replies. (Just an aside, plotting all the points is a quick way to detect outliers.)

Before I sent the email, I tried to find a simple raster command in matplotlib to do just that (convert the image to raster), but I could not find one in my search. Is there such a thing?

Thanks again.

c

···

----- Original Message -----
From: "Goyo" <goyodiaz@...287...>
To: "Christopher Kuhlman" <ckuhlman@...4505...>
Cc: "matplotlib-users" <matplotlib-users@lists.sourceforge.net>
Sent: Saturday, March 22, 2014 4:11:08 PM
Subject: Re: [Matplotlib-users] how to reduce the file size of plots generated with matplotlib

2014-03-22 20:23 GMT+01:00 Christopher Kuhlman <ckuhlman@...4505...>:
[...]

For example, most recently, I am plotting 3 data sets; each data set has about 90,000 points. If I plot all three sets in one PDF figure, the file size is over 2MB.
This seems absurd to me. I used R plotting for many years (again, my own homegrown code, for 6 years) and never had this issue, and I was making these kinds of plots/figures.

I thought it may be a vector/raster issue, but the following web page says that PDF are generated as vector image, which, to my understanding (which could be wrong), is the more compact format.
http://matplotlib.org/faq/usage_faq.html

[...]

Roughly speaking, size of vector files depend on the number of points
while size of raster files depends on the number of pixels. For your
use case (many points, small images) raster output should be more
compact.

Goyo

2014-03-22 21:38 GMT+01:00 Christopher Kuhlman <ckuhlman@...4505...>:

Thank you both for your fast replies. (Just an aside, plotting all the points is a quick way to detect outliers.)

Before I sent the email, I tried to find a simple raster command in matplotlib to do just that (convert the image to raster), but I could not find one in my search. Is there such a thing?

outfile = "basefile" + ".png"

Goyo