Pdf File sizes on newer versions of matplotlib is a lot larger

Jeffrey Spencer <jeffspencerd@...287...> writes:

I have three different versions of matplotlib that all output different
file sizes with matplotlib 1.1.1 providing the smallest. This is for the
same exact script. I can post the script if that helps.

MPL 1.4.x: 539.32kb, Ubuntu 12.10
MPL 1.1.1: 172.56kb Ubuntu 12.10
MPL 1.2.1: 475.9kb, Ubuntu 13.04

Yes, it would be interesting to know what the plotting commands are.
Just as a guess, since all the sizes are a few hundred kilobytes, it
could be a difference in e.g. font embedding - many TrueType fonts are
of comparable size.

···

--
Jouni K. Sepp�nen
http://www.iki.fi/jks

In addition to your plot script, any matplotlibrc customizations that you may have in effect would be helpful.

Mike

···

On 07/30/2013 09:23 AM, Jouni K. Sepp�nen wrote:

Jeffrey Spencer <jeffspencerd@...287...> writes:

I have three different versions of matplotlib that all output different
file sizes with matplotlib 1.1.1 providing the smallest. This is for the
same exact script. I can post the script if that helps.

MPL 1.4.x: 539.32kb, Ubuntu 12.10
MPL 1.1.1: 172.56kb Ubuntu 12.10
MPL 1.2.1: 475.9kb, Ubuntu 13.04

Yes, it would be interesting to know what the plotting commands are.
Just as a guess, since all the sizes are a few hundred kilobytes, it
could be a difference in e.g. font embedding - many TrueType fonts are
of comparable size.

K, I have just made the script self-contained but it loads external data so I have attached that as well. If you want me to just separate out the plotting commands let me know. I have also attached my matplotlib rc file which is the same on all three systems. All the modifications to the matplotlibrc file are copied to the top and in the first 30 lines or so.

Of note, the smallest file sizes for pdf are using the pgf backend around 60kb. Not sure if that helps at all. It is also around the same size if I export to .eps and then convert to pdf. About 60kb. The problem with eps in these 3d figures though is the back wall I think has an alpha channel because just becomes a solid wall in the output. No lines through it like the other two walls.

data_mod.npy (23.8 KB)

vowel_data_pb.csv (92 Bytes)

pb_plot_3d_spheres.py (8.05 KB)

matplotlibrc (19.9 KB)

···

On Tue, Jul 30, 2013 at 11:23 PM, Jouni K. Seppänen <jks@…397…> wrote:

Jeffrey Spencer <jeffspencerd@…287…> writes:

I have three different versions of matplotlib that all output different

file sizes with matplotlib 1.1.1 providing the smallest. This is for the

same exact script. I can post the script if that helps.

MPL 1.4.x: 539.32kb, Ubuntu 12.10

MPL 1.1.1: 172.56kb Ubuntu 12.10

MPL 1.2.1: 475.9kb, Ubuntu 13.04

Yes, it would be interesting to know what the plotting commands are.

Just as a guess, since all the sizes are a few hundred kilobytes, it

could be a difference in e.g. font embedding - many TrueType fonts are

of comparable size.

Jouni K. Seppänen

http://www.iki.fi/jks


Get your SQL database under version control now!

Version control is standard for application code, but databases havent

caught up. So what steps can you take to put your SQL databases under

version control? Why should you start doing it? Read more to find out.

http://pubads.g.doubleclick.net/gampad/clk?id=49501711&iu=/4140/ostg.clktrk


Matplotlib-users mailing list

Matplotlib-users@lists.sourceforge.net

https://lists.sourceforge.net/lists/listinfo/matplotlib-users

There are two different things going on here.

Between 1.2.1 and now, there was a bugfix to the font selection routine that inadvertently introduced a bug selecting fonts in the usetex backend. You may notice that on master, the IPA font selected is different. The file size difference can be attributed to the slightly larger font size of the one it selected vs. the one it should have. Note that when usetex is True, the fonts are not subsetted, so you always get the full font embedded in the file (MEP14 work will fix this in the future).

See b5c340 for the bug that introduced the commit, and https://github.com/matplotlib/matplotlib/pull/2260 for the fix (which should make it into 1.3.0 final).

Between 1.1.1 and 1.2.1 a change was made in how collections are handled. Previously, each path was redrawn individually. In 1.2, if a path is reused multiple times, a "stamp" is created and then it is "used" multiple times. In principle, this generally reduces file sizes by a large amount. However, in the case of this figure with the 3D spheres, each path is used only once, so rather than getting the file size savings of that approach, we only get the overhead. The backend could be smarter by not doing this when the path is only used a small number of times. Such a fix would be welcome, but is probably too large/risky to try to get into the current release cycle. It will have to wait for 1.3.1

Cheers,
Mike

···

On 07/30/2013 12:24 PM, Jeffrey Spencer wrote:

K, I have just made the script self-contained but it loads external data so I have attached that as well. If you want me to just separate out the plotting commands let me know. I have also attached my matplotlib rc file which is the same on all three systems. All the modifications to the matplotlibrc file are copied to the top and in the first 30 lines or so.

Of note, the smallest file sizes for pdf are using the pgf backend around 60kb. Not sure if that helps at all. It is also around the same size if I export to .eps and then convert to pdf. About 60kb. The problem with eps in these 3d figures though is the back wall I think has an alpha channel because just becomes a solid wall in the output. No lines through it like the other two walls.

On Tue, Jul 30, 2013 at 11:23 PM, Jouni K. Sepp�nen <jks@…397… > <mailto:jks@…397…>> wrote:

    Jeffrey Spencer <jeffspencerd@…287…
    <mailto:jeffspencerd@…287…>> writes:

    > I have three different versions of matplotlib that all output
    different
    > file sizes with matplotlib 1.1.1 providing the smallest. This is
    for the
    > same exact script. I can post the script if that helps.
    >
    > MPL 1.4.x: 539.32kb, Ubuntu 12.10
    > MPL 1.1.1: 172.56kb Ubuntu 12.10
    > MPL 1.2.1: 475.9kb, Ubuntu 13.04

    Yes, it would be interesting to know what the plotting commands are.
    Just as a guess, since all the sizes are a few hundred kilobytes, it
    could be a difference in e.g. font embedding - many TrueType fonts are
    of comparable size.

    --
    Jouni K. Sepp�nen
    http://www.iki.fi/jks

    ------------------------------------------------------------------------------
    Get your SQL database under version control now!
    Version control is standard for application code, but databases havent
    caught up. So what steps can you take to put your SQL databases under
    version control? Why should you start doing it? Read more to find out.
    http://pubads.g.doubleclick.net/gampad/clk?id=49501711&iu=/4140/ostg.clktrk
    _______________________________________________
    Matplotlib-users mailing list
    Matplotlib-users@lists.sourceforge.net
    <mailto:Matplotlib-users@lists.sourceforge.net>
    https://lists.sourceforge.net/lists/listinfo/matplotlib-users

------------------------------------------------------------------------------
Get your SQL database under version control now!
Version control is standard for application code, but databases havent
caught up. So what steps can you take to put your SQL databases under
version control? Why should you start doing it? Read more to find out.
http://pubads.g.doubleclick.net/gampad/clk?id=49501711&iu=/4140/ostg.clktrk

_______________________________________________
Matplotlib-users mailing list
Matplotlib-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/matplotlib-users

Michael,

Thanks that is very informative. Answers most of the problems I was having and read MEP14 which looks really useful

That being said does the ps backend subset the fonts or use collections for drawing (is the collections feature global or just in the pdf backend)? I usually use .eps output and convert to pdf using epstopdf unless the figure has an alpha channel because always results in a much smaller file (60kB roughly for this file or plain figure around 10kB) than direct pdf output with the output looking the same. I pretty much always have usetex=True so maybe the pdf file is always embedding the full fonts.

Also, does the Cairo backend support usetex=True or subsetting? I know I had read it did not support usetex but that was maybe 2 years ago or so. The x,y,z axis look correct with cairo but the IPA Fonts don’t render properly. The legend font says it is size 12 but if you zoom in extremely close you can see they are the correct fonts just way to small. The file size is around 60kB as well so I am guessing it supports subsetting of fonts.

The pgf backend would also subset fonts if output to .pdf I’m assuming because that is the default with pdftex? It results in similar size files to the .eps output for this file (roughly 60kB also).

The IPA font uses the package (\usepackage{tipa}) and therefore that is why I think these look differently. That package draws these fonts with its’ font libraries instead of whatever is selected as the text font. Maybe I’m wrong about this but that is my understanding because even in normal latex code the fonts look different than the standard text.

Cheers,

Jeff

pb_Gauss3dmales_perc0.3cairo.pdf (67.6 KB)

···

On Wed, Jul 31, 2013 at 4:43 AM, Michael Droettboom <mdroe@…86…> wrote:

  There are two different things going on

here.

  Between 1.2.1 and now, there was a bugfix to the font selection

routine that inadvertently introduced a bug selecting fonts in the
usetex backend. You may notice that on master, the IPA font
selected is different. The file size difference can be attributed
to the slightly larger font size of the one it selected vs. the
one it should have. Note that when usetex is True, the fonts are
not subsetted, so you always get the full font embedded in the
file (MEP14 work will fix this in the future).

  See b5c340 for the bug that introduced the commit, and

https://github.com/matplotlib/matplotlib/pull/2260
for the fix (which should make it into 1.3.0 final).

  Between 1.1.1 and 1.2.1 a change was made in how collections are

handled. Previously, each path was redrawn individually. In 1.2,
if a path is reused multiple times, a “stamp” is created and then
it is “used” multiple times. In principle, this generally reduces
file sizes by a large amount. However, in the case of this figure
with the 3D spheres, each path is used only once, so rather than
getting the file size savings of that approach, we only get the
overhead. The backend could be smarter by not doing this when the
path is only used a small number of times. Such a fix would be
welcome, but is probably too large/risky to try to get into the
current release cycle. It will have to wait for 1.3.1

  Cheers,

  Mike






  On 07/30/2013 12:24 PM, Jeffrey Spencer wrote:
    K, I have just made the script self-contained but

it loads external data so I have attached that as well. If you
want me to just separate out the plotting commands let me know.
I have also attached my matplotlib rc file which is the same on
all three systems. All the modifications to the matplotlibrc
file are copied to the top and in the first 30 lines or so.

      Of note, the smallest file sizes for pdf are using the pgf

backend around 60kb. Not sure if that helps at all. It is also
around the same size if I export to .eps and then convert to
pdf. About 60kb. The problem with eps in these 3d figures
though is the back wall I think has an alpha channel because
just becomes a solid wall in the output. No lines through it
like the other two walls.

------------------------------------------------------------------------------
Get your SQL database under version control now!
Version control is standard for application code, but databases havent caught up. So what steps can you take to put your SQL databases under version control? Why should you start doing it? Read more to find out.
[http://pubads.g.doubleclick.net/gampad/clk?id=49501711&iu=/4140/ostg.clktrk](http://pubads.g.doubleclick.net/gampad/clk?id=49501711&iu=/4140/ostg.clktrk)
_______________________________________________
Matplotlib-users mailing list
Matplotlib-users@lists.sourceforge.net
[https://lists.sourceforge.net/lists/listinfo/matplotlib-users](https://lists.sourceforge.net/lists/listinfo/matplotlib-users)

Get your SQL database under version control now!

Version control is standard for application code, but databases havent

caught up. So what steps can you take to put your SQL databases under

version control? Why should you start doing it? Read more to find out.

http://pubads.g.doubleclick.net/gampad/clk?id=49501711&iu=/4140/ostg.clktrk


Matplotlib-users mailing list

Matplotlib-users@lists.sourceforge.net

https://lists.sourceforge.net/lists/listinfo/matplotlib-users

      On Tue, Jul 30, 2013 at 11:23 PM, Jouni

K. Seppänen <jks@…397…>
wrote:

Jeffrey Spencer <jeffspencerd@…287… >
writes:

          > I have three different versions of matplotlib that

all output different

          > file sizes with matplotlib 1.1.1 providing the

smallest. This is for the

          > same exact script. I can post the script if that

helps.

          >

          > MPL 1.4.x: 539.32kb, Ubuntu 12.10

          > MPL 1.1.1: 172.56kb Ubuntu 12.10

          > MPL 1.2.1: 475.9kb, Ubuntu 13.04
        Yes, it would be interesting to know what the plotting

commands are.

        Just as a guess, since all the sizes are a few hundred

kilobytes, it

        could be a difference in e.g. font embedding - many TrueType

fonts are

        of comparable size.



        --

        Jouni K. Seppänen

        [http://www.iki.fi/jks](http://www.iki.fi/jks)

        Get your SQL database under version control now!

        Version control is standard for application code, but

databases havent

        caught up. So what steps can you take to put your SQL

databases under

        version control? Why should you start doing it? Read more to

find out.

        [http://pubads.g.doubleclick.net/gampad/clk?id=49501711&iu=/4140/ostg.clktrk](http://pubads.g.doubleclick.net/gampad/clk?id=49501711&iu=/4140/ostg.clktrk)

        _______________________________________________

        Matplotlib-users mailing list

        Matplotlib-users@lists.sourceforge.net

        [https://lists.sourceforge.net/lists/listinfo/matplotlib-users](https://lists.sourceforge.net/lists/listinfo/matplotlib-users)

The ps backend has the same behavior as pdf on both counts. TTF
fonts are subsetted, but the fonts that come from TeX come to use as
Type1 fonts, which matplotlib currently does not know how to
subset. It also handles collections in the same way (by creating a
“stamp” and reusing it).
Yes, when usetex=True, matplotlib does not do any font subsetting
(in any backend). To get around this limitation, one can use the
pdftocairo tool (part of poppler utils), to convert from pdf to a
pdf with subsetted fonts. With your example, I was able to get the
pdf down to ~80k. With MEP14, we would basically move such
functionality into matplotlib itself, but that’s sort of a long
term, semi-back-burner project so it could be a while.
It’s possible that epstopdf is doing some font subsetting of its
own. But as you point out, Postscript (as a specification) doesn’t
support alpha, so it’s not useful when you need alpha.
Cairo does support font subsetting, but the matplotlib Cairo backend
has no support for usetex. I’m surprised this worked for you at
all. When I run your example with the Cairo backend, the IPA
characters appear as raw TeX source code, i.e. “\textipa{i}”, which
is what I would expect given that the regular font renderer doesn’t
understand that syntax.
Yes.
That is correct. The default font for usetex=True is Computer
Modern, whereas it is Bitstream Vera Sans in the default font
rendering. I was referring to the difference between 1.2 and 1.4
which was using TeX fonts in both cases, but due to a bug in 1.3/1.4
was rendering the IPA in serif when you had requested sans-serif.
Mike

···

On 07/30/2013 04:20 PM, Jeffrey Spencer
wrote:

Michael,

      Thanks that is very informative. Answers most of the

problems I was having and read MEP14 which looks really useful

      That being said does the ps backend subset the fonts or use

collections for drawing (is the collections feature global or
just in the pdf backend)?

      I usually use .eps output and convert to pdf using

epstopdf unless the figure has an alpha channel because always
results in a much smaller file (60kB roughly for this file or
plain figure around 10kB) than direct pdf output with the
output looking the same. I pretty much always have usetex=True
so maybe the pdf file is always embedding the full fonts.

      Also, does the Cairo backend support usetex=True or

subsetting? I know I had read it did not support usetex but
that was maybe 2 years ago or so. The x,y,z axis look correct
with cairo but the IPA Fonts don’t render properly. The legend
font says it is size 12 but if you zoom in extremely close you
can see they are the correct fonts just way to small. The file
size is around 60kB as well so I am guessing it supports
subsetting of fonts.

      The pgf backend would also subset fonts if output to .pdf

I’m assuming because that is the default with pdftex? It
results in similar size files to the .eps output for this file
(roughly 60kB also).

      The IPA font uses the package (\usepackage{tipa}) and

therefore that is why I think these look differently. That
package draws these fonts with its’ font libraries instead of
whatever is selected as the text font. Maybe I’m wrong about
this but that is my understanding because even in normal latex
code the fonts look different than the standard text.

Cheers,

Jeff

      On Wed, Jul 31, 2013 at 4:43 AM,

Michael Droettboom <mdroe@…86…> wrote:

There are two different things going on here.

            Between 1.2.1 and now, there was a bugfix to the font

selection routine that inadvertently introduced a bug
selecting fonts in the usetex backend. You may notice
that on master, the IPA font selected is different. The
file size difference can be attributed to the slightly
larger font size of the one it selected vs. the one it
should have. Note that when usetex is True, the fonts
are not subsetted, so you always get the full font
embedded in the file (MEP14 work will fix this in the
future).

            See b5c340 for the bug that introduced the commit, and [https://github.com/matplotlib/matplotlib/pull/2260](https://github.com/matplotlib/matplotlib/pull/2260)
            for the fix (which should make it into 1.3.0 final).



            Between 1.1.1 and 1.2.1 a change was made in how

collections are handled. Previously, each path was
redrawn individually. In 1.2, if a path is reused
multiple times, a “stamp” is created and then it is
“used” multiple times. In principle, this generally
reduces file sizes by a large amount. However, in the
case of this figure with the 3D spheres, each path is
used only once, so rather than getting the file size
savings of that approach, we only get the overhead. The
backend could be smarter by not doing this when the path
is only used a small number of times. Such a fix would
be welcome, but is probably too large/risky to try to
get into the current release cycle. It will have to
wait for 1.3.1

            Cheers,

            Mike






                On 07/30/2013 12:24 PM, Jeffrey Spencer wrote:
                  K, I have just made the script

self-contained but it loads external data so I
have attached that as well. If you want me to just
separate out the plotting commands let me know. I
have also attached my matplotlib rc file which is
the same on all three systems. All the
modifications to the matplotlibrc file are copied
to the top and in the first 30 lines or so.

                    Of note, the smallest file sizes for pdf are

using the pgf backend around 60kb. Not sure if
that helps at all. It is also around the same
size if I export to .eps and then convert to
pdf. About 60kb. The problem with eps in these
3d figures though is the back wall I think has
an alpha channel because just becomes a solid
wall in the output. No lines through it like the
other two walls.

                    On Tue, Jul 30, 2013 at

11:23 PM, Jouni K. Seppänen <jks@…397…>
wrote:

Jeffrey Spencer <jeffspencerd@…287…
>
writes:

                        > I have three different versions of

matplotlib that all output different

                        > file sizes with matplotlib 1.1.1

providing the smallest. This is for the

                        > same exact script. I can post the

script if that helps.

                        >

                        > MPL 1.4.x: 539.32kb, Ubuntu 12.10

                        > MPL 1.1.1: 172.56kb Ubuntu 12.10

                        > MPL 1.2.1: 475.9kb, Ubuntu 13.04
                      Yes, it would be interesting to know what the

plotting commands are.

                      Just as a guess, since all the sizes are a few

hundred kilobytes, it

                      could be a difference in e.g. font embedding -

many TrueType fonts are

                      of comparable size.



                      --

                      Jouni K. Seppänen

                      [http://www.iki.fi/jks](http://www.iki.fi/jks)

                      Get your SQL database under version control

now!

                      Version control is standard for application

code, but databases havent

                      caught up. So what steps can you take to put

your SQL databases under

                      version control? Why should you start doing

it? Read more to find out.

                      [http://pubads.g.doubleclick.net/gampad/clk?id=49501711&iu=/4140/ostg.clktrk](http://pubads.g.doubleclick.net/gampad/clk?id=49501711&iu=/4140/ostg.clktrk)

                      Matplotlib-users mailing list

                      Matplotlib-users@lists.sourceforge.net

                      [https://lists.sourceforge.net/lists/listinfo/matplotlib-users](https://lists.sourceforge.net/lists/listinfo/matplotlib-users)
------------------------------------------------------------------------------
Get your SQL database under version control now!
Version control is standard for application code, but databases havent caught up. So what steps can you take to put your SQL databases under version control? Why should you start doing it? Read more to find out.
[http://pubads.g.doubleclick.net/gampad/clk?id=49501711&iu=/4140/ostg.clktrk](http://pubads.g.doubleclick.net/gampad/clk?id=49501711&iu=/4140/ostg.clktrk)
_______________________________________________
Matplotlib-users mailing list
Matplotlib-users@lists.sourceforge.net
[https://lists.sourceforge.net/lists/listinfo/matplotlib-users](https://lists.sourceforge.net/lists/listinfo/matplotlib-users)

        Get your SQL database under version control now!

        Version control is standard for application code, but

databases havent

        caught up. So what steps can you take to put your SQL

databases under

        version control? Why should you start doing it? Read more to

find out.

        [http://pubads.g.doubleclick.net/gampad/clk?id=49501711&iu=/4140/ostg.clktrk](http://pubads.g.doubleclick.net/gampad/clk?id=49501711&iu=/4140/ostg.clktrk)

        _______________________________________________

        Matplotlib-users mailing list

        Matplotlib-users@lists.sourceforge.net

        [https://lists.sourceforge.net/lists/listinfo/matplotlib-users](https://lists.sourceforge.net/lists/listinfo/matplotlib-users)

Michael,

Pdftocairo is a good tool to know so thanks for that tip.

I still think currently it is a regression with the current ‘stamp’ method to use it on all accounts. I understand in a complicated figure with a bunch of subplots that this would be beneficial and create smaller code. I don’t see how in single figures this would often result in reduced files sizes. I usually output single figures with one plot and I don’t think one of them that I am currently working on was smaller in 1.4.x. They all resulted in reduced file sizes with mpl 1.1.1. This figure of 3d spheres resulted in 60kb instead of roughly 80kb after running pdftocairo. Anyway, you said in coming versions a threshold should be set before stamping of objects occurs so a fix is on the way eventually.

Thanks for all the help,

Jeff

···

On Wed, Jul 31, 2013 at 11:31 PM, Michael Droettboom <mdroe@…86…> wrote:

  On 07/30/2013 04:20 PM, Jeffrey Spencer

wrote:

Michael,

      Thanks that is very informative. Answers most of the

problems I was having and read MEP14 which looks really useful

      That being said does the ps backend subset the fonts or use

collections for drawing (is the collections feature global or
just in the pdf backend)?

The ps backend has the same behavior as pdf on both counts.  TTF

fonts are subsetted, but the fonts that come from TeX come to use as
Type1 fonts, which matplotlib currently does not know how to
subset. It also handles collections in the same way (by creating a
“stamp” and reusing it).

      I usually use .eps output and convert to pdf using

epstopdf unless the figure has an alpha channel because always
results in a much smaller file (60kB roughly for this file or
plain figure around 10kB) than direct pdf output with the
output looking the same. I pretty much always have usetex=True
so maybe the pdf file is always embedding the full fonts.

Yes, when usetex=True, matplotlib does not do any font subsetting

(in any backend). To get around this limitation, one can use the
pdftocairo tool (part of poppler utils), to convert from pdf to a
pdf with subsetted fonts. With your example, I was able to get the
pdf down to ~80k. With MEP14, we would basically move such
functionality into matplotlib itself, but that’s sort of a long
term, semi-back-burner project so it could be a while.

It's possible that epstopdf is doing some font subsetting of its

own. But as you point out, Postscript (as a specification) doesn’t
support alpha, so it’s not useful when you need alpha.

      Also, does the Cairo backend support usetex=True or

subsetting? I know I had read it did not support usetex but
that was maybe 2 years ago or so. The x,y,z axis look correct
with cairo but the IPA Fonts don’t render properly. The legend
font says it is size 12 but if you zoom in extremely close you
can see they are the correct fonts just way to small. The file
size is around 60kB as well so I am guessing it supports
subsetting of fonts.

Cairo does support font subsetting, but the matplotlib Cairo backend

has no support for usetex. I’m surprised this worked for you at
all. When I run your example with the Cairo backend, the IPA
characters appear as raw TeX source code, i.e. “\textipa{i}”, which
is what I would expect given that the regular font renderer doesn’t
understand that syntax.

      The pgf backend would also subset fonts if output to .pdf

I’m assuming because that is the default with pdftex? It
results in similar size files to the .eps output for this file
(roughly 60kB also).

Yes.

      The IPA font uses the package (\usepackage{tipa}) and

therefore that is why I think these look differently. That
package draws these fonts with its’ font libraries instead of
whatever is selected as the text font. Maybe I’m wrong about
this but that is my understanding because even in normal latex
code the fonts look different than the standard text.

That is correct.  The default font for usetex=True is Computer

Modern, whereas it is Bitstream Vera Sans in the default font
rendering. I was referring to the difference between 1.2 and 1.4
which was using TeX fonts in both cases, but due to a bug in 1.3/1.4
was rendering the IPA in serif when you had requested sans-serif.

Mike

Cheers,

Jeff

      On Wed, Jul 31, 2013 at 4:43 AM,

Michael Droettboom <mdroe@…86…> wrote:

There are two different things going on here.

            Between 1.2.1 and now, there was a bugfix to the font

selection routine that inadvertently introduced a bug
selecting fonts in the usetex backend. You may notice
that on master, the IPA font selected is different. The
file size difference can be attributed to the slightly
larger font size of the one it selected vs. the one it
should have. Note that when usetex is True, the fonts
are not subsetted, so you always get the full font
embedded in the file (MEP14 work will fix this in the
future).

            See b5c340 for the bug that introduced the commit, and [https://github.com/matplotlib/matplotlib/pull/2260](https://github.com/matplotlib/matplotlib/pull/2260)
            for the fix (which should make it into 1.3.0 final).



            Between 1.1.1 and 1.2.1 a change was made in how

collections are handled. Previously, each path was
redrawn individually. In 1.2, if a path is reused
multiple times, a “stamp” is created and then it is
“used” multiple times. In principle, this generally
reduces file sizes by a large amount. However, in the
case of this figure with the 3D spheres, each path is
used only once, so rather than getting the file size
savings of that approach, we only get the overhead. The
backend could be smarter by not doing this when the path
is only used a small number of times. Such a fix would
be welcome, but is probably too large/risky to try to
get into the current release cycle. It will have to
wait for 1.3.1

            Cheers,

            Mike






                On 07/30/2013 12:24 PM, Jeffrey Spencer wrote:
                  K, I have just made the script

self-contained but it loads external data so I
have attached that as well. If you want me to just
separate out the plotting commands let me know. I
have also attached my matplotlib rc file which is
the same on all three systems. All the
modifications to the matplotlibrc file are copied
to the top and in the first 30 lines or so.

                    Of note, the smallest file sizes for pdf are

using the pgf backend around 60kb. Not sure if
that helps at all. It is also around the same
size if I export to .eps and then convert to
pdf. About 60kb. The problem with eps in these
3d figures though is the back wall I think has
an alpha channel because just becomes a solid
wall in the output. No lines through it like the
other two walls.

------------------------------------------------------------------------------
Get your SQL database under version control now!
Version control is standard for application code, but databases havent caught up. So what steps can you take to put your SQL databases under version control? Why should you start doing it? Read more to find out.
[http://pubads.g.doubleclick.net/gampad/clk?id=49501711&iu=/4140/ostg.clktrk](http://pubads.g.doubleclick.net/gampad/clk?id=49501711&iu=/4140/ostg.clktrk)
_______________________________________________
Matplotlib-users mailing list
Matplotlib-users@lists.sourceforge.net
[https://lists.sourceforge.net/lists/listinfo/matplotlib-users](https://lists.sourceforge.net/lists/listinfo/matplotlib-users)

        Get your SQL database under version control now!

        Version control is standard for application code, but

databases havent

        caught up. So what steps can you take to put your SQL

databases under

        version control? Why should you start doing it? Read more to

find out.

        [http://pubads.g.doubleclick.net/gampad/clk?id=49501711&iu=/4140/ostg.clktrk](http://pubads.g.doubleclick.net/gampad/clk?id=49501711&iu=/4140/ostg.clktrk)

        _______________________________________________

        Matplotlib-users mailing list

        Matplotlib-users@lists.sourceforge.net

        [https://lists.sourceforge.net/lists/listinfo/matplotlib-users](https://lists.sourceforge.net/lists/listinfo/matplotlib-users)
                    On Tue, Jul 30, 2013 at

11:23 PM, Jouni K. Seppänen <jks@…397…>
wrote:

Jeffrey Spencer <jeffspencerd@…287…
>
writes:

                        > I have three different versions of

matplotlib that all output different

                        > file sizes with matplotlib 1.1.1

providing the smallest. This is for the

                        > same exact script. I can post the

script if that helps.

                        >

                        > MPL 1.4.x: 539.32kb, Ubuntu 12.10

                        > MPL 1.1.1: 172.56kb Ubuntu 12.10

                        > MPL 1.2.1: 475.9kb, Ubuntu 13.04
                      Yes, it would be interesting to know what the

plotting commands are.

                      Just as a guess, since all the sizes are a few

hundred kilobytes, it

                      could be a difference in e.g. font embedding -

many TrueType fonts are

                      of comparable size.



                      --

                      Jouni K. Seppänen

                      [http://www.iki.fi/jks](http://www.iki.fi/jks)

                      Get your SQL database under version control

now!

                      Version control is standard for application

code, but databases havent

                      caught up. So what steps can you take to put

your SQL databases under

                      version control? Why should you start doing

it? Read more to find out.

                      [http://pubads.g.doubleclick.net/gampad/clk?id=49501711&iu=/4140/ostg.clktrk](http://pubads.g.doubleclick.net/gampad/clk?id=49501711&iu=/4140/ostg.clktrk)

                      Matplotlib-users mailing list

                      Matplotlib-users@lists.sourceforge.net

                      [https://lists.sourceforge.net/lists/listinfo/matplotlib-users](https://lists.sourceforge.net/lists/listinfo/matplotlib-users)

The case where it has an enormous impact is when the same shape is
used multiple times. For example in a scatter, hexbin or pcolor
plot.
Yes, but it’s too complex of a fix to throw in quickly. I think the
overall benefit of stamping is preferable to not doing it at all at
this point.
Mike

···

On 07/31/2013 10:38 AM, Jeffrey Spencer
wrote:

Michael,

Pdftocairo is a good tool to know so thanks for that tip.

      I still think currently it is a regression with the current

‘stamp’ method to use it on all accounts. I understand in a
complicated figure with a bunch of subplots that this would be
beneficial and create smaller code. I don’t see how in single
figures this would often result in reduced files sizes.

      I usually output single figures with one plot and I don't

think one of them that I am currently working on was smaller
in 1.4.x. They all resulted in reduced file sizes with mpl
1.1.1. This figure of 3d spheres resulted in 60kb instead of
roughly 80kb after running pdftocairo. Anyway, you said in
coming versions a threshold should be set before stamping of
objects occurs so a fix is on the way eventually.

Thanks for all the help,

Jeff

      On Wed, Jul 31, 2013 at 11:31 PM,

Michael Droettboom <mdroe@…86…> wrote:

On 07/30/2013 04:20 PM, Jeffrey Spencer wrote:

Michael,

                  Thanks that is very informative. Answers most

of the problems I was having and read MEP14 which
looks really useful

                  That being said does the ps backend subset the

fonts or use collections for drawing (is the
collections feature global or just in the pdf
backend)?

          The ps backend has the same behavior as pdf on both

counts. TTF fonts are subsetted, but the fonts that come
from TeX come to use as Type1 fonts, which matplotlib
currently does not know how to subset. It also handles
collections in the same way (by creating a “stamp” and
reusing it).

                  I usually use .eps output and convert to pdf

using epstopdf unless the figure has an alpha
channel because always results in a much smaller
file (60kB roughly for this file or plain figure
around 10kB) than direct pdf output with the
output looking the same. I pretty much always have
usetex=True so maybe the pdf file is always
embedding the full fonts.

          Yes, when usetex=True, matplotlib does not do any font

subsetting (in any backend). To get around this
limitation, one can use the pdftocairo tool (part of
poppler utils), to convert from pdf to a pdf with
subsetted fonts. With your example, I was able to get the
pdf down to ~80k. With MEP14, we would basically move
such functionality into matplotlib itself, but that’s sort
of a long term, semi-back-burner project so it could be a
while.

          It's possible that epstopdf is doing some font subsetting

of its own. But as you point out, Postscript (as a
specification) doesn’t support alpha, so it’s not useful
when you need alpha.

                  Also, does the Cairo backend support

usetex=True or subsetting? I know I had read it
did not support usetex but that was maybe 2 years
ago or so. The x,y,z axis look correct with cairo
but the IPA Fonts don’t render properly. The
legend font says it is size 12 but if you zoom in
extremely close you can see they are the correct
fonts just way to small. The file size is around
60kB as well so I am guessing it supports
subsetting of fonts.

          Cairo does support font subsetting, but the matplotlib

Cairo backend has no support for usetex. I’m surprised
this worked for you at all. When I run your example with
the Cairo backend, the IPA characters appear as raw TeX
source code, i.e. “\textipa{i}”, which is what I would
expect given that the regular font renderer doesn’t
understand that syntax.

                  The pgf backend would also subset fonts if

output to .pdf I’m assuming because that is the
default with pdftex? It results in similar size
files to the .eps output for this file (roughly
60kB also).

Yes.

                  The IPA font uses the package

(\usepackage{tipa}) and therefore that is why I
think these look differently. That package draws
these fonts with its’ font libraries instead of
whatever is selected as the text font. Maybe I’m
wrong about this but that is my understanding
because even in normal latex code the fonts look
different than the standard text.

          That is correct.  The default font for usetex=True is

Computer Modern, whereas it is Bitstream Vera Sans in the
default font rendering. I was referring to the difference
between 1.2 and 1.4 which was using TeX fonts in both
cases, but due to a bug in 1.3/1.4 was rendering the IPA
in serif when you had requested sans-serif.

          Mike

Cheers,

Jeff

                    On Wed, Jul 31, 2013 at

4:43 AM, Michael Droettboom <mdroe@…86…>
wrote:

                          There are two different things going on

here.

                          Between 1.2.1 and now, there was a bugfix

to the font selection routine that
inadvertently introduced a bug selecting
fonts in the usetex backend. You may
notice that on master, the IPA font
selected is different. The file size
difference can be attributed to the
slightly larger font size of the one it
selected vs. the one it should have. Note
that when usetex is True, the fonts are
not subsetted, so you always get the full
font embedded in the file (MEP14 work will
fix this in the future).

                          See b5c340 for the bug that introduced the

commit, and https://github.com/matplotlib/matplotlib/pull/2260
for the fix (which should make it into
1.3.0 final).

                          Between 1.1.1 and 1.2.1 a change was made

in how collections are handled.
Previously, each path was redrawn
individually. In 1.2, if a path is reused
multiple times, a “stamp” is created and
then it is “used” multiple times. In
principle, this generally reduces file
sizes by a large amount. However, in the
case of this figure with the 3D spheres,
each path is used only once, so rather
than getting the file size savings of that
approach, we only get the overhead. The
backend could be smarter by not doing this
when the path is only used a small number
of times. Such a fix would be welcome,
but is probably too large/risky to try to
get into the current release cycle. It
will have to wait for 1.3.1

                          Cheers,

                          Mike






                              On 07/30/2013 12:24 PM, Jeffrey

Spencer wrote:

                                K, I have just made the

script self-contained but it loads
external data so I have attached
that as well. If you want me to just
separate out the plotting commands
let me know. I have also attached my
matplotlib rc file which is the same
on all three systems. All the
modifications to the matplotlibrc
file are copied to the top and in
the first 30 lines or so.

                                  Of note, the smallest file

sizes for pdf are using the pgf
backend around 60kb. Not sure if
that helps at all. It is also
around the same size if I export
to .eps and then convert to pdf.
About 60kb. The problem with eps
in these 3d figures though is the
back wall I think has an alpha
channel because just becomes a
solid wall in the output. No lines
through it like the other two
walls.

                                  On Tue, Jul

30, 2013 at 11:23 PM, Jouni K.
Seppänen <jks@…397…>
wrote:

Jeffrey Spencer <jeffspencerd@…287…

                                      >

writes:

                                      > I have three different

versions of matplotlib that
all output different

                                      > file sizes with

matplotlib 1.1.1 providing the
smallest. This is for the

                                      > same exact script. I can

post the script if that helps.

                                      >

                                      > MPL 1.4.x: 539.32kb,

Ubuntu 12.10

                                      > MPL 1.1.1: 172.56kb

Ubuntu 12.10

                                      > MPL 1.2.1: 475.9kb,

Ubuntu 13.04

                                    Yes, it would be interesting to

know what the plotting commands
are.

                                    Just as a guess, since all the

sizes are a few hundred
kilobytes, it

                                    could be a difference in e.g.

font embedding - many TrueType
fonts are

                                    of comparable size.



                                    --

                                    Jouni K. Seppänen

                                    [http://www.iki.fi/jks](http://www.iki.fi/jks)

                                    Get your SQL database under

version control now!

                                    Version control is standard for

application code, but databases
havent

                                    caught up. So what steps can you

take to put your SQL databases
under

                                    version control? Why should you

start doing it? Read more to
find out.

                                    [http://pubads.g.doubleclick.net/gampad/clk?id=49501711&iu=/4140/ostg.clktrk](http://pubads.g.doubleclick.net/gampad/clk?id=49501711&iu=/4140/ostg.clktrk)

                                    Matplotlib-users mailing list

                                    Matplotlib-users@lists.sourceforge.net

                                    [https://lists.sourceforge.net/lists/listinfo/matplotlib-users](https://lists.sourceforge.net/lists/listinfo/matplotlib-users)
------------------------------------------------------------------------------
Get your SQL database under version control now!
Version control is standard for application code, but databases havent caught up. So what steps can you take to put your SQL databases under version control? Why should you start doing it? Read more to find out.
[http://pubads.g.doubleclick.net/gampad/clk?id=49501711&iu=/4140/ostg.clktrk](http://pubads.g.doubleclick.net/gampad/clk?id=49501711&iu=/4140/ostg.clktrk)
_______________________________________________
Matplotlib-users mailing list
Matplotlib-users@lists.sourceforge.net
[https://lists.sourceforge.net/lists/listinfo/matplotlib-users](https://lists.sourceforge.net/lists/listinfo/matplotlib-users)

                      Get your SQL database under version control

now!

                      Version control is standard for application

code, but databases havent

                      caught up. So what steps can you take to put

your SQL databases under

                      version control? Why should you start doing

it? Read more to find out.

                      [http://pubads.g.doubleclick.net/gampad/clk?id=49501711&iu=/4140/ostg.clktrk](http://pubads.g.doubleclick.net/gampad/clk?id=49501711&iu=/4140/ostg.clktrk)

                      Matplotlib-users mailing list

                      Matplotlib-users@lists.sourceforge.net

                      [https://lists.sourceforge.net/lists/listinfo/matplotlib-users](https://lists.sourceforge.net/lists/listinfo/matplotlib-users)

Yeah, I plot to pcolor a lot but haven’t recently so next time I do I’ll check. It would make a lot of sense for saving overhead there as you have stated.

The overhead doesn’t seem to be to big for small plots but was just curious where it was most useful.

Cheers,

Jeff

···

On Thu, Aug 1, 2013 at 12:59 AM, Michael Droettboom <mdroe@…86…> wrote:

  On 07/31/2013 10:38 AM, Jeffrey Spencer

wrote:

Michael,

Pdftocairo is a good tool to know so thanks for that tip.

      I still think currently it is a regression with the current

‘stamp’ method to use it on all accounts. I understand in a
complicated figure with a bunch of subplots that this would be
beneficial and create smaller code. I don’t see how in single
figures this would often result in reduced files sizes.

The case where it has an enormous impact is when the same shape is

used multiple times. For example in a scatter, hexbin or pcolor
plot.

      I usually output single figures with one plot and I don't

think one of them that I am currently working on was smaller
in 1.4.x. They all resulted in reduced file sizes with mpl
1.1.1. This figure of 3d spheres resulted in 60kb instead of
roughly 80kb after running pdftocairo. Anyway, you said in
coming versions a threshold should be set before stamping of
objects occurs so a fix is on the way eventually.

Yes, but it's too complex of a fix to throw in quickly.  I think the

overall benefit of stamping is preferable to not doing it at all at
this point.

Mike

Thanks for all the help,

Jeff

      On Wed, Jul 31, 2013 at 11:31 PM,

Michael Droettboom <mdroe@…86…> wrote:

On 07/30/2013 04:20 PM, Jeffrey Spencer wrote:

Michael,

                  Thanks that is very informative. Answers most

of the problems I was having and read MEP14 which
looks really useful

                  That being said does the ps backend subset the

fonts or use collections for drawing (is the
collections feature global or just in the pdf
backend)?

          The ps backend has the same behavior as pdf on both

counts. TTF fonts are subsetted, but the fonts that come
from TeX come to use as Type1 fonts, which matplotlib
currently does not know how to subset. It also handles
collections in the same way (by creating a “stamp” and
reusing it).

                  I usually use .eps output and convert to pdf

using epstopdf unless the figure has an alpha
channel because always results in a much smaller
file (60kB roughly for this file or plain figure
around 10kB) than direct pdf output with the
output looking the same. I pretty much always have
usetex=True so maybe the pdf file is always
embedding the full fonts.

          Yes, when usetex=True, matplotlib does not do any font

subsetting (in any backend). To get around this
limitation, one can use the pdftocairo tool (part of
poppler utils), to convert from pdf to a pdf with
subsetted fonts. With your example, I was able to get the
pdf down to ~80k. With MEP14, we would basically move
such functionality into matplotlib itself, but that’s sort
of a long term, semi-back-burner project so it could be a
while.

          It's possible that epstopdf is doing some font subsetting

of its own. But as you point out, Postscript (as a
specification) doesn’t support alpha, so it’s not useful
when you need alpha.

                  Also, does the Cairo backend support

usetex=True or subsetting? I know I had read it
did not support usetex but that was maybe 2 years
ago or so. The x,y,z axis look correct with cairo
but the IPA Fonts don’t render properly. The
legend font says it is size 12 but if you zoom in
extremely close you can see they are the correct
fonts just way to small. The file size is around
60kB as well so I am guessing it supports
subsetting of fonts.

          Cairo does support font subsetting, but the matplotlib

Cairo backend has no support for usetex. I’m surprised
this worked for you at all. When I run your example with
the Cairo backend, the IPA characters appear as raw TeX
source code, i.e. “\textipa{i}”, which is what I would
expect given that the regular font renderer doesn’t
understand that syntax.

                  The pgf backend would also subset fonts if

output to .pdf I’m assuming because that is the
default with pdftex? It results in similar size
files to the .eps output for this file (roughly
60kB also).

Yes.

                  The IPA font uses the package

(\usepackage{tipa}) and therefore that is why I
think these look differently. That package draws
these fonts with its’ font libraries instead of
whatever is selected as the text font. Maybe I’m
wrong about this but that is my understanding
because even in normal latex code the fonts look
different than the standard text.

          That is correct.  The default font for usetex=True is

Computer Modern, whereas it is Bitstream Vera Sans in the
default font rendering. I was referring to the difference
between 1.2 and 1.4 which was using TeX fonts in both
cases, but due to a bug in 1.3/1.4 was rendering the IPA
in serif when you had requested sans-serif.

          Mike

Cheers,

Jeff

                    On Wed, Jul 31, 2013 at

4:43 AM, Michael Droettboom <mdroe@…86…>
wrote:

                          There are two different things going on

here.

                          Between 1.2.1 and now, there was a bugfix

to the font selection routine that
inadvertently introduced a bug selecting
fonts in the usetex backend. You may
notice that on master, the IPA font
selected is different. The file size
difference can be attributed to the
slightly larger font size of the one it
selected vs. the one it should have. Note
that when usetex is True, the fonts are
not subsetted, so you always get the full
font embedded in the file (MEP14 work will
fix this in the future).

                          See b5c340 for the bug that introduced the

commit, and https://github.com/matplotlib/matplotlib/pull/2260
for the fix (which should make it into
1.3.0 final).

                          Between 1.1.1 and 1.2.1 a change was made

in how collections are handled.
Previously, each path was redrawn
individually. In 1.2, if a path is reused
multiple times, a “stamp” is created and
then it is “used” multiple times. In
principle, this generally reduces file
sizes by a large amount. However, in the
case of this figure with the 3D spheres,
each path is used only once, so rather
than getting the file size savings of that
approach, we only get the overhead. The
backend could be smarter by not doing this
when the path is only used a small number
of times. Such a fix would be welcome,
but is probably too large/risky to try to
get into the current release cycle. It
will have to wait for 1.3.1

                          Cheers,

                          Mike






                              On 07/30/2013 12:24 PM, Jeffrey

Spencer wrote:

                                K, I have just made the

script self-contained but it loads
external data so I have attached
that as well. If you want me to just
separate out the plotting commands
let me know. I have also attached my
matplotlib rc file which is the same
on all three systems. All the
modifications to the matplotlibrc
file are copied to the top and in
the first 30 lines or so.

                                  Of note, the smallest file

sizes for pdf are using the pgf
backend around 60kb. Not sure if
that helps at all. It is also
around the same size if I export
to .eps and then convert to pdf.
About 60kb. The problem with eps
in these 3d figures though is the
back wall I think has an alpha
channel because just becomes a
solid wall in the output. No lines
through it like the other two
walls.

------------------------------------------------------------------------------
Get your SQL database under version control now!
Version control is standard for application code, but databases havent caught up. So what steps can you take to put your SQL databases under version control? Why should you start doing it? Read more to find out.
[http://pubads.g.doubleclick.net/gampad/clk?id=49501711&iu=/4140/ostg.clktrk](http://pubads.g.doubleclick.net/gampad/clk?id=49501711&iu=/4140/ostg.clktrk)
_______________________________________________
Matplotlib-users mailing list
Matplotlib-users@lists.sourceforge.net
[https://lists.sourceforge.net/lists/listinfo/matplotlib-users](https://lists.sourceforge.net/lists/listinfo/matplotlib-users)

                      Get your SQL database under version control

now!

                      Version control is standard for application

code, but databases havent

                      caught up. So what steps can you take to put

your SQL databases under

                      version control? Why should you start doing

it? Read more to find out.

                      [http://pubads.g.doubleclick.net/gampad/clk?id=49501711&iu=/4140/ostg.clktrk](http://pubads.g.doubleclick.net/gampad/clk?id=49501711&iu=/4140/ostg.clktrk)

                      Matplotlib-users mailing list

                      Matplotlib-users@lists.sourceforge.net

                      [https://lists.sourceforge.net/lists/listinfo/matplotlib-users](https://lists.sourceforge.net/lists/listinfo/matplotlib-users)
                                  On Tue, Jul

30, 2013 at 11:23 PM, Jouni K.
Seppänen <jks@…397…>
wrote:

Jeffrey Spencer <jeffspencerd@…287…

                                      >

writes:

                                      > I have three different

versions of matplotlib that
all output different

                                      > file sizes with

matplotlib 1.1.1 providing the
smallest. This is for the

                                      > same exact script. I can

post the script if that helps.

                                      >

                                      > MPL 1.4.x: 539.32kb,

Ubuntu 12.10

                                      > MPL 1.1.1: 172.56kb

Ubuntu 12.10

                                      > MPL 1.2.1: 475.9kb,

Ubuntu 13.04

                                    Yes, it would be interesting to

know what the plotting commands
are.

                                    Just as a guess, since all the

sizes are a few hundred
kilobytes, it

                                    could be a difference in e.g.

font embedding - many TrueType
fonts are

                                    of comparable size.



                                    --

                                    Jouni K. Seppänen

                                    [http://www.iki.fi/jks](http://www.iki.fi/jks)

                                    Get your SQL database under

version control now!

                                    Version control is standard for

application code, but databases
havent

                                    caught up. So what steps can you

take to put your SQL databases
under

                                    version control? Why should you

start doing it? Read more to
find out.

                                    [http://pubads.g.doubleclick.net/gampad/clk?id=49501711&iu=/4140/ostg.clktrk](http://pubads.g.doubleclick.net/gampad/clk?id=49501711&iu=/4140/ostg.clktrk)

                                    Matplotlib-users mailing list

                                    Matplotlib-users@...1738....net

                                    [https://lists.sourceforge.net/lists/listinfo/matplotlib-users](https://lists.sourceforge.net/lists/listinfo/matplotlib-users)