TTF subsetting in PDF

I just committed changes that add TTF subsetting to the PDF backend. It is completely analogous to the font subsetting recently added to the PS backend.

I have added a configuration option, pdf.fonttype, to choose either "Type3" or "Truetype" font output. This may be removed in the future once the "Type3" stuff has been sufficiently tested.

Some results:

fonts_demo_kw.py: 201744 -> 37326
mathtext_demo.py: 129306 -> 26179
unicode_demo.py: 45303 -> 20084
over all demos in backend_driver.py: 5856001 -> 3390460

The differences aren't as dramatic as with Postscript, but IMHO they are still large enough to be worthwhile.

Again, please help by testing with your own favorite PDF tools.

Gory details about composite characters follow -->

In this new code, composite characters (such as a character composed of a letter and an accent) aren't handled as they should be. According to the PDF spec, a PDF-1.2 (Acrobat 3.x) Type 3 font can reference other glyphs with the "Do" command, to avoid duplicating the components of a composite glyph. I was able to get this to work with Acrobat 7, but xpdf-3.0 and ggv 2.8.0 both choked on the file. Therefore, I decided to err on the side of compatibility by including each component of a composite character inline where it is used. This makes the PDF files larger than they would otherwise have to be. However, it should only be a real problem if a plot contains an inordinate amount of different accented characters.

Cheers,
Mike

When you say "over all demos" do you mean just over the PS or PDF
depending on which you are testing? I'm a bit confused because the
single examples you show show between a 5 and 20 fold improvement, but
the overall number is less than 2 fold. So I wonder if you are
including the other backend driver PNG, SVG, etc... output.... Just
curious.

JDH

···

On 7/10/07, Michael Droettboom <mdroe@...31...> wrote:

I just committed changes that add TTF subsetting to the PDF backend. It
is completely analogous to the font subsetting recently added to the PS
backend.

I have added a configuration option, pdf.fonttype, to choose either
"Type3" or "Truetype" font output. This may be removed in the future
once the "Type3" stuff has been sufficiently tested.

Some results:

fonts_demo_kw.py: 201744 -> 37326
mathtext_demo.py: 129306 -> 26179
unicode_demo.py: 45303 -> 20084
over all demos in backend_driver.py: 5856001 -> 3390460

John Hunter wrote:

fonts_demo_kw.py: 201744 -> 37326
mathtext_demo.py: 129306 -> 26179
unicode_demo.py: 45303 -> 20084
over all demos in backend_driver.py: 5856001 -> 3390460

When you say "over all demos" do you mean just over the PS or PDF
depending on which you are testing? I'm a bit confused because the
single examples you show show between a 5 and 20 fold improvement, but
the overall number is less than 2 fold. So I wonder if you are
including the other backend driver PNG, SVG, etc... output.... Just
curious.

The comparison is just over the PDF files, old way (Truetype embedding) vs. new way (Type 3 subsetting). The ratios are different because I chose to highlight the examples that are quite "texty". That wasn't a deliberate attempt to mislead, it's just because this change is related to fonts. Most of the other examples use a single font, and the plotting content itself dominates file size.

Cheers,
Mike

···

On 7/10/07, Michael Droettboom <mdroe@...31...> wrote:

This must be dominated by some weird outliers. I'm seeing great
results with the canonical "simple_plot"

johnh@...539...:examples> python simple_plot.py -dPS
johnh@...539...:examples> mv simple_plot.ps new.ps
johnh@...539...:examples> PYTHONPATH=/my/old/site-packages python simple_plot.py -dPS
johnh@...539...:examples> mv simple_plot.ps old.ps
johnh@...539...:examples> ls -l old.ps new.ps
-rw-r--r-- 1 johnh research 19352 Jul 10 12:11 new.ps
-rw-r--r-- 1 johnh research 144227 Jul 10 12:11 old.ps

Though for some reason my 90.1 install is picking up Vera Serif and my
svn install is picking up Vera Sans., which is mysterious but
unrelated to your work

In any case, excellent work!

JDH

···

On 7/10/07, Michael Droettboom <mdroe@...31...> wrote:

The comparison is just over the PDF files, old way (Truetype embedding)
vs. new way (Type 3 subsetting).
The ratios are different because I chose to highlight the examples that
are quite "texty". That wasn't a deliberate attempt to mislead, it's
just because this change is related to fonts. Most of the other
examples use a single font, and the plotting content itself dominates
file size.

John Hunter wrote:

The comparison is just over the PDF files, old way (Truetype embedding)
vs. new way (Type 3 subsetting).

[...]

In any case, excellent work!

JDH

Mike,

I second that! I greatly appreciate your contributions, first in chasing down memory leaks and now in reducing file sizes by embedding fonts.

Eric

···

On 7/10/07, Michael Droettboom <mdroe@...31...> wrote:

Eric Firing wrote:

I second that! I greatly appreciate your contributions, first in chasing down memory leaks and now in reducing file sizes by embedding fonts.

It's been fun.

Now, Eric, I'm just waiting for you to tell me how this latest batch reveals another bug on Ubuntu :wink: (with all due respect to Ubuntu)

Cheers,
Mike

Michael Droettboom wrote:

Eric Firing wrote:

I second that! I greatly appreciate your contributions, first in chasing down memory leaks and now in reducing file sizes by embedding fonts.

It's been fun.

Mike,

Good--what's next? You're ready for more fun, I hope. If you are looking for brain-benders, I know of two bugs lurking deep in extension code (one in cntr.c, the other in the Agg quadmesh rendering) that have completely eluded me. There is also a bug in the Agg image rendering that shows up at high magnification. If you are interested in grand strategy, let's discuss what a suitable target might be--maybe some refactoring and consolidation of backend code to reduce duplication, and make optimizations applicable to all (referring to recent work by Allan). A related idea that has been languishing is to consolidate the various image-like functionality, again so as to make maximum use of the different optimizations and options that are now spread among different functions. John probably has better ideas about what to attack.

Now, Eric, I'm just waiting for you to tell me how this latest batch reveals another bug on Ubuntu :wink: (with all due respect to Ubuntu)

So, you use RHEL4 at work and ubuntu at home; has the former actually managed to put together a set of relatively bug-free versions of gui toolkits and other libraries? Or is it a choice between old bugs and newer bugs? Given all the different constantly-changing libraries, compilers, etc. involved in something like mpl, I am sometimes amazed that it works at all--and worried that at any moment it may cease to work.

Eric

···

Cheers,
Mike