MEP14: Improve text handling

Michael_Droettboom · May 30, 2013, 3:59pm

I’ve drafted a MEP with a plan to improve some of the text and font
handling in matplotlib.

I'd love any and all feedback.

https://github.com/matplotlib/matplotlib/wiki/Mep14

Mike

_Chris.Barker · May 30, 2013, 6:27pm

nice writ-up and thanks for workign on this.

One idea (alternative?) would be to put more effort into the
"mathtext" renderer. TeX itself, of course does an outstanding job of
laying out text, paragraphs, etc. I'm assuming that the core stuff is
already in mathtext, so adding better support for regular old non-math
text would be a less-than-huge deal. And we still wouldn't need the
full how-to-split-pages and all that code for MPL.

Not sure about properly handling unicode issues, though modern TeX
does support unicode.

With a fully-function mathtex, it could be the default (only?) text
layout system for MPL, simplifying things quite a bit.

... just a thought.

-Chris

···

On Thu, May 30, 2013 at 8:59 AM, Michael Droettboom <mdroe@...31...> wrote:

I've drafted a MEP with a plan to improve some of the text and font handling
in matplotlib.

I'd love any and all feedback.

--

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R (206) 526-6959 voice
7600 Sand Point Way NE (206) 526-6329 fax
Seattle, WA 98115 (206) 526-6317 main reception

Chris.Barker@...236...

_Nicolas_P_Rougier1 · May 30, 2013, 7:33pm

For the free type wrapper, maybe the freetype-py may be of some help:
http://code.google.com/p/freetype-py/

I did not wrap all the freetype library but it already allows a fair amount of font manipulation/rendering.

For unicode/harfbuzz, I've found this example

to be incredibly useful to understand the (poorly documented) library. The strong point of harfbuzz is to have no heavy dependencies (compared to pango for example). By the way, Behad is considering a refactoring of the library and it might be worth to interact with him (on the harfbuzz list) to see how this could ease a python wrapper (if you intend to use it of course).

In the current draft, you're speaking of rich text but I found no reference for a possible markup (or equivalent) to specify the different font, color, boldness, etc.

Nicolas

···

On May 30, 2013, at 8:27 PM, Chris Barker - NOAA Federal <chris.barker@...706...36...> wrote:

On Thu, May 30, 2013 at 8:59 AM, Michael Droettboom <mdroe@...31...> wrote:

I've drafted a MEP with a plan to improve some of the text and font handling
in matplotlib.

I'd love any and all feedback.

nice writ-up and thanks for workign on this.

One idea (alternative?) would be to put more effort into the
"mathtext" renderer. TeX itself, of course does an outstanding job of
laying out text, paragraphs, etc. I'm assuming that the core stuff is
already in mathtext, so adding better support for regular old non-math
text would be a less-than-huge deal. And we still wouldn't need the
full how-to-split-pages and all that code for MPL.

Not sure about properly handling unicode issues, though modern TeX
does support unicode.

With a fully-function mathtex, it could be the default (only?) text
layout system for MPL, simplifying things quite a bit.

... just a thought.

-Chris

--

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R (206) 526-6959 voice
7600 Sand Point Way NE (206) 526-6329 fax
Seattle, WA 98115 (206) 526-6317 main reception

Chris.Barker@...236...

------------------------------------------------------------------------------
Introducing AppDynamics Lite, a free troubleshooting tool for Java/.NET
Get 100% visibility into your production application - at no cost.
Code-level diagnostics for performance bottlenecks with <2% overhead
Download for free and get started troubleshooting in minutes.
http://p.sf.net/sfu/appdyn_d2d_ap1
_______________________________________________
Matplotlib-devel mailing list
Matplotlib-devel@lists.sourceforge.net
matplotlib-devel List Signup and Options

Michael_Droettboom · May 31, 2013, 12:03am

I've drafted a MEP with a plan to improve some of the text and font handling
in matplotlib.

I'd love any and all feedback.

nice writ-up and thanks for workign on this.

One idea (alternative?) would be to put more effort into the
"mathtext" renderer. TeX itself, of course does an outstanding job of
laying out text, paragraphs, etc. I'm assuming that the core stuff is
already in mathtext, so adding better support for regular old non-math
text would be a less-than-huge deal. And we still wouldn't need the
full how-to-split-pages and all that code for MPL.

That's an interesting idea, that we should definitely ruminate on. That still doesn't address the Unicode issues, which are really complex to get right -- I'd really rather depend on something else for that. But what you suggest might be the best way forward to improve the built-in rendering for a good fraction of users that don't really care about Unicode.

Not sure about properly handling unicode issues, though modern TeX
does support unicode.

Right -- and I think moving to XeTeX for the "usetex" backend, which is now pretty widely available, might be a good improvement on that front. I still don't want to reimplement all of that, if I can avoid it.

With a fully-function mathtex, it could be the default (only?) text
layout system for MPL, simplifying things quite a bit.

I'm not sure that's realistic. The usetex backend gets a great deal of use, and I don't think it's only because it handles multiline text better -- it's also the easiest way to make the text match that of a larger TeX document in which it's included (though the new PGF backend goes some way to helping that in an entirely different way). It might be worth collating a list of reasons that users are using "usetex" to include in the MEP -- if we can address them all in another way, great, but if not it's not too difficult to keep something that already works fairly well working. The problem I have with it is not really that it exists, only that it has tendrils all throughout matplotlib that could be better localized into a single set of modules.

... just a thought.

Thanks. Keep em coming!

Mike

···

On 05/30/2013 02:27 PM, Chris Barker - NOAA Federal wrote:

On Thu, May 30, 2013 at 8:59 AM, Michael Droettboom <mdroe@...31...> wrote:

Michael_Droettboom · May 31, 2013, 12:21am

For the free type wrapper, maybe the freetype-py may be of some help:
Google Code Archive - Long-term storage for Google Code Project Hosting.

I did not wrap all the freetype library but it already allows a fair amount of font manipulation/rendering.

I looked at this a number of years ago, and just looked at it again today. I think in general it's a better approach than what we have now in matplotlib, in that it's a thin wrapper around freetype rather than a "just enough to for what we need approach", which should make things more flexible in the long run. It's a lot like what I have in mind.

However, I do have some concerns about it and I'd like to get a sense of your receptibility to these changes.

1) It's implemented in ctypes. I'm not much of a fan of ctypes, as it has the potential to segfault in nasty ways if the API changes in any way from what was expected (which would normally be caught at compile time in a C extension). I'm also concerned about the overhead of ctypes, given that there are already so many required optimizations in the matplotlib freetype wrapper to make it fast enough. But I'm willing to hold judgement on that until some measurements have been made.

2) It's not Numpy-aware. For example, it loads image buffers into regular Python lists. This really should use Numpy for speed.

3) It exposes the fixed point numbers to Python as integers -- it should really return all of these as floats -- the user shouldn't have to know or remember which values are 16.16 and which are 24.8 etc. It should just give floats. Double precision (with 52 bits in the mantissa) is enough for any of these 32-bit fixed-point values. I think that's just a remnant of older systems and needing to run on hardware without an FPU that doesn't need to be brought forward into the Python wrapper.

4) It should have another layer to handle the decoding of SFNT tables in a consistent manner. I know the sfnt-names.py example does this, but that should be built into the library. There are certain places where hiding the details of the underlying font file is a good thing -- and I think one of the reasons freetype doesn't do this is the lack of a standard Unicode type in C. We don't have that problem in Python.

I think all of these are fixable by adding another layer on top, with the exception of (1) of course. Maybe it makes sense to build that intermediate layer, adapt matplotlib to it, benchmark the ctypes issue, and if necessary reimplement the core using C/API.

For unicode/harfbuzz, I've found this example

GitHub - lxnt/ex-sdl-freetype-harfbuzz: Example code which uses SDL, freetype, and harfbuzz to do ttf/otf text layout and rendering

to be incredibly useful to understand the (poorly documented) library. The strong point of harfbuzz is to have no heavy dependencies (compared to pango for example). By the way, Behad is considering a refactoring of the library and it might be worth to interact with him (on the harfbuzz list) to see how this could ease a python wrapper (if you intend to use it of course).

That example is very helpful. Thanks. I should add to the MEP, for those that are not aware, that even though Harfbuzz is a part of the Gtk/Gnome/Cairo ecosystem, it is a very standalone library itself, and is the closest to "works everywhere with minimal requirements" of any of the available options. I should definitely clarify that even though there are many options for font layout libraries, including both cross-platform/open source and closed-source-vendor ones, Harfbuzz could be the "one to rule them all" so we wouldn't necessarily need to wrap all of them.

In the current draft, you're speaking of rich text but I found no reference for a possible markup (or equivalent) to specify the different font, color, boldness, etc.

Yeah -- I need to make that more explicit. I think MEP14 needs to consider the *possibility* of adding rich text support down the line so that the API can support it, but the details of how we might actually do that should be postponed for another MEP. It's already a lot to bite off as it is. Does that make sense to you -- are there things in the proposed API that would inhibit that from being added in the future?

Cheers,
Mike

···

On 05/30/2013 03:33 PM, Nicolas Rougier wrote:

Nicolas

On May 30, 2013, at 8:27 PM, Chris Barker - NOAA Federal <chris.barker@...236...> wrote:

On Thu, May 30, 2013 at 8:59 AM, Michael Droettboom <mdroe@...31...> wrote:

I've drafted a MEP with a plan to improve some of the text and font handling
in matplotlib.

I'd love any and all feedback.

nice writ-up and thanks for workign on this.

One idea (alternative?) would be to put more effort into the
"mathtext" renderer. TeX itself, of course does an outstanding job of
laying out text, paragraphs, etc. I'm assuming that the core stuff is
already in mathtext, so adding better support for regular old non-math
text would be a less-than-huge deal. And we still wouldn't need the
full how-to-split-pages and all that code for MPL.

Not sure about properly handling unicode issues, though modern TeX
does support unicode.

With a fully-function mathtex, it could be the default (only?) text
layout system for MPL, simplifying things quite a bit.

... just a thought.

-Chris

--

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R (206) 526-6959 voice
7600 Sand Point Way NE (206) 526-6329 fax
Seattle, WA 98115 (206) 526-6317 main reception

Chris.Barker@...236...

------------------------------------------------------------------------------
Introducing AppDynamics Lite, a free troubleshooting tool for Java/.NET
Get 100% visibility into your production application - at no cost.
Code-level diagnostics for performance bottlenecks with <2% overhead
Download for free and get started troubleshooting in minutes.
http://p.sf.net/sfu/appdyn_d2d_ap1
_______________________________________________
Matplotlib-devel mailing list
Matplotlib-devel@lists.sourceforge.net
matplotlib-devel List Signup and Options

------------------------------------------------------------------------------
Introducing AppDynamics Lite, a free troubleshooting tool for Java/.NET
Get 100% visibility into your production application - at no cost.
Code-level diagnostics for performance bottlenecks with <2% overhead
Download for free and get started troubleshooting in minutes.
http://p.sf.net/sfu/appdyn_d2d_ap1
_______________________________________________
Matplotlib-devel mailing list
Matplotlib-devel@lists.sourceforge.net
matplotlib-devel List Signup and Options

_Paul_Hobson · May 31, 2013, 1:10am

>
> With a fully-function mathtex, it could be the default (only?) text
> layout system for MPL, simplifying things quite a bit.

I'm not sure that's realistic. The usetex backend gets a great deal of
use, and I don't think it's only because it handles multiline text
better -- it's also the easiest way to make the text match that of a
larger TeX document in which it's included (though the new PGF backend
goes some way to helping that in an entirely different way).

Exactly! I like that I can set text.usetex=True and add
\usepackage{fourier} and I *know* that my figures and document will look
the same.

That said, I've never been able to get the PGF backend to work well. Random
elements are pixelated. It's surely user-error on my end, but the usetex is
comparatively easy to set up.

It might
be worth collating a list of reasons that users are using "usetex" to
include in the MEP -- if we can address them all in another way, great,
but if not it's not too difficult to keep something that already works
fairly well working. The problem I have with it is not really that it
exists, only that it has tendrils all throughout matplotlib that could
be better localized into a single set of modules.

As I state above -- I absolutely require One Font throughout my documents.
If it's a serif font, I use the fourier TeX package. If it's a sans-serif
font, I do the weird \sansmath voodoo (I still owe you a PR with an example
of setting that up). Point is, it works well.

Cheers,
-paul

···

On Thu, May 30, 2013 at 5:03 PM, Michael Droettboom <mdroe@...31...> wrote:

On 05/30/2013 02:27 PM, Chris Barker - NOAA Federal wrote:

Michael_Droettboom · May 31, 2013, 3:21am

Additionally, I just discovered that ctypes isn't available on Google App Engine, for obvious security reasons. That sort of, unfortunately, makes it a non-starter for matplotlib.

Wish that weren't the case, but I think Google App Engine support is an important thing to keep going...

Mike

···

On 05/30/2013 08:21 PM, Michael Droettboom wrote:

1) It's implemented in ctypes. I'm not much of a fan of ctypes, as it has the potential to segfault in nasty ways if the API changes in any way from what was expected (which would normally be caught at compile time in a C extension). I'm also concerned about the overhead of ctypes, given that there are already so many required optimizations in the matplotlib freetype wrapper to make it fast enough. But I'm willing to hold judgement on that until some measurements have been made. 2) It's not Numpy-aware. For example, it loads image buffers into regular Python lists. This really should use Numpy for speed. 3) It exposes the fixed point numbers to Python as integers -- it should really return all of these as floats -- the user shouldn't have to know or remember which values are 16.16 and which are 24.8 etc. It should just give floats. Double precision (with 52 bits in the mantissa) is enough for any of these 32-bit fixed-point values. I think that's just a remnant of older systems and needing to run on hardware without an FPU that doesn't need to be brought forward into the Python wrapper. 4) It should have another layer to handle the decoding of SFNT tables in a consistent manner. I know the sfnt-names.py example does this, but that should be built into the library. There are certain places where hiding the details of the underlying font file is a good thing -- and I think one of the reasons freetype doesn't do this is the lack of a standard Unicode type in C. We don't have that problem in Python. I think all of these are fixable by adding another layer on top, with the exception of (1) of course. Maybe it makes sense to build that intermediate layer, adapt matplotlib to it, benchmark the ctypes issue, and if necessary reimplement the core using C/API.

_Nicolas_P_Rougier1 · May 31, 2013, 6:29am

1) It's implemented in ctypes. I'm not much of a fan of ctypes, as it
has the potential to segfault in nasty ways if the API changes in any
way from what was expected (which would normally be caught at compile
time in a C extension). I'm also concerned about the overhead of
ctypes, given that there are already so many required optimizations in
the matplotlib freetype wrapper to make it fast enough. But I'm willing
to hold judgement on that until some measurements have been made.

I would never have thought ctypes would be a problem for speed/optimization and I never benchmarked the freetype-py. Not sure how to do that though.

2) It's not Numpy-aware. For example, it loads image buffers into
regular Python lists. This really should use Numpy for speed.

Yes, and I recently discovered it may make things really slow in some cases.

3) It exposes the fixed point numbers to Python as integers -- it should
really return all of these as floats -- the user shouldn't have to know
or remember which values are 16.16 and which are 24.8 etc. It should
just give floats. Double precision (with 52 bits in the mantissa) is
enough for any of these 32-bit fixed-point values. I think that's just a
remnant of older systems and needing to run on hardware without an FPU
that doesn't need to be brought forward into the Python wrapper.

You're right. I try to keep the very-low level to stick to the freetype implementation/type and the mid-level wrapper should use float everywhere (I may need to check that).

This + your comment on Google App Engine makes me think that freetype-py might not be so useful in the end. Anyway, I would gladly (try to) contribute to the new system.

Nicolas

_Chris.Barker · June 3, 2013, 8:26pm

I'm also concerned about the overhead of
ctypes, given that there are already so many required optimizations in
the matplotlib freetype wrapper to make it fast enough. But I'm willing
to hold judgement on that until some measurements have been made.

I would never have thought ctypes would be a problem for speed/optimization and I never benchmarked the freetype-py.

Well, I see it this way -- for high performing Python code, you often
need to "vectorize" operations one way or another. i.e. if you need to
do a given operation on a bunch of numbers, objects, whatever, you
need to be able to pass the collection in to lower-level code, so you
dont have all the overhead of python funciton calls, dynamic typing,
etc, inside your loop.

Many (most) C libraries are not designed this way. So when writing
python wrappers, you need to loop though a sequence in python, and
call the underlying c function for each item. With ctypes, you write
that code inPython, with cython, it's easy to write that code in
cython, which gets compiled down to C -- you can get major performance
benefits from this.

And Cython is almost at easy to write as Python.

How this applied to freetype, I don't know.

2) It's not Numpy-aware. For example, it loads image buffers into
regular Python lists. This really should use Numpy for speed.

you can do this with ctypes, and would work fine for image buffers, by
many not as well as Cython for say, a large sequence of characters...

-Chris

···

On Thu, May 30, 2013 at 11:29 PM, Nicolas Rougier <Nicolas.Rougier@...922...> wrote:

--

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R (206) 526-6959 voice
7600 Sand Point Way NE (206) 526-6329 fax
Seattle, WA 98115 (206) 526-6317 main reception

Chris.Barker@...236...