After some thorough research on the subject I decided to post my

conclusions/thoughts here. Beware, this is a long one.

Font problems

## ···

==========

There are no good, complete, free, unicode, Open/TrueType math fonts

currently. We will have to wait for the STIX fonts. On the site it

says that the beta version of the fonts will be available in

september, so probably the next SoC could cover that - if we're lucky

;).

I had a look at the following Open/TrueType unicode fonts:

* CMU fonts. This fonts practicaly don't have any math symbols, so

they're not a solution.

* The fonts used by Open Office - Open Symbol (opens___.ttf), which

has a decent set of symbols (unicode). This fonts were made to play

well with Times, and could be used in mathtext with perhaps Nimbus

Roman fonts.

* FreeFont. GPL fonts, available on any Linux box. They have an

extensive list of supported symbols. Probably the best free TrueType

fonts out there.

The best solution to the problem of good fonts would be using the

currently available CM and AMS (and other) Type1 fonts which are free

and come with every TeX distribution. These fonts are complete, and

have pretty good Unicode support which is ilustrated by the following

code:

from matplotlib.ft2font import FT2Font

import unicodedata

# Path to a Type1 font

filename = r'c:\texmf\fonts\type1\bluesky\symbols\msam10.pfb'

f = FT2Font(filename)

indexes = f.get_charmap()

for index, uni in indexes.items():

try:

name = unicodedata.name(unichr(uni))

except ValueError:

name = None

print f.get_glyph_name(index), index, name, repr(unichr(uni))

which outputs

space 128 SPACE u' '

diamond 6 BLACK DIAMOND SUIT u'\u2666'

therefore 41 THEREFORE u'\u2234'

because 42 BECAUSE u'\u2235'

muchless 110 MUCH LESS-THAN u'\u226a'

muchgreater 111 MUCH GREATER-THAN u'\u226b'

dblarrowleft 18 LEFT RIGHT DOUBLE ARROW u'\u21d4'

dblarrowright 19 RIGHTWARDS DOUBLE ARROW u'\u21d2'

lessorgreater 55 LESS-THAN OR GREATER-THAN u'\u2276'

greaterorless 63 GREATER-THAN OR LESS-THAN u'\u2277'

angle 92 ANGLE u'\u2220'

proportional 95 PROPORTIONAL TO u'\u221d'

msam10 font was used in the above code, but other fonts behave similarly.

Unfortunately the most important function in FT2Font class

f.get_glyph(index)

raises

ValueError: Glyph index out of range

for Type1 fonts, but I think that this could be easily fixed.

Current C++ code for get_glyph:

char FT2Font::get_glyph__doc__[] =

"get_glyph(num)\n"

"\n"

"Return the glyph object with num num\n"

;

Py::Object

FT2Font::get_glyph(const Py::Tuple & args){

_VERBOSE("FT2Font::get_glyph");

args.verify_length(1);

int num = Py::Int(args[0]);

if ( (size_t)num >= gms.size())

throw Py::ValueError("Glyph index out of range");

//todo: refcount?

return Py::asObject(gms[num]);

}

The problem with this solution (if we get get_glyph to work with

Type1) could be the backends. Agg wouldn't have to change much (if at

all), but I don't know about the PS and SVG backends. Type 1 fonts are

installable on both windows (via .pfm files) and Unix systems, so I

guess SVG files could be viewed/changed without much hassle, and the

PS backend could be changed a bit to support Type1 fonts.

Also, all the characters are spread around in a pretty large number of

files, but I suppose that with a little code this can be surpassed.

# Unicode problems

The following is assembled from the report ¸"Unicode Support for

Mathematics", which is the first source of information regarding

mathematics and Unicode.

The biggest problem with *proper* math Unicode are the "Mathematical

Alphanumeric Symbols", which are found in the 1D400..1D7FF range, not

in the Basic Multilingual Plane. These are not found in any free font.

I also noticed that Python's support for Unicode outside the BMP plane

is not very good. The following example works on Linux (Ubuntu 6.06),

but doesn't work on Windows XP (32):

import unicodedata

unicodedata.name(U'U\U0001d400')

Traceback (most recent call last):

File "<stdin>", line 1, in ?

TypeError: need a single Unicode character as parameter

The output should say:

MATHEMATICAL BOLD CAPITAL A

The "Mathematical Alphanumeric Symbols" block contains:

* Mathematical bold letters

* Mathematical italic letters (used for variables, default font in

TeX math mode)

* Mathematical bold italic letters

* Mathematical script (calligraphic) letters

* Mathematical bold script letters

* Mathematical fraktur letters

* Mathematical double-struck letters

* Mathematical bold fraktur letters

* Mathematical sans-serif letters

* Mathematical sans-serif bold letters

* Mathematical sans-serif italic letters

* Mathematical sans-serif bold italic letters

* Mathematical monospace letters

* Dotless symbols

* Bold Greek symbols

* Additional bold Greek symbols

* Italic Greek symbols

* Additional italic Greek symbols

* Bold italic Greek symbols

* Additional bold italic Greek symbols

* Sans-serif bold Greek symbols

* Sans-serif bold italic Greek symbols

* Additional sans-serif bold Greek symbols

* Additional sans-serif bold italic Greek symbols

* Bold digits

* Double-struck digits

* Sans-serif digits

* Sans-serif bold digits

* Monospace digits

These were all put in the Unicode character set because of their

semantic meanings in mathematics, although practically all are just

font variations (<font>). The roman math letters (serif, normal, used

for digits) default to the "Basic Latin" block.

It is interesting to note that the "Mathematical Alphanumeric Symbols"

block doesn't seem to be supported by, for example, Arial Unicode MS

(it supports only the BMP).

This issue cannot be successfully solved until the STIX fonts come

out. If they package them right (and they ought to), we could have a

single .ttf file for all the glyphs needed for mathtext. Until then,

any solution will need some sort of mapping between unicode blocks

(character ranges) and fontfiles (at least for italic, calligraphic

etc. fonts)

# Possible enhancements

I think there should be a thin Python wrapper around the FreeType2

FT2Font class. Then, for example, all the caching could be handled by

that class. This would allow not only caching for mathtext, but even

for *plain text* and would clean up code. This would also allow adding

new functionality, without messing around with C++, and without

breaking old code.

One could then, for example, have a FT2Font class method

get_unicode_glyph that would return the glyph based on his unicode

index, or better yet, the next code would be easy implementable:

glyphs = FT2Font('/path/to/font')

glypha = glyphs['a']

or even:

text_to_render = glyphs.text('Some lame text')

or something similar. Again, this would not break old code and would

ease writing new code. However, as John once said:

The font library is probably an SOC project of

it's own, because we would like to settle on one freetype library that

both matplotlib and enthought/chaco can use. How to deal with this

issue without becoming consumed by it will require some thought.

# Conclusion

John, what should I do? Please comment.

I think that the best solution right now are unfortunately the BaKoMa

fonts. If we could get the Type1 fonts to work then I could probably

easily ingegrate them into the existing model. I could also try to do

something with the Open Symbol fonts, and the FreeFont (windows users

could dowload them sepparately).

Cheers,

Edin