I would like to try this. Due to time constraints, it may
> take some time. As far as I understand I have to use the
> GlyphIDs as well as the map code from cmap_format_4 to
> create a latex_to_umbelleek dictionary. Any hints from font
> experts are appreciated.
The minimum you need to do is provide a dictionary that maps TeX
symbol name to the fontname/glyphindex for that symbol. Eg for \pm in
bakoma, the font name is cmsy10.ttf, the glyph index is 8 , the
character code is 167 (hex is 0xa7) and the glyph name is plusminus.
The entry in the latex_to_bakoma dict is
r'\pm' : ('cmsy10', 8),
From the fontname and glyph index, we can get the character code and
glyphname from the ttf file. I have written a little helper script
for you. It's brute force and ain't terribly pretty, but it (mostly,
see below) works.
This creates a font grid table png using the agg backend and
matplotlib's ft2font module - you'll probably want to get the latest
CVS matplotlib for this to work properly - I'm not 100% sure this is
required but it is at least strongly recommended.
It will produce font grid images for the font specified on the command
like, like the following for umr10.ttf
You can use these grid tables to get the hex charcode code of the
symbol you want, and the output of the script lists the glyphind,
ccode, hex(ccode), and name, sorted by charcode, so you can look up
the glyphind form the hex code. Ie
1) Pick a new tex symbol.
2) Find the corresponding character in one of the umbellek font
table pngs, or by using the glyph names listed when you run the
3) Use the font_table output to get the glyphind corresponding to
the symbol/name of interest.
4) GOTO 1
There is probably a better way, but with a combination of glyphnames
and grid tables you can knock this out in several hours of tedious
work. Any other information you want to attach while you are in the
thick of it (mathml names, unicode chars) would be a great, but is not
> I would like to add codes for accented chars: r'?':
> ('umr10', <code>) Should _mathtext_data.py contain a
> encoding line, i.e. # -*- coding: latin1 -*- to allow
> non-Ascii chars?
Perhaps others can give input here about what would be the best way to
proceed. My inclination is to use the TeX names like \"a where
possible, but by all means add them if you have them - getting the
codes is the relatively tedious part, providing the proper interface
to them can be worked out later. It may require some changes to the
parser to support \"a and friends, but this is no problem.
Now, on to the "mostly working" part of the font_table script, which
is why I CCd Paul on this email. The font_table script is working on
the um*.ttf fonts but failing on the bl*.ttf fonts. The reason it is
failing is that FT2Font::get_charmap is returning an empty dict.
These fonts are not empty, eg ft2font reports 1 face, 2 charmaps, and
124 glyphs for blsy.ttf, but get_charmap is returning empty, because
the call to
FT_ULong code = FT_Get_First_Char(face, &index);
is returning 0 for code and index.