TeX in xlabel ?

John_Hunter · October 29, 2004, 2:12pm

I would like to try this. Due to time constraints, it may

    > take some time. As far as I understand I have to use the
    > GlyphIDs as well as the map code from cmap_format_4 to
    > create a latex_to_umbelleek dictionary. Any hints from font
    > experts are appreciated.

The minimum you need to do is provide a dictionary that maps TeX
symbol name to the fontname/glyphindex for that symbol. Eg for \pm in
bakoma, the font name is cmsy10.ttf, the glyph index is 8 , the
character code is 167 (hex is 0xa7) and the glyph name is plusminus.
The entry in the latex_to_bakoma dict is

r'\pm' : ('cmsy10', 8),

From the fontname and glyph index, we can get the character code and
glyphname from the ttf file. I have written a little helper script
for you. It's brute force and ain't terribly pretty, but it (mostly,
see below) works.

http://matplotlib.sf.net/share/font_table.py

This creates a font grid table png using the agg backend and
matplotlib's ft2font module - you'll probably want to get the latest
CVS matplotlib for this to work properly - I'm not 100% sure this is
required but it is at least strongly recommended.

It will produce font grid images for the font specified on the command
like, like the following for umr10.ttf

http://matplotlib.sf.net/share/umr10.ttf.png

You can use these grid tables to get the hex charcode code of the
symbol you want, and the output of the script lists the glyphind,
ccode, hex(ccode), and name, sorted by charcode, so you can look up
the glyphind form the hex code. Ie

1) Pick a new tex symbol.

  2) Find the corresponding character in one of the umbellek font
     table pngs, or by using the glyph names listed when you run the
     font_table script.

3) Use the font_table output to get the glyphind corresponding to
the symbol/name of interest.

4) GOTO 1

There is probably a better way, but with a combination of glyphnames
and grid tables you can knock this out in several hours of tedious
work. Any other information you want to attach while you are in the
thick of it (mathml names, unicode chars) would be a great, but is not
necessary.

    > I would like to add codes for accented chars: r'?':
    > ('umr10', <code>) Should _mathtext_data.py contain a
    > encoding line, i.e. # -*- coding: latin1 -*- to allow
    > non-Ascii chars?

Perhaps others can give input here about what would be the best way to
proceed. My inclination is to use the TeX names like \"a where
possible, but by all means add them if you have them - getting the
codes is the relatively tedious part, providing the proper interface
to them can be worked out later. It may require some changes to the
parser to support \"a and friends, but this is no problem.

Now, on to the "mostly working" part of the font_table script, which
is why I CCd Paul on this email. The font_table script is working on
the um*.ttf fonts but failing on the bl*.ttf fonts. The reason it is
failing is that FT2Font::get_charmap is returning an empty dict.
These fonts are not empty, eg ft2font reports 1 face, 2 charmaps, and
124 glyphs for blsy.ttf, but get_charmap is returning empty, because
the call to

FT_ULong code = FT_Get_First_Char(face, &index);

is returning 0 for code and index.

Any ideas?

JDH

Paul_Barrett1 · November 1, 2004, 8:20pm

John Hunter wrote:

"Carl" == Carl Dr Kleffner <cmkleffner@...380...> writes:

   > I would like to try this. Due to time constraints, it may
   > take some time. As far as I understand I have to use the
   > GlyphIDs as well as the map code from cmap_format_4 to
   > create a latex_to_umbelleek dictionary. Any hints from font
   > experts are appreciated.

The minimum you need to do is provide a dictionary that maps TeX
symbol name to the fontname/glyphindex for that symbol. Eg for \pm in
bakoma, the font name is cmsy10.ttf, the glyph index is 8 , the
character code is 167 (hex is 0xa7) and the glyph name is plusminus.
The entry in the latex_to_bakoma dict is

   r'\pm' : ('cmsy10', 8),

From the fontname and glyph index, we can get the character code and
glyphname from the ttf file. I have written a little helper script
for you. It's brute force and ain't terribly pretty, but it (mostly,
see below) works.

http://matplotlib.sf.net/share/font_table.py

This creates a font grid table png using the agg backend and
matplotlib's ft2font module - you'll probably want to get the latest
CVS matplotlib for this to work properly - I'm not 100% sure this is
required but it is at least strongly recommended.

It will produce font grid images for the font specified on the command
like, like the following for umr10.ttf

http://matplotlib.sf.net/share/umr10.ttf.png

You can use these grid tables to get the hex charcode code of the
symbol you want, and the output of the script lists the glyphind,
ccode, hex(ccode), and name, sorted by charcode, so you can look up
the glyphind form the hex code. Ie

1) Pick a new tex symbol.

2) Find the corresponding character in one of the umbellek font
    table pngs, or by using the glyph names listed when you run the
    font_table script.

3) Use the font_table output to get the glyphind corresponding to
    the symbol/name of interest.

4) GOTO 1

There is probably a better way, but with a combination of glyphnames
and grid tables you can knock this out in several hours of tedious
work. Any other information you want to attach while you are in the
thick of it (mathml names, unicode chars) would be a great, but is not
necessary.

   > I would like to add codes for accented chars: r'�':
   > ('umr10', <code>) Should _mathtext_data.py contain a
   > encoding line, i.e. # -*- coding: latin1 -*- to allow
   > non-Ascii chars?

Perhaps others can give input here about what would be the best way to
proceed. My inclination is to use the TeX names like \"a where
possible, but by all means add them if you have them - getting the
codes is the relatively tedious part, providing the proper interface
to them can be worked out later. It may require some changes to the
parser to support \"a and friends, but this is no problem.

A possible alternative approach to getting the proper glyph from the TTF file is to map the LaTeX name into the PostScript name and then use the PS name to find the glyph index from ft2font::get_name_index(). This or a similar approach is what I had in mind when I first implemented the TTF code. This assumes that the glyphs associated with the PS names adhere to the Adobe PS naming definition. In this case, the PS name could be used to create on-the-fly a lookup dictionary of the fontname/index.

My memory is a bit hazy on this issue, but I seem to recall that the TeX font names are not completely consistent with the Adobe PS names. In addition, there needed to be a mechanism to distinguish between the same glyph in different Bakoma fonts. I'm guessing that the more recent fonts probably adhere to the PS font naming convention and therefore it might be worthwhile persuing this approach again. It sure would make it easier to create the math font tables and to use other fonts that contain such mathematical glyphs.

Now, on to the "mostly working" part of the font_table script, which
is why I CCd Paul on this email. The font_table script is working on
the um*.ttf fonts but failing on the bl*.ttf fonts. The reason it is
failing is that FT2Font::get_charmap is returning an empty dict.
These fonts are not empty, eg ft2font reports 1 face, 2 charmaps, and
124 glyphs for blsy.ttf, but get_charmap is returning empty, because
the call to

FT_ULong code = FT_Get_First_Char(face, &index);

is returning 0 for code and index.

Any ideas?

John, you appear to have solved this one yourself.

-- Paul

···

--
Paul Barrett, PhD Space Telescope Science Institute
Phone: 410-338-4475 ESS/Science Software Branch
FAX: 410-338-4767 Baltimore, MD 21218