# Problems with Unicode in mathtext

Conclusion ======== John, what should I do? Please comment.

I don't think we should be distracted by Type1 fonts or the lack of a
good set of free unicode trueype math fonts. We will have those soon
enough (or at some point). What I would like to see is an
infrastructure where the user can point to an arbitrary set of unicode
fonts and have mathtext work with that font set. Then when the STIX
or some other set of unicode fonts become available, we can point to
them. Users who have proprietary unicode math fonts can use them.

I don't think we are at the point now where we can easily test
mathtext with an arbitrary set of unicode fonts. I'd like to be there
before we get distracted on other things. Or am I missing something?

JDH

This is the problem:
For now, mathtext knows about \it (this should be \mit - as in plain
TeX), \rm, \cal, \tt fontface commands.

Suppose I define that \it is mapped to VeraIt.ttf (not important, it
could be any *italic* font).

Right now, with the Unicode font classes, the behavior is:
$abc$ gives "abc" in italic style. But, if the font is designed
properly, even the math symbols would be italic, so one would get
$\sum$ to be italic, which should not happen.

That's why the Unicode standard defines math-italic characters to be
in the Unicode Plane 1 (1D434 is unicode MATHEMATICAL ITALIC SMALL A
etc.), so they can be bundled together in a font that is normaly
Roman. So the parser should transform "abc" to
U"\U0001D44E\U0001D44F\U0001D450" or to some apropriate TeX commands,
like r"\uni1D44F\uni1D44F\uni1D450". However, TeX commands cannot
contain numbers, so it would be better to call them \unimia, \unimib,
\unimic.

The problem with the unicode transform is that, under windows, python
converts Unicode chars outside BMP to surrogate pairs, so
len(U"\U0001D44E") would return two, and one would have difficulties
to interpret that as a single char - which is needed to pull the glyph
out of the fontfile. Again, this is not a problem if we convert all
the characters to some made up TeX commands, as I the ones above.

The same should be done with other font variants in math mode (cal,
tt, bold etc.).

Everything would be fine as long as all the glyphs are in one file.
However, I haven't still found a font that defines the unicode block:
1D400..1D7FF; Mathematical Alphanumeric Symbols
but I suppose the STIX font will be one file only, and they will have
even that block properly defined. If anyone knows a free Font that
defines that range, please speak up.

If everything is in one font file, then the job will be pretty easy.
But if the glyphs are spread across several files it would be best to
have some Unicode block -> fontfile mapping so one could set:
0000..007F; Basic Latin -> file1
2200..22FF; Mathematical Operators -> file2
...
1D400..1D7FF; Mathematical Alphanumeric Symbols -> filen

Maybe this is the best approach?

Cheers,
Edin

···

On 7/14/06, John Hunter <jdhunter@...5...> wrote:

> Conclusion ======== John, what should I do? Please comment.

I don't think we should be distracted by Type1 fonts or the lack of a
good set of free unicode trueype math fonts. We will have those soon
enough (or at some point). What I would like to see is an
infrastructure where the user can point to an arbitrary set of unicode
fonts and have mathtext work with that font set. Then when the STIX
or some other set of unicode fonts become available, we can point to
them. Users who have proprietary unicode math fonts can use them.

I don't think we are at the point now where we can easily test
mathtext with an arbitrary set of unicode fonts. I'd like to be there
before we get distracted on other things. Or am I missing something?

JDH