Mathtext improvements

I'm working on some improvements to the mathtext engine on a branch. Feel free to join in if curious, but I expect to break lots of things as I go.

https://matplotlib.svn.sourceforge.net/svnroot/matplotlib/branches/mathtext_mgd/

I've collected a bunch of math expressions from the source tree for use in regression testing. If you have any math strings of your own that you want to make sure I don't break, please send them to me (probably should be off-list to conserve noise).

Here's the preliminary TODO list I'm working with in no particular order (compiled from the TODO list in mathtext.py and the list of improvements in mathtext2.py):

1. Deal with nested sub/superscripts, such as $x_i_j$, equivalent to $x_{i_j}$
2. Make the font change tags (\cal, \tt, \rm etc.) behave more like TeX, e.g. use ${\rm sin}$ instead of $\rm{sin}$
3. Support roman function names, e.g. $\sin$ as a shortcut for ${\rm sin}$
4. Implement \frac
5. Other layout commands, like large \sqrt (I suspect there's a very large list of these things and they will have to be prioritized.)
6. Support kerning (probably best put off until we have good fonts with kerning information to use, e.g. STIX fonts)
...

(1 and 2 are already implemented in the branch.)

I don't want to start a long thread about all the desired features for mathtext -- I'm sure there are lots of them. There will need to be some way to prioritize, which I leave up to those on this list who have been around long enough to have a sense of what features are more pressing than others.

In general, is the goal with mathtext to become as TeX-compatible as possible (for some subset of standard TeX math syntax?) The reason I ask is, (1) above is not valid LaTeX and raises the error "Double subscript". Task (2) will break backward compatibility with existing matplotlib plots. In the long run, maintaining two codebases or two separate paths through the same codebase probably won't scale.

Cheers,
Mike

I'm working on some improvements to the mathtext engine on a branch.
Feel free to join in if curious, but I expect to break lots of things as
I go.

https://matplotlib.svn.sourceforge.net/svnroot/matplotlib/branches/mathtext_mgd/

I've collected a bunch of math expressions from the source tree for use
in regression testing. If you have any math strings of your own that
you want to make sure I don't break, please send them to me (probably
should be off-list to conserve noise).

Here's the preliminary TODO list I'm working with in no particular order
(compiled from the TODO list in mathtext.py and the list of improvements
in mathtext2.py):

1. Deal with nested sub/superscripts, such as $x_i_j$, equivalent to
$x_{i_j}$

Hmm, I thought that already worked -- it's been a long time since I
touched that code. What is the problem here?

2. Make the font change tags (\cal, \tt, \rm etc.) behave more like TeX,
e.g. use ${\rm sin}$ instead of $\rm{sin}$
3. Support roman function names, e.g. $\sin$ as a shortcut for ${\rm sin}$
4. Implement \frac
5. Other layout commands, like large \sqrt (I suspect there's a very
large list of these things and they will have to be prioritized.)

You will probably want to implement some primitive drawing in the
ft2font pixel buffer, to support things like a \frac bar (\frac would
be nice BTW)

6. Support kerning (probably best put off until we have good fonts with
kerning information to use, e.g. STIX fonts)

Ha, STIX, never made a deadline they could keep. Given their response
to Eric's inquiry, we could be waiting quite a while.

(1 and 2 are already implemented in the branch.)

I don't want to start a long thread about all the desired features for
mathtext -- I'm sure there are lots of them. There will need to be some

How about a short one :slight_smile: I think one of the most important
enhancements will be to support embedded mathtext expressions, eg

  r'some text $\sigma=1$ some more text'

Currently we have to do something like

  r'$\rm{some text\ }\sigma=1\rm{\ some more text}$'

which is pretty bad. This might require some changes to how mathtext
is identified -- the presence of an even number of dollar signs is
probably sufficient, but in some corner cases might give the wrong
results, requiring a flag to overrride, etc. THat or we simply adopt
the TeX standard and require all literal $ to be quoted. THe latter
wil break some code but is probably better than anything else.

JDH

···

On 7/16/07, Michael Droettboom <mdroe@...31...> wrote:

John Hunter wrote:

I'm working on some improvements to the mathtext engine on a branch.
Feel free to join in if curious, but I expect to break lots of things as
I go.

https://matplotlib.svn.sourceforge.net/svnroot/matplotlib/branches/mathtext_mgd/

I've collected a bunch of math expressions from the source tree for use
in regression testing. If you have any math strings of your own that
you want to make sure I don't break, please send them to me (probably
should be off-list to conserve noise).

Here's the preliminary TODO list I'm working with in no particular order
(compiled from the TODO list in mathtext.py and the list of improvements
in mathtext2.py):

1. Deal with nested sub/superscripts, such as $x_i_j$, equivalent to
$x_{i_j}$

Hmm, I thought that already worked -- it's been a long time since I
touched that code. What is the problem here?

The j is the same size as the i, because its font size is (somehow) determined as if it were attached to the x. My fix was to make '_' be an infix operator that's right associative. That makes $x_i_j$ parse just like $x_{i_j}$, and like how LaTeX appears to behave.

How about a short one :slight_smile: I think one of the most important
enhancements will be to support embedded mathtext expressions, eg

r'some text $\sigma=1$ some more text'

Currently we have to do something like

r'$\rm{some text\ }\sigma=1\rm{\ some more text}$'

which is pretty bad.

I'll add that to the list.

Cheers,
Mike

···

On 7/16/07, Michael Droettboom <mdroe@...31...> wrote:

Sorry I didn't address these on the first pass

Maximum compatibility with LaTeX is definitely the goal, within
reason. By that I mean, if we do something different and can fix it,
we should. I don't mean we should try and support everything that
LaTeX does, at least not this month <wink>

As for 1), the double subscript error, I thought it worked in TeX and
if it doesn't we should not support it. My comment in the TODO list
may have simply reflected this ignorance.

As for 2, the font syntax, we need not break existing mpl code,
because both \rm{text} and {\rm text} work in latex. We'll just be
adding support for the 2nd idiom.

JDH

···

On 7/16/07, Michael Droettboom <mdroe@...31...> wrote:

1. Deal with nested sub/superscripts, such as $x_i_j$, equivalent to
$x_{i_j}$
2. Make the font change tags (\cal, \tt, \rm etc.) behave more like TeX,
In general, is the goal with mathtext to become as TeX-compatible as
possible (for some subset of standard TeX math syntax?) The reason I
ask is, (1) above is not valid LaTeX and raises the error "Double
subscript". Task (2) will break backward compatibility with existing
matplotlib plots. In the long run, maintaining two codebases or two
separate paths through the same codebase probably won't scale.

> I'm working on some improvements to the mathtext engine on a branch.
> Feel free to join in if curious, but I expect to break lots of things as
> I go.
>
> https://matplotlib.svn.sourceforge.net/svnroot/matplotlib/branches/mathte
>xt_mgd/
>
> I've collected a bunch of math expressions from the source tree for use
> in regression testing. If you have any math strings of your own that
> you want to make sure I don't break, please send them to me (probably
> should be off-list to conserve noise).
>
> Here's the preliminary TODO list I'm working with in no particular order
> (compiled from the TODO list in mathtext.py and the list of improvements
> in mathtext2.py):
>
> 1. Deal with nested sub/superscripts, such as $x_i_j$, equivalent to
> $x_{i_j}$

Hmm, I thought that already worked -- it's been a long time since I
touched that code. What is the problem here?

I think $x_i_j$ is ambiguous. Maybe it is best to stick to Knuth-approved
syntax.

> 2. Make the font change tags (\cal, \tt, \rm etc.) behave more like TeX,
> e.g. use ${\rm sin}$ instead of $\rm{sin}$

Would it be possible to include the latex equivalents, like \textrm{} (its not
{\textrm ...}), \texttt{}, \textit{}... ? They are more familiar to some of
us (like latex's \frac{1}{2} is more familiar than tex's {1\over 2}).

> 3. Support roman function names, e.g. $\sin$ as a shortcut for ${\rm
> sin}$ 4. Implement \frac
> 5. Other layout commands, like large \sqrt (I suspect there's a very
> large list of these things and they will have to be prioritized.)

You will probably want to implement some primitive drawing in the
ft2font pixel buffer, to support things like a \frac bar (\frac would
be nice BTW)

Is mpl's mathtext guided by the algorithms published in "Computers and
Typesetting, TeX: The Program"?

> 6. Support kerning (probably best put off until we have good fonts with
> kerning information to use, e.g. STIX fonts)

Ha, STIX, never made a deadline they could keep. Given their response
to Eric's inquiry, we could be waiting quite a while.

They claim their website will be updated the week of July 9. They don't
indicate what year.

> (1 and 2 are already implemented in the branch.)
>
> I don't want to start a long thread about all the desired features for
> mathtext -- I'm sure there are lots of them. There will need to be some

How about a short one :slight_smile: I think one of the most important
enhancements will be to support embedded mathtext expressions, eg

  r'some text $\sigma=1$ some more text'

Currently we have to do something like

  r'$\rm{some text\ }\sigma=1\rm{\ some more text}$'

I agree, this is sorely lacking.

which is pretty bad. This might require some changes to how mathtext
is identified -- the presence of an even number of dollar signs is
probably sufficient, but in some corner cases might give the wrong
results, requiring a flag to overrride, etc. THat or we simply adopt
the TeX standard and require all literal $ to be quoted. THe latter
wil break some code but is probably better than anything else.

I think we should stick to the (La)TeX standard. The algorithms are available,
stable (when was the last time Donald Knuth had to fix a bug in TeX?), and
the syntax is familiar.

···

On Monday 16 July 2007 02:32:30 pm John Hunter wrote:

On 7/16/07, Michael Droettboom <mdroe@...31...> wrote:

John Hunter wrote:

That or we simply adopt the TeX standard

+1 TeX is widely used and well documented -- why have something almost the same?

though I still think the real solution is to sue TeX itself to do the typesetting. not the way we do now, but:

Parsing the DVI and laying out stuff that way -- so it's scalable, and has fewer dependencies.

Including the fonts required (STIX again -- maybe it will really happen eventually)

Occasionally someone pops up with plan to make an embeddable TeX engine, which is what we really want--maybe some day.

-Chris

···

--
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R (206) 526-6959 voice
7600 Sand Point Way NE (206) 526-6329 fax
Seattle, WA 98115 (206) 526-6317 main reception

Chris.Barker@...236...