Mathtext improvements (merging into trunk)

It seems that the improvements finally allow users to mix mathtext with
ordinary text, as in 'foo $a=b^c+d$ bar', which I believe has been
requested a lot. This is really cool, but I think it causes another
backward incompatibility: you could use dollar signs in text strings
(except if you wanted a dollar sign both at the beginning and at the end
of a string), but now dollar signs only work if you use an odd number of
them.

My suggestion is to distinguish between mathtext and normal text at a
level outside the string. For example,

  text(['foo ', Math(r'a=b^c+d'), ' bar'])

where Math is a wrapper object that signals to "text" that its contents
are to be passed to the mathtext interpreter.

Or, Math could be a function that parses the string and returns a
lower-level description (presumably a hierarchy of boxes) that "text"
can then intersperse with the simple boxes containing the ordinary
strings. Then we could also have a LaTeX object that passes its argument
to an external LaTeX process, reads the resulting dvi file and returns
a list of glyphs and rules that "text" knows how to draw on the canvas.
In other words, formulas could be interpreted by the internal mathtext
parser or the external LaTeX process selectively, not via a single
global usetex switch.

···

--
Jouni K. Sepp�nen
http://www.iki.fi/jks

I would like to voice my opinion against this idea. I think the backward
imcompatibility will be rare, and does not justify the additionaly complexity
of the far more common need to interpret mathtext.

Darren

···

On Thursday 26 July 2007 5:54:18 pm Jouni K. Seppänen wrote:

It seems that the improvements finally allow users to mix mathtext with
ordinary text, as in 'foo a=b^c\+d bar', which I believe has been
requested a lot. This is really cool, but I think it causes another
backward incompatibility: you could use dollar signs in text strings
(except if you wanted a dollar sign both at the beginning and at the end
of a string), but now dollar signs only work if you use an odd number of
them.

My suggestion is to distinguish between mathtext and normal text at a
level outside the string. For example,

  text(['foo ', Math(r'a=b^c+d'), ' bar'])

where Math is a wrapper object that signals to "text" that its contents
are to be passed to the mathtext interpreter.

I'm on the fence as to how to handle this case. The majority of our
users will think of $ as the US currency symbol, and will have never
heard of TeX. Option 1 is to educate them, and require them to \$
quote that symbol. Option 2 is to enable a text property eg mathtext,
and do

text(x, y, 'what is the \\sin\(x\)', mathtext=True)

Option 3 is to try and be clever, and interpret an even number of
unquoted dollar symbols as mathtext, or any string that has a quoted
dollar sign symbol as mathtext, else assume plain text. Option 4 is
to treat *all* strings as mathtext, but I think we would pay a pretty
big performance hit to invoke the mathtext machinery for every piece
of text. But it is an option. In option 4, of course, users would be
required to quote all dollar signs, so it is related to option 1 but
slightly different in how it treats strings with no dollar signs.

I'm not too keen on the text(x, y, Math('string')) proposal, which is
a little outside the normal matplotlib approach.

Michael, do you have a preference or an alternate proposal?
JDH

···

On 7/26/07, Darren Dale <dd55@...143...> wrote:

> where Math is a wrapper object that signals to "text" that its contents
> are to be passed to the mathtext interpreter.

I would like to voice my opinion against this idea. I think the backward
imcompatibility will be rare, and does not justify the additionaly complexity
of the far more common need to interpret mathtext.

[ That was meant for the list, sorry ]

I'm on the fence as to how to handle this case. The majority of our
users will think of $ as the US currency symbol, and will have never
heard of TeX. Option 1 is to educate them, and require them to \$
quote that symbol. Option 2 is to enable a text property eg mathtext,
and do

text(x, y, 'what is the \\sin\(x\)', mathtext=True)

Option 3 is to try and be clever, and interpret an even number of
unquoted dollar symbols as mathtext, or any string that has a quoted
dollar sign symbol as mathtext, else assume plain text. Option 4 is
to treat *all* strings as mathtext, but I think we would pay a pretty
big performance hit to invoke the mathtext machinery for every piece
of text. But it is an option. In option 4, of course, users would be
required to quote all dollar signs, so it is related to option 1 but
slightly different in how it treats strings with no dollar signs.

I'm not too keen on the text(x, y, Math('string')) proposal, which is
a little outside the normal matplotlib approach.

Michael, do you have a preference or an alternate proposal?

I'm not Michael, but I s'pose I can still speak :slight_smile:

This sounds to me like a good case for Guido's mantra of NOT putting
keywords in functions and instead just making two separate functions.
Why not just

text(x,y,"This year I lost a lot of $$$")
mtext(x,y,r"This year I lost \$$\infty$")

? Explicit is better than implicit and all that...

cheers,

f

···

On 7/26/07, John Hunter <jdh2358@...149...> wrote:

[ That was meant for the list, sorry ]

> I'm on the fence as to how to handle this case. The majority of our
> users will think of as the US currency symbol, and will have never &gt; heard of TeX\. Option 1 is to educate them, and require them to \\$ &gt; quote that symbol\. Option 2 is to enable a text property eg mathtext, &gt; and do &gt; &gt; text\(x, y, &#39;what is the \sin(x)$', mathtext=True)

But would this make sense:
text(x, y, 'what is the \\sin\(x\)', mathtext=False)

[...]

This sounds to me like a good case for Guido's mantra of NOT putting
keywords in functions and instead just making two separate functions.
Why not just

text(x,y,"This year I lost a lot of $$$")
mtext(x,y,r"This year I lost \$$\infty$")

? Explicit is better than implicit and all that...

what about x/ylabels, titles, ticks, etc?

I think education is the best way to go. Its not that difficult to grasp, its
an established standard... and we are designing tools primarily for
scientists and engineers after all. Most of the other options will probably
have a larger effect on existing code.

Darren

···

On Thursday 26 July 2007 9:05:41 pm Fernando Perez wrote:

On 7/26/07, John Hunter <jdh2358@...149...> wrote:

> This sounds to me like a good case for Guido's mantra of NOT putting
> keywords in functions and instead just making two separate functions.
> Why not just
>
> text(x,y,"This year I lost a lot of $$$")
> mtext(x,y,r"This year I lost \$$\infty$")
>
> ? Explicit is better than implicit and all that...

what about x/ylabels, titles, ticks, etc?

Oh, I'd forgotten about all of those :slight_smile: Yes, this is pervasive across
MPL, I answered in haste. Duplicating the entire text-related API may
be a tad much, perhaps :wink:

I think education is the best way to go. Its not that difficult to grasp, its
an established standard... and we are designing tools primarily for
scientists and engineers after all. Most of the other options will probably
have a larger effect on existing code.

Well, I was trying to go with John's concern for non-latex users. I'm
quite happy with a system that treats *every string* via latex. But I
know for many reasons that's not realistic here (and PyX does
precisely that, if I really want it).

Cheers,

f

···

On 7/26/07, Darren Dale <dd55@...143...> wrote:

On Thursday 26 July 2007 9:05:41 pm Fernando Perez wrote:

John Hunter wrote:

Option 1 is to educate them, and require them to \$
quote that symbol. Option 2 is to enable a text property eg mathtext,
and do

text(x, y, 'what is the \\sin\(x\)', mathtext=True)
  

Except for the backward incompatibility, I like this because it is explicit.

Option 3 is to try and be clever, and interpret an even number of
unquoted dollar symbols as mathtext, or any string that has a quoted
dollar sign symbol as mathtext, else assume plain text.

That's close to what it does at the moment.

  Option 4 is
to treat *all* strings as mathtext, but I think we would pay a pretty
big performance hit to invoke the mathtext machinery for every piece
of text. But it is an option.

I'm not sure the performance hit would be so bad. The parser is completely flat until it goes between the &#39;s\. But it would require all 's to be escaped, of course.

  In option 4, of course, users would be
required to quote all dollar signs, so it is related to option 1 but
slightly different in how it treats strings with no dollar signs.

I'm not too keen on the text(x, y, Math('string')) proposal, which is
a little outside the normal matplotlib approach.

Michael, do you have a preference or an alternate proposal?
  

Well, that certainly is no shortage of options! :wink: I think the decision should ultimately lie with someone with a better sense of the existing "feel" of matplotlib than I.

If we go with another delimiter, there are others in TeX to choose from. Plain TeX uses $$ for display math, and LaTeX uses \[, \]. Both of these are less likely to be legitimate literals. While display math normally implies that the math is placed on a separate line (not inline with the text), it's not far from what matplotlib does, since it follows the display math layout patterns.

Cheers,
Mike

Juust a data point for the discussion. I think it would be very nice if a
script gave the same result on a system with or without TeX (as long as
you don't do TeX hacks).

My 2 cents,

Ga�l

···

On Fri, Jul 27, 2007 at 08:38:49AM -0400, Michael Droettboom wrote:

> text(x, y, 'what is the \\sin\(x\)', mathtext=True)

Except for the backward incompatibility, I like this because it is explicit.

Gael Varoquaux wrote:

  

text(x, y, 'what is the \\sin\(x\)', mathtext=True)
      
Except for the backward incompatibility, I like this because it is explicit.
    
Juust a data point for the discussion. I think it would be very nice if a
script gave the same result on a system with or without TeX (as long as
you don't do TeX hacks).
  

Using this "mathtext=True" option (as opposed to using a delimiter that TeX doesn't understand) or something else entirely, would certainly make it easier to make usetex vs. not usetex more consistent.

More broadly, it will probably never be 100% compatible -- I don't think reimplementing all of TeX is feasible or desirable, and the fact that it is a macro language makes it hard to fully emulate. Defining what is a "hack" vs. normal usage is also subjective, of course...

Cheers,
Mike

···

On Fri, Jul 27, 2007 at 08:38:49AM -0400, Michael Droettboom wrote:

Using this "mathtext=True" option (as opposed to using a delimiter that
TeX doesn't understand) or something else entirely, would certainly make
it easier to make usetex vs. not usetex more consistent.

I think so to.

More broadly, it will probably never be 100% compatible -- I don't think
reimplementing all of TeX is feasible or desirable, and the fact that it
is a macro language makes it hard to fully emulate. Defining what is a
"hack" vs. normal usage is also subjective, of course...

No, of course, but you are making some progress in this direction, and I
think that would be a great added value coming from your work.

Cheers,

Ga�l

···

On Fri, Jul 27, 2007 at 08:52:27AM -0400, Michael Droettboom wrote:

I think $$ might be a bad idea, that has a very specific meaning in TeX, which
is different than \. Like wise, \\\[ means display math while \\\( means inline math\. \\\( \.\.\. \\\) is considered to be fragile, while ... $ is robust, but
maybe \( \) would be a good solution. Then you could even switch between
mathtext and usetex, and the usetex code wouldnt have to go through strings
trying to substitute latex mathmode indicators for mpl indicators.

Darren

···

On Friday 27 July 2007 08:38:49 am Michael Droettboom wrote:

If we go with another delimiter, there are others in TeX to choose
from. Plain TeX uses $$ for display math, and LaTeX uses \[, \]. Both
of these are less likely to be legitimate literals. While display math
normally implies that the math is placed on a separate line (not inline
with the text), it's not far from what matplotlib does, since it follows
the display math layout patterns.

John Hunter wrote:

where Math is a wrapper object that signals to "text" that its contents
are to be passed to the mathtext interpreter.

I would like to voice my opinion against this idea. I think the backward
imcompatibility will be rare, and does not justify the additionaly complexity
of the far more common need to interpret mathtext.

I'm on the fence as to how to handle this case. The majority of our
users will think of $ as the US currency symbol, and will have never
heard of TeX. Option 1 is to educate them, and require them to \$
quote that symbol. Option 2 is to enable a text property eg mathtext,
and do

text(x, y, 'what is the \\sin\(x\)', mathtext=True)

Option 3 is to try and be clever, and interpret an even number of
unquoted dollar symbols as mathtext, or any string that has a quoted
dollar sign symbol as mathtext, else assume plain text. Option 4 is
to treat *all* strings as mathtext, but I think we would pay a pretty
big performance hit to invoke the mathtext machinery for every piece
of text. But it is an option. In option 4, of course, users would be
required to quote all dollar signs, so it is related to option 1 but
slightly different in how it treats strings with no dollar signs.

I'm not too keen on the text(x, y, Math('string')) proposal, which is
a little outside the normal matplotlib approach.

Michael, do you have a preference or an alternate proposal?
JDH

Let's rule out option 3 completely; it is an example of the type of cleverness that ends up causing more trouble and confusion than it is worth.

I also oppose using something other than the $ to delimit math, if delimiters are needed, which I think they are. At least in *Tex, a string of characters (a word) is rendered very differently depending on whether it is inside an equation or outside.

I suspect that options 1 and 4 will cause endless questions to matplotlib-users, and grumbling among people in the business and financial community who use lots of dollar signs and no math.

That leaves some variant of 2 and the Math('string') idea. I find the latter quite pythonic; it is a very concise, readable, and general way of attaching extra information to a string object, and it does not require passing yet another kwarg through a sequence of function and method calls. But if it is judged to be too out-of-character with the rest of the mpl api, or if in practice it would cause trouble that I don't see yet, then I am happy to let it go. I have not thought it through carefully, and I am not attached to it.

If a variant of 2 is chosen, one might shorten the kwarg to "math". Or use "format='math'" or something like that. This is more flexible than a boolean kwarg, leaving the door open to additional options for interpretation of strings--but not quite as flexible and powerful as the math('string') idea.

Eric

···

On 7/26/07, Darren Dale <dd55@...143...> wrote:

I don't know if we ever reached consensus on how to specify math text vs. regular text. I agree with Eric that it's down to two options: using a new kw argument (probably format="math" to be most future-proof) or Math('string'). I don't think I have enough "historical perspective" to really make the call but I do have a concern about the second option that it may be confusing depending on how "Math" is imported. (It may have to be pylab.Math in some instances but not in others.) But I don't have a strong objection.

Any last objections to going with the new keyword argument?

Cheers,
Mike

Eric Firing wrote:

···

That leaves some variant of 2 [a keyword argument] and the Math('string') idea. I find the latter quite pythonic; it is a very concise, readable, and general way of attaching extra information to a string object, and it does not require passing yet another kwarg through a sequence of function and method calls. But if it is judged to be too out-of-character with the rest of the mpl api, or if in practice it would cause trouble that I don't see yet, then I am happy to let it go. I have not thought it through carefully, and I am not attached to it.

If a variant of 2 is chosen, one might shorten the kwarg to "math". Or use "format='math'" or something like that. This is more flexible than a boolean kwarg, leaving the door open to additional options for interpretation of strings--but not quite as flexible and powerful as the math('string') idea.

I'm +1 on the kwarg approach -- it seems most consistent with our other usage.

···

On 8/2/07, Michael Droettboom <mdroe@...31...> wrote:

I don't know if we ever reached consensus on how to specify math text
vs. regular text. I agree with Eric that it's down to two options:
using a new kw argument (probably format="math" to be most future-proof)
or Math('string'). I don't think I have enough "historical perspective"
to really make the call but I do have a concern about the second option
that it may be confusing depending on how "Math" is imported. (It may
have to be pylab.Math in some instances but not in others.) But I don't
have a strong objection.

Any last objections to going with the new keyword argument?

Maybe the keyword should be format="TeX"? Or texformatting=True? Maybe it
would be appropriate to have the kwarg default to None, and if None reference
an rcoption like text.texformatting? That might be the least disruptive all
around.

Darren

···

On Thursday 02 August 2007 10:42:17 am John Hunter wrote:

On 8/2/07, Michael Droettboom <mdroe@...31...> wrote:
> I don't know if we ever reached consensus on how to specify math text
> vs. regular text. I agree with Eric that it's down to two options:
> using a new kw argument (probably format="math" to be most future-proof)
> or Math('string'). I don't think I have enough "historical perspective"
> to really make the call but I do have a concern about the second option
> that it may be confusing depending on how "Math" is imported. (It may
> have to be pylab.Math in some instances but not in others.) But I don't
> have a strong objection.
>
> Any last objections to going with the new keyword argument?

I'm +1 on the kwarg approach -- it seems most consistent with our other
usage.

Darren Dale wrote:

  

I don't know if we ever reached consensus on how to specify math text
vs. regular text. I agree with Eric that it's down to two options:
using a new kw argument (probably format="math" to be most future-proof)
or Math('string'). I don't think I have enough "historical perspective"
to really make the call but I do have a concern about the second option
that it may be confusing depending on how "Math" is imported. (It may
have to be pylab.Math in some instances but not in others.) But I don't
have a strong objection.

Any last objections to going with the new keyword argument?
      

I'm +1 on the kwarg approach -- it seems most consistent with our other
usage.
    
Maybe the keyword should be format="TeX"? Or texformatting=True? Maybe it would be appropriate to have the kwarg default to None, and if None reference an rcoption like text.texformatting? That might be the least disruptive all around.
  

I think format="TeX" may be a bit misleading, since it uses something TeX-like, but not really TeX (as the usetex stuff does). That said, I don't really have a better suggestion :wink:

The idea also is that in the future this could support other values, e.g. format="html" might support "<b>bold</b>" for instance, so texformatting=True would be less extensible overall.

And yes, having a rcoption default seems like it could be handy.

Cheers,
Mike

···

On Thursday 02 August 2007 10:42:17 am John Hunter wrote:

On 8/2/07, Michael Droettboom <mdroe@...31...> wrote:

Darren Dale wrote:
>>> I don't know if we ever reached consensus on how to specify math text
>>> vs. regular text. I agree with Eric that it's down to two options:
>>> using a new kw argument (probably format="math" to be most
>>> future-proof) or Math('string'). I don't think I have enough
>>> "historical perspective" to really make the call but I do have a
>>> concern about the second option that it may be confusing depending on
>>> how "Math" is imported. (It may have to be pylab.Math in some
>>> instances but not in others.) But I don't have a strong objection.
>>>
>>> Any last objections to going with the new keyword argument?
>>
>> I'm +1 on the kwarg approach -- it seems most consistent with our other
>> usage.
>
> Maybe the keyword should be format="TeX"? Or texformatting=True? Maybe it
> would be appropriate to have the kwarg default to None, and if None
> reference an rcoption like text.texformatting? That might be the least
> disruptive all around.

I think format="TeX" may be a bit misleading, since it uses something
TeX-like, but not really TeX (as the usetex stuff does). That said, I
don't really have a better suggestion :wink:

The idea also is that in the future this could support other values,
e.g. format="html" might support "<b>bold</b>" for instance, so
texformatting=True would be less extensible overall.

How about markup="TeX" then?

···

On Thursday 02 August 2007 11:03:09 am Michael Droettboom wrote:

> On Thursday 02 August 2007 10:42:17 am John Hunter wrote:
>> On 8/2/07, Michael Droettboom <mdroe@...31...> wrote:

And yes, having a rcoption default seems like it could be handy.

Darren Dale wrote:
[...]

How about markup="TeX" then?

"markup" is a good kwarg for this; it is descriptive and won't be confused with anything else.

Eric

···

And yes, having a rcoption default seems like it could be handy.