unicode question

_Darren_Dale · July 16, 2007, 3:13pm

I am cleaning up some of the code in ticker.ScalarFormatter, specifically some
of the text formatting for dealing with scientific notation.

We provide an option to format labels in sci. notation without using mathtext
or usetex, in which case I would like to use the unicode multiplication sign,
× instead of x. Is there any reason not to do so? If not, should we use
u'\xd7' or '×' in the actual sources (the latter requiring the file's
encoding to be declared at the beginning of the file, like: # -*- coding:
utf-8 -*-)? If we can use unicode, it might be nice to use real minus signs
as well, −123 rather than -123.

Darren

Michael_Droettboom · July 16, 2007, 3:48pm

Darren Dale wrote:

If not, should we use u'\xd7' or '×' in the actual sources (the latter requiring the file's encoding to be declared at the beginning of the file, like: # -*- coding: utf-8 -*-)?

In an ideal world, I would prefer the latter, but we would want to verify that all the matplotlib developers are using an editor that respects those tags, or we could run into surprises if the files are accidentally re-encoded.

Cheers,
Mike

Eric_Firing2 · July 16, 2007, 5:25pm

Michael Droettboom wrote:

Darren Dale wrote:

If not, should we use u'\xd7' or '×' in the actual sources (the latter requiring the file's encoding to be declared at the beginning of the file, like: # -*- coding: utf-8 -*-)?

In an ideal world, I would prefer the latter, but we would want to verify that all the matplotlib developers are using an editor that respects those tags, or we could run into surprises if the files are accidentally re-encoded.

Cheers,
Mike

I use a good old-fashioned editor called zed, written by an Italian named Sandro Serrafini who seems to have left no trace for several years. I have modified it slightly, and I do minimal maintenance to keep it compiling with new OS releases. Yes, I am familiar with emacs and vi and nano and gedit and jed; I periodically survey the field of editors. And yes, emacs will brew your morning coffee, but no, it won't behave in the sane ways that I like an editor to behave.

So the suggestion to start using unicode in source code is a nightmare for me. Ascii is good: simple, universal, easy to work with, easy to understand. One byte, one character. Unambiguous. Undoubtedly unicode makes sense for the world in the long run, but for me it is an unadulterated pain.

Eric

_John_Hunter · July 16, 2007, 5:46pm

I am a huge emacs user, am am familiar with coffee.el though have
never used it, but I think putting unicode into the src is a bad idea.
Wouldn't this cause potential problems for people working over dumb
terminals?

JDH

···

On 7/16/07, Eric Firing <efiring@...229...> wrote:

I use a good old-fashioned editor called zed, written by an Italian
named Sandro Serrafini who seems to have left no trace for several
years. I have modified it slightly, and I do minimal maintenance to
keep it compiling with new OS releases. Yes, I am familiar with emacs
and vi and nano and gedit and jed; I periodically survey the field of
editors. And yes, emacs will brew your morning coffee, but no, it won't
behave in the sane ways that I like an editor to behave.

So the suggestion to start using unicode in source code is a nightmare
for me. Ascii is good: simple, universal, easy to work with, easy to
understand. One byte, one character. Unambiguous. Undoubtedly unicode
makes sense for the world in the long run, but for me it is an
unadulterated pain.

_Darren_Dale · July 16, 2007, 5:56pm

Michael Droettboom wrote:
> Darren Dale wrote:
>> If not, should we use
>> u'\xd7' or '×' in the actual sources (the latter requiring the file's
>> encoding to be declared at the beginning of the file, like: # -*-
>> coding: utf-8 -*-)?
>
> In an ideal world, I would prefer the latter, but we would want to
> verify that all the matplotlib developers are using an editor that
> respects those tags, or we could run into surprises if the files are
> accidentally re-encoded.
>
> Cheers,
> Mike

I use a good old-fashioned editor called zed, written by an Italian
named Sandro Serrafini who seems to have left no trace for several
years. I have modified it slightly, and I do minimal maintenance to
keep it compiling with new OS releases. Yes, I am familiar with emacs
and vi and nano and gedit and jed; I periodically survey the field of
editors. And yes, emacs will brew your morning coffee, but no, it won't
behave in the sane ways that I like an editor to behave.

So the suggestion to start using unicode in source code is a nightmare
for me. Ascii is good: simple, universal, easy to work with, easy to
understand. One byte, one character. Unambiguous.

What about rendering unicode, but keeping the mpl sources ascii only?

Undoubtedly unicode
makes sense for the world in the long run, but for me it is an
unadulterated pain.

In that case, I imagine you are not eagerly anticipating the arrival of Py3K.

···

On Monday 16 July 2007 01:25:18 pm Eric Firing wrote:

Eric_Firing2 · July 16, 2007, 5:58pm

John Hunter wrote:

I use a good old-fashioned editor called zed, written by an Italian
named Sandro Serrafini who seems to have left no trace for several
years. I have modified it slightly, and I do minimal maintenance to
keep it compiling with new OS releases. Yes, I am familiar with emacs
and vi and nano and gedit and jed; I periodically survey the field of
editors. And yes, emacs will brew your morning coffee, but no, it won't
behave in the sane ways that I like an editor to behave.

So the suggestion to start using unicode in source code is a nightmare
for me. Ascii is good: simple, universal, easy to work with, easy to
understand. One byte, one character. Unambiguous. Undoubtedly unicode
makes sense for the world in the long run, but for me it is an
unadulterated pain.

I am a huge emacs user, am am familiar with coffee.el though have
never used it, but I think putting unicode into the src is a bad idea.
Wouldn't this cause potential problems for people working over dumb
terminals?

(Or for dumb people (me) working over terminals? Probably all terminals by now are smarter than I am.)

I think that unicode does require a whole level of support--something of a paradigm shift, not quite as jarring as command-line to gui, but still quite a bit of support infrastructure.

My understanding is that Python 3000 will be all-unicode, so I will have to get used to it and get a different editor or be left behind. But I am not looking forward to it, and don't want to do it any sooner than I have to.

Eric

···

On 7/16/07, Eric Firing <efiring@...229...> wrote:

JDH

Eric_Firing2 · July 16, 2007, 6:10pm

Darren Dale wrote:
[...]

What about rendering unicode, but keeping the mpl sources ascii only?

This sounds like the thing to do for now.

While you are at it, perhaps you can figure out how to stop unicode_demo from generating an error:

driving unicode_demo.py
File "_tmp_unicode_demo.py", line 10
SyntaxError: Non-ASCII character '\xe9' in file _tmp_unicode_demo.py on line 10, but no encoding declared; see PEP 263 – Defining Python Source Code Encodings | peps.python.org for details

Undoubtedly unicode makes sense for the world in the long run, but for me it is an
unadulterated pain.

In that case, I imagine you are not eagerly anticipating the arrival of Py3K.

I am fairly open to attempts to make major improvements, or just clean things up. The unicode aspect of Py3K may make good sense in the long run, but I expect it will cause me some pain, and I am concerned that the use of unicode in source code will fragment the world's body of source code and may be conterproductive. The dominance of English is not entirely a bad thing, and its loss as a lingua franca may do more harm than good.

Eric

Michael_Droettboom · July 16, 2007, 6:28pm

Eric Firing wrote:

While you are at it, perhaps you can figure out how to stop unicode_demo from generating an error:

driving unicode_demo.py
File "_tmp_unicode_demo.py", line 10
SyntaxError: Non-ASCII character '\xe9' in file _tmp_unicode_demo.py on line 10, but no encoding declared; see PEP 263 – Defining Python Source Code Encodings | peps.python.org for details

I have a fix for this I'll commit momentarily. The backend_driver inserts non-comment lines before the -*- coding line.

As for Unicode literals in Python source, there is a third option, other than u'\xd7' or '×'. Python will let you do u"\N{MULTIPLICATION SIGN}", which means you don't have to remember what \xd7 is. For single characters like this, I don't see much advantage (you can just name the variable something obvious), but for longer strings with embedded unicode characters (like docstrings), this might be something to consider.

Cheers,
Mike

_Edin_Salkovic · July 16, 2007, 7:25pm

As for Unicode literals in Python source, there is a third option, other
than u'\xd7' or '×'. Python will let you do u"\N{MULTIPLICATION SIGN}",
which means you don't have to remember what \xd7 is. For single
characters like this, I don't see much advantage (you can just name the
variable something obvious), but for longer strings with embedded
unicode characters (like docstrings), this might be something to consider.

There's also a TeX -> "unicode integer representation" dict in
_mathtext_data.py. It's called tex2uni.

You use it like this (notice no backslashes):

from matplotlib._mathtext_data import tex2uni
unichr(tex2uni['int'])

u'\u222b'

unichr(tex2uni['sum'])

u'\u2211'

Cheers,
Edin

···

On 7/16/07, Michael Droettboom <mdroe@...31...> wrote:

_John_Hunter · July 16, 2007, 8:16pm

That's a rather good solution, as it avoids duplication, and is
friendly to dumb terminals and developers <wink>. You might simply
define some constants at the ticker module level for efficient reuse,
eg

usumchar = unichr(tex2uni['sum'])

etc...

···

On 7/16/07, Edin Salkovic <edin.salkovic@...149...> wrote:

You use it like this (notice no backslashes):
>>> from matplotlib._mathtext_data import tex2uni
>>> unichr(tex2uni['int'])
u'\u222b'
>>> unichr(tex2uni['sum'])
u'\u2211'