encoding of files included in sphinx

Hello,

I’m using the matplotlib Sphinx extension which automatically includes the source
code and the figures it produces into the Sphinx document. This is a very handy
feature whose use goes far beyond documenting matplotlib itself. (thanks for that by the way)

However I have trouble when the python file passed to the plot directive contains
non-ascii characters. I set up a simple example located there :
http://github.com/sbarthelemy/SphinxEncoding

running “make html” on it raises:
Exception occurred:
File “/usr/lib/pymodules/python2.6/sphinx/highlighting.py”, line 167, in highlight_block
source = source.decode()
UnicodeDecodeError: ‘ascii’ codec can’t decode byte 0xc3 in position 37: ordinal not in range(128)
The full traceback has been saved in /tmp/sphinx-err-5kW6ih.log, if you want to report the issue to the author.

So, I’ve got a few questions:

  • is this expected ?
  • is there a workaround ?
  • if not, how hard would it be to fix this problem, maybe I could help a bit (with proper guidance).

Thank you for any help !

PS: I use sphinx 0.6.2-1 and matplotlib 0.99.0-1ubuntu1, both shipped from ubuntu karmic

sphinx-err-5kW6ih.log (1.75 KB)

S�bastien Barth�lemy wrote:

Hello,

I'm using the matplotlib Sphinx extension which automatically includes the source
code and the figures it produces into the Sphinx document. This is a very handy
feature whose use goes far beyond documenting matplotlib itself. (thanks for that by the way)

However I have trouble when the python file passed to the plot directive contains
non-ascii characters. I set up a simple example located there :
http://github.com/sbarthelemy/SphinxEncoding

running "make html" on it raises:
  Exception occurred:
    File "/usr/lib/pymodules/python2.6/sphinx/highlighting.py", line 167, in highlight_block
      source = source.decode()
  UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 37: ordinal not in range(128)
  The full traceback has been saved in /tmp/sphinx-err-5kW6ih.log, if you want to report the issue to the author.

So, I've got a few questions:

- is this expected ?
- is there a workaround ?
- if not, how hard would it be to fix this problem, maybe I could help a bit (with proper guidance).

Thank you for any help !

PS: I use sphinx 0.6.2-1 and matplotlib 0.99.0-1ubuntu1, both shipped from ubuntu karmic

This is a bug -- but it has a fairly straightforward fix: to use Sphinx's "include" directive rather than roll our own as we currently do. This has been fixed in SVN r7972. plot-directive now takes an "encoding" option, exactly like the Sphinx include directive. It does not do automatic encoding detection (meaning it ignores the "# coding: latin1" comments), just as the Sphinx include directive does.

I'm not sure if there's a workaround "outside" of matplotlib, other than to ensure the source files are encoding in pure ascii (by using unicode escapes in literals instead of the real characters). But that's not a great workaround.

Mike

···

--
Michael Droettboom
Science Software Branch
Operations and Engineering Division
Space Telescope Science Institute
Operated by AURA for NASA

Hello Michael,

thank you for your fast reply and action. I just tried with the version from trunk (r7978) and I still have an encoding problem on the same test case. It seems to happen when the file is ran (to produce the figure) rather. I do not understand what is happenning, I would have expected imp to proprely guess the encoding.

Could you tell me if you have the same problem ? Do you have any idea of what is going on ?

Thanks !

$ git clone git://github.com/sbarthelemy/SphinxEncoding.git

$ cd SphinxEncoding/
$ make html
sphinx-build -b html -d _build/doctrees . _build/html
Making output directory…
Running Sphinx v0.6.2
loading pickled environment… not found
building [html]: targets for 1 source files that are out of date

updating environment: 1 added, 0 changed, 0 removed
/home/barthelemy/.local/lib/python2.6/site-packages/matplotlib/sphinxext/plot_directive.py:273: UserWarning: Exception running plot ./fileutf8.py

Traceback (most recent call last):
File “/home/barthelemy/.local/lib/python2.6/site-packages/matplotlib/sphinxext/plot_directive.py”, line 270, in render_figures
run_code(plot_path, function_name, plot_code)

File “/home/barthelemy/.local/lib/python2.6/site-packages/matplotlib/sphinxext/plot_directive.py”, line 182, in run_code
plot”, fd, fname, (‘py’, ‘r’, imp.PY_SOURCE))
File “fileutf8.py”, line 2, in

print(u"accent aigus é")

UnicodeEncodeError: ‘ascii’ codec can’t encode character u’\xe9’ in position 13: ordinal not in range(128)

···

Le 18 novembre 2009 17:24, Michael Droettboom <mdroe@…86…> a écrit :

This is a bug – but it has a fairly straightforward fix: to use Sphinx’s “include” directive rather than roll our own as we currently do. This has been fixed in SVN r7972. plot-directive now takes an “encoding” option, exactly like the Sphinx include directive. It does not do automatic encoding detection (meaning it ignores the “# coding: latin1” comments), just as the Sphinx include directive does.

Hi,

just wanted to raise this problem on the devel list, where it probably belongs. Also, if nobody has time to look at it now and you prefer me to file a bug, please don’t hesitate to tell it.

the original post is there: http://thread.gmane.org/gmane.comp.python.matplotlib.general/20411

Cheers

···

Le 21 novembre 2009 17:50, Sébastien Barthélemy <barthelemy@…1709…> a écrit :

Le 18 novembre 2009 17:24, Michael Droettboom <mdroe@…86…> a écrit :

This is a bug – but it has a fairly straightforward fix: to use Sphinx’s “include” directive rather than roll our own as we currently do. This has been fixed in SVN r7972. plot-directive now takes an “encoding” option, exactly like the Sphinx include directive. It does not do automatic encoding detection (meaning it ignores the “# coding: latin1” comments), just as the Sphinx include directive does.

Hello Michael,

thank you for your fast reply and action. I just tried with the version from trunk (r7978) and I still have an encoding problem on the same test case. It seems to happen when the file is ran (to produce the figure) rather than when it is included. I had a look at the code, but cannot understand what is happenning, I would have expected imp to proprely guess the encoding.

Could you tell me if you have the same problem ? Do you have any idea of what is going on ?

Thanks !

$ git clone git://github.com/sbarthelemy/SphinxEncoding.git

$ cd SphinxEncoding/
$ make html
sphinx-build -b html -d _build/doctrees . _build/html
Making output directory…
Running Sphinx v0.6.2
loading pickled environment… not found
building [html]: targets for 1 source files that are out of date

updating environment: 1 added, 0 changed, 0 removed
/home/barthelemy/.local/lib/python2.6/site-packages/matplotlib/sphinxext/plot_directive.py:273: UserWarning: Exception running plot ./fileutf8.py

Traceback (most recent call last):
File “/home/barthelemy/.local/lib/python2.6/site-packages/matplotlib/sphinxext/plot_directive.py”, line 270, in render_figures
run_code(plot_path, function_name, plot_code)

File “/home/barthelemy/.local/lib/python2.6/site-packages/matplotlib/sphinxext/plot_directive.py”, line 182, in run_code
plot”, fd, fname, (‘py’, ‘r’, imp.PY_SOURCE))

File “fileutf8.py”, line 2, in

print(u"accent aigus é")

UnicodeEncodeError: ‘ascii’ codec can’t encode character u’\xe9’ in position 13: ordinal not in range(128)

Sorry this thread fell through the cracks. Thanks for the reminder.

The error is not actually on importing and parsing the .py file (it
seems to do that just fine). The error is on printing to the console,
at which point it tries to convert the Unicode string to ascii (which
fails because it has character points > 127). One way around this
is to encode Unicode as UTF-8 (which seems to be the default for most
modern Linux X terminals etc.), eg.:

print(u"accent aigus é".encode("utf8"))

Mike

···

On 11/25/2009 02:11 PM, Sébastien Barthélemy wrote:

Hi,

just wanted to raise this problem on the devel list, where it probably
belongs. Also, if nobody has time to look at it now and you prefer me
to file a bug, please don’t hesitate to tell it.

the original post is there: http://thread.gmane.org/gmane.comp.python.matplotlib.general/20411

Cheers

Le 21 novembre 2009 17:50, Sébastien > Barthélemy <barthelemy@…1709…> > a écrit :

Le
18 novembre 2009 17:24, Michael Droettboom <mdroe@…86…>
a écrit :

This
is a bug – but it has a fairly straightforward fix: to use Sphinx’s
“include” directive rather than roll our own as we currently do. This
has been fixed in SVN r7972. plot-directive now takes an “encoding”
option, exactly like the Sphinx include directive. It does not do
automatic encoding detection (meaning it ignores the “# coding: latin1”
comments), just as the Sphinx include directive does.

Hello Michael,

thank you for your fast reply and action. I just tried with the version
from trunk (r7978) and I still have an encoding problem on the same
test case. It seems to happen when the file is ran (to produce the
figure) rather than when it is included. I had a look at the code, but
cannot understand what is happenning, I would have expected imp to
proprely guess the encoding.

Could you tell me if you have the same problem ? Do you have any idea
of what is going on ?

Thanks !

$ git clone git://github.com/sbarthelemy/SphinxEncoding.git

$ cd SphinxEncoding/

$ make html

sphinx-build -b html -d _build/doctrees . _build/html

Making output directory…

Running Sphinx v0.6.2

loading pickled environment… not found

building [html]: targets for 1 source files that are out of date

updating environment: 1 added, 0 changed, 0 removed

/home/barthelemy/.local/lib/python2.6/site-packages/matplotlib/sphinxext/plot_directive.py:273:
UserWarning: Exception running plot ./fileutf8.py

Traceback (most recent call last):

File
“/home/barthelemy/.local/lib/python2.6/site-packages/matplotlib/sphinxext/plot_directive.py”,
line 270, in render_figures

run_code(plot_path, function_name, plot_code)

File
“/home/barthelemy/.local/lib/python2.6/site-packages/matplotlib/sphinxext/plot_directive.py”,
line 182, in run_code

"__plot__", fd, fname, ('py', 'r', imp.PY_SOURCE))

File “fileutf8.py”, line 2, in

print(u"accent aigus é")

UnicodeEncodeError: ‘ascii’ codec can’t encode character u’\xe9’ in
position 13: ordinal not in range(128)