Error when running multiple jobs utilizing the Tex utilities in matplotlib

Hi all,

Myself and my colleagues use, and have used, matplotlib
and it’s Tex capabilities quite extensively to create plots to assist
in the gravitational wave searches we perform. (and it has been a great
tool for us :slight_smile: ). However recently we have been running into problems
when we have started automating our plot generation by running multiple
plotting jobs concurrently using the condor scheduler (and dagmans).
Many of our plotting jobs fail with messages such as the one below:

—snip—

Traceback (most recent call last):
File
"/home/romain/Projects/

ligovirgo/s5_2yr_lv_lowcbc_20080625/868815014-868901414/868815014-868901414/inj001_summary_plots/…/executables/plotinjnum",

line 298, in ?

'eff_dist_h')

File

“/home/romain/Projects/ligovirgo/s5_2yr_lv_lowcbc_20080625/868815014-868901414/868815014-868901414/inj001_summary_plots/…/executables/plotinjnum”,

line 119, in plot_found_missed

fname_thumb = InspiralUtils.savefig_pylal(filename=fname,

doThumb=True, dpi_thumb=opts.figure_resolution)

File

“/home/romain/codes/s5_2yr_lv_lowcbc_20080625/pylal/lib64/python2.4/site-packages/pylal/InspiralUtils.py”,

line 58, in savefig_pylal

fig.savefig(filename_thumb, dpi=dpi_thumb)

File “/usr/lib64/python2.4/site-packages/matplotlib/texmanager.py”, line

259, in make_png

os.remove(outfile)

OSError: [Errno 2] No such file or directory:

‘/home/romain/.matplotlib/tex.cache/ae479c90ff242327b54af004a0846188.output’

—snip—

My feeling is that when the code invokes the Tex ‘bit’ it creates a
temp file in ~/matplotlib/tex.cache and then deletes it and all other temp tex
files when it finishes the Tex ‘bit’. This would cause problems if
another job is in the middle of running Tex when the other job deletes
it’s temp files!

We
are running a slightly old version of matplotlib (0.87.7), as we run on
multiple clusters our sys admins tend to only update software when
there is a need to and we have had no other problems with matplotlib, I
apologize if this has been fixed in the meantime (I did do a quick
search of the mailing list archive but found nothing). All our clusters
currently run Fedora Core 4 (we’re going to move to CentOS 5).

Currently we are getting around this by forcing condor to retry the
failed jobs 2/3 times, this catches most of these errors. Another
solution would be to limit the number of jobs running to 1 BUT as we
run dagmen from within one ‘super’ dagman it would prove difficult to
limit jobs from multiple dagmen.

Anyway if anyone has any ideas of how to solve this I would
appreciate this. Also if there are any options where we can set the
location of these temp tex files and use a different directory for
each job (or stop matplotlib deleting other temp files) that would help
us.

Thanks in advance for any help

Ian Harry

···

Ian Harry
School of Physics & Astronomy
Queens Buildings, The Parade

Cardiff, CF24 3AA
Email: Ian.Harry@…1663…
Phone: (+44) 29 208 75120
Mobile: (+44) 7890 479090

Hi Ian,

Hi all,

Myself and my colleagues use, and have used, matplotlib and it's Tex
capabilities quite extensively to create plots to assist in the
gravitational wave searches we perform. (and it has been a great tool for
us

:slight_smile: ). However recently we have been running into problems when we have

started automating our plot generation by running multiple plotting jobs
concurrently using the condor scheduler (and dagmans). Many of our plotting
jobs fail with messages such as the one below:

---snip---

Traceback (most recent call last):
File
"/home/romain/Projects/
ligovirgo/s5_2yr_lv_lowcbc_20080625/868815014-868901414/868815014-868901414
/inj001_summary_plots/../executables/plotinjnum", line 298, in ?
   'eff_dist_h')
File
"/home/romain/Projects/ligovirgo/s5_2yr_lv_lowcbc_20080625/868815014-868901
414/868815014-868901414/inj001_summary_plots/../executables/plotinjnum",
line 119, in plot_found_missed
   fname_thumb = InspiralUtils.savefig_pylal(filename=fname,
doThumb=True, dpi_thumb=opts.figure_resolution)
File
"/home/romain/codes/s5_2yr_lv_lowcbc_20080625/pylal/lib64/python2.4/site-pa
ckages/pylal/InspiralUtils.py", line 58, in savefig_pylal
   fig.savefig(filename_thumb, dpi=dpi_thumb)
....
File "/usr/lib64/python2.4/site-packages/matplotlib/texmanager.py", line
259, in make_png
   os.remove(outfile)
OSError: [Errno 2] No such file or directory:
'/home/romain/.matplotlib/tex.cache/ae479c90ff242327b54af004a0846188.output
'

---snip---

My feeling is that when the code invokes the Tex 'bit' it creates a temp
file in ~/matplotlib/tex.cache and then deletes it and all other temp tex
files when it finishes the Tex 'bit'. This would cause problems if another
job is in the middle of running Tex when the other job deletes it's temp
files!

We are running a slightly old version of matplotlib (0.87.7), as we run on
multiple clusters our sys admins tend to only update software when there is
a need to and we have had no other problems with matplotlib, I apologize if
this has been fixed in the meantime (I did do a quick search of the mailing
list archive but found nothing). All our clusters currently run Fedora Core
4 (we're going to move to CentOS 5).

Currently we are getting around this by forcing condor to retry the failed
jobs 2/3 times, this catches most of these errors. Another solution would
be to limit the number of jobs running to 1 BUT as we run dagmen from
within one 'super' dagman it would prove difficult to limit jobs from
multiple dagmen.

Anyway if anyone has any ideas of how to solve this I would appreciate
this. Also if there are any options where we can set the location of these
temp tex files and use a different directory for each job (or stop
matplotlib deleting other temp files) that would help us.

I'm really hesitant to mess around with the location of the temp files. It was
a bit painfull trying to get usetex to work across platforms.

Instead, would you try replacing:

os.remove(outfile)

with:

try: os.remove(outfile)
except OSError: pass

Let me know if that fixes it, and if you need to wrap any other file
deletions.

Thanks,
Darren

···

On Thursday 10 July 2008 06:03:54 am Ian Harry wrote:

Hi Darren,

I have tried rerunning our code with the change you suggested in the make_dvi and make_png functions. I am still noticing failures however. I put these at the bottom of this message. Strangely enough, these errors don’t seem to occur when there are a lot of files in my tex.cache directory. For example, I ran the code (consisting of ~40 codes all making ~10-20 plots each), successfully 3 times (the OSError wasn’t raised at all, I used a print statement to check). I realised after this that a lot of temp files were in my tex.cache directory, so I emptied it and then I noticed that a lot of failures occured when I ran the code the next time (the OSError I showed previously was raised as well as the error messages shown below). It seems weird that it should run fine when a lot of files are left in my temp directory and not when it is empty?

Here are the error messages that are occuring now:

Traceback (most recent call last):
File “/home/spxiwh/ihope/852450000-852700000/nsbhinj_summary_plots/…/executables/plotinspmissed”, line 625, in ?
savePlot( opts, filename, titleText)
File “/home/spxiwh/ihope/852450000-852700000/nsbhinj_summary_plots/…/executables/plotinspmissed”, line 108, in savePlot
dpi_thumb=opts.figure_resolution)
File “/home/spxiwh/lscsoft/executables/cbc_s5_1yr_20070129/pylal//lib64/python2.4/site-packages/pylal/InspiralUtils.py”, line 54, in savefig_pylal
fig.savefig(filename, dpi=dpi)
File “/home/spxiwh/test/matplotlib/figure.py”, line 682, in savefig
self.canvas.print_figure(*args, **kwargs)
File “/home/spxiwh/test/matplotlib/backends/backend_agg.py”, line 456, in print_figure
self.draw()
File “/home/spxiwh/test/matplotlib/backends/backend_agg.py”, line 392, in draw
self.figure.draw(renderer)
File “/home/spxiwh/test/matplotlib/figure.py”, line 544, in draw
for a in self.axes: a.draw(renderer)
File “/home/spxiwh/test/matplotlib/axes.py”, line 1063, in draw
a.draw(renderer)
File “/home/spxiwh/test/matplotlib/axis.py”, line 595, in draw
self.label.draw(renderer)
File “/home/spxiwh/test/matplotlib/text.py”, line 340, in draw
bbox, info = self._get_layout(renderer)
File “/home/spxiwh/test/matplotlib/text.py”, line 187, in _get_layout
w,h = renderer.get_text_width_height(
File “/home/spxiwh/test/matplotlib/backends/backend_agg.py”, line 240, in get_text_width_height
Z = texmanager.get_rgba(s, size, self.dpi.get(), rgb)
File “/home/spxiwh/test/matplotlib/texmanager.py”, line 334, in get_rgba
pngfile = self.make_png(tex, fontsize, dpi, force=False)
File “/home/spxiwh/test/matplotlib/texmanager.py”, line 255, in make_png
fh = file(outfile)
IOError: [Errno 2] No such file or directory: ‘/home/spxiwh/.matplotlib/tex.cache/fb2014e54961855bd04020b61190867c.output’

Traceback (most recent call last):
File “/home/spxiwh/ihope/852450000-852700000/bnsinj_summary_plots/…/executables/plotinspinj”, line 569, in ?
‘end_time’, ‘days’, opts.time_axis, plot_type = ‘linear’ )
File “/home/spxiwh/ihope/852450000-852700000/bnsinj_summary_plots/…/executables/plotinspinj”, line 94, in plot_parameter_accuracy
dpi_thumb=opts.figure_resolution)
File “/home/spxiwh/lscsoft/executables/cbc_s5_1yr_20070129/pylal//lib64/python2.4/site-packages/pylal/InspiralUtils.py”, line 54, in savefig_pylal
fig.savefig(filename, dpi=dpi)
File “/home/spxiwh/test/matplotlib/figure.py”, line 682, in savefig
self.canvas.print_figure(*args, **kwargs)
File “/home/spxiwh/test/matplotlib/backends/backend_agg.py”, line 456, in print_figure
self.draw()
File “/home/spxiwh/test/matplotlib/backends/backend_agg.py”, line 392, in draw
self.figure.draw(renderer)
File “/home/spxiwh/test/matplotlib/figure.py”, line 544, in draw
for a in self.axes: a.draw(renderer)
File “/home/spxiwh/test/matplotlib/axes.py”, line 1063, in draw
a.draw(renderer)
File “/home/spxiwh/test/matplotlib/axis.py”, line 561, in draw
tick.draw(renderer)
File “/home/spxiwh/test/matplotlib/axis.py”, line 161, in draw
if self.label1On: self.label1.draw(renderer)
File “/home/spxiwh/test/matplotlib/text.py”, line 838, in draw
Text.draw(self, renderer)
File “/home/spxiwh/test/matplotlib/text.py”, line 340, in draw
bbox, info = self._get_layout(renderer)
File “/home/spxiwh/test/matplotlib/text.py”, line 187, in _get_layout
w,h = renderer.get_text_width_height(
File “/home/spxiwh/test/matplotlib/backends/backend_agg.py”, line 240, in get_text_width_height
Z = texmanager.get_rgba(s, size, self.dpi.get(), rgb)
File “/home/spxiwh/test/matplotlib/texmanager.py”, line 334, in get_rgba
pngfile = self.make_png(tex, fontsize, dpi, force=False)
File “/home/spxiwh/test/matplotlib/texmanager.py”, line 247, in make_png
dvifile = self.make_dvi(tex, fontsize)
File “/home/spxiwh/test/matplotlib/texmanager.py”, line 223, in make_dvi
fh = file(outfile)
IOError: [Errno 2] No such file or directory: ‘/home/spxiwh/.matplotlib/tex.cache/7e534aafdc12681d1ef0d36df4963de8.output’

And once I noticed:

Traceback (most recent call last):
File “/home/spxiwh/ihope/852450000-852700000/allinj_summary_plots/…/executables/plotinspmissed”, line 661, in ?
dpi_thumb=opts.figure_resolution)
File “/home/spxiwh/lscsoft/executables/cbc_s5_1yr_20070129/pylal//lib64/python2.4/site-packages/pylal/InspiralUtils.py”, line 54, in savefig_pylal
fig.savefig(filename, dpi=dpi)
File “/usr/lib64/python2.4/site-packages/matplotlib/figure.py”, line 682, in savefig
self.canvas.print_figure(*args, **kwargs)
File “/usr/lib64/python2.4/site-packages/matplotlib/backends/backend_agg.py”, line 456, in print_figure
self.draw()
File “/usr/lib64/python2.4/site-packages/matplotlib/backends/backend_agg.py”, line 392, in draw
self.figure.draw(renderer)
File “/usr/lib64/python2.4/site-packages/matplotlib/figure.py”, line 544, in draw
for a in self.axes: a.draw(renderer)
File “/usr/lib64/python2.4/site-packages/matplotlib/axes.py”, line 1063, in draw
a.draw(renderer)
File “/usr/lib64/python2.4/site-packages/matplotlib/text.py”, line 340, in draw
bbox, info = self._get_layout(renderer)
File “/usr/lib64/python2.4/site-packages/matplotlib/text.py”, line 187, in _get_layout
w,h = renderer.get_text_width_height(
File “/usr/lib64/python2.4/site-packages/matplotlib/backends/backend_agg.py”, line 240, in get_text_width_height
Z = texmanager.get_rgba(s, size, self.dpi.get(), rgb)
File “/usr/lib64/python2.4/site-packages/matplotlib/texmanager.py”, line 330, in get_rgba
X = readpng(os.path.join(self.texcache, pngfile))
RuntimeError: _image_module::readpng: file not recognized as a PNG file

Cheers

Ian

2008/7/10 Darren Dale <dsdale24@…287…>:

···

Hi Ian,

On Thursday 10 July 2008 06:03:54 am Ian Harry wrote:

Hi all,

Myself and my colleagues use, and have used, matplotlib and it’s Tex

capabilities quite extensively to create plots to assist in the

gravitational wave searches we perform. (and it has been a great tool for

us

:slight_smile: ). However recently we have been running into problems when we have

started automating our plot generation by running multiple plotting jobs

concurrently using the condor scheduler (and dagmans). Many of our plotting

jobs fail with messages such as the one below:

—snip—

Traceback (most recent call last):

File

"/home/romain/Projects/

ligovirgo/s5_2yr_lv_lowcbc_20080625/868815014-868901414/868815014-868901414

/inj001_summary_plots/…/executables/plotinjnum", line 298, in ?

‘eff_dist_h’)

File

"/home/romain/Projects/ligovirgo/s5_2yr_lv_lowcbc_20080625/868815014-868901

414/868815014-868901414/inj001_summary_plots/…/executables/plotinjnum",

line 119, in plot_found_missed

fname_thumb = InspiralUtils.savefig_pylal(filename=fname,

doThumb=True, dpi_thumb=opts.figure_resolution)

File

"/home/romain/codes/s5_2yr_lv_lowcbc_20080625/pylal/lib64/python2.4/site-pa

ckages/pylal/InspiralUtils.py", line 58, in savefig_pylal

fig.savefig(filename_thumb, dpi=dpi_thumb)

File “/usr/lib64/python2.4/site-packages/matplotlib/texmanager.py”, line

259, in make_png

os.remove(outfile)

OSError: [Errno 2] No such file or directory:

'/home/romain/.matplotlib/tex.cache/ae479c90ff242327b54af004a0846188.output

—snip—

My feeling is that when the code invokes the Tex ‘bit’ it creates a temp

file in ~/matplotlib/tex.cache and then deletes it and all other temp tex

files when it finishes the Tex ‘bit’. This would cause problems if another

job is in the middle of running Tex when the other job deletes it’s temp

files!

We are running a slightly old version of matplotlib (0.87.7), as we run on

multiple clusters our sys admins tend to only update software when there is

a need to and we have had no other problems with matplotlib, I apologize if

this has been fixed in the meantime (I did do a quick search of the mailing

list archive but found nothing). All our clusters currently run Fedora Core

4 (we’re going to move to CentOS 5).

Currently we are getting around this by forcing condor to retry the failed

jobs 2/3 times, this catches most of these errors. Another solution would

be to limit the number of jobs running to 1 BUT as we run dagmen from

within one ‘super’ dagman it would prove difficult to limit jobs from

multiple dagmen.

Anyway if anyone has any ideas of how to solve this I would appreciate

this. Also if there are any options where we can set the location of these

temp tex files and use a different directory for each job (or stop

matplotlib deleting other temp files) that would help us.

I’m really hesitant to mess around with the location of the temp files. It was

a bit painfull trying to get usetex to work across platforms.

Instead, would you try replacing:

os.remove(outfile)

with:

try: os.remove(outfile)

except OSError: pass

Let me know if that fixes it, and if you need to wrap any other file

deletions.

Thanks,

Darren

Ian Harry
School of Physics & Astronomy
Queens Buildings, The Parade

Cardiff, CF24 3AA
Email: Ian.Harry@…1663…
Phone: (+44) 29 208 75120
Mobile: (+44) 7890 479090

Hi Darren,

I have tried rerunning our code with the change you suggested in the
make_dvi and make_png functions. I am still noticing failures however. I
put these at the bottom of this message. Strangely enough, these errors
don't seem to occur when there are a lot of files in my tex.cache
directory. For example, I ran the code (consisting of ~40 codes all making
~10-20 plots each), successfully 3 times (the OSError wasn't raised at all,
I used a print statement to check). I realised after this that a lot of
temp files were in my tex.cache directory, so I emptied it and then I
noticed that a lot of failures occured when I ran the code the next time
(the OSError I showed previously was raised as well as the error messages
shown below). It seems weird that it should run fine when a lot of files
are left in my temp directory and not when it is empty?

Most of those files are not temporary files, but cached files. The error you
reported only occurs when a required file does not already exist in the
cache, and like you said, it appears to be the case that two jobs are trying
to add the same file to the cache at the same time, and one job is failing
because the other deletes a temporary file that is being used by both. I
guess.

Here are the error messages that are occuring now:

Traceback (most recent call last):
  File
"/home/spxiwh/ihope/852450000-852700000/nsbhinj_summary_plots/../executable
s/plotinspmissed", line 625, in ?
    savePlot( opts, filename, titleText)
  File
"/home/spxiwh/ihope/852450000-852700000/nsbhinj_summary_plots/../executable
s/plotinspmissed", line 108, in savePlot
    dpi_thumb=opts.figure_resolution)
  File
"/home/spxiwh/lscsoft/executables/cbc_s5_1yr_20070129/pylal//lib64/python2.
4/site-packages/pylal/InspiralUtils.py", line 54, in savefig_pylal
    fig.savefig(filename, dpi=dpi)
  File "/home/spxiwh/test/matplotlib/figure.py", line 682, in savefig
    self.canvas.print_figure(*args, **kwargs)
  File "/home/spxiwh/test/matplotlib/backends/backend_agg.py", line 456, in
print_figure
    self.draw()
  File "/home/spxiwh/test/matplotlib/backends/backend_agg.py", line 392, in
draw
    self.figure.draw(renderer)
  File "/home/spxiwh/test/matplotlib/figure.py", line 544, in draw
    for a in self.axes: a.draw(renderer)
  File "/home/spxiwh/test/matplotlib/axes.py", line 1063, in draw
    a.draw(renderer)
  File "/home/spxiwh/test/matplotlib/axis.py", line 595, in draw
    self.label.draw(renderer)
  File "/home/spxiwh/test/matplotlib/text.py", line 340, in draw
    bbox, info = self._get_layout(renderer)
  File "/home/spxiwh/test/matplotlib/text.py", line 187, in _get_layout
    w,h = renderer.get_text_width_height(
  File "/home/spxiwh/test/matplotlib/backends/backend_agg.py", line 240, in
get_text_width_height
    Z = texmanager.get_rgba(s, size, self.dpi.get(), rgb)
  File "/home/spxiwh/test/matplotlib/texmanager.py", line 334, in get_rgba
    pngfile = self.make_png(tex, fontsize, dpi, force=False)
  File "/home/spxiwh/test/matplotlib/texmanager.py", line 255, in make_png
    fh = file(outfile)
IOError: [Errno 2] No such file or directory:
'/home/spxiwh/.matplotlib/tex.cache/fb2014e54961855bd04020b61190867c.output
'

That doesnt make any sense to me. file defaults to open a file in append mode,
it doesnt matter if a file exists or not. Maybe you could try to figure out
why that fails and report back.

And once I noticed:

Traceback (most recent call last):
  File
"/home/spxiwh/ihope/852450000-852700000/allinj_summary_plots/../executables
/plotinspmissed", line 661, in ?
    dpi_thumb=opts.figure_resolution)
  File
"/home/spxiwh/lscsoft/executables/cbc_s5_1yr_20070129/pylal//lib64/python2.
4/site-packages/pylal/InspiralUtils.py", line 54, in savefig_pylal
    fig.savefig(filename, dpi=dpi)
  File "/usr/lib64/python2.4/site-packages/matplotlib/figure.py", line 682,
in savefig
    self.canvas.print_figure(*args, **kwargs)
  File
"/usr/lib64/python2.4/site-packages/matplotlib/backends/backend_agg.py",
line 456, in print_figure
    self.draw()
  File
"/usr/lib64/python2.4/site-packages/matplotlib/backends/backend_agg.py",
line 392, in draw
    self.figure.draw(renderer)
  File "/usr/lib64/python2.4/site-packages/matplotlib/figure.py", line 544,
in draw
    for a in self.axes: a.draw(renderer)
  File "/usr/lib64/python2.4/site-packages/matplotlib/axes.py", line 1063,
in draw
    a.draw(renderer)
  File "/usr/lib64/python2.4/site-packages/matplotlib/text.py", line 340,
in draw
    bbox, info = self._get_layout(renderer)
  File "/usr/lib64/python2.4/site-packages/matplotlib/text.py", line 187,
in _get_layout
    w,h = renderer.get_text_width_height(
  File
"/usr/lib64/python2.4/site-packages/matplotlib/backends/backend_agg.py",
line 240, in get_text_width_height
    Z = texmanager.get_rgba(s, size, self.dpi.get(), rgb)
  File "/usr/lib64/python2.4/site-packages/matplotlib/texmanager.py", line
330, in get_rgba
    X = readpng(os.path.join(self.texcache, pngfile))
RuntimeError: _image_module::readpng: file not recognized as a PNG file

No idea, sorry.

Darren

···

On Thursday 10 July 2008 10:48:01 am you wrote:

Hi Darren,

Thanks for helping with this problem.

I have investigated further this issue and here is what I have found out:

I have traced the errors themselves back to two functions in texmanager.py (matplotlib.texmanager), make_dvi and make_png. Most of the errors seem to mention ‘Stale NFS file handles’ and crop up at a variety of different places throughout these functions. I guess this is because on our clusters /home/[username] is not a local directory, we have seen issues before with other code if a lot of nodes try to access the same directory on the NFS file system simultaneously. I tried altering the init.py to force the code to put the .matplotlib directory on filesystems local to each node. Moving the .matplotlib directory to a local drive solves almost all of these errors.

One error that remained was the one about file opening
fh = file(outfile)
I added a ‘w’ to this and this seemed to solve this problem, I also commented out some of the verbose generating commands (specifically fh.read() was causing a problem (probably expected with ‘w’)) within these functions and the errors go away. I guess ‘a’ would be better but the commands only seem to be called if the file doesn’t exist?

As we have a lot of users running this code a solution like this is unworkable (as a lot of our users are unfamiliar with python/Linux and want to run a simple command). Do you have any ideas of how we could solve this issue?

Thanks again for your help

Ian Harry

2008/7/10 Darren Dale <dsdale24@…287…>:

···

On Thursday 10 July 2008 10:48:01 am you wrote:

Hi Darren,

I have tried rerunning our code with the change you suggested in the

make_dvi and make_png functions. I am still noticing failures however. I

put these at the bottom of this message. Strangely enough, these errors

don’t seem to occur when there are a lot of files in my tex.cache

directory. For example, I ran the code (consisting of ~40 codes all making

~10-20 plots each), successfully 3 times (the OSError wasn’t raised at all,

I used a print statement to check). I realised after this that a lot of

temp files were in my tex.cache directory, so I emptied it and then I

noticed that a lot of failures occured when I ran the code the next time

(the OSError I showed previously was raised as well as the error messages

shown below). It seems weird that it should run fine when a lot of files

are left in my temp directory and not when it is empty?

Most of those files are not temporary files, but cached files. The error you

reported only occurs when a required file does not already exist in the

cache, and like you said, it appears to be the case that two jobs are trying

to add the same file to the cache at the same time, and one job is failing

because the other deletes a temporary file that is being used by both. I

guess.

Here are the error messages that are occuring now:

Traceback (most recent call last):

File

"/home/spxiwh/ihope/852450000-852700000/nsbhinj_summary_plots/…/executable

s/plotinspmissed", line 625, in ?

savePlot( opts, filename, titleText)

File

"/home/spxiwh/ihope/852450000-852700000/nsbhinj_summary_plots/…/executable

s/plotinspmissed", line 108, in savePlot

dpi_thumb=opts.figure_resolution)

File

"/home/spxiwh/lscsoft/executables/cbc_s5_1yr_20070129/pylal//lib64/python2.

4/site-packages/pylal/InspiralUtils.py", line 54, in savefig_pylal

fig.savefig(filename, dpi=dpi)

File “/home/spxiwh/test/matplotlib/figure.py”, line 682, in savefig

self.canvas.print_figure(*args, **kwargs)

File “/home/spxiwh/test/matplotlib/backends/backend_agg.py”, line 456, in

print_figure

self.draw()

File “/home/spxiwh/test/matplotlib/backends/backend_agg.py”, line 392, in

draw

self.figure.draw(renderer)

File “/home/spxiwh/test/matplotlib/figure.py”, line 544, in draw

for a in self.axes: a.draw(renderer)

File “/home/spxiwh/test/matplotlib/axes.py”, line 1063, in draw

a.draw(renderer)

File “/home/spxiwh/test/matplotlib/axis.py”, line 595, in draw

self.label.draw(renderer)

File “/home/spxiwh/test/matplotlib/text.py”, line 340, in draw

bbox, info = self._get_layout(renderer)

File “/home/spxiwh/test/matplotlib/text.py”, line 187, in _get_layout

w,h = renderer.get_text_width_height(

File “/home/spxiwh/test/matplotlib/backends/backend_agg.py”, line 240, in

get_text_width_height

Z = texmanager.get_rgba(s, size, self.dpi.get(), rgb)

File “/home/spxiwh/test/matplotlib/texmanager.py”, line 334, in get_rgba

pngfile = self.make_png(tex, fontsize, dpi, force=False)

File “/home/spxiwh/test/matplotlib/texmanager.py”, line 255, in make_png

fh = file(outfile)

IOError: [Errno 2] No such file or directory:

'/home/spxiwh/.matplotlib/tex.cache/fb2014e54961855bd04020b61190867c.output

That doesnt make any sense to me. file defaults to open a file in append mode,

it doesnt matter if a file exists or not. Maybe you could try to figure out

why that fails and report back.

And once I noticed:

Traceback (most recent call last):

File

"/home/spxiwh/ihope/852450000-852700000/allinj_summary_plots/…/executables

/plotinspmissed", line 661, in ?

dpi_thumb=opts.figure_resolution)

File

"/home/spxiwh/lscsoft/executables/cbc_s5_1yr_20070129/pylal//lib64/python2.

4/site-packages/pylal/InspiralUtils.py", line 54, in savefig_pylal

fig.savefig(filename, dpi=dpi)

File “/usr/lib64/python2.4/site-packages/matplotlib/figure.py”, line 682,

in savefig

self.canvas.print_figure(*args, **kwargs)

File

“/usr/lib64/python2.4/site-packages/matplotlib/backends/backend_agg.py”,

line 456, in print_figure

self.draw()

File

“/usr/lib64/python2.4/site-packages/matplotlib/backends/backend_agg.py”,

line 392, in draw

self.figure.draw(renderer)

File “/usr/lib64/python2.4/site-packages/matplotlib/figure.py”, line 544,

in draw

for a in self.axes: a.draw(renderer)

File “/usr/lib64/python2.4/site-packages/matplotlib/axes.py”, line 1063,

in draw

a.draw(renderer)

File “/usr/lib64/python2.4/site-packages/matplotlib/text.py”, line 340,

in draw

bbox, info = self._get_layout(renderer)

File “/usr/lib64/python2.4/site-packages/matplotlib/text.py”, line 187,

in _get_layout

w,h = renderer.get_text_width_height(

File

“/usr/lib64/python2.4/site-packages/matplotlib/backends/backend_agg.py”,

line 240, in get_text_width_height

Z = texmanager.get_rgba(s, size, self.dpi.get(), rgb)

File “/usr/lib64/python2.4/site-packages/matplotlib/texmanager.py”, line

330, in get_rgba

X = readpng(os.path.join(self.texcache, pngfile))

RuntimeError: _image_module::readpng: file not recognized as a PNG file

No idea, sorry.

Darren


Sponsored by: SourceForge.net Community Choice Awards: VOTE NOW!

Studies have shown that voting for your favorite open source project,

along with a healthy diet, reduces your potential for chronic lameness

and boredom. Vote Now at http://www.sourceforge.net/community/cca08


Matplotlib-users mailing list

Matplotlib-users@lists.sourceforge.net

https://lists.sourceforge.net/lists/listinfo/matplotlib-users

Ian Harry
School of Physics & Astronomy
Queens Buildings, The Parade
Cardiff, CF24 3AA

Email: Ian.Harry@…1663…
Phone: (+44) 29 208 75120
Mobile: (+44) 7890 479090

Hi Ian,

Thanks for helping with this problem.

I have investigated further this issue and here is what I have found out:

I have traced the errors themselves back to two functions in texmanager.py
(matplotlib.texmanager), make_dvi and make_png. Most of the errors seem to
mention 'Stale NFS file handles' and crop up at a variety of different
places throughout these functions. I guess this is because on our clusters
/home/[username] is not a local directory, we have seen issues before with
other code if a lot of nodes try to access the same directory on the NFS
file system simultaneously. I tried altering the __init__.py to force the
code to put the .matplotlib directory on filesystems local to each node.
Moving the .matplotlib directory to a local drive solves almost all of
these errors.

I suggest you try backing out those changes you just described, and instead
try setting a MPLCONFIGDIR environment variable to point somewhere on the
local filesystem. If MPLCONFIGDIR is not defined, we use ~/.matplotlib.

One error that remained was the one about file opening
fh = file(outfile)
I added a 'w' to this and this seemed to solve this problem, I also
commented out some of the verbose generating commands (specifically
fh.read() was causing a problem (probably expected with 'w')) within these
functions and the errors go away. I guess 'a' would be better but the
commands only seem to be called if the file doesn't exist?

Out of curiosity, if you added 'a' instead of 'w', does the error go away?
Either way, please let me know exactly what changes need to be made and I will
commit the changes to svn.

As we have a lot of users running this code a solution like this is
unworkable (as a lot of our users are unfamiliar with python/Linux and want
to run a simple command). Do you have any ideas of how we could solve this
issue?

Please try the environment variable I mentioned and let me know what happens.

Darren

···

On Tuesday 15 July 2008 10:13:02 am Ian Harry wrote:

2008/7/15 Darren Dale <dsdale24@…287…>:

Hi Ian,

Thanks for helping with this problem.

I have investigated further this issue and here is what I have found out:

I have traced the errors themselves back to two functions in texmanager.py

(matplotlib.texmanager), make_dvi and make_png. Most of the errors seem to

mention ‘Stale NFS file handles’ and crop up at a variety of different

places throughout these functions. I guess this is because on our clusters

/home/[username] is not a local directory, we have seen issues before with

other code if a lot of nodes try to access the same directory on the NFS

file system simultaneously. I tried altering the init.py to force the

code to put the .matplotlib directory on filesystems local to each node.

Moving the .matplotlib directory to a local drive solves almost all of

these errors.

I suggest you try backing out those changes you just described, and instead

try setting a MPLCONFIGDIR environment variable to point somewhere on the

local filesystem. If MPLCONFIGDIR is not defined, we use ~/.matplotlib.

Brilliant! This works perfectly and should be easy to implement on different systems!

One error that remained was the one about file opening

fh = file(outfile)

I added a ‘w’ to this and this seemed to solve this problem, I also

commented out some of the verbose generating commands (specifically

fh.read() was causing a problem (probably expected with ‘w’)) within these

functions and the errors go away. I guess ‘a’ would be better but the

commands only seem to be called if the file doesn’t exist?

Out of curiosity, if you added ‘a’ instead of ‘w’, does the error go away?

Either way, please let me know exactly what changes need to be made and I will

commit the changes to svn.

No, using ‘a’ gives the same errors as using ‘w’ (again in fh.read()). Here are the changes I made to stop the errors that didn’t seem to be due to ‘stale NFS file handle’:

–snip–

[spxiwh@…2098… 07:14 AM matplotlib]$ diff texmanager.py /usr/lib64/python2.4/site-packages/matplotlib/texmanager.py
248c248
< fh = file(outfile,‘a’)

···

On Tuesday 15 July 2008 10:13:02 am Ian Harry wrote:


        fh = file(outfile)

252,254c252
< else:
< try: verbose.report(fh.read(), ‘debug’)
< except: pass

        else: verbose.report(fh.read(), 'debug')

259,261c257,258
< else:
< try: os.remove(fname)
< except: pass

            else: os.remove(fname)

280c277
< fh = file(outfile,‘a’)


        fh = file(outfile)

285,287c282
< else:
< try: verbose.report(fh.read(), ‘debug’)
< except: pass

        else: verbose.report(fh.read(), 'debug')

289,290c284
< try: os.remove(outfile)
< except: pass

        os.remove(outfile)

314c308
< # else: verbose.report(fh.read(), ‘debug’)

        else: verbose.report(fh.read(), 'debug')

–snip–

Once again, thanks for the help.

Ian

As we have a lot of users running this code a solution like this is

unworkable (as a lot of our users are unfamiliar with python/Linux and want

to run a simple command). Do you have any ideas of how we could solve this

issue?

Please try the environment variable I mentioned and let me know what happens.

Darren

Ian Harry
School of Physics & Astronomy
Queens Buildings, The Parade

Cardiff, CF24 3AA
Email: Ian.Harry@…1663…
Phone: (+44) 29 208 75120
Mobile: (+44) 7890 479090

I took a different approach:

Index: lib/matplotlib/texmanager.py

···

On Wednesday 16 July 2008 07:20:59 am Ian Harry wrote:

[spxiwh@...2098... 07:14 AM matplotlib]$ diff texmanager.py
/usr/lib64/python2.4/site-packages/matplotlib/texmanager.py
248c248
< fh = file(outfile,'a')
---

> fh = file(outfile)

252,254c252
< else:
< try: verbose.report(fh.read(), 'debug')
< except: pass
---

> else: verbose.report(fh.read(), 'debug')

259,261c257,258
< else:
< try: os.remove(fname)
< except: pass
---

> else: os.remove(fname)

280c277
< fh = file(outfile,'a')
---

> fh = file(outfile)

285,287c282
< else:
< try: verbose.report(fh.read(), 'debug')
< except: pass
---

> else: verbose.report(fh.read(), 'debug')

289,290c284
< try: os.remove(outfile)
< except: pass
---

> os.remove(outfile)

314c308
< # else: verbose.report(fh.read(), 'debug')
---

> else: verbose.report(fh.read(), 'debug')

--snip--

===================================================================
--- lib/matplotlib/texmanager.py (revision 5771)
+++ lib/matplotlib/texmanager.py (working copy)
@@ -273,16 +273,22 @@
                             %(os.path.split(texfile)[-1], outfile))
             mpl.verbose.report(command, 'debug')
             exit_status = os.system(command)
- fh = file(outfile)
+ try:
+ fh = file(outfile)
+ report = fh.read()
+ fh.close()
+ except IOError:
+ report = 'No latex error report available.'
             if exit_status:
                 raise RuntimeError(('LaTeX was not able to process the
following \
-string:\n%s\nHere is the full report generated by LaTeX: \n\n'% repr(tex)) +
fh.read())
- else: mpl.verbose.report(fh.read(), 'debug')
- fh.close()
+string:\n%s\nHere is the full report generated by LaTeX: \n\n'% repr(tex)) +
report)
+ else: mpl.verbose.report(report, 'debug')
             for fname in glob.glob(basefile+'*'):
                 if fname.endswith('dvi'): pass
                 elif fname.endswith('tex'): pass
- else: os.remove(fname)
+ else:
+ try: os.remove(fname)
+ except OSError: pass

         return dvifile

@@ -305,14 +311,19 @@
                         os.path.split(dvifile)[-1], outfile))
             mpl.verbose.report(command, 'debug')
             exit_status = os.system(command)
- fh = file(outfile)
+ try:
+ fh = file(outfile)
+ report = fh.read()
+ fh.close()
+ except IOError:
+ report = 'No dvipng error report available.'
             if exit_status:
                 raise RuntimeError('dvipng was not able to \
process the flowing file:\n%s\nHere is the full report generated by dvipng: \
-\n\n'% dvifile + fh.read())
- else: mpl.verbose.report(fh.read(), 'debug')
- fh.close()
- os.remove(outfile)
+\n\n'% dvifile + report)
+ else: mpl.verbose.report(report, 'debug')
+ try: os.remove(outfile)
+ except OSError: pass

         return pngfile

Would you update from svn and see if it works for you?

Thanks,
Darren

Hi Darren,

I have updated from svn and tried to run the code. It is not working, but, the failures have nothing to do with texmanager.py. I’m getting some of our codes failing from within one of our init.py files (my guess is a naming conflict). And some more codes failing with:

File “/home/spxiwh/matplotlibinstall/lib64/python2.4/site-packages/matplotlib/axes.py”, line 263, in _xy_from_xy
assert nrx == nry, ‘Dimensions of x and y are incompatible’
AssertionError: Dimensions of x and y are incompatible

I also get:

/home/spxiwh/matplotlibinstall/lib64/python2.4/site-packages/matplotlib/init.py:801: UserWarning: This call to matplotlib.use() has no effect
because the the backend has already been chosen;

matplotlib.use() must be called before pylab, matplotlib.pyplot,
or matplotlib.backends is imported for the first time.

at the top of all of our plotting routine outputs now.

This sounds like we have bugs in our code, which we need to deal with before we can upgrade our numpy and matplotlib versions. Because of time restraints, it is likely that upgrading of these modules on our systems will not happen for a few months. Using MPLCONFIGDIR should stop most of our failures anyway, I guess we can solve the rest by automatically retrying failed jobs.

Thanks for the help

Ian

2008/7/17 Darren Dale <dsdale24@…287…>:

···

On Wednesday 16 July 2008 07:20:59 am Ian Harry wrote:

[spxiwh@…2098… 07:14 AM matplotlib]$ diff texmanager.py

/usr/lib64/python2.4/site-packages/matplotlib/texmanager.py

248c248

< fh = file(outfile,‘a’)


        fh = file(outfile)

252,254c252

< else:

< try: verbose.report(fh.read(), ‘debug’)

< except: pass


        else: verbose.report(fh.read(), 'debug')

259,261c257,258

< else:

< try: os.remove(fname)

< except: pass


            else: os.remove(fname)

280c277

< fh = file(outfile,‘a’)


        fh = file(outfile)

285,287c282

< else:

< try: verbose.report(fh.read(), ‘debug’)

< except: pass


        else: verbose.report(fh.read(), 'debug')

289,290c284

< try: os.remove(outfile)

< except: pass


        os.remove(outfile)

314c308

< # else: verbose.report(fh.read(), ‘debug’)


        else: verbose.report(fh.read(), 'debug')

–snip–

I took a different approach:

Index: lib/matplotlib/texmanager.py

===================================================================

— lib/matplotlib/texmanager.py (revision 5771)

+++ lib/matplotlib/texmanager.py (working copy)

@@ -273,16 +273,22 @@

                         %(os.path.split(texfile)[-1], outfile))

         mpl.verbose.report(command, 'debug')

         exit_status = os.system(command)
  •        fh = file(outfile)
    
  •        try:
    
  •            fh = file(outfile)
    
  •            report = fh.read()
    
  •            fh.close()
    
  •        except IOError:
    
  •            report = 'No latex error report available.'
    
           if exit_status:
    
               raise RuntimeError(('LaTeX was not able to process the
    

following \

-string:\n%s\nHere is the full report generated by LaTeX: \n\n’% repr(tex)) +

fh.read())

  •        else: mpl.verbose.report(fh.read(), 'debug')
    
  •        fh.close()
    

+string:\n%s\nHere is the full report generated by LaTeX: \n\n’% repr(tex)) +

report)

  •        else: mpl.verbose.report(report, 'debug')
    
           for fname in glob.glob(basefile+'*'):
    
               if fname.endswith('dvi'): pass
    
               elif fname.endswith('tex'): pass
    
  •            else: os.remove(fname)
    
  •            else:
    
  •                try: os.remove(fname)
    
  •                except OSError: pass
    
    
    
       return dvifile
    

@@ -305,14 +311,19 @@

                     os.path.split(dvifile)[-1], outfile))

         mpl.verbose.report(command, 'debug')

         exit_status = os.system(command)
  •        fh = file(outfile)
    
  •        try:
    
  •            fh = file(outfile)
    
  •            report = fh.read()
    
  •            fh.close()
    
  •        except IOError:
    
  •            report = 'No dvipng error report available.'
    
           if exit_status:
    
               raise RuntimeError('dvipng was not able to \
    

process the flowing file:\n%s\nHere is the full report generated by dvipng: \

-\n\n’% dvifile + fh.read())

  •        else: mpl.verbose.report(fh.read(), 'debug')
    
  •        fh.close()
    
  •        os.remove(outfile)
    

+\n\n’% dvifile + report)

  •        else: mpl.verbose.report(report, 'debug')
    
  •        try: os.remove(outfile)
    
  •        except OSError: pass
    
    
    
       return pngfile
    

Would you update from svn and see if it works for you?

Thanks,

Darren

Ian Harry
School of Physics & Astronomy
Queens Buildings, The Parade

Cardiff, CF24 3AA
Email: Ian.Harry@…1663…
Phone: (+44) 29 208 75120
Mobile: (+44) 7890 479090

For your own sake, this use bug should be fixed because it means mpl
is not doing what you think. The backend needs to be set before pylab
is imported. The two main ways to set the backend are in the rc file
and with the use directive. If you do the latter, make sure you put

  import matplotlib
  matplotlib.use('YourBackend')

near the top of your main driver code, before you import pylab or any
other modules which import it. You should also do this only in one
place in your code. If you try and do it after you import pylab, and
your backend is already set to something else from your rc files, your
code will break .

JDH

···

On Fri, Jul 18, 2008 at 6:12 AM, Ian Harry <ian.harry@...1663...> wrote:

Hi Darren,

/home/spxiwh/matplotlibinstall/lib64/python2.4/site-packages/matplotlib/__init__.py:801:
UserWarning: This call to matplotlib.use() has no effect
because the the backend has already been chosen;
matplotlib.use() must be called *before* pylab, matplotlib.pyplot,
or matplotlib.backends is imported for the first time.

at the top of all of our plotting routine outputs now.

This sounds like we have bugs in our code, which we need to deal with before
we can upgrade our numpy and matplotlib versions. Because of time
restraints, it is likely that upgrading of these modules on our systems will
not happen for a few months. Using MPLCONFIGDIR should stop most of our
failures anyway, I guess we can solve the rest by automatically retrying
failed jobs.