tkinter segv under opensuse 10.3

Hi,

I had a look through the archives but couldn't find an answer to this.
Using the tkagg backend (agg is fine) I get a segmentation fault doing a simple plot.
gdb returns the following:

362 Point* ll_api() {return _ll;}
Current language: auto; currently c++
(gdb) bt
#0 0xb6fc1bac in PyAggImagePhoto (clientdata=0x0, interp=0x87a9430, argc=5,
     argv=0xbf97fc9c) at src/_transforms.h:362
#1 0xb712ea5c in TclInvokeStringCommand () from /usr/lib/libtcl8.4.so
#2 0xb712ff05 in TclEvalObjvInternal () from /usr/lib/libtcl8.4.so
#3 0xb7131015 in Tcl_EvalObjv () from /usr/lib/libtcl8.4.so
#4 0xb73d34a6 in ?? () from /usr/lib/python2.5/lib-dynload/_tkinter.so
snip

This is matplotlib-0.91.2, python 2.5 on linux (opensuse 10.3)

Cheers,
Malte.

This looks like the same symptoms as this (unresolved) bug here:

http://sourceforge.net/tracker/index.php?func=detail&aid=1949982&group_id=80706&atid=560720

Your gdb backtrace reveals that you have debugging symbols in matplotlib, so we've got a little more information now. Thanks.

Since this crash is in an inlined C++ method, I would try forcing a clean rebuild of matplotlib (by deleting the build directory under the source tree) and rebuilding/reinstalling. Beyond that, I'm a little stumped. Could you run the following in gdb when the crash happens?

p bboxo
p bbox
p bbox->_ll

Cheers,
Mike

Malte Marquarding wrote:

···

Hi,

I had a look through the archives but couldn't find an answer to this.
Using the tkagg backend (agg is fine) I get a segmentation fault doing a simple plot.
gdb returns the following:

362 Point* ll_api() {return _ll;}
Current language: auto; currently c++
(gdb) bt
#0 0xb6fc1bac in PyAggImagePhoto (clientdata=0x0, interp=0x87a9430, argc=5,
     argv=0xbf97fc9c) at src/_transforms.h:362
#1 0xb712ea5c in TclInvokeStringCommand () from /usr/lib/libtcl8.4.so
#2 0xb712ff05 in TclEvalObjvInternal () from /usr/lib/libtcl8.4.so
#3 0xb7131015 in Tcl_EvalObjv () from /usr/lib/libtcl8.4.so
#4 0xb73d34a6 in ?? () from /usr/lib/python2.5/lib-dynload/_tkinter.so
snip

This is matplotlib-0.91.2, python 2.5 on linux (opensuse 10.3)

Cheers,
Malte.

-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
Matplotlib-users mailing list
Matplotlib-users@lists.sourceforge.net
matplotlib-users List Signup and Options
  
--
Michael Droettboom
Science Software Branch
Operations and Engineering Division
Space Telescope Science Institute
Operated by AURA for NASA

Hi,

The segv also occurs in matplotlib-0.90.1. A clean build doesn't help.

Here is the gdb output, looks like something is pointing into nirvana..

(gdb) p bboxo
$1 = <value optimized out>
(gdb) p bbox
$2 = (Bbox *) 0x7ffffffb
(gdb) p bbox->_ll
Cannot access memory at address 0x7ffffffb

Cheers,
Malte

···

On 13/05/2008, at 10:19 PM, Michael Droettboom wrote:

p bboxo
p bbox
p bbox->_ll

Ouch! The way that pointer is obtained is really weird (though I believe it is a common idiom in Tcl extensions):

PyAggImagePhoto(ClientData clientdata, Tcl_Interp* interp,
               int argc, char **argv) {
    ...
       bboxo = (PyObject*)atol(argv[4]);
    if (bboxo != Py_None) {
      bbox = (Bbox*)bboxo;

That means the pointer comes to us encoded as a string of digits, which gets converted to an integer, cast to a (PyObject*), and then cast to a (Bbox*) (which is a subclass of PyObject, in the C-object-oriented sense). That's just one of those things you'd rather not be doing :wink:

Are you running the 64-bit version of OpenSUSE by any chance? That might explain this if the atol call is overflowing. That's only theoretical, as I think it *should* work. atol is supposed to return a "long", which is supposed to be 64-bit on a 64-bit Linux machine. Could you try replacing "atol" with "atoll", recompile and see what happens? Do you get any warnings during compilation of _tkagg.cpp?

Failing that, it would be useful, I suppose, to print out "argv[4]" from the debugger.

Thanks for helping with this. Hopefully we're honing in on something.

Mike

Malte Marquarding wrote:

···

Hi,

The segv also occurs in matplotlib-0.90.1. A clean build doesn't help.

Here is the gdb output, looks like something is pointing into nirvana..

(gdb) p bboxo
$1 = <value optimized out>
(gdb) p bbox
$2 = (Bbox *) 0x7ffffffb
(gdb) p bbox->_ll
Cannot access memory at address 0x7ffffffb

Cheers,
Malte

On 13/05/2008, at 10:19 PM, Michael Droettboom wrote:

p bboxo
p bbox
p bbox->_ll

--
Michael Droettboom
Science Software Branch
Operations and Engineering Division
Space Telescope Science Institute
Operated by AURA for NASA

Hi,

unfortunately 32-bit :wink: I tried digging around, but I don't know much about tcl/tk. Seeing a char string argv atol'ed into a pointer address left me with an uncomfortable feeling...
Anyway her is the argv value

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0xb7c516c0 (LWP 5638)]
0xb6f4500e in PyAggImagePhoto (clientdata=0x0, interp=0x87396f0, argc=5,
     argv=0xbfc8874c) at src/_transforms.h:362
362 Point* ll_api() {return _ll;}
Current language: auto; currently c++
(gdb) p argv[4]
$1 = 0x895d0f0 "3085736160"
(gdb) p bboxo
$2 = (PyObject *) 0x7fffffff

The cast doesn't seem to work.

Cheers,
Malte

···

On 14/05/2008, at 10:47 PM, Michael Droettboom wrote:

Ouch! The way that pointer is obtained is really weird (though I believe it is a common idiom in Tcl extensions):

PyAggImagePhoto(ClientData clientdata, Tcl_Interp* interp,
              int argc, char **argv) {
   ...
     bboxo = (PyObject*)atol(argv[4]);
   if (bboxo != Py_None) {
     bbox = (Bbox*)bboxo;

That means the pointer comes to us encoded as a string of digits, which gets converted to an integer, cast to a (PyObject*), and then cast to a (Bbox*) (which is a subclass of PyObject, in the C-object-oriented sense). That's just one of those things you'd rather not be doing :wink:

Are you running the 64-bit version of OpenSUSE by any chance? That might explain this if the atol call is overflowing. That's only theoretical, as I think it *should* work. atol is supposed to return a "long", which is supposed to be 64-bit on a 64-bit Linux machine. Could you try replacing "atol" with "atoll", recompile and see what happens? Do you get any warnings during compilation of _tkagg.cpp?

Failing that, it would be useful, I suppose, to print out "argv[4]" from the debugger.

Thanks for helping with this. Hopefully we're honing in on something.

Mike

Malte Marquarding wrote:

Hi,

The segv also occurs in matplotlib-0.90.1. A clean build doesn't help.

Here is the gdb output, looks like something is pointing into nirvana..

(gdb) p bboxo
$1 = <value optimized out>
(gdb) p bbox
$2 = (Bbox *) 0x7ffffffb
(gdb) p bbox->_ll
Cannot access memory at address 0x7ffffffb

Cheers,
Malte

On 13/05/2008, at 10:19 PM, Michael Droettboom wrote:

p bboxo
p bbox
p bbox->_ll

--
Michael Droettboom
Science Software Branch
Operations and Engineering Division
Space Telescope Science Institute
Operated by AURA for NASA

Malte Marquarding
Malte.Marquarding@...272...

Hi,

we were on the right track just the inverse. The number of argv[4] is bigger then MAX_LONG on 32bit.
So what I did is:

unsigned long tmplong;
std::stringstream ss;
ss.str(argv[4]);
ss >> tmplong;
bboxo = (PyObject*)tmplong;

Now it works.

Also, I will change all other atol()'s to stringstream, we are c++ er's anyway.

I will check out the svn and send a patch. Where do I send it to?

Cheers,
Malte

···

On 15/05/2008, at 8:40 AM, Malte Marquarding wrote:

Hi,

unfortunately 32-bit :wink: I tried digging around, but I don't know
much about tcl/tk. Seeing a char string argv atol'ed into a pointer
address left me with an uncomfortable feeling...
Anyway her is the argv value

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0xb7c516c0 (LWP 5638)]
0xb6f4500e in PyAggImagePhoto (clientdata=0x0, interp=0x87396f0, argc=5,
     argv=0xbfc8874c) at src/_transforms.h:362
362 Point* ll_api() {return _ll;}
Current language: auto; currently c++
(gdb) p argv[4]
$1 = 0x895d0f0 "3085736160"
(gdb) p bboxo
$2 = (PyObject *) 0x7fffffff

The cast doesn't seem to work.

Cheers,
Malte

On 14/05/2008, at 10:47 PM, Michael Droettboom wrote:

Ouch! The way that pointer is obtained is really weird (though I
believe it is a common idiom in Tcl extensions):

PyAggImagePhoto(ClientData clientdata, Tcl_Interp* interp,
              int argc, char **argv) {
   ...
     bboxo = (PyObject*)atol(argv[4]);
   if (bboxo != Py_None) {
     bbox = (Bbox*)bboxo;

That means the pointer comes to us encoded as a string of digits,
which gets converted to an integer, cast to a (PyObject*), and then
cast to a (Bbox*) (which is a subclass of PyObject, in the C-object-
oriented sense). That's just one of those things you'd rather not
be doing :wink:

Are you running the 64-bit version of OpenSUSE by any chance? That
might explain this if the atol call is overflowing. That's only
theoretical, as I think it *should* work. atol is supposed to
return a "long", which is supposed to be 64-bit on a 64-bit Linux
machine. Could you try replacing "atol" with "atoll", recompile
and see what happens? Do you get any warnings during compilation
of _tkagg.cpp?

Failing that, it would be useful, I suppose, to print out "argv[4]"
from the debugger.

Thanks for helping with this. Hopefully we're honing in on something.

Mike

Malte Marquarding wrote:

Hi,

The segv also occurs in matplotlib-0.90.1. A clean build doesn't
help.

Here is the gdb output, looks like something is pointing into
nirvana..

(gdb) p bboxo
$1 = <value optimized out>
(gdb) p bbox
$2 = (Bbox *) 0x7ffffffb
(gdb) p bbox->_ll
Cannot access memory at address 0x7ffffffb

Cheers,
Malte

On 13/05/2008, at 10:19 PM, Michael Droettboom wrote:

p bboxo
p bbox
p bbox->_ll

--
Michael Droettboom
Science Software Branch
Operations and Engineering Division
Space Telescope Science Institute
Operated by AURA for NASA

Malte Marquarding
Malte.Marquarding@...272...

-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
Matplotlib-users mailing list
Matplotlib-users@lists.sourceforge.net
matplotlib-users List Signup and Options

Malte Marquarding
Malte.Marquarding@...272...

Hi

Attached is the the patch. It uses stringstream, so I don't know if it will work on all platforms. I am not a windows person :wink:
I didn't read your email properly about the existence of "atoll", so as I am a c++er I am a bit more comfortable with stringstream.

Cheers,
Malte.

_tkagg.cpp.patch (1.24 KB)

Yes, it looks like if it were an "unsigned int", we would have been okay. That looks like (essentially) what your patch does, but in a C++ idiom. I'll submit your patch and put a note out to the Windows guys to help test it. There's a good chance that if it compiles at all, it should work.

Thanks for getting to the bottom of this.

Cheers,
Mike

Malte Marquarding wrote:

···

Hi

Attached is the the patch. It uses stringstream, so I don't know if it will work on all platforms. I am not a windows person :wink:
I didn't read your email properly about the existence of "atoll", so as I am a c++er I am a bit more comfortable with stringstream.

Cheers,
Malte.

--
Michael Droettboom
Science Software Branch
Operations and Engineering Division
Space Telescope Science Institute
Operated by AURA for NASA

Michael Droettboom wrote:

Yes, it looks like if it were an "unsigned int", we would have been okay. That looks like (essentially) what your patch does, but in a C++ idiom. I'll submit your patch and put a note out to the Windows guys to help test it. There's a good chance that if it compiles at all, it should work.

Mike,

If you understand what the patch does, and if you know how to do it in a C idiom (or C++ for that matter) that is readable and that *will* compile and run, then I would suggest that you modify the patch accordingly instead of waiting to see if it *does* compile on whatever compiler versions the Windows people happen to have right now.

Eric

Michael Droettboom wrote:

Yes, it looks like if it were an "unsigned int", we would have been okay. That looks like (essentially) what your patch does, but in a C++ idiom. I'll submit your patch and put a note out to the Windows guys to help test it. There's a good chance that if it compiles at all, it should work.

Looks like the C idiom would almost have to use
something like sscanf(argv[2], "%lu", tmpulong).
Is it bad form to use this in C++?
Is *anything* actually standardized in C++? How does one know whether something like stringstream is safe to use? The "ask someone to try it" approach is not very reassuring.

Eric

qEric Firing wrote:

Michael Droettboom wrote:

Yes, it looks like if it were an "unsigned int", we would have been okay. That looks like (essentially) what your patch does, but in a C++ idiom. I'll submit your patch and put a note out to the Windows guys to help test it. There's a good chance that if it compiles at all, it should work.

Looks like the C idiom would almost have to use
something like sscanf(argv[2], "%lu", tmpulong).

That is a reasonable alternative.

Is it bad form to use this in C++?
Is *anything* actually standardized in C++?

The stringstream class used is in the C++ standard library. My concern is only based on Visual Studio not always adhering to it. In this case, it's a pretty safe bet it does -- it's a very core and long-standing piece of functionality.

  How does one know whether something like stringstream is safe to use? The "ask someone to try it" approach is not very reassuring.

Barring more obvious documentation from Visual Studio (which may be mostly ignorance on my part) the only sure way to know is to try it. But it is in the specification and mentioned in Stroustrup, "C++: The Programming Language".

Cheers,
Mike

···

--
Michael Droettboom
Science Software Branch
Operations and Engineering Division
Space Telescope Science Institute
Operated by AURA for NASA

Eric Firing wrote:

Michael Droettboom wrote:

Yes, it looks like if it were an "unsigned int", we would have been okay. That looks like (essentially) what your patch does, but in a C++ idiom. I'll submit your patch and put a note out to the Windows guys to help test it. There's a good chance that if it compiles at all, it should work.

Mike,

If you understand what the patch does, and if you know how to do it in a C idiom (or C++ for that matter) that is readable and that *will* compile and run, then I would suggest that you modify the patch accordingly instead of waiting to see if it *does* compile on whatever compiler versions the Windows people happen to have right now.

It does compile and run on gcc-3.4 and gcc-4.2 on Linux, and solves Matle's crash (which I was never able to reproduce myself), so I committed the patch (with only cosmetic changes) to SVN. It just needs to be verified under Visual Studio, which I don't personally have.

Plus the whole Tcl thing where it encodes the pointer as a string of digits and then converts that back to an unsigned int gives me the heebie-jeebies, but I think we're stuck with that, for reasonable values of "stuck".

Cheers,
Mike

···

--
Michael Droettboom
Science Software Branch
Operations and Engineering Division
Space Telescope Science Institute
Operated by AURA for NASA