By the way: I am still interested in my original question. How would
I use a debugger etc. to find the problem myself in such a situation?
I should know the answer because I created the glibc patch that fixes
the problem, but it was back in February and I can't remember all the
It started when I thought, why should I run my AthlonXP in '386 emulator
mode' when I can use 'gcc -march=athlon-xp' and actually benefit from
the extra instructions my processor supports. This worked fine until I
compiled numarray and it failed its own tests with a floating-point
exception. But if I used the default gcc settings it worked OK. I filed
a numarray bug report (which I can no longer locate, perhaps they get
deleted after a certain date), they looked at it and said it was
probably a gcc bug. I filed a gcc bug report, and they closed it saying
it was not a gcc bug.
Then I thought it might be a bug with the way kernel handles FP
exceptions and started looking through the kernel sources, but did not
make much progress. So I went back to the numarray source code and tried
no narrow down where the problem was occurring.
Now to answer your question:
Consider you are on a TV game show where you have to guess a number x in
the range 1 to y and are told 'higher', 'lower' or 'correct' after each
turn. You can use a binary search and always guess the mid point of the
range - you are either correct or eliminate half of the possibilities
each turn, so in ceil log(y, 2) turns or less you locate the correct
You can use a similar kind of binary search to locate bugs in software.
You know the bug occurs on some line x of the source code with y lines.
Use gdb and insert breakpoints in the code (I think I just inserted
printf() statements instead of using gdb) and see if the error occurs
before or after the breakpoint, move the breakpoint and try again. The
problem is that source code is rarely a linear list of statements in one
file that are executed in order, but a set of procedures/functions in
many files where the execution order can vary. You can start at the main
() function, split it in half and insert a breakpoint (or printf()) run
it and see in which half the error occurs, repeat the process working
your way down into other functions until you pinpoint the error.
Hope that makes sense. You could now reinstall the old glibc, forget
that you know that glibc is the problem and start again to locate the
bug, it will be useful practise for the next bug that comes along!