PHYSICS 580††††††††††††††††††††††††††††††††††††††††††††††††††††††††††††††††††††††††††††††††††††††††††††††††††††††††††† Fall 2006

 

HINTS FOR DEBUGGING

 

A ďbugĒ is an error in your program. (For an interesting history of the term ďbugĒ, see http://www.jamesshuggins.com/h/tek1/first_computer_bug.htm )Almost no one writes a bug-free program from scratch the first time. According to lore, the standard for the software industry is 10 lines of error-free code per programmer per day.

Broadly speaking, there are two classes of bugs: bugs that cause your program to crash, and bugs that give you wrong answers.

There exist debugging tools that can be very useful; they are, however, platform dependent and so I will not discuss them. Such tools simplify the process discussed below. In general you have to compile the code with the right switch, such as Ėg, to make it amenable to use with a debugger. Keep in mind that code compiled with the debug switch generally runs very slow.

 

Writing short routines and including error traps will help enormously in finding bugs.

 

Staring at code generally does not help you find bugs, except for elementary mistakes in syntax. Even for experts.

 

Common culprits that cause crashes and/or erroneous answers:

Arrays out-of-bounds. An array is dimensioned to, say, 100, but you write to element 101. This can lead to nasty consequences. But sometimes nothing at all happens. This is easily founds with a check-bounds switch while compiling (usually ĖCB (Intel compiler) or Ėfcheck-bounds (native Linux compiler, at least old versions)).

Mismatched calls for subroutines/functions. If you declare

subroutine multiply(n,a,b,c,errflag)

but then

call multiply(n,a,b,c).

Mismatched declaration: In the above, if you declare the variable a to be real in one part of the code but double precision in the subroutine. Similarly if the arrays have incongruent dimensions.

Failure to initialize variables. When you have a variable, some compilers will initialize it to zero...but not all. Thus the following routine

do i = 1,100

if(i/2*2 .eq. i)nevens = nevens+1

enddo

could give a nonsensical answer, because you didnít initialize nevens = 0.

 

Beyond the above, you may have simply made a mistake in your algorithm or logic. There is no single route to finding a bug. Most broadly:

-- Build your code from the ground up. Test each subroutine as your create it. Donít wait until the very end to test itóit will be harder to find where the bug is. For example, if you are writing a code to do multidimensional quadrature, test your 1D integration routine separately and first.

--Think of simple test cases for which you know the correct answer. Test thoroughly and obsessively. Donít do just one test and assume your code is correct; it probably isnít.

-- If you compare to another personís code, donít just look at the code itself; in the context of this course, that would be cheating, and besides, this method doesnít work for large, complex codes. Instead, look at some intermediate results where you can compare the inner workings of the two codes. For example, suppose you are writing a multidimensional quadrature code using Gauss-Hermite quadrature, and your friend has a working code that does multidimensional quadrature using Bodeís rule. You can compare 1-D or 2-D results. If you are not using a debugger, this generally means using lots of write statements. Get used to it.

 

Again, I cannot emphasize enough the importance of validation through test cases. Try as many variations as practical. Assume something is wrong with you code, and try to find it.

(A good way, incidentally, is to hand over your code to a colleague and let them try it out. They wonít have the same unconscious assumptions as you and may very quickly find either bugs in your code, or something awkward in the way your code works.)