Got bugs? Here’s how to catch the errors in your scientific software

Science is becoming increasingly computational. Experimental data must be logged, cleaned, checked and analysed. Data analysis often involves iterative trial and error using ‘scripting’ programming languages such as Python and R. The outputs of such programs are then included in papers, presentations and grant applications.

A typical piece of professional software contains up to 50 errors per 1,000 lines of code (D. A. W. Soergel F1000Research 3, 303; 2015). But scientific code, which is written mainly by graduate students and postdocs who have little to no training in software development, is even more error-prone. Self-taught coders — and the artificial-intelligence-driven assistants they sometimes use — can create programs that seem to work yet generate nonsense, says computer scientist Amy Ko at the Information School at the University of Washington in Seattle. “If you have a program that computes something, it doesn’t mean that it’s correct.”

How to fix your scientific coding errors

Sometimes code fails to run altogether — because of a syntax error, for instance. This, “is annoying, but not the end of the world”, says ecologist and programmer Ethan White at the University of Florida at Gainesville. It’s easily fixed, he says. “The worst kind of code is code that executes but is wrong”.

Enter debugging, a crucial skill for software developers that is rarely taught to scientist-coders. Debugging “is like a detective story where you are both the investigator and the murderer”, says Andreas Zeller, a software engineer at the CISPA Helmholtz Center for Information Security in Saarbrücken, Germany, and author of The Debugging Book).

Debugging basics

Nature asked computing specialists to share their tips for debugging and ensuring that code does what it is supposed to do.

Document the conditions that cause the bug to appear. Is there a problematic input, for instance? If possible, identify a minimal working example (using stripped-down data or code, plus locked-down variables such as seeds for random-number generators) to replicate the problem easily. Then, iron the bugs out.

Use print statements. The simplest approach to debugging is to litter your code with ‘print’ commands that reveal a program’s internal state as it runs. While iterating over a collection of files to compute a number, for instance, you can ask your code to output the current file, current value and running tally.

Python’s ‘logging’ library provides a mechanism for doing this with varying degrees of verbosity, says Toby Hodges, who is based in Heidelberg, Germany, and is the curriculum director at The Carpentries, a global non-profit organization that teaches computational skills to researchers.

... continue reading