Monday, 22 October 2012

Best Practices for Scientific Computing

I recently learned of an article that appeared in the Computer Science section of the arXiv entitled Best Practices for Scientific Computing. It's a sound list of software engineering practices and philosophies that lead to code that is generally easier to understand, operate and maintain. It also links to plenty of resources to help implement these practices.

While understanding, operation and maintenance might not sound like interesting objectives for lonely coders rushing to a minimal result-producing code, remember that your program carries more value (and potentially citations!) if other people are able to use, modify and extend it. Also, don't forget that you need to write code that your future self can understand...

I recommend a quick read of the six-page article, but I've listed here the section titles and emphasized directives.
  1. Write programs for people, not computers.
    1. A program should not require its readers to hold more than a handful of facts in memory at once.
    2. Names should be consistent, distinctive, and meaningful.
    3. Code style and formatting should be consistent.
    4. All aspects of software development should be broken down into tasks roughly an hour long.
  2. Automate repetitive tasks.
    1. Rely on the computer to repeat tasks.
    2. Save recent commands in a file for re-use.
    3. Use a build tool to automate scientific workflows.
  3. Use the computer to record history.
    1. Software tools should be used to track computational work automatically.
  4. Make incremental changes.
    1. Work in small steps with frequent feedback and course correction.
  5. Use version control.
    1. Use a version control system.
    2. Everything that has been created manually should be put in version control.
  6. Don’t repeat yourself (or others).
    1. Every piece of data must have a single authoritative representation in the system.
    2. Code should be modularized rather than copied and pasted.
    3. Re-use code instead of rewriting it.
  7. Plan for mistakes.
    1. Defensive programming: programers should add assertions to programs to check their operation.
    2. Use an off-the-shelf unit testing library.
    3. Turn bugs into test cases.
  8. Optimize software only after it works correctly.
    1. Use a profiler to identify bottlenecks.
    2. Write code in the highest-level language possible.
  9. Document the design and purpose of code rather than its mechanics.
    1. Document interfaces and reasons, not implementations.
    2. Refactor code instead of explaining how it works.
    3. Embed the documentation for a piece of software in that software.
  10. Conduct code reviews.
    1. Use code review and pair programming when bringing someone new up to speed and when tackling particularly tricky design, coding, and debugging problems.
    2. Use an issue tracking tool.
The bigger problem is motivating scientists to spend the time to adopt these practices. But arguing for why they are helpful to code writers is a good place to start.

1 comment:

  1. Looking at points 3, 4 and 5, anyone who would waste time developing code on a platform other than VMS probably doesn't know VMS. 32767 versions of each file are provided automatically by the operating system. I routinely have the code in an editor, make a change, save the changes (i.e. create a new version of the file), compile, link and run in another window. Each such iteration produces new files with source code, object files, executable file and output file for the program, with timestamps of course. The only limit is disk space, but these days that is not even a concern for stuff like this.