Wednesday, 20 October 2010

The troubles with programming in science

As someone who subjects himself daily to struggling with a piece of code largely written before Fortran 77 even existed, this is an issue close to my heart. I warn the lay-reader to skip to the short summary at the end or skip this entirely. This 1500-word rant is not going to give you any insight about how to recycle more, end third world poverty or jailbreak the latest version of iOS. I also warn that I'm going to generalize terribly and accuse everyone of the worst case. I know this is unfair but it serves to highlight what we're all partially guilty of.

It's been my intention to write about the woes of scientific computing for some time. Last week's issue of Nature set the blogosphere alight with two contrapuntal articles about the state of scientific programming, so I too shall enter the fray. Zeeya Merali's News Feature reveals what anyone working with these codes already knew: they're nothing like commercial production quality and science suffers for it. Nick Barnes' World View calls upon scientists to release their code even if they don't think it's release-worthy. Although the idea of openness is part of what we need, unleashing these malformed monsters as they are now isn't the solution. But more on that later.

The problems

So what, ye who have not travelled the puzzling lands of scientific coding ask, are the problems? The code I use to model stars as they evolve is a pretty good example of most of them. In short, it seems that most scientists write code for their own particular problems with the expectation that no-one else will use their code or even want to. Maybe that isn't the underlying problem, but here's the seemingly endless list of troubles that arise.

The first corollary is that the code itself is quite ugly. Some graduate student probably learned some Java in high school, did a 6-month course in C that they've now forgotten, and has finally taught himself just enough Python or Fortran to write something that'll do the job. What's more, he's possibly piled his code on top of the same thing that the last guy in the research group did. Like I said, the code I have to deal with is a prime example. The basics were laid down by one guy back in 1971. Microsoft hadn't been founded and The King was still alive. To understand the layout of the code is to delve into the mind of a scientist in his early career working with computers similar in scale to my office, much less my desk, and then to understand the minds of four decades of successors. It's not quite that bad since there have only been two or three substantial overhauls but it's still very clear where someone new appended his segment.

The second corollary is that there is no guarantee that the code was tested properly at design time. If it was tested, it may have been only by the fact that it reproduces observed data rather than matches a hard analytical result. Luckily, for a code old enough to have children in high school, the fact that it survived means we can probably trust the results but newer codes won't have that sagely edge.

Third, documentation is often scarce and badly written when extant. Part of my unruly code-creature inverts a matrix as a critical part of the calculation. The only direction the author gives is that the relevant subroutine "is a custom-built matrix inverter, which I believe is a good deal smarter the anything I was able to get off the shelf." I'll be amazed if anyone really knows how it works, including the original author. On a recent rewrite, he was quizzed on some of the boundary conditions and could only claim he'd had a good reason for them at the time...

If these codes are examples of software engineering, then the automotive analogy must be a Ford Model T with a beefed up Rolls-Royce Merlin engine duct-taped to a cut out at the back, the windscreen replaced with a perspex sheet that hasn't been crafted at all, the doors long fallen off and replaced with poorly-carved wood leftovers and the only documentation a set of notes in Latin about the Apollo program. There's probably a five-year-old post-it note on the dashboard saying "cut perspex!!! - JS 9/1978".

The product is a code no-one else will understand, much less trust. This means when someone else comes to the same problem, they often write their own code. This has led to various amounts of multiplicity, depending on which field you're in. Stellar evolution is honestly ridiculous. An entire journal volume was dedicated to trying to calibrate them for the sake of the observers. Granted, some amount of this multiplicity is a good thing. It allows us to employ different methods for certain parts of the calculation and to compare methods. But often these codes appear to be only slight variants, "forks", or near-duplicates of each other. Moreover, there are many aspects of the calculation that don't warrant a new fork, just a different subroutine that could be specified at compile time. Finally, the subtle differences are sometimes concealed. As someone who works with stellar evolution codes, for example, I know when authors are actually comparing oranges and apples but readers who themselves work on, say, galaxy formation might not.

Hydrodynamics' example

There is hope, however. Stellar evolution codes were born principally in the 1970s after an efficient algorithm for solving the equations was developed in the early 1960s. Other fields have only become computationally feasible much more recently. On the technological side, this has meant they are more in line with modern conventions. What's more important is that the people who wrote them have had, on average, more training or experience in coding. A good example is fluid modelling, especially on cosmological scales, which really picked up during the 1990s.

To put these simulations in context, note that we cannot construct an experiment the size of the universe. That doesn't makes sense. There isn't enough space. Instead, people write simulations that will model what we think happened. The Millenium Run was a high-profile example which saw weird purple-looking spiderwebs plastered across popular publications. These webs represent structures that form in a big, self-gravitating fluid. In the densest bits we expect to find galaxies. These simulations allow us to predict how material in the Universe, on the scale of galaxies and larger, should be distributed based on our theories. (I could write another blog post about what these simulations and their conflicts with observations have taught us.)

The code that produced all this violet violence, GADGET, was originally written by Volker Springel as a chunk of his PhD. It's well-maintained, tested, and documented, and is used by more people than just Dr Springel. The fluid dynamics folks in general (not just the cosmologists) seem to have a whole host of codes with equally contrived acronyms like ATHENA, FLASH and CASTRO, and all appear to be reasonably well-maintained and used outside just the research group that wrote them. The codes and documentation are updated regularly and released with tests against problems that can be solved analytically. Oh, how this lowly stellar evolutionist dreams of such fine software engineering... (As an aside, hope may have arrived in the form of a computer-scientist-turned-astrophysicist who has very recently tried to introduce a new stellar code.)

To top it all off, the situation we face with our codes is self-defeating. As Greg Wilson pointed out, "for the overwhelming majority of our fellow [scientists] computing is a price your have to pay to do something else that you actually care about". Most of us want to do science rather than write code. In fact, we're under a lot of pressure to produce results, so the less time we spend getting them, the better. But the catch is that unless someone else writes, documents, and maintains a code we can use, we have to do it ourselves. Few of us seem to have the time to polish our code to commerical-like quality so we all write our own substandard packages.

The ways forward, for now and for later

The long-term solution, in my opinion, is to create positions that give people this time. My vision is codes being treated like instruments. The creation of computing facilities with resident scientists is an indication of the investment going into the hardware but the software needs attention too (and I don't just mean for keeping the clusters running). Instruments need instrument scientists.

That kind of paradigm shift won't happen tomorrow, so for now, next time you write a code, write it so that someone else could use it, even if you think it won't be useful to anyone else. You can take a look at The Joel Test and tips from AstroCompute outlining some considerations for larger projects. The basics are to comment, document and test your code as broadly as you can manage for your time and make sure the documentation and tests are available. Give an appropriate amount of time for design, not just implementation. Modularization and version control are more advanced considerations but both are ultimately in your favour.

Where Nick Barnes says "your code is good enough", I rather say "make it good enough". Not "perfect" or "amazing", but at least "good enough" for someone else to pick up and use rather than writing their own code. Most code is closer than the author thinks.


Coding in science is often badly commented, documented and tested. It's also often not released publicly, despite this going against the scientific process. This is changing in some fields and with any luck the change is starting to bleed into others. My hope is for codes to one day be treated like instruments and have dedicated support staff but all scientists should start making an effort to design codes more properly and release them for scrutiny. Everybody wins!

1 comment:

  1. Ohhhh Warrick, I wish I could tell you that it's better out here in production-land. I work with a gargantuan front-end (fnarr) that was created specifically to be maintained and customised by clients, but the documentation is woeful, the commenting non-existent or worse, and the testing mechanical and shamefully inadequate in the face of real-world data.

    But I digress. This is an excellent article. Imagine the explosion the FOSS community would experience if all research software development were conducted in the public domain, with common-interest research communities hosting their own source. Source control, bug tracking, peer review and documentation are pretty much built-in then, and you'd have an army of informed co-workers for any given problem.

    Of course, then you're opening yourself up to the problem of losing the publishing-race if a large part of your research depends on developing algorithms and computational models. You'd lose that knowledge-capitalist engine of competitive research. It's a tough one...

    Anyway, thanks for the Joel Test link. I'll point my managers to it someday (esp. 8 & 9) and say, "This!! THIS is why I can't do Sys Admin stuff as well!"