Tuesday, 20 December 2011

Elements of scientific style

Scientists must write. In fact, between papers, posters and proposals, they must write a lot. So it's somewhat surprising that many science students are never given formal teaching in how to write well. They are expected to learn through a grand process of trial and error, with each submission throughout a degree corrected, graded and returned. The language in scientific publications is usually correct but I find that it is often unclear and cumbersome. What it lacks, I believe, is style.

All writers should be motivated to write well. What is better written is more widely read and that is what all scientists want. An important addition to this thought is that many readers of English-language journals are not first language speakers. They are less likely to understand your use of future perfect tense. All the more reason to avoid complicated constructions.

The time has at last come for me to start preparing my PhD thesis. To cultivate some inspiration, I've been trawling through a few materials about how to write better. This post is a summary of my findings. It is advice to myself that you may also find useful.

I think everyone should read two items. First, Section III of Strunk & White's Elements of Style describes straightforward ways to improve composition and provides contrasting examples of good and bad composition. Second, The Economist Style Guide, which was available online, is almost entirely dedicated to concise, jargon-free word choice. It is also populated with dry humour in its examples.

Technical points can be drawn from the style guides of relevant scientific organisations. For example, the American Institute of Physics and International Astronomical Union have style manuals that should be read by physicists and astronomers. Also, Donald Knuth (of TeX fame) released notes on mathematical writing based on a course at Stanford.

Composition Starts with a C...

... and so does everything about it. In many guides, the basics of style are Cs. The American Institute of Physics Style Manual is most explicit, declaring that writing should be clear, concise, and complete. But how does one achieve these things?

Clear writing is achieved through short, active sentences, at least for the important points. Readers remember short, sharp statements. Avoid long, wandering sentences, where a large number of clauses can lead to confusion, and especially avoid a sequence of loose sentences. Try to use simple tenses. The present tense usually suffices. If you find yourself writing in the future perfect or past continuous, consider for a moment whether it is necessary.

Concise writing is borne from frugal use of words. For example, "The fact that..." can almost always be reduced to "That..." Do not declare that a result is "very interesting" or say things "in the opinion of the present authors". Let the reader decide if you're not just stating the obvious. Aside from using fewer words, you can also use shorter, simpler words. The Economist Style Guide is full of examples. The added danger of complicated words is that they can be misused and misunderstood. As The Economist writes of underprivileged, sometimes they don't even make sense.
Since a privilege is a special favour or advantage, it is by definition not something to which everyone is entitled. So underprivileged, by implying the right to privileges for all, is not just ugly jargon but also nonsense.
To ensure that writing is complete is difficult in technical work. There is a fine line between writing everything an article needs to be logically complete and writing more than is necessary. To evaluate how much to write, you must first understand the audience for whom you are writing. Writing a conference proceeding for a small meeting of researchers on a specific subfield requires less background material than a widely-circulated journal or a fellowship proposal that will be read by scientists in other fields.

There are some simple things that are fairly obvious. For a start, all symbols and abbreviations must be defined and, if necessary, explained appropriately too. You may define all sorts of derived quantities but it will help the reader if you explain why they are interesting.

Another point about completeness regards the use of citations. Personally, I think some writers feel that a citation absolves them of any need to explain the content of the cited work. Many articles rely heavily on work presented fully in other articles. A citation should be accompanied by some explanation of the logic that leads to whatever conclusion or result is being employed, especially if that work is itself very long. A sentence or two may save your reader the trouble of looking up another paper and therefore make him more likely to continue reading. It is offputting to open a six-page article to find that the authors rely heavily on detailed and subtle calculations or measurements in a poorly-written 30-page epic.

Another C that can be added is to make sure your writing is correct. You shouldn't be leaving hanging particles or hidden verbs but there are subtler errors in English usage. Many are explained in The Economist Style Guide. A dry-witted example is the difference between "among" and "between":
To fall between two stools, however painful, is grammatically acceptable; to fall between the cracks is to challenge the laws of physics.

Structured writing

It's no secret that most scientists peruse papers briefly before committing to reading them in detail. It pays to write for these perusers by structuring content appropriately. Besides, structured writing helps lead the reader along structured thought.

The smallest logical unit of composition is the paragraph. Start each paragraph with a sentence that captures its content so that those who are skimming through will quickly get an idea of how the article presents its content. The paragraph principle leads me to plan a paper from the top down, all the way to the paragraph level.

When it comes to arranging paragraphs, I found an interesting point in a talk given at MIT. Points made in paragraphs form a logical sequence but there are usually multiple dependencies between these thoughts. More than one idea depends on more than one preceding idea. So how should one link the paragraphs? The answer is to choose an arrangement that gives the fewest "crossovers", as in the image below.



The logical relationship between the paragraphs is shown at the left. If we write all the starting points in sequence (the "layered" approach), we end up crossing back and forth between logical sequences. By instead using a "linear" approach, we avoid this problem, even though some logical jumps are necessary to bring everything together.

Special problems for technical writers

Some advice on style must be cast aside in technical writing. For example, it is difficult to write without jargon, though it can be kept to a minimum. There is no better way of describing a photometric redshift than to call it a photometric redshift. So use those words but explain them if your target audience warrants it.

There are potential problems with technical words that carry undesirable connotations in other contexts. For example, in a recent paper, I describe a choice of equations that allows "arbitrary boundary conditions". A co-author pointed out that "arbitrary" is usually synonymous with "pointless" or "not really worth pursuing". But in the mathematical context, it is precisely the correct word to use. It sounds bad to the lay reader but it is technically correct so I stuck with it. If you can avoid such words, do so, but not if the price is the precise technical meaning.

Mathematical and theoretical writing is often written in a particular textbook-like style which makes for dense and heavy reading. You can help the reader by slowing down the concept-barrage. Restate definitions in simple ways. Explain why a definition or equation is interesting. Separate technical segments with paragraphs written in a more conversational style. Even in less technical or mathematical sections, make sure that you have always motivated the detailed calculations you have written.

The final C

The advice in the previous paragraph may seem to contradict the principle of concise writing and this brings us to a final C: compromise. George Orwell's sixth elementary rule is quoted in The Economist:
Break any of these rules sooner than say anything outright barbarous.
Ultimately, following these rules steadfastly may lead to a clunky sentence that simply does not read well. You can do well within the guidelines of good writing but there are occasions where a rule is better off broken. Perhaps a sentence sounds better if it begins with a conjunction or there is no escaping the precise meaning of clunky phrase. In these cases, so be it.

The final piece of advice is point 15 from Knuth's opening section:
There is a definite rhythm in sentences. Read what you have written, and change the wording if it does not flow smoothly.
If the proof of the pudding is in the eating, then the proof of the writing is in the reading. Follow this rule above all.

Tuesday, 13 December 2011

Farewell, old friend

I hate waste. Most of my calculations are done on the flip-sides of single-printed pages. When arXiv articles are uploaded in referee format, I download the source and recompile in a journal style to save paper. The water that collects in our dehumidifier is poured into our toilet's cistern to save about one litre per day. I throw away little in the usually vain hope that it can be re-used. Thus, it was with great pain that I finally conceded that there is no longer a place in my life for an old friend. It's been a fixture in my rooms since I arrived in the UK, and before that, a fixture in my brother's since shortly after he arrived in the UK.

Maybe you'll be surprised to hear that said fixture was the desktop computer he bought, back in 2004. Though not top of the line even then, this particular Dell Inspiron 4600, helped by regular re-installation of Windows XP, a small RAM upgrade and occasional dust-removal, has been chugging along happily since then. It's in fantastic condition. After adding a GeForce 6600GT, it even ran StarCraft 2. I'd have kept it if I hadn't acquired a new laptop through my department to replace it.

Having cared so much for this computer, I wanted it to go to a good home, preferably a charity. It will soon feature at the hotdesk of The Humanitarian Centre, Cambridge. But in order to keep it useful, I figured I should include the hard drive rather than remove it for security purposes. And this meant securely formatting the drive, frequently called nuking.

The tool for this is Darik's Boot and Nuke. I had some trouble with the then latest version but the older 1.0.7 worked fine and I don't imagine much has changed the system is quite idiot-proof. Burn the image as a bootable DVD, boot, and follow the instructions, which includes a frightening sounding selection of algorithms. Some are associated with the likes of the Royal Canadian Mounted Police or the US Department of Defense. Good enough for the DoD? Good enough for me.

So, having nuked the drive, my computer has been passed on. It's inspired a moment's reflection about how technology sometimes makes us lazy and wasteful. I now have a new computer, which I hope will also last at least 4 years. Presuming I don't hurl it out a window in a Windows-inspired moment of blind rage. But in the age of phone upgrades every other year and a new iProduct iteration, I'm not sure how many people still build things to last.