Tuesday, 14 December 2010

Equations online

Do you need to include equations on your webpage or blog? I didn't think so. But at some point, maybe I will. So I started looking at how to accomplish that. It turns out that, like getting the world to agree on climate change targets, it isn't easy. Here are the options I've found. I'm presuming the kind of person looking to put an equation online is the kind of person who's familiar with LaTeX so all these tools will make use of it.

The language of maths on the web is supposed to be MathML but, although it's been around a while, it's generally poorly adopted. MathML is a part of HTML5, the next najor revision of web-formatting as a whole, but that's a work in progress and implementation is variable. MathML code is also very, very ugly. Given that LaTeX was created to make things more compact, MathML looks like a big step backwards. Apparently this makes it versatile. If I need 23 open and close tags to write the quadratic formula, I'm not convinced that this constitutes progress...

Embedded images

One way of putting equations on your webpage is to create images of your equations and host those. At first glance, that isn't that easy, since you'll want a quick way of producing the images from (La)TeX. Fortunately, CodeCogs have created an online equation editor that'll do the image creation for you. Punch in your equation and you'll be given an image in a chosen format. Better still, you can choose "HTML" from the drop down menu at the bottom and it'll give you an HTML snippet that you can copy. This way they host the image for you. For example,

Note that on some web interfaces, like Blogger, you may have to flip to editing in HTML mode when you paste. Also, I've chosen PNG as the format but it pixelates when scaled. I'm not sure anything else is much better since implementation is iffy. I'd choose SVG (also part of the HTML5 standard) but Internet Explorer is notorious for being the last major browser that doesn't support it. Then again, it's just another reason you shouldn't be using IE anyway.


Another option is to use JavaScript to generate MathML from (La)TeX markup in the page. Dr Douglas R. Woodall created such a script. A newer version exists but I haven't tried it. You can download the script to your own page or link to the version on his site. Either way, you'll need a snippet like

<script type="text/javascript"

inside the "head" block of your HTML file. All very well on your own site but what about a blog? On Blogger/Blogspot, you can do this by clicking "Design" (top right) then "Edit HTML". Place the snippet above after the <head> tag and you're good to go. I'm not sure how you'd do this on other Blog hosting sites. If you know, feel free to add a comment.

You can now include TeX in \$ ... \$ blocks. For example,

$G_{\mu\nu}+\Lambda g_{\mu\nu}=\frac{8\pi G}{c^4}T_{\mu\nu}$


$G_{\mu\nu}+\Lambda g_{\mu\nu}=\frac{8\pi G}{c^4}T_{\mu\nu}$

As before, you may need to edit in HTML mode or similar. If you want a plain \$ sign, mark it up as \\$. This method might be better for inlining stuff. Also, you might not see that properly in Internet Explorer without the MathPlayer plugin. Did I mention that you shouldn't be using IE?

Pure MathML

The final option is to write your equation in MathML. It's a crap option, even if in the future it's meant to be the most universal. The hard part is converting (La)TeX into MathML. One way of doing this is to create a page using the JavaScript above, selecting the equation portion, right-clicking and selecting "View Selection Source". Well, in Firefox 3.6, anyway. Another way, in theory, might be to use the TeX/LaTeX to MathML Online Translator, except that it appears to load for five minutes and then times out when I try to use it. Did I get your hopes up there? I hope so. It got my hopes up too.

Back on track, this is what the MathML produces here on Blogger.


If you want to know, this is the markup that I used:


Did I mention that MathML is ugly? Did you believe me? I bet you do now. There's an obvious catch: this doesn't seem to work! (Let me know what browser you're using if the equation is rendered properly.) I don't understand why not. I'm pretty sure the markup is correct. I tried the example source on Wikipedia and that didn't work either. The JavaScript is turning the TeX markup into MathML, which is then rendered. Yet if I just plug in the MathML, Firefox ignores the tags? I presume there's something about the Blogger implementation that's doing this so it might work on other blogging sites. Either way, it's a lot like my PhD inasmuch as it doesn't make sense.

My choice

Really, MathML should be the way. It should platform-independent and keep the page self-contained. Unfortunately, the Internet cares about standards about as much as Douglas Adams cares about deadlines. Wikipedia realised this too, which is why that invented their own markup that generates embedded images. The same (with CodeCogs' gadget) is my choice for now.

Friday, 10 December 2010

LibreOffice 3 Release Candidate

You may have noticed I'm a fan of OpenOffice.org. Well, it's slightly-more-open fork, LibreOffice, has put out a release candidate, putting it one step closer to full release.

It installs alongside OpenOffice.org, unlike the betas, so you can safely take for a test-drive. I've done exactly this on my Eee 1000 and was positively underwhelmed. This looks like a quick exercise in rebranding to establish a community and kickstart the project. I could hardly find any difference between LO RC1 and OOo 3.2. I hope real work is coming in the future because my biggest use case, presentations, are almost entirely unchanged from OpenOffice.org 3.2. If anything, the slide transitions seem to be slower..

Wednesday, 24 November 2010

Windows XP tweaks: that damned language bar, reverting styles and mouseover focus

Despite Lifehacker's advice to the contrary, I keep my 6-year-old Dell Dimenion 4600 going by re-installing Windows XP about once per year, usually at the start of the UK academic year. I have no problems quickly re-installing all the other software I use too but there are a few other tweaks that aren't so obvious. Note that these are explicitly for XP. I have no experience using Windows 7 or Vista.

Removing the language bar

If XP is installed with more than one language (in my case UK and US Englishes because I have to install US English), then a switch to quickly change them will be placed on the taskbar. There's no real reason for me to change to a US layout, so it's pointless. Better still, if you switch it off, it tends to come back. The best long-lasting way I've found to get rid of it is to run msconfig, go to the "Startup" tab and disable ctfmon.exe. On your first restart, Windows will whinge about running with a modified startup but you can tell it not to whine in future.

Preserving Windows Classic style

Speaking of resilient behaviour that keeps returning, you may find that your Windows Classic style occasionally reverts to Windows XP style. To disable this behaviour, run services.msc and disable the "Themes" service. I haven't found this foolproof but it keeps it fixed on Classic style most of the time.

Mouseover focus

One of the behaviours I like in Linux, though it isn't to everyone's taste, is mouseover focus. That is, if I point at a window, then my actions will occur in that window (without bringing it forward). I find it useful for being able to manipulate more applications without a big screen. I previously edited the registry to accomplish this but I've switched over to one of the XP PowerToys: Tweak UI. It makes a great many hidden options available and I prefer it over the old way. I also used it to increase the context menu speed.

Other PowerToys

If you followed the link, you may have noticed there's a whole range of PowerToys for XP. Most seem to be directed at bringing more modern UI features (e.g. multiple desktops, Alt-Tab previews) to Windows XP. By and large, I don't find them very effective and there seem to be better alternatives out there. I tried the Alt-Tab previewer and the Virtual Desktop Manager. Without putting it too finely, both suck. Tweak UI is the only one I've found useful.

Monday, 1 November 2010

Presenter View in OpenOffice Impress

You may know of PowerPoint's Presenter View. Basically, when you're presenting from a laptop by connecting to a projector or large monitor, it displays the slideshow on the presentation screen and your notes and a timer on your laptop screen. This is a handy way of improving presentations by keeping the clutter off the screen but making sure you still remember to say everything. It also means if you distribute your presentation, you can supply a lot more of the information that you actually presented since your notes will be distributed too. Remember, we're avoiding the read-the-slide mode of presenting.

Now, I saying it's a feature of PowerPoint is fine for people whose superiors pay for an MS Office license. If you prefer or are stuck with free and free (cost and code), the good news is that, although a presenter mode isn't a standard part of the package, OpenOffice.org has an appopriate extension is available. You can download it here. It's also available in the universe repository for Ubuntu 10.04+ as openoffice.org-presenter-console if that's how you prefer to roll.

I've found it works pretty much out-of-the-box on OpenOffice.org 3.2.0 (under Ubuntu 10.04) but there's a lot of discussion on the download page above about various bugs that might crop up. I've found the presenter window appears on the presentation display but just moving it back onto the laptop display fixes that.

Wednesday, 20 October 2010

The troubles with programming in science

As someone who subjects himself daily to struggling with a piece of code largely written before Fortran 77 even existed, this is an issue close to my heart. I warn the lay-reader to skip to the short summary at the end or skip this entirely. This 1500-word rant is not going to give you any insight about how to recycle more, end third world poverty or jailbreak the latest version of iOS. I also warn that I'm going to generalize terribly and accuse everyone of the worst case. I know this is unfair but it serves to highlight what we're all partially guilty of.

It's been my intention to write about the woes of scientific computing for some time. Last week's issue of Nature set the blogosphere alight with two contrapuntal articles about the state of scientific programming, so I too shall enter the fray. Zeeya Merali's News Feature reveals what anyone working with these codes already knew: they're nothing like commercial production quality and science suffers for it. Nick Barnes' World View calls upon scientists to release their code even if they don't think it's release-worthy. Although the idea of openness is part of what we need, unleashing these malformed monsters as they are now isn't the solution. But more on that later.

The problems

So what, ye who have not travelled the puzzling lands of scientific coding ask, are the problems? The code I use to model stars as they evolve is a pretty good example of most of them. In short, it seems that most scientists write code for their own particular problems with the expectation that no-one else will use their code or even want to. Maybe that isn't the underlying problem, but here's the seemingly endless list of troubles that arise.

The first corollary is that the code itself is quite ugly. Some graduate student probably learned some Java in high school, did a 6-month course in C that they've now forgotten, and has finally taught himself just enough Python or Fortran to write something that'll do the job. What's more, he's possibly piled his code on top of the same thing that the last guy in the research group did. Like I said, the code I have to deal with is a prime example. The basics were laid down by one guy back in 1971. Microsoft hadn't been founded and The King was still alive. To understand the layout of the code is to delve into the mind of a scientist in his early career working with computers similar in scale to my office, much less my desk, and then to understand the minds of four decades of successors. It's not quite that bad since there have only been two or three substantial overhauls but it's still very clear where someone new appended his segment.

The second corollary is that there is no guarantee that the code was tested properly at design time. If it was tested, it may have been only by the fact that it reproduces observed data rather than matches a hard analytical result. Luckily, for a code old enough to have children in high school, the fact that it survived means we can probably trust the results but newer codes won't have that sagely edge.

Third, documentation is often scarce and badly written when extant. Part of my unruly code-creature inverts a matrix as a critical part of the calculation. The only direction the author gives is that the relevant subroutine "is a custom-built matrix inverter, which I believe is a good deal smarter the anything I was able to get off the shelf." I'll be amazed if anyone really knows how it works, including the original author. On a recent rewrite, he was quizzed on some of the boundary conditions and could only claim he'd had a good reason for them at the time...

If these codes are examples of software engineering, then the automotive analogy must be a Ford Model T with a beefed up Rolls-Royce Merlin engine duct-taped to a cut out at the back, the windscreen replaced with a perspex sheet that hasn't been crafted at all, the doors long fallen off and replaced with poorly-carved wood leftovers and the only documentation a set of notes in Latin about the Apollo program. There's probably a five-year-old post-it note on the dashboard saying "cut perspex!!! - JS 9/1978".

The product is a code no-one else will understand, much less trust. This means when someone else comes to the same problem, they often write their own code. This has led to various amounts of multiplicity, depending on which field you're in. Stellar evolution is honestly ridiculous. An entire journal volume was dedicated to trying to calibrate them for the sake of the observers. Granted, some amount of this multiplicity is a good thing. It allows us to employ different methods for certain parts of the calculation and to compare methods. But often these codes appear to be only slight variants, "forks", or near-duplicates of each other. Moreover, there are many aspects of the calculation that don't warrant a new fork, just a different subroutine that could be specified at compile time. Finally, the subtle differences are sometimes concealed. As someone who works with stellar evolution codes, for example, I know when authors are actually comparing oranges and apples but readers who themselves work on, say, galaxy formation might not.

Hydrodynamics' example

There is hope, however. Stellar evolution codes were born principally in the 1970s after an efficient algorithm for solving the equations was developed in the early 1960s. Other fields have only become computationally feasible much more recently. On the technological side, this has meant they are more in line with modern conventions. What's more important is that the people who wrote them have had, on average, more training or experience in coding. A good example is fluid modelling, especially on cosmological scales, which really picked up during the 1990s.

To put these simulations in context, note that we cannot construct an experiment the size of the universe. That doesn't makes sense. There isn't enough space. Instead, people write simulations that will model what we think happened. The Millenium Run was a high-profile example which saw weird purple-looking spiderwebs plastered across popular publications. These webs represent structures that form in a big, self-gravitating fluid. In the densest bits we expect to find galaxies. These simulations allow us to predict how material in the Universe, on the scale of galaxies and larger, should be distributed based on our theories. (I could write another blog post about what these simulations and their conflicts with observations have taught us.)

The code that produced all this violet violence, GADGET, was originally written by Volker Springel as a chunk of his PhD. It's well-maintained, tested, and documented, and is used by more people than just Dr Springel. The fluid dynamics folks in general (not just the cosmologists) seem to have a whole host of codes with equally contrived acronyms like ATHENA, FLASH and CASTRO, and all appear to be reasonably well-maintained and used outside just the research group that wrote them. The codes and documentation are updated regularly and released with tests against problems that can be solved analytically. Oh, how this lowly stellar evolutionist dreams of such fine software engineering... (As an aside, hope may have arrived in the form of a computer-scientist-turned-astrophysicist who has very recently tried to introduce a new stellar code.)

To top it all off, the situation we face with our codes is self-defeating. As Greg Wilson pointed out, "for the overwhelming majority of our fellow [scientists] computing is a price your have to pay to do something else that you actually care about". Most of us want to do science rather than write code. In fact, we're under a lot of pressure to produce results, so the less time we spend getting them, the better. But the catch is that unless someone else writes, documents, and maintains a code we can use, we have to do it ourselves. Few of us seem to have the time to polish our code to commerical-like quality so we all write our own substandard packages.

The ways forward, for now and for later

The long-term solution, in my opinion, is to create positions that give people this time. My vision is codes being treated like instruments. The creation of computing facilities with resident scientists is an indication of the investment going into the hardware but the software needs attention too (and I don't just mean for keeping the clusters running). Instruments need instrument scientists.

That kind of paradigm shift won't happen tomorrow, so for now, next time you write a code, write it so that someone else could use it, even if you think it won't be useful to anyone else. You can take a look at The Joel Test and tips from AstroCompute outlining some considerations for larger projects. The basics are to comment, document and test your code as broadly as you can manage for your time and make sure the documentation and tests are available. Give an appropriate amount of time for design, not just implementation. Modularization and version control are more advanced considerations but both are ultimately in your favour.

Where Nick Barnes says "your code is good enough", I rather say "make it good enough". Not "perfect" or "amazing", but at least "good enough" for someone else to pick up and use rather than writing their own code. Most code is closer than the author thinks.


Coding in science is often badly commented, documented and tested. It's also often not released publicly, despite this going against the scientific process. This is changing in some fields and with any luck the change is starting to bleed into others. My hope is for codes to one day be treated like instruments and have dedicated support staff but all scientists should start making an effort to design codes more properly and release them for scrutiny. Everybody wins!

Thursday, 14 October 2010

Equations in OpenOffice.org with OOoLaTeX

I'm a fan of free software. I tell myself it's because I support (some of) the principles of the open source movement but the reality is that I'm a poor student who can't afford proprietary software. Whichever is the truth, it means I use OpenOffice.org as my office package, at least while LibreOffice takes shape. Like most office suites, the equation editor sucks, but there's a useful extension that helps, even if its name looks like an excerpt from a 12-year-old's text message: OOoLaTeX.

Installation and operation is straightforward and well-described on their website. Basically, download the package, then add it via the extensions interface in OpenOffice.org. You'll be granted a new toolbar with which to add equations by entering LaTeX markup.

The only worry I had was installing the fonts without superuser privileges. This might affect you too if you're working on a Linux-based system in your department. Download the archive from SourceForge and extract its contents into a "/.fonts/" subdirectory in your home directory ("~"). You might need to open each font in a viewer and install manually.

Now I can quickly add and edit equations in my presentations with LaTeX markup, saving time for more useful things. Like compulsively refreshing Google Reader. Or blogging...

Wednesday, 22 September 2010

Four gnuplot Tricks

Depending on what you do on a day-to-day basis, you might use the plotting program gnuplot. If you film dolphins underwater, probably not. But if, like me, you have to digest pages and pages of columned simulation data, maybe you do, and in that case I have a few tips and tricks you might not know about.

I'm going to presume that you're reading this as someone quite familiar with gnuplot. It might not be pretty but it's quick and shows me what I need to see. (This sentence could be removed from context with disastrous consequences...) I don't use it for producing plots for publication or presentation but if I just need a quick look at what's going wrong with my PhD, gnuplot fits the bill. With a few abbreviations to speed up the process, it's actually really useful.

All of these tips are from my daily routine which takes place in Bash terminals, so I can't guarantee they'll work on every system. In particular, I'm not sure about how much will work with Windows. But why are you trying to do work in Windows?

Plot Window Shortcuts

There are a couple of commands you can issue while the plot window is selected. For example, pressing "e" issues a replot command. For a complete list, select a plot window and press "h". The list will appear in the command window.

Terminal Abbreviations

This is a really simple trick that I'm including just in case you missed out on it. One can abbreviate just about every terminal command in gnuplot. For example,

plot "file" using 1:2

can be shortened to

p "file" u 1:2

If you're used to hacking away at the terminal, this is a must. What's more, this isn't from a standard list: as long as the abbreviation is unambiguous in context, it should work. So p above could just as well be pl.

Using Shell Commands in the Filename

What if you only want to plot from the last 200 lines of the file? There are some native gnuplot instructions that will do this but an alternative way, for the example above, is

p "< tail -200 file" u 1:2

See what I did there? You can effectively construct a new output stream with whatever terminal command you'd like. I personally have a formidable armoury of sed, awk and egrep commands that I use to work with my data. Okay, formidable for the PhDemon rather than the US Marine Corps, but I don't have to wrestle with the US Marines Corps on a daily basis.

Command History Search

A familiar trick for Bash users is the ability to search your history for a previously issued command. Well, if you didn't know about that, now you do. If you don't feel like pressing the up arrow 206 times, press "Ctrl+R" to get a mini-prompt which you can use to search all your previous commands. I suspect this is an inherited Bash feature, so I'm not sure if it works in all circumstances but it definitely works in GNOME.

Tuesday, 7 September 2010

My reasons for using RSS and Google Reader

Taking advice from Matt Might, I took another step into the 21st century a short while ago by starting to use Google Reader to aggregate RSS feeds. For the sake of productivity, if you don't use RSS feeds already, you really should. Here are my reasons why, along with how I'm doing it. (If you don't know what RSS is, have a look at the opening paragraph of the Wikipedia article.)


By subscribing to an RSS feed, you're guaranteed to miss nothing. For webcomics, that means I save time because I don't have to open a new tab. It's on a page I'm already looking at. For news sites, it means I don't look at a frontpage of stories I've already seen.

For academic journals, it means I don't end up double-checking titles or abstracts I've already seen. I find this particularly useful because the journals I look at aren't very good about letting you know what's new since the last visit, even though they have facilities for displaying accepted articles before they appear in the printed copy.

For everything, it means that if I'm away for a while, I don't have to go back through the logs trying to work out where I left off. It's all waiting for me in Reader and I know whether I've seen it or not. Okay, granted that might be many hundreds of posts, but at least I won't miss a paper that's scoops my PhD project...

As time goes on, you'll find it quick to sift through your feeds. But beware the temptation to gorge yourself like some kind of infovore by subscribing to every technology news site on the Internet.

Google Reader

Like all Google's web tools, I use Reader because it's accessible from anywhere. In my case, this means I recover some dead time at home. I mostly restrict working to the workplace. After all, I have better odds against the PhDemon when I have 4 cores, 4GB of RAM and a 24" screen. But with the RSS feeds for journals coming through, I can look at the feeds and clean out stuff I don't want to read even on my six-year-old P4 hand-me-down. Conversely, Flash isn't installed on the computers we use in the office, so I leave embedded videos for home viewing. Basically, I look at materials where it's most convenient.

A lesser point is that I've found Reader's recommendations pretty good. It's how I picked up on a blog like AstroBetter. It's also meme-tastic. My recommendations always seem to be right on top of dominant virals and popular websites. The Wilderness Downtown, Arcade Fire and Google's HTML5 demo, popped up on a recommendation shortly before appearing in every tech blog across the Internet. (Another reason for not being an infovore is the redundancy. I'm pretty sure all the science sites base their stories on the same limited set of press releases.)

Custom Feeds

All this said, some RSS feeds aren't perfect. I'm a reader of the webcomic Ctrl+Alt+Del. The RSS feed on the site doesn't seem to display the comic... which is kind of the point of a feed for a comic. That means I have to click to get it. Fortunately, someone else who knows more than me has made a fix.

Moreover, some feeds that you might want don't seem to exist. Don't despair though: if you search around a bit, you may well find that again, someone else has fixed the problem. I looked hard for a Zapiro feed on the sites of the newspapers that print his cartoons but instead I found it here.

So if there's something you read online, find a feed. Seek the orange button. Remember an unofficial one might be more efficient or exist where there's no official alternative at all. With a good aggregator, like Google Reader, you'll save yourself a lot of time, which either means you can spend more time elsewhere or just read more on the web. Instead of going out looking for stuff on the web, you can make sure it's sent to you.

Friday, 3 September 2010

Finding symbols in LaTeX

There are a lot of symbols in LaTeX and finding the right one, especially when first learning, can be a bit of a pain. It happens very infrequently now that I've been using LaTeX for more than four years, but when I need to find some symbol, I look to two resources.

Cheat Sheets and Reference Cards

A few people have put together one or two page tables of the most commonly used constructs. They're generally referred to as "cheat sheets", "reference cards" or something like that. Googling them will give you plenty of hits. The two that have sat on my desk since I started using LaTeX are Winston Chang's LaTeX 2e Cheat Sheet and J. H. Silverman's TeX Reference Card. Print out the PDFs and just stick them somewhere nearby and accessible.


DeTeXify is an amazing HTML5 tool by Philipp Kühl and Daniel Kirsch. Just try to draw the symbol you want and the script will give you its best guesses. If you're a seasoned LaTeX user, I'd recommend visiting it and trying your most obscure symbols so you can teach it. This tool should be spread as far and wide across the Internet as possible.

Note that because the applet uses HTML5, you'll need a recent browser. The page itself claims that Opera 9.6+, Firefox 3.5+ and Safari 4+ work. Google Chrome will almost certainly work too.

Wednesday, 1 September 2010

Joining and including PDFs in LaTeX

Have you ever wanted to join a set of separated PDFs together? Maybe a book broken into chapters or sections? There's a useful package that makes this easy with LaTeX: pdfpages. As far as I can tell, it's included in most LaTeX distributions. You can find the complete documentation at the CTAN entry but basic use is simple.

Basic Use

Include the package by placing the line


in your document's preamble i.e. before \begin{document}. Then, where you want a PDF to appear, put the line


where file.pdf is the relevant file. If you only want to include pages x through y, use


instead. If you omit x and y completely it includes the entire document. Note that when you compile, you have to compile straight to PDF: as far as I can tell, this won't work if you compile to DVI or PS first.

The main use I have for this is to join files that represent parts of a single document back together. I come across such situations often in my searches for lecture notes and other free materials. I now just a have a single LaTeX file with a string of includepdf's that I adjust each time, compile and then rename the output.


As a part-worked example, here's how I joined the obsolete Numerical Recipes in Fortran 77, Second Edition, into a single (monstrous) PDF. The second edition of NR is available free for C, Fortran 77 and Fortran 90. The PDFs are separated into sections: each file in the F77 version is labeled fc-s.pdf, where c is the chapter number and s is the section number. Beyond the introductory materials, many sections don't start on new pages so some pages are doubled up. I included each file with pdfpages and specified the page range to skip the first pages of sections that don't start on new pages. The result is a LaTeX file that looks like this:





I compile it with pdflatex and now have NR F77 in a single file for reference.

Tuesday, 31 August 2010


Which straw broke the camel's back? I don't know, but my skepticism and resistance have waned and I've decided to start a blog. It was probably the fifth and sixth of Matt Might's six blog tips for busy academics.

I've spent part of my PhD hunting down little trickses and fixes for a variety of things (mostly related to computing) so my starting point is to share some of those. Maybe you already know them and found them elsewhere, but I imagine that if I had to look, other people like me will be looking too, so here's hoping I can help them.

Every now and then I might spill out an opinion or observation from some aspect of my daily toils and what I've been able to find out about it. So amongst the useful, I hope you'll find something interesting, too.