Wednesday, 31 October 2012

Learning languages online with gamification

Gamification—the use of game-like mechanics to motivate non-game activities—is a bit of craze at the moment. There is all manner of websites popping up to help you solve all of your problems. Whether or not they'll work remains to be seen, but for now, I've been experimenting with two that help you to learn (or, rather, consolidate learning of) languages. I'll save comment for later on whether or not gamification works, or even should or can work. Here, I'm just summarizing tools I've been trying, but if you've found others, let me know in the comments.

Duolingo

The idea behind Duolingo is to teach by immersion. This is achieved by forcing you to do a lot of translating and occasionally teaching you new vocabulary. It currently offers to teach English-speakers French, German or Spanish. 

You gain points by completing lessons that involving translating to and from the objective language, or by translating sentences from online content. This content translation is how Duolingo claims to remain free, but I'm not really sure who would be paying for such content. I could imagine a mass-translation of, say, Wikipedia articles by Wikimedia, but not by many others that wouldn't do it in house anyway.

From my experience so far, Duolingo is okay as long as you're building on existing knowledge of a language. In particular, it doesn't seem to be much use for learning the grammar of a language. You pretty much have to infer that for yourself. But if you've previously learned the language and just  lack vocabulary and some idiomatic expressions, Duolingo is probably a reasonably good way to stay in practice.

Memrise

Memrise helps you remember any list of associated things by forcing you to recall them at increasing intervals. This is one area where I also found Duolingo a bit lacking, so it makes for a pretty good complement. I think it limits itself to testing you on 50 items at any one time, which is long but not too long. Possible its biggest downside is its lame metaphor about watering plants and moving them from the greenhouse to the garden...

So far, I've only used one of the officially-endorsed vocabulary lists but there's an enormous number of community-supplied courses, too, including a few grammar courses. I'll let you know how these pan out. For vocab lists, it seems pretty good so far.

Monday, 22 October 2012

Best Practices for Scientific Computing

I recently learned of an article that appeared in the Computer Science section of the arXiv entitled Best Practices for Scientific Computing. It's a sound list of software engineering practices and philosophies that lead to code that is generally easier to understand, operate and maintain. It also links to plenty of resources to help implement these practices.

While understanding, operation and maintenance might not sound like interesting objectives for lonely coders rushing to a minimal result-producing code, remember that your program carries more value (and potentially citations!) if other people are able to use, modify and extend it. Also, don't forget that you need to write code that your future self can understand...

I recommend a quick read of the six-page article, but I've listed here the section titles and emphasized directives.
  1. Write programs for people, not computers.
    1. A program should not require its readers to hold more than a handful of facts in memory at once.
    2. Names should be consistent, distinctive, and meaningful.
    3. Code style and formatting should be consistent.
    4. All aspects of software development should be broken down into tasks roughly an hour long.
  2. Automate repetitive tasks.
    1. Rely on the computer to repeat tasks.
    2. Save recent commands in a file for re-use.
    3. Use a build tool to automate scientific workflows.
  3. Use the computer to record history.
    1. Software tools should be used to track computational work automatically.
  4. Make incremental changes.
    1. Work in small steps with frequent feedback and course correction.
  5. Use version control.
    1. Use a version control system.
    2. Everything that has been created manually should be put in version control.
  6. Don’t repeat yourself (or others).
    1. Every piece of data must have a single authoritative representation in the system.
    2. Code should be modularized rather than copied and pasted.
    3. Re-use code instead of rewriting it.
  7. Plan for mistakes.
    1. Defensive programming: programers should add assertions to programs to check their operation.
    2. Use an off-the-shelf unit testing library.
    3. Turn bugs into test cases.
  8. Optimize software only after it works correctly.
    1. Use a profiler to identify bottlenecks.
    2. Write code in the highest-level language possible.
  9. Document the design and purpose of code rather than its mechanics.
    1. Document interfaces and reasons, not implementations.
    2. Refactor code instead of explaining how it works.
    3. Embed the documentation for a piece of software in that software.
  10. Conduct code reviews.
    1. Use code review and pair programming when bringing someone new up to speed and when tackling particularly tricky design, coding, and debugging problems.
    2. Use an issue tracking tool.
The bigger problem is motivating scientists to spend the time to adopt these practices. But arguing for why they are helpful to code writers is a good place to start.

Sunday, 9 September 2012

Four tips for better SSHing

If you use any large Linux-based computer network, you either are, will be or should be familiar with using Secure Shell (SSH) to connect to networked machines from inside or outside the network. I've been doing so for years, but I only recently discovered this post explaining a range of useful tricks for SSH. I invite you to read the linked post but I've reproduced below the tricks I found handy. Note that these tips are all written with the Linux terminal in mind. Mac users should be okay but I'm not sure about Windows.

Security without passwords

If you find yourself cursing the need to enter your password each time you connect to a machine, you can set up a secure RSA key for a particular connection. You'll be prompted for a passphrase once per boot but that's it. First, open a terminal and type

ssh-keygen

You'll be prompted for a filename and passphrase. You then need to copy the public key to the server, which is most easily accomplished by executing

ssh-copy-id user@your.server.com

at the command line. If this doesn't work or you can't get ssh-copy-id, you can read the full instructions in the link I provided above. Also, if you use ssh-keygen to generate different private keys for different servers, then you should execute

ssh-copy-id -i identity_file user@your.server.com

Use of the same private key on different systems means that if your private key is compromised, your identity can be faked on any relevant server. It's up to you whether you think this is worth risking.

Hostname aliases

Tired of always having to type out ssh user@gateway.​group.​​department.​​institution.com? Fortunately, this is easily fixed but creating aliases for hosts. Open ~/.ssh/config in your favourite text editor and add the segment

Host YourAlias
  HostName gateway.group.department.institution.com

You can now login simply with ssh YourAlias. If the remote username differs from your local one, you'll also need

  User remote_username

in the segment above. For really pro-use, you can use wildcards to group similar aliases. The examples in the link are

Host dev intranet backup 
   HostName %h.internal.example.com 

Host www* mail 
  HostName %h.example.com

Automatic gateway skip

Networks are often set up so that remote access is through a gateway machine. That is, you log into something like gateway.dept.​uni.ac.uk and thence connect to useful_computer.​dept.uni.ac.uk. You can modify your SSH aliases so that this is done automatically. The alias should point to your destination host and have an extra line (called ProxyCommand) that gives the intermediate server. For example,

Host work1
  HostName work1.dept.uni.ac.uk
  ProxyCommand ssh username@gateway.dept.uni.ac.uk -W %h:%p
  User username

The %h:%p part is literal: these wildcards must appear in the alias. If your remote username is the same as your local one, you don't need to specify it in ProxyCommand. You also don't need the User line.

Mount remote filesystems locally

This is a trick I've mentioned before but it's worth raising again now because it inherits behaviour from SSH. To mount a remote folder as if it were local, you can use the sshfs command. (You might need to install it.) First, create a suitable local mount point, e.g.

mkdir ~/remotefs

Then, mount the remote filesystem with

sshfs user@host:/absolute/location ~/remotefs

The useful thing to mention here is that sshfs knows about your aliases. So if you aliased gateway.dept.​uni.ac.uk to uni, you can replace host with uni in the line above.

Have fun and spread the word about the power of SSH! If you have any more pro-tips, let me know in the comments.

Monday, 27 August 2012

Getting Gmail back on CyanogenMod 7

In the distant past, I mentioned my then newly acquired HTC Wildfire and looked forward to converting into a productivity machine. That never happened, mostly on account of it being a pretty slow phone. For example, it can't really handle Skype. It has been very useful for allowing me to sift through my RSS feeds in otherwise wasted time but it hasn't revolutionized how I work. But recently, I've been finding that it has far too little space on the ROM to keep up with newer versions of major mobile apps. So I bit the bullet and ROM'ed my phone. That is, I installed CyanogenMod 7, which allows you to force applications onto external storage (i.e. the SD card).

This post won't tell you how to install the new OS. That's well-covered by the relevant article in the CyanogenMod wiki. However, after installing, I found myself without Gmail. Other Google apps installed correctly with the relevant package but Gmail remained conspicuously absent from the Play store.

The solution I found is to fake your location in the Play store to somewhere in the US, where Gmail is available from the store. For this, you need Market Enabler, available here but not in the store. You'll need to allow non-Market applications but if you've haven't yet, you'll be prompted. Once it's installed, feign your location as a US provider by long-pressing an appropriate choice and changing the location. Gmail should now show up in the Play store and you can install it and happily reconnect to your mailbox.

Friday, 24 August 2012

.gnuplot, the startup script for Gnuplot

I've been using Gnuplot for a long time but I've only now come across the appropriate way to execute a default script when the program runs. Just put your favourite commands in ~/.gnuplot and they'll be executed whenever Gnuplot starts. For example, my .gnuplot file starts with

set term x11
set style data lines

Note that typing reset restores the defaults and does not run .gnuplot again.

Thursday, 7 June 2012

Outreach online

I take it as a given that most universities' scientific departments are involved in some form of "outreach": the organization of and participation in events that allow scientists to engage with the public at a non-specialist level. Love it or hate it, it's something all scientists should know something about and a worthwhile pursuit given things like misrepresentation in the media or various political causes that oppose what we all take as fact. It's up to us to defend, for example, the theory of evolution everywhere from South Carolina to South Korea. And in South Korea, they're winning.

Fortunately, there are several things you can do to help scientific literacy from the comfort of your own home; ways of engaging the public over the Internet about your science.

Wikipedia

A simple start is Wikipedia. The quality of articles varies quite wildly so you can do an unknowable number of people a service by checking the accuracy of articles in your field of expertise. You can surely find an appropriate WikiProject and, somewhere therein, a table of the importance of articles versus their quality. Naturally, you want the most important articles to be good and work downwards in priority from there.

In my case, I swung over the WikiProject Astronomy and found that the articles on accretion and asteroseismology are highly important but start class articles. A good place to contribute. Another place is to start (or, where extant but lacking, improve) articles on individual scientists. There are lists of winners for a number of notable prizes and the associated articles can reasonably be created if they don't already exist.

Q&A sites

At best, outreach allows people a chance to hear from scientists themselves. Without being at such an event, you can also answer their questions on Q&A sites. There's a large and growing number of Stack Exchange websites, which I think are great systems for connecting high quality questions and answers. Look for one in your own area, bearing in mind that it might be under construction or part of another Exchange. (Astronomy.SE was recently subsumed into Physics.SE.)

Then, there's Quora, the StackExchange-like Q&A site for everything. The questions are not required to have objective answers but people ask a lot of science-related questions. You should find plenty under the appropriate tag. For me, they're astronomy and astrophysics.

Netizenship

Finally, I think an important process is to generally participate online on scientific issues. When you read a good story or editorial about a scientific issue, share it. If you read bad news, share it with a sad face; good news, with a happy face. There are plenty of good blogs that deserve support, too. I personally follow Martin Robbins, a journalist who repeatedly points fingers at science coverage in his industry.

I recognize that this last point probably carries the least penetration. If you're sensible about science, it's likely that your friends are too. But maybe your shares will catch the eye of aunt Mabel and she'll share it with Midlandsville-cum-Stream book club and they'll see why evolution really should be taught in schools.

Thursday, 3 May 2012

Extracting data from rasterized images

A while ago, AstroBetter hosted a post about digitizing figures. That is, taking a raster image of a plot and somehow turning that into raw data for you to plot yourself. (Vector data is theoretically available in the image file. If I work out a good way of extracting this data, I'll let you know.) AstroBetter recommends Mac app GraphClick, the registered version of which you can buy for \$8. But I don't have a Mac. In fact, I don't intend to ever own one. And why pay for something that can be done for free?

For the same job, I therefore recommend Engauge on Linux. You can download it as a binary for Windows and Linux. (Supposedly, this site will help you get it running on a Mac but I have no idea how that's done.) Usage is pretty straightforward. For example, I wanted to get the data from Figure 1 of Yabushita (1975).

A PNG image was created by cropping a screenshot of the PDF. Run Engauge and import the figure (Ctrl+I by default). The image comes up with the curves highlighted. First, make sure the "segment fill" tool is selected and then click on the segments of curve for which you'd like to extract raw data. Now, select the "axis tool" and define three axis points. I recommend the origin and the last well-defined point on each axis. Bear in mind that this is how Engauge calibrates the data so describe your axes prudently. For example, if an axis is logarithmic, I use the logarithm rather than the value. In Yabushita's figure, I claim that point "1000" on the x-axis is 3 rather than 1000.

With all the data selected, now hit "File > Export" in the menus and choose a destination. I selected CSV output, which looks like this

x,Curve1
-0.995727,0.870997
-0.950925,0.912782
-0.906123,0.954566
-0.866913,1.00322
-0.827708,1.04497
-0.788499,1.09363
-0.74929,1.14229
-0.710076,1.19786
...

Engauge isn't the only choice. Besides GraphClick, there's also NASA ADS's Dexter tool. There's an independent web implementation here but I found it clunky. Engauge is simple and it's a pity it seems to be deprecated. But it's good enough for me, for now.

What's your favourite tool for digitizing rasterized plots?

Tuesday, 24 April 2012

Scientific plotting with Veusz

If you pay any attention to this blog, you'll know that plotting my day-to-day investigations is done with gnuplot, for which I previously provided a few pro-tips. When it comes to producing plots for papers and presentations, I find gnuplot a bit too clunky. Enter Veusz, a python-based graphical plotting program that is remarkably versatile. It's also written by Jeremy Sanders, currently just a few doors down from my office. He previously wrote a guest post on AstroBetter about it.

The GUI intuitive and powerful. It's easy to make detailed and elegant plots quickly. And there technical benefits too. First, the python-basis means the program runs on just about any platform. Second, it can read a wide range of formats, including fits, the astronomical workhorse. I personally just have space-delimited ASCII, which is dead-easy to work with. Third, Veusz's own scripts are just python batch files, so you can edit them directly. I found this really useful for batch-editing all the figures in my thesis, say, when I wanted to modify the margins of the figures in a uniform way. Finally, and I think most importantly, Veusz is able to export to a wide range of image formats. For paper submissions, there's eps. For internal documents, like a thesis, there's pdf. For presentations, there's png. Many more are supported.

If you don't usually work an environment with a natural plotting capability (e.g. IDL, MATLAB) or you want to making something more detailed without much effort, I highly recommend Veusz. I personally use it for all my publication and presentation plotting.

Examples of my own plots made with Veusz. Left: evolution of a black hole inside a Bondi-type quasi-star. Right: topology of the homologous Lane-Emden equation for six values of the polytropic index n.

Monday, 26 March 2012

Make MNRAS submissions look more like MNRAS

Have you ever submitted an article to Monthly Notices of the Royal Astronomical Society (MNRAS)? If not, I offer you this lousy joke,
Q: Who was the first electricity detective?
A: Sherlock Ohms.
and invite you to stop reading now. The rest of this post is irrelevant to you. 

If your answer was "yes", then you've probably noticed that just putting your article into mn2e.cls doesn't make it look quite like the final publication. If you care enough about such things, read on for a few steps that make a submission look more (but not exactly) like the final product. In short, you can change the font, fix the bibliography and stick to MNRAS style on some minor points.

Fix the font


I would love to know exactly what font MNRAS uses. The closest approximation I've found among the LaTeX defaults is Times. So the first step to improving your article's MNRAS-ness is to simply add

\usepackage{times}

to the preamble. To illustrate the difference, here's a comparison between the default font, Times and whatever MNRAS uses, all at the default font size.

Times is definitely an improvement.

Fix the bibliography

The first fix in the bibliography is to change its size. MNRAS sets the bibliography in smaller type than the main text but the mn2e.cls file doesn't emulate this. To fix things, enclose the bibliography in a \footnotesize block. That is, call the bibliography with something like

\footnotesize{
  \bibliographystyle{mn2e}
  \bibliography{paper}
}

If you use BibTeX and copy entries from NASA ADS, then you might find yourself being asked for more bibliographic information when your paper is being readied for print. Part of the problem is that the BibTeX entries are not always complete. For example, I often find that books and conference proceedings don't include the publishers' names and addresses. You can usually find these in the "default format" entry on ADS and enter them manually. For the appropriate entry names in the bibliography file, look in the LaTeX Wikibook.

The MNRAS bibliography breaks down on long author lists. There are several fixes for this. One is to modify the .bbl file after BibTeX runs and thereafter summon the bibliography with

\footnotesize{
  \input{paper.bbl}
}

so that it isn't recompiled each time. The other option is to use (or make) a different .bst file. Michael Williams has done such a thing and you can download his variant of mn2e.bst here. His variant doesn't work perfectly with the MNRAS directive of listing all three authors on the first citation and as "et al." thereafter. You get output like "Jones, Smith, & White (2012)", which is ugly. I've tried my own hand a custom MNRAS bibliography style to fix all this. You're welcome to download it here, give it a try and send me feedback. If you've worked out your own happy medium between the original MNRAS file and Michael's fix, let me know below.

Fix minor things

For whatever reason, mn2e.cls doesn't appear to flush equations to the left like in the journal. This is fixed by calling the class file with the fleqn option, as in

\documentclass[fleqn]{mn2e}

plus whatever other options you use. I thought this was a result of an outdated mn2e.cls but I'm clearly not the only one.

Finally, there is a host of small things that you shouldn't forget. I've either made these mistakes myself or seen them in arXiv submissions. Hopefully a handy list of common errors will help someone not make the same errors.

  • The article title is set in sentence case.
  • The running header is limited to 45 characters. If the title is longer, you must provide a short one.
  • Names of software packages are set in small caps. i.e. \textsc{code} in LaTeX. I sometimes see folks use other fonts. e.g. \texttt
  • Tables never have double lines.
  • Subreferencing is never done inline. That is, MNRAS won't allow "Cox & Giuli (1968, §27.3)". I don't think they allow page numbers either, even though I think they're useful...
  • No citations in the abstract.
  • "per cent", never %.

Further fixes?

Your submission should now look more like MNRAS but it isn't there yet. If you know how to fix this or anything other outstanding discrepancies, let me know in the comments.

Wednesday, 7 March 2012

Astronomical Software Wants To Be Free

From two articles on the reborn Astrophysics Source Code Library, which are themselves worth reading, I found myself reading a fairly bold arXiv submission from 2009 entitled Astronomical Software Wants To Be Free: A Manifesto. The initial summary cuts to the chase.

We advocate that: (1) the astronomical community consider software as an integral and fundable part of facility construction and science programs; (2) that software release be considered as integral to the open and reproducible scientific process as are publication and data release; (3) that we adopt technologies and repositories for releasing and collaboration on software that have worked for open-source software; (4) that we seek structural incentives to make the release of software and related publications easier for scientist-authors; (5) that we consider new ways of funding the development of grass-roots software; (6) and that we rethink our values to acknowledge that astronomical software development is not just a technical endeavor, but a fundamental part of our scientific practice.
All these points are expanded somewhat in the conclusions. Paraphrasing only the lead sentence of each point, the authors write eight suggestions for moving forward.
  1. We should create an open central repository location at which authors can release software and documentation.
  2. Software release should be an integral and funded part of astronomical projects.
  3. Software release should become an integral part of the publication process.
  4. The barriers to publication of methods and descriptive papers should be lower.
  5. Astronomical programming, statistics and data analysis should be an integral part of the curriculum for undergrad and grad students.
  6. We should encourage interdisciplinary cooperation with like-minded and algorithmically sophisticated members of the computer science community.
  7. We should create more opportunities to fund grass-roots software projects of use to the wider community. 
  8. We should develop institutional support for science programs that attract and support talented scientists who generate software for public release.
Do you agree? Do you adopting these recommendations would help to improve astronomical (or generally scientific) software?

In my own humble opinion, the real hurdles are release and documentation. If you can find a code, good luck installing it, learning how it works or finding out how the code is structured. But these are hurdles because it still isn't in a scientist's interest to do commit time to what are ultimately support tasks. So I think the most important point, of the eight above, is 2: software development should be fundable and funded. There are big gains to be made with solid attempts to get some real software engineering into scientific problems and it costs a lot less than building a telescope.

Thursday, 1 March 2012

Mount a remote filesystem locally with sshfs

I only occasionally need or choose to work from home. When I do, I need to get at my files on the department filesystem. There are many ways to skin this particular cat but a recent blog post by Matt Might mentions one of the easiest ways yet: a small application called sshfs.

The name says it all. This command line program mounts a remote filesystem at a local point over SSH. Usage is simple too. Just enter

sshfs user@host:/remote-folder/ local-mount-point/

and you can start using the files. I now mount my departmental home folder on my laptop and start edit my remote thesis as if it were a local file. To unmount, enter

fusermount -u local-mount-point

I use Linux both in the office and on my personal machines but the Matt Might's blog post mentions how to do it on other systems. It should be easy on Mac OS X because it's so Linux-like under the hood already. You also may have to install sshfs in some Linux distributions. I had to install it on Ubuntu 11.04 but it's as easy as

sudo apt-get install sshfs

which Ubuntu tells you anyway if you don't have sshfs installed.

Have you used sshfs for remote working? Problems to watch out for? Better choices? Let me know below.

Friday, 24 February 2012

Building a local LaTeX tree

If, like me, your sysadmins have you suffering the burden of an archaic OS then, like me, you may also keep a beady eye out for ways of circumnavigating the virtual obstruction. If a by-product is that you suffer old LaTeX packages, there's actually an easy way of installing your own, up-to-date versions. Basically, you can build a local LaTeX package tree, which is searched before the default location. Below, I describe how to do this in Linux. This worked perfectly for me so I can't offer any help if it goes wrong or if you're using another OS. Fortunately this is widely covered on the web so Googling something like "local texmf tree" should net you something useful. There's a lot of information in the LaTeX Wikibook.

In your home folder, create the folder texmf/tex/latex/. Installing a given package boils down to copying the style file and any associated things into a new subfolder in ~/texmf/tex/latex/. After each new package is installed, go back to your home folder and run texhash. Don't worry if it tells you about not being able to access the base folders. (On my system, they involve something like /usr/share/texmf.)

Installing packages comes in three basic flavours. Some packages, like quotchap, are just a single style file.  If you have the relevant .sty file to hand, you can copy that into a new subfolder, run texhash and then start using it. More complicated packages, like microtype, require that you download the .dtx and .ins files and run

latex microtype.ins

to create the package files, which must then be copied into the relevant subfolder in the local texmf tree. Finally, packages that contain a large number of smaller units, like oberdiek, offer an archive that just needs to be extracted into texmf.

I usually found that the README files gave me the necessary information to copy things to the right place. Failing that, you can usually infer the right location from the folder on the CTAN servers.

There are two caveats I'll mention. First, expect some dependencies to crop up when updating very old packages. For example, when I tried my local installation of microtype, I ended up having to update everything in the package under oberdiek. Second, the status of packages that are listed under tex/generic is unclear. I found I had to copy them to subfolders in tex/latex for them to be detected and work properly. (I believe this is because they need to be accessible to pure TeX too.)

Problems? Improvements? Extra tips? Found this useful? Let me know in the comments.

Monday, 20 February 2012

Installing Windows 7 OEM without disks (2)

I previously posted a half-useful rant that claimed to resolve the issue of installing Windows 7 from scratch with an OEM product key. My previous solution, which boiled down to activating via phone, stopped working and I've since found a better solution. It sounds stupid, actually. Basically, it turns my laptop has a sticker with a W7 product key on it and, with that, it activated perfectly normally by connecting to Microsoft over the internet. So if you're trying to re-install W7 from the official ISO images, take a good look around your laptop for a sticker with a product key, particularly if your copy of Windows appears to be registered with the OEM key.

There are a few oddities about this sticker, though. First, the sticker's product key isn't the same as the OEM product key. That is, it isn't the same as the Dell product key that you can find online or the product key that the Windows registry had when I first turned on the laptop or restore factory settings. Second, no-one at Dell technical support thought to mention the sticker when I phoned them and told them that I was re-installing W7 from scratch and the OEM key wasn't working.

Third, the sticker is under the battery.

Looking for your product key? Don't expect it to be easy. Mine is under the battery. (It's the blurred Microsoft tag right of the centre.)
Yes, that's right, I had to remove the battery to find it. I only looked there because I happened to look at the comments to a Lifehacker post with the links to the W7 ISOs.

So when I say take a good look around your laptop, I mean take a good look around your laptop. Let me know in the comments if you've also found a secret sticker...

Friday, 17 February 2012

Good reasons to use PDF(La)TeX

If you're a long-time LaTeX user who still compiles to PDF by converting DVI to PostScript to PDF, you've probably asked yourself why you bother. After all, who even uses PostScript these days? I suspect the answer is no-one. Or at least, no-one who can't also use PDF. Here are a few more reasons to start compiling straight to PDF with PDF(La)TeX. If you always object to PDF because you mainly produce EPS plots, there are clean ways to convert EPS figures near the end of the post.

It's simpler. That is, instead stringing together enough commands to reach the moon, as in,

latex thesis ; dvips thesis.dvi -o ; ps2pdf thesis.ps

you cut it down to

pdflatex thesis

Done.

PDF files are smaller than PS. As above, only a small point but worth mentioning. Most of this is because PDF files are binary whereas PS files can be read as text. PDF also has built-in compression.

PDF looks better. Specifically, it has better font-handling in general, related, I believe, to the way that the fonts are stored in the PDF file. Also, it allows you to use the microtype package, which gives you the awesomeness on the right rather than the ugly sister of the typographic world on the left.
The effect of microtype. The left sample is rendered without microtype; the right sample with. (texblog.net)
The microtype package makes maximum use of advanced typographic things like kerning. So tack \usepackage{microtype} in your next preamble. Even if you write garbage, it'll be beautiful garbage.
I have never been as self-conscious about my handwriting as when I was inking in the caption for this comic. (xkcd.com)
You get hyperlinking and clickable contents in the document. Okay, this is also achieved by compiling through other formats but it's done better when compiling straight to PDF. All you need is to \include{hyperref} in your preamble. There are plenty of options to set so explore the documentation.

Your figures are automatically compressed. Now, I need to be careful with this one because it can be a downer if EPS figures are compressed with the low default quality factor. A lot of people still make use of EPS figures (especially in astronomy) so this might be why you haven't shifted to PDF(La)TeX before.

The easiest way around this is to export your plots straight to PDF. I make my publication plots with Veusz, which has a PDF option. Actually, Veusz is generally awesome and I highly recommend it for high-quality final plotting. I believe that MATLAB also exports to PDF but I'm not sure about gnuplot. Just another reason I only use it for day-to-day purposes.

If you must make EPS plots, you can dictate how the compression is done. Open the EPS file in a text editor and add one of the following snippets at the end of the preamble. i.e. the bit commented out with leading % symbols.

For lossless FlatEncode, add

systemdict /setdistillerparams known {
<< /AutoFilterColorImages false /ColorImageFilter /FlateEncode >> setdistillerparams
} if

Alternatively, you can use lossy DCTEncode but force the quality factor to be very high, in which case you should add
systemdict /setdistillerparams known {
<< /ColorACSImageDict << /QFactor 0.15 /Blend 1 /ColorTransform 1 /HSamples [1 1 1 1] /VSamples [1 1 1 1] >>
    >> setdistillerparams
} if

When you convert the EPS using, say, ps2pdf, there should be no (or little) loss of quality. Thanks to one Gary Steele for these tips. I haven't tested the two algorithms carefully but experiment to see what works for you. Using FlatEncode appeared to be lossless but still occasionally reaped some huge compression ratios.

So that's why you should switch to PDF(La)TeX. When my thesis is done, you can bet there won't be a PS version!

Tuesday, 31 January 2012

My simple LaTeX thesis template

I'm a minimalist so when the time came to finally write my thesis in LaTeX, I wasn't going to spend time fussing about making sure my chapter numbers came out in Cambridge blue. (Apparently, the navy blue default in that package was oh-so-Oxford...) I started, as all good research does, by searching for what others had done before. Some departments provide a thesis class; some host classes used by previous students; and some offer nothing at all. The closest thing in Cambridge is a thesis class at the Department of Engineering (CUED). I found this template, and several others from previous students, overly complicated so I elected to build my own file from the ground up.

I put together a very simple thesis file that covers the basics. In this post, I'll show you what I've done, what it does and how you can change it if you want.

Basics

My material is divided into "front" matter, like an abstract and plagiarism declaration; "main" matter, which is the thesis content itself; and "back" matter, which is here just the bibliography but could include an index. The three-way division led me to start with the default book class. For margins, I use the geometry package. It's a powerful package but I only use it to set the margins (1in) and the binding offset (0.5in). My bibliography is compiled using BibTeX so I call the natbib package too with my favourite options: author, year style wtih round brackets. The bare bones of my thesis file is thus

\documentclass{book}
\usepackage[twoside,margin=1in,bindingoffset=0.5in]{geometry}
\usepackage[round,authoryear]{natbib}
\begin{document}
\input{title}
\frontmatter
\input{dec} % Declaration
\input{abs} % Abstract
\tableofcontents
\listoffigures
\mainmatter
\input{ch1} % Chapter 1
\input{ch2} % Chapter 2
\input{ch3} % Chapter 3
\appendix
\input{apA} % Appendix A
\backmatter
\bibliographystyle{plainnat}
\bibliography{thesis}
\end{document}

That's it! That's all you really need for a LaTeX thesis. But I'm sure you'd like to add a bit more functionality and a light personal touch.

Extra features

There are three extra functional packages I call. First and foremost is graphicx for figures. Next, it's nice to hyperlink cross-references in the text so I summon hyperref. I took this straight from the CUED thesis so I'm not sure if all of the options are necessary. I've also omitted my colour choices for brevity but they're all listed in the documentation. Third, for some extra table of contents goodness, I invoke tocbibind. My actual thesis thus has the three extra preamble lines

\usepackage{tocbibind}
\usepackage{graphicx}
\usepackage[breaklinks = true, linktocpage, pagebackref,
colorlinks = true, hyperindex = true, hyperfigures] {hyperref}

Some style

Finally, we can add some personal style. The main thing is to change the chapter headings. There are a number of choices here. I personally use the quotchap package with a grey number. This gives me space for quotes and a nice chapter heading in only one package.

\usepackage[grey]{quotchap}

Example chapter heading with quotchap. Taken from the official example.

The other option I'd recommend is the fncychap package, which contains about 7 chapter styles you might try and some of which I have seen used here in Cambridge.

The seven chapter styles available in fncychap. Taken from the official examples.

In any case, I set the front matter bits as numberless chapters and use a quote block to increase the margins there.

\chapter*{Declaration}
\begin{quote}
\end{quote}

If you want the front matter to appear in the contents, add after \chapter*{} a line like

\addcontentsline{toc}{chapter}{Declaration}

where chapter indicates the insertion level and Declaration is the text that is added.

I'm in the UK and I subscribe to the style guidelines of Monthly Notices of the Royal Astronomical Society so I use their bibliography style, I use A4 paper and flush my equations to the left. The last two points are accomplished with the [a4paper,fleqn] options in the book class. The bibliography style is invoked with 

\bibliographystyle{mn2e}

where mn2e replaces plainnat.

The last thing to customize is the title page. Mine is a hardcoded mess that works for me so I don't see any point putting it up. I encourage you to play around with some title.tex to put your final stamp on the front.
Throw that all together and you'll have just about everything in my thesis that isn't actual content. This is a quick and simple file which I encourage other people to build on. But it's here to prove that choosing a layout is actually quite straightforward so you can get down to actually writing.
Any suggestions from your thesis files? Further simple additions that add to the final product or things that needn't be here? Let me know in the comments.

Wednesday, 25 January 2012

The wonders of arcsinh

Do you sometimes need to plot a number that has large magnitude but can be positive or negative? I do and I have a trick to do it: instead of plotting x, I plot arcsinhx/2/log10. For positive x, it gives log10x; for negative x, -log10-x. It's quite accurate for x10 and is linear across zero. Pretty much exactly what I need!

Our friend, arcsinh, is the purple curve. The approximating logarithms are in red and blue.


To see why this works, it helps to know that

arcsinx=logx+x2+1

For x0,

logx+x2+1log2x.

For x0,

logx+sqrtx2+1logx+x1+1/2x2 log1/2x=-log-2x.