Thursday, 3 May 2012

Extracting data from rasterized images

A while ago, AstroBetter hosted a post about digitizing figures. That is, taking a raster image of a plot and somehow turning that into raw data for you to plot yourself. (Vector data is theoretically available in the image file. If I work out a good way of extracting this data, I'll let you know.) AstroBetter recommends Mac app GraphClick, the registered version of which you can buy for \$8. But I don't have a Mac. In fact, I don't intend to ever own one. And why pay for something that can be done for free?

For the same job, I therefore recommend Engauge on Linux. You can download it as a binary for Windows and Linux. (Supposedly, this site will help you get it running on a Mac but I have no idea how that's done.) Usage is pretty straightforward. For example, I wanted to get the data from Figure 1 of Yabushita (1975).

A PNG image was created by cropping a screenshot of the PDF. Run Engauge and import the figure (Ctrl+I by default). The image comes up with the curves highlighted. First, make sure the "segment fill" tool is selected and then click on the segments of curve for which you'd like to extract raw data. Now, select the "axis tool" and define three axis points. I recommend the origin and the last well-defined point on each axis. Bear in mind that this is how Engauge calibrates the data so describe your axes prudently. For example, if an axis is logarithmic, I use the logarithm rather than the value. In Yabushita's figure, I claim that point "1000" on the x-axis is 3 rather than 1000.

With all the data selected, now hit "File > Export" in the menus and choose a destination. I selected CSV output, which looks like this

x,Curve1
-0.995727,0.870997
-0.950925,0.912782
-0.906123,0.954566
-0.866913,1.00322
-0.827708,1.04497
-0.788499,1.09363
-0.74929,1.14229
-0.710076,1.19786
...

Engauge isn't the only choice. Besides GraphClick, there's also NASA ADS's Dexter tool. There's an independent web implementation here but I found it clunky. Engauge is simple and it's a pity it seems to be deprecated. But it's good enough for me, for now.

What's your favourite tool for digitizing rasterized plots?