Tuesday, 5 April 2011

Finding co-citations on NASA ADS

The NASA Astronomy Data Service is a marvellous tool. Not only is it the baseline for astronomical literature searches, it also locates an article in the cite-web. It's easy to obtain the list of articles that cite or are cited by the reference in question. This all means that the database can be used for all sorts of other interesting experiments, for which I have a few ideas. Here's the first.

Lately, I've been working closely from two papers. Though they are authored by different groups (that had collaborated before and have since), their content is very similar and they were published back-to-back in the Astrophysical Journal. What became interesting to me was this: what articles cite both papers?

To answer this question, I needed the lists of article identifiers ("bibcodes") for the citation lists of both papers. For each article, I navigated to the database page linked above, clicked "Refereed citations to the article" (or just "Citations to the article", if you prefer all of them), clicked "Select All Records" at the bottom, and requested the records in a custom format of %R. This returned a plain text list of the identifiers of all the refereed citations. I saved these lists to two plain text files. I concatenated them, sorted them, and found the identifiers that occurred twice in the combined list. These are the articles that appear twice and thus cite both articles. To get this list of identifiers back into a list of hits, I copied the list into the "Bibliographic Code Query" at this ADS query page.

In short then, all I did was get the lists of citations to each of my two articles, find the common hits, and locate them on ADS. The results revealed a few further papers that were worth going through. In addition, I think this kind of search reveals, from metadata alone, how much these papers have in common. The articles each have 176 and 167 of their own refereed citations. Of those, 112 papers cite both: 63% and 67%, respectively!

This process can probably be automated quite easily. Using embedded queries or the Perl module, I imagine an able scripter could quite easily write a short program that will find the list of common citations for two or more papers. What I don't know is how useful this would generally be. A more interesting application might be to rank papers based on how much they are co-cited with a selected reference, but that could be a big calculation because it would require a second step in the citation web. Still, if such a calculation is made only at regular intervals, it might still be useful.

This is one a few a few ADS-based ideas I have. Let me know if you think (or have found) it's useful or interesting, and watch out for more ideas further down the line.

