Tuesday, 29 November 2011

Google weighs in on scholarly citations

Your Impact Factor. (PhD comics)

Much of the academic blogosphere was abuzz with the announcement that Google Scholar Citations is now open to everyone. So far, it just looks like a page where you can identify papers that you authored. Based on what Google knows, it then automatically tracks citation networks. Tracks them slowly, that is. I know there's a citation to one of my papers, so I'm not sure if Google only indexes citations from papers that people have identified themselves on. Maybe they just haven't fleshed out the network yet.

Is this going to be a game changer of some sort? There's a lot to that question, so let's pick it apart a bit. I don't know about other fields, but in astronomy and astrophysics, this is definitely not a new feature. We have the very powerful NASA Astronomy Data Service. Many researchers use it as an automatic index of their work on their personal webpages. It tracks citations quickly and cross-links to publicly available versions of articles that appear on arXiv.org. That's how I know I have that citation, which makes it all the more surprising that Google doesn't.

One advantage Google might have is that it indexes other things, too. When I signed up, I noticed that Google had picked up a publicly available draft version of one of my papers. Presumably the LaTeX leftovers in the PDF file told it that I was an author. It may well pick up appropriately tagged presentations and conference proceedings that haven't appeared elsewhere.

But over and above these practical details, what is there about the game that can (or even should) be changed? In this age of overwhelming data, there's a growing interest in bibliometrics: the science of science and scholarly publication itself. Maybe it's possible to cut through the dense web of citations to find who's really being productive or which neglected papers made big contributions. I'm interested in questions like these and I previously poked at problems with academic publication.

In physics, it turns out someone already tried ordering journal articles through a PageRank-like algorithm. The interesting part of the details is that this means a citation from a highly-cited paper is worth more than another. There are a few interesting outliers from the strong correlation between citation count and rank, but none of this sidesteps the problem that citations are a slow measure of meaningful work. The problem isn't working out if a 1960 paper was more relevant than its citation count suggests; it's whether a 2010 paper is going to have 100 citations in 3 years time.

So maybe once Google's built a dense citation network, it will start providing meaningful information about how science is done and how that can be improved. For now, my plan for improving my citation counts or h-index or i10-index or whatever-metric is simple: do good science.