Tuesday, 20 December 2011

Elements of scientific style

Scientists must write. In fact, between papers, posters and proposals, they must write a lot. So it's somewhat surprising that many science students are never given formal teaching in how to write well. They are expected to learn through a grand process of trial and error, with each submission throughout a degree corrected, graded and returned. The language in scientific publications is usually correct but I find that it is often unclear and cumbersome. What it lacks, I believe, is style.

All writers should be motivated to write well. What is better written is more widely read and that is what all scientists want. An important addition to this thought is that many readers of English-language journals are not first language speakers. They are less likely to understand your use of future perfect tense. All the more reason to avoid complicated constructions.

The time has at last come for me to start preparing my PhD thesis. To cultivate some inspiration, I've been trawling through a few materials about how to write better. This post is a summary of my findings. It is advice to myself that you may also find useful.

I think everyone should read two items. First, Section III of Strunk & White's Elements of Style describes straightforward ways to improve composition and provides contrasting examples of good and bad composition. Second, The Economist Style Guide, which was available online, is almost entirely dedicated to concise, jargon-free word choice. It is also populated with dry humour in its examples.

Technical points can be drawn from the style guides of relevant scientific organisations. For example, the American Institute of Physics and International Astronomical Union have style manuals that should be read by physicists and astronomers. Also, Donald Knuth (of TeX fame) released notes on mathematical writing based on a course at Stanford.

Composition Starts with a C...

... and so does everything about it. In many guides, the basics of style are Cs. The American Institute of Physics Style Manual is most explicit, declaring that writing should be clear, concise, and complete. But how does one achieve these things?

Clear writing is achieved through short, active sentences, at least for the important points. Readers remember short, sharp statements. Avoid long, wandering sentences, where a large number of clauses can lead to confusion, and especially avoid a sequence of loose sentences. Try to use simple tenses. The present tense usually suffices. If you find yourself writing in the future perfect or past continuous, consider for a moment whether it is necessary.

Concise writing is borne from frugal use of words. For example, "The fact that..." can almost always be reduced to "That..." Do not declare that a result is "very interesting" or say things "in the opinion of the present authors". Let the reader decide if you're not just stating the obvious. Aside from using fewer words, you can also use shorter, simpler words. The Economist Style Guide is full of examples. The added danger of complicated words is that they can be misused and misunderstood. As The Economist writes of underprivileged, sometimes they don't even make sense.
Since a privilege is a special favour or advantage, it is by definition not something to which everyone is entitled. So underprivileged, by implying the right to privileges for all, is not just ugly jargon but also nonsense.
To ensure that writing is complete is difficult in technical work. There is a fine line between writing everything an article needs to be logically complete and writing more than is necessary. To evaluate how much to write, you must first understand the audience for whom you are writing. Writing a conference proceeding for a small meeting of researchers on a specific subfield requires less background material than a widely-circulated journal or a fellowship proposal that will be read by scientists in other fields.

There are some simple things that are fairly obvious. For a start, all symbols and abbreviations must be defined and, if necessary, explained appropriately too. You may define all sorts of derived quantities but it will help the reader if you explain why they are interesting.

Another point about completeness regards the use of citations. Personally, I think some writers feel that a citation absolves them of any need to explain the content of the cited work. Many articles rely heavily on work presented fully in other articles. A citation should be accompanied by some explanation of the logic that leads to whatever conclusion or result is being employed, especially if that work is itself very long. A sentence or two may save your reader the trouble of looking up another paper and therefore make him more likely to continue reading. It is offputting to open a six-page article to find that the authors rely heavily on detailed and subtle calculations or measurements in a poorly-written 30-page epic.

Another C that can be added is to make sure your writing is correct. You shouldn't be leaving hanging particles or hidden verbs but there are subtler errors in English usage. Many are explained in The Economist Style Guide. A dry-witted example is the difference between "among" and "between":
To fall between two stools, however painful, is grammatically acceptable; to fall between the cracks is to challenge the laws of physics.

Structured writing

It's no secret that most scientists peruse papers briefly before committing to reading them in detail. It pays to write for these perusers by structuring content appropriately. Besides, structured writing helps lead the reader along structured thought.

The smallest logical unit of composition is the paragraph. Start each paragraph with a sentence that captures its content so that those who are skimming through will quickly get an idea of how the article presents its content. The paragraph principle leads me to plan a paper from the top down, all the way to the paragraph level.

When it comes to arranging paragraphs, I found an interesting point in a talk given at MIT. Points made in paragraphs form a logical sequence but there are usually multiple dependencies between these thoughts. More than one idea depends on more than one preceding idea. So how should one link the paragraphs? The answer is to choose an arrangement that gives the fewest "crossovers", as in the image below.



The logical relationship between the paragraphs is shown at the left. If we write all the starting points in sequence (the "layered" approach), we end up crossing back and forth between logical sequences. By instead using a "linear" approach, we avoid this problem, even though some logical jumps are necessary to bring everything together.

Special problems for technical writers

Some advice on style must be cast aside in technical writing. For example, it is difficult to write without jargon, though it can be kept to a minimum. There is no better way of describing a photometric redshift than to call it a photometric redshift. So use those words but explain them if your target audience warrants it.

There are potential problems with technical words that carry undesirable connotations in other contexts. For example, in a recent paper, I describe a choice of equations that allows "arbitrary boundary conditions". A co-author pointed out that "arbitrary" is usually synonymous with "pointless" or "not really worth pursuing". But in the mathematical context, it is precisely the correct word to use. It sounds bad to the lay reader but it is technically correct so I stuck with it. If you can avoid such words, do so, but not if the price is the precise technical meaning.

Mathematical and theoretical writing is often written in a particular textbook-like style which makes for dense and heavy reading. You can help the reader by slowing down the concept-barrage. Restate definitions in simple ways. Explain why a definition or equation is interesting. Separate technical segments with paragraphs written in a more conversational style. Even in less technical or mathematical sections, make sure that you have always motivated the detailed calculations you have written.

The final C

The advice in the previous paragraph may seem to contradict the principle of concise writing and this brings us to a final C: compromise. George Orwell's sixth elementary rule is quoted in The Economist:
Break any of these rules sooner than say anything outright barbarous.
Ultimately, following these rules steadfastly may lead to a clunky sentence that simply does not read well. You can do well within the guidelines of good writing but there are occasions where a rule is better off broken. Perhaps a sentence sounds better if it begins with a conjunction or there is no escaping the precise meaning of clunky phrase. In these cases, so be it.

The final piece of advice is point 15 from Knuth's opening section:
There is a definite rhythm in sentences. Read what you have written, and change the wording if it does not flow smoothly.
If the proof of the pudding is in the eating, then the proof of the writing is in the reading. Follow this rule above all.

Tuesday, 13 December 2011

Farewell, old friend

I hate waste. Most of my calculations are done on the flip-sides of single-printed pages. When arXiv articles are uploaded in referee format, I download the source and recompile in a journal style to save paper. The water that collects in our dehumidifier is poured into our toilet's cistern to save about one litre per day. I throw away little in the usually vain hope that it can be re-used. Thus, it was with great pain that I finally conceded that there is no longer a place in my life for an old friend. It's been a fixture in my rooms since I arrived in the UK, and before that, a fixture in my brother's since shortly after he arrived in the UK.

Maybe you'll be surprised to hear that said fixture was the desktop computer he bought, back in 2004. Though not top of the line even then, this particular Dell Inspiron 4600, helped by regular re-installation of Windows XP, a small RAM upgrade and occasional dust-removal, has been chugging along happily since then. It's in fantastic condition. After adding a GeForce 6600GT, it even ran StarCraft 2. I'd have kept it if I hadn't acquired a new laptop through my department to replace it.

Having cared so much for this computer, I wanted it to go to a good home, preferably a charity. It will soon feature at the hotdesk of The Humanitarian Centre, Cambridge. But in order to keep it useful, I figured I should include the hard drive rather than remove it for security purposes. And this meant securely formatting the drive, frequently called nuking.

The tool for this is Darik's Boot and Nuke. I had some trouble with the then latest version but the older 1.0.7 worked fine and I don't imagine much has changed the system is quite idiot-proof. Burn the image as a bootable DVD, boot, and follow the instructions, which includes a frightening sounding selection of algorithms. Some are associated with the likes of the Royal Canadian Mounted Police or the US Department of Defense. Good enough for the DoD? Good enough for me.

So, having nuked the drive, my computer has been passed on. It's inspired a moment's reflection about how technology sometimes makes us lazy and wasteful. I now have a new computer, which I hope will also last at least 4 years. Presuming I don't hurl it out a window in a Windows-inspired moment of blind rage. But in the age of phone upgrades every other year and a new iProduct iteration, I'm not sure how many people still build things to last.

Tuesday, 29 November 2011

Google weighs in on scholarly citations

Your Impact Factor. (PhD comics)

Much of the academic blogosphere was abuzz with the announcement that Google Scholar Citations is now open to everyone. So far, it just looks like a page where you can identify papers that you authored. Based on what Google knows, it then automatically tracks citation networks. Tracks them slowly, that is. I know there's a citation to one of my papers, so I'm not sure if Google only indexes citations from papers that people have identified themselves on. Maybe they just haven't fleshed out the network yet.

Is this going to be a game changer of some sort? There's a lot to that question, so let's pick it apart a bit. I don't know about other fields, but in astronomy and astrophysics, this is definitely not a new feature. We have the very powerful NASA Astronomy Data Service. Many researchers use it as an automatic index of their work on their personal webpages. It tracks citations quickly and cross-links to publicly available versions of articles that appear on arXiv.org. That's how I know I have that citation, which makes it all the more surprising that Google doesn't.

One advantage Google might have is that it indexes other things, too. When I signed up, I noticed that Google had picked up a publicly available draft version of one of my papers. Presumably the LaTeX leftovers in the PDF file told it that I was an author. It may well pick up appropriately tagged presentations and conference proceedings that haven't appeared elsewhere.

But over and above these practical details, what is there about the game that can (or even should) be changed? In this age of overwhelming data, there's a growing interest in bibliometrics: the science of science and scholarly publication itself. Maybe it's possible to cut through the dense web of citations to find who's really being productive or which neglected papers made big contributions. I'm interested in questions like these and I previously poked at problems with academic publication.

In physics, it turns out someone already tried ordering journal articles through a PageRank-like algorithm. The interesting part of the details is that this means a citation from a highly-cited paper is worth more than another. There are a few interesting outliers from the strong correlation between citation count and rank, but none of this sidesteps the problem that citations are a slow measure of meaningful work. The problem isn't working out if a 1960 paper was more relevant than its citation count suggests; it's whether a 2010 paper is going to have 100 citations in 3 years time.

So maybe once Google's built a dense citation network, it will start providing meaningful information about how science is done and how that can be improved. For now, my plan for improving my citation counts or h-index or i10-index or whatever-metric is simple: do good science.

Monday, 31 October 2011

iDol

Though it's been three weeks since Steve Jobs shuffled off this mortal coil, discussion of his life and lifestyle rages on, partly thanks to the release of his biography, subsequent reviews and curious Taiwanese video renditions of its content. I'll declare upfront that I am neither a Apple fan nor user. I don't own any iProducts and never have. Though I find something like the iPhone quite simple to use, it takes about 60 to 90 seconds for me to go totally bonkers when using a Mac. Maybe my mind isn't letting go of Windows or UNIX paradigms, but OS X drives me mad. I find it genuinely bizarre how many astronomers use Macs.

The outpourings of sorrow, the likes of which were probably last seen when Pope John Paul II perished, weren't surprising, given what iCustomers are like. After all, brand loyalty has been compared to religion. Even so, I was disappointed at just how far the dramatic eulogizing penetrated. Even Nature weighed in with praise for Jobs. Fortunately, other sources were more nuanced. I stumbled upon a column in the Cambridge University newspaper that is my choice for the best analysis of the whole story.

The deification of Mr Jobs is, in homage to his own mantra, a simple, elegant, unconscious misdirection of our love of stuff. We cannot admit to ourselves the level to which our obsession with stuff has grown, for it would mean admitting the worship of icons for their own sake. Instead, we have placed Steve Jobs on a pedestal. It was not the technology we love – perish the thought. It was Steve. We love Steve for he was our prophet. But we worship his God at our peril.

Whenever an issue polarizes opinion, I normally find the truth languishing somewhere in the middle, where no-ones seems to dare tread. Steve Jobs did make some contribution, at least to the rich world. In short, he brought advanced technology to the masses, often in ways that other manufacturers hadn't managed. The iPhone charged in with a large-form touchscreen where other manufacturers hadn't managed to succeed and I'm left to concede how much my HTC Wildfire looks like Jobs' brainchild. You can arguably shoot a feature film from an iPhone. Or you could just send it into space. So he gets full credit for turning Apple into a consistent innovator.

But how much praise is warranted? Should we hail Steve for supplying our newfangled gadgetry? While iProducts may be pioneering a "post-PC" era, producing them has been highly profitable. Steve Jobs may have been driven, but I don't think he was driven to change the world as much as to make money. He certainly did the latter; I'm unconvinced about the former. The fact that we're ultimately venerating a master salesman is a worrying sign of the West has come to value.

Jobs' death occurred around the same time as one Dennis Ritchie. Ritchie has a few claims to fame but he is credited with the creation of the C programming language. That is something that has defined the digital world. More or less everything is written in C, a derivative, or a language for which the compilers are written in C. Including most of the software that Apple uses to run its hegemony. Ritchie's death hasn't gone unnoticed but it may have appeared in the mainstream only because the geeks of the world understood his importance (and/or shunned Jobs'). The Economist ran an obituary but probably only because of a letter inciting them to do so.

Steve Jobs' passing is certainly cause for a moment's pause. Not just as for  any untimely death, or for his early contributions to personal computing, but also for a thought on what it is that really matters to each of us.

Tuesday, 13 September 2011

Installing Windows 7 OEM without disks (1)

This is basically a rant about a struggle I had with Microsoft product keys on a new Dell laptop. To keep it useful, the question at hand is this: without disks from the manufacturer, can you make a clean installation of Windows 7 using an OEM product key? The answer is yes. First, get your OEM product key, either from a list online or with a tool like Belarc Advisor. Download the relevant DVD image for the version of W7 you want to install, burn it to disc (I used ImgBurn), and start the installation process by booting from the DVD. When prompted for a product key, uncheck the automatic validation option and don't supply a key. Once you've booted into W7, activate by phone. It doesn't matter whether you install 32-bit or 64-bit via this method, but the flavour (Starter, Home, Professional or Ultimate) must match the key.

My full story goes like this. I finally unpackaged my new Dell Latitude E5520 that will replace my dear but ageing Inspiron 4600 desktop. I started the machine up to find that it had a lot of bloatware, as usual, but also a 32-bit version of Windows 7 Professional (W7Pro), even though the hardware is fully capable of handling 64-bit. So I set out to install the 64-bit version of W7Pro, without any disks from Dell.

I want to take a moment to highlight the level of bloatware I'm talking about. While wildly uninstalling most of the junk Dell had installed on the laptop, I noticed a "Modem Diagnostic Tool". My curiosity was piqued so I decided to pause in my trimming of the software list and see what this particular program would provide as a diagnosis...


No surprises there. When was the last time Dell even made a laptop with a modem? Have they not updated the driver/application list since then? A sure sign of the competence to come

On with W7. Why would there be any problem just installing W7Pro from scratch? Product keys. Those 25-character strings that testify one's right to a legal copy of Microsoft's product, the software that only runs thereupon, and updates for all of the above. There are some things that are well-known here. The product key doesn't transfer between different variants of Windows 7. If you have a key for Professional, it won't validate an instance of Ultimate. They do transfer between different architectures, despite what the helpful technician in the Dell call centre told me. That is, a key for 32-bit W7 will activate the 64-bit version too. I don't think having a disk with Service Pack 1 makes a difference. That wouldn't make sense, right? But not much in the last few days did. Certainly not when the Microsoft techie told me that there are no service packs for Windows 7.

What if your computer came with W7 already on it? In that case, you probably have an OEM version. Now things become very complicated. I toiled with a variety of solutions to my problem before I stumbled upon one that worked. I have burned three slightly different images of the W7Pro Retail DVD (and, accidentally, one Ultimate DVD too) in a bid to get an activated copy of W7Pro. You can Google the issue to your heart's content but the leading theory I've come across is that you can install a Retail version of W7 and activate it using an OEM product key, but you have to activate by phone. Don't ask.

So, the first thing you will need is an image of the W7 retail disk. Do not fear. These are entirely legal. They were created because some users suggested Microsoft could join in on this strange notion of digital distribution and they conceded. There is a full listing of disks with and without SP1 at My Digital Life. Because the validity of keys with SP1 is unclear, I avoided those, but it doesn't matter: I couldn't install off those disks and validate successfully. In the end I used the same method as the one that did work, but with these ISOs, the phone activation failed. You've been warned.

I then found a lengthy post in their forums that describes cracking W7 by pretending to be an OEM machine. This is possible because the OEM keys are the same for all customers. That is, everyone who buys a Dell Latitude with W7Pro gets the same product key and these keys can be easily found online. The crackers then load a piece of software into the BIOS that make the computer pretend to be manufactured by Dell or HP or whoever, so the OEM version installs and validates. If you're wondering, yes, that practice is entirely illegal. However, the disk images they have must be working for OEM product keys, which is what I have a legal entitlement to. The forum only has one disk image, so if you use it, don't forget to tinker with ei.cfg to change it. That's how I ended up with one Ultimate disc.

What have I learned from this? I don't know how hard it might be to install an illegal copy of W7, but damn, I found it hard to install a legal one. Not with any help from Dell of Microsoft either. In fact, I gotunhelp: each techie told me at least one thing that is simply not true.

Update 1: I filled in a customer experience survey over the weekend and Dell phoned yesterday to apologize for my being misinformed. Half-decent customer relations, but that isn't to say the guy was actually helpful. He just told me how sorry Dell was and corrected his co-worker's error.

Update 2: A few weeks after this post, I updated Windows and it started saying that my copy wasn't genuine. A few days later, I updated again and the problem disappeared. Warp forward to a few weeks ago and it started doing the same thing. But this time it hasn't stopped and it's been nearly a month now. No manner of other trickery has worked so I'm going to try installing W7Pro as I should have in the first place...

Update 3: W7 was gradually shutting down on my laptop so I tried to re-install. This method no longer worked even with the precisely correct (i.e. OEM channel, Professional) DVD but I discovered a sticker under the battery that had a different product key on it. This "other" key validated fine over the internet.

Monday, 29 August 2011

Einstein and Escher

It's old news now, but earlier this year, NASA announced the result of it's Gravity Probe B. The mission was aimed at testing Einstein's theory of gravity and, 7 years and \$750 million later, it turns out he's still right, at least to the level that we can measure. More fascinating, though, are the four niobium-coated quartz spheres, like those pictured below, that were involved in the measurement. These are the most spherical objects that mankind has ever manufactured. They are spherical to a thickness of 40 atoms. No surprise that the were fabricated in a German laboratory...

Examples of the orbs in Gravity Probe B. The sphere on the left has not been coated yet.

I immediately saw an uncanny resemblance with the left pair of spheres in Three Spheres II by M.C. Escher.
Three Spheres II by M.C. Escher
I think it's safe to say that many scientists and mathematicians appreciate Escher's work. I'm not sure if the similarity between the images was intentional, but I appreciated it. I thought you might, too.

Thursday, 21 July 2011

Favicon follow-up

I previously wrote about creating a favicon for your Blogger/Blogspot blog and inserting some HTML into the header to display it. I tinkered with my design a few days ago and noticed that the "Design" tab now has a favicon box at the top left of the blog layout. You can upload your favicon there and let Google worry about the rest. I've made this change and commented out the HTML I inserted. It all seems to be working, so, if you can be bothered to relinquish more control to our Google overlords, you can do the same.

Thursday, 7 July 2011

The Overflowing Stack of Exchanges

The Internet is not renowned for bringing out the best in people. Instead of large online communities raising us to the highest common factor, they tend to degenerate into the lowest common factor. Penny Arcade put it crudely: normal person + anonymity + audience = total f---wad. So no surprises that some famous attempts at useful Q&A sites haven't been resounding successes, although other examples of community-driven resources have.

However, against the tide of widespread inanity, it appears that two software developers managed to create a useful system. StackOverflow, a Q&A site for programmers, was launched in August 2008. It quickly spawned similar sites in related sectors. The first was ServerFault, for sysadmins, followed by SuperUser, for general "power users". The software platform on which these sites was based was consolidated into StackExchange. Notably, even Ubuntu launched it's own Q&A site that uses the system, AskUbuntu, alongside Ubuntu 10.10. The StackExchange domain now hosts a variety of "exchanges", with many, many more in various stages of proposal or development.

What's special about this system? From the StackExchange's own description:
After someone asks a question, members of the community propose answers. Others vote on those answers. Very quickly, the answers with the most votes rise to the top. You don’t have to read through a lot of discussion to find the best answer. 

Like topics on Wikipedia, questions and answers on Stack Exchange can be edited. If someone writes the beginning of a great answer, someone else can embellish it and make it even better.
The site is free and open to everyone. You don’t have to register, but if you do, you collect reputation points when people vote up your answers, which will appear next to your name.
So, why does it seem to work so well? By their own reckoning, StackExchanges  combine elements of forums, wikis, blogs and social bookmarking. I think the real key is that positive contribution is rewarded by increased privileges. This screens out a lot of noise and keeps the good contributors coming back. Users who consistently don't provide anything useful will find their answer ranked lower (or even voted down), making it difficult for them to voice themselves elsewhere without moderation. Users who provide solutions that the community endorses find themselves reigning increasingly freely.

Of course, if the community degenerates sufficiently, then they could just grant privilege to negative contributors, but I suspect the level of cohesion in the group must be very large for that to happen. Outsiders would still broadly quell this. Moreover, StackExchanges are used for particular niches of knowledge. It isn't a stage for ordinary interaction: answering questions usefully is the initial attraction. An exchange would never host a subjective discussion on "Which is better, red or blue?" although it would probably succeed  in answering "Why do some people prefer red over blue, and vice versa?" Trolls are better fed in the first case.

My dream, at first, was that this kind of behaviour could be introduced into something like Wikipedia. But on second thought, I'm not really sure how it would work (voting up individual edits seems crazy) and the simpler parts of the system are already there (basic restrictions on articles mean only established contributors can edit). Still, the idea is great: I found what I regard as the best explanations of the Fortran vs C debate on StackOverflow, and I've begun contributing to the new AstronomyExchange myself.

Have you had any experience on a StackExchange? How do you think these ideas could be spread around to clean up the Internet? Or will they go the way of Yahoo! Q&A?

Thursday, 30 June 2011

Like Google+'s circles? Then use Facebook lists

Back when I still wasted time following TechCrunch, I found a gem amidst the torrent of pointless speculation about Quora. The Real Life Social Network is a presentation by Paul Adams, a Google engineer who crossed to Facebook. The presentation is well worth viewing. For a start, it's just a good presentation. It makes good use of slides, animation and notes. More importantly, it carries a simple point: distinct groups of people that we know are thrown together when we interact with them online. Worlds collide. Broadly, we collect friends and acquaintances around certain interests. Our close friends might represent significant overlap that has grown over time, but not all of your 130-on-average Facebook friends are really more than acquaintances because of some common activity or event. With you at the centre, these otherwise separate groups can see other sides of your life when, without the Internet, they never could have.

Fast forward to the present, and the interwebs are buzzing about Google's latest (re-)entry into the world of social networking. Part of what's being leveraged is the solution to the problem above, in the form of the Circles feature. This isn't at all new. Most Facebook-alternatives that have been springing up boast something like this. These are the cliques of InCliq; the aspescts of Diaspora. Clearly, Facebook is drawing a lot of fire from one direction. Wired magazine's coverage of the new platform puts it like this:
Google believes that with Circles it has solved the tough sharing problem that Facebook has inexplicably failed to crack. "With Facebook I have 500 friends -- my mum's my friend, my boss is my friend," says Shimrit Ben-Yair, the product manager in charge of the social graph. "So when I share on Facebook, I overshare. On Twitter, I undershare, because it's public. If Google hits that spot in the middle, we can revolutionise social interaction."

There are many reasons to want separation of these groups. The first is to prevent overshares like the example in the presentation. While not all of us will facilitate minors seeing photos of gay strip clubs, it's different only in magnitude from keeping your boss separate to your officemates when sharing YouTube videos of sneezing pandas. Also, we're likely to share more if we know it's only going to people who actually care and, if everyone does it, we all have less to trawl through in our feeds. Finally, we can distinguish the overall privacy between various groups: where you are seen but which groups, not just what you share with them.

You may have been nodding your head to all of this, thinking about how you share different things with your parents' friends than your teammates in whatever sport you play. So, having squarely criticized Facebook for throwing together people who would never meet with ourselves as the hinge, we make a U-turn: Facebook hasn't "failed to crack" this problem at all. This functionality exists in the form of lists.

Facebook's friend lists fit all the bills here. Whenever you share anything, be it a status message, a photo album or any old link, you customize with which lists (and individuals) that item will be shared. You can set different privacy settings for each list. You can even decide which lists will see you as available for chat. You can separate your online friends by the same lines that separate them in reality.

Engadget have written a detailed explanation of how to start using lists. They suggest their own lists for privacy purposes, but I also have certain groups of friends (sailors, divers, rowers, collegemates) for when I share things. It boils down to clicking Account, Edit Friends, and then Make a List. Once a few names are entered into a given list, Facebook will start suggesting friends to add, with pretty good accuracy, which speeds creation up a lot. With that done, the rest is straightforward. When sharing anything, you can click on the little lock icon an make a custom rule which applies only to that item.

I don't know why Facebook don't encourage people to use this feature. I don't know why they don't make it dead simple and plaster it in neon lights all over the front page. They could easily derail a common line of attack by their competitors, streamline everyone's feed, and eliminate some of their users' privacy concerns. The functionality is there, but they're failing to promote it.

I'll leave comment on Google+ for another time. We'll have to see if it one hits the mark, or misses it like Wave and Buzz. But for now, start using lists. Go and do it right now, and get your friends to. Not because we think Facebook is awesome, but because it's what we've got, it's what everyone uses, and it seems to work better that everything that gets thrown at it. Besides, the only things that gets thrown at it is... the same thing. Maybe that's all we want?

One one hand, you'll never be able to convince your parents to switch. On the other hand, you'll never be able to convince your parents to switch! (xkcd.com)

Thursday, 23 June 2011

Splitting PDF pages in two

Been a while. To fill the void, here's something I finally worked out how to do. "Do what?" you ask. I'll try to explain but maybe my inability is why I took so long to Google an answer.

Sometimes, you might find yourself with a document where two pages of real document are on each page of the document file. Like if you scanned two A5 sheets onto one A4 sheet. The document in question is typically a scan of something like, say, the 1989 IAU Style Manual. My quest, with such documents, is to separate the pages; to take the double-page layout and split it in two; to separate those A5 sheets from their doubled-up A4 version; or to go from the first screenshot to the second, subtly different, one...




To refine the problem slightly, I'm firstly presuming you're working with a PDF. Most documents should be circulated in this format but you can probably print to a PDF anyway. Secondly, I'm working on the Linux command line. This should work wherever the standard tools I use are present and will probably work on Macs with the Linux-based versions of OS X. If you know how to do this in Windows, let me know in the comments. Finally, just before you tell me that this is easy, I'm not paying for any software.

The real work here is done by a tool called Unpaper. It's capable of much more and I invite you to check out the documentation to see what other tricks are possible. It can be downloaded as a binary (navigate to /bin/ in the tarball) so it doesn't require permissions to use. Given that it runs here, I guess the binary must be 32-bit x86 compiled. Other architectures might require compilation from source.

Unpaper works with Portable Bitmap Files, or PBMs, so the first thing we need is to extract such images from the PDF using pdfimages.

pdfimages in.pdf in

This tool is part of the Xpdf package, which is itself bundled in just about every major Linux distro, as far as I know. It produces a set of files with names like in-012.pbm, where 012 is the page number in the PDF file. Unpaper can now get cracking. Following the example given there,

unpaper --layout double --output-pages 2 in-%03d.pbm out-%03d.pbm

The %03d is the wildcard for the numbers in the filenames. This will, unsurprisingly, produce twice as many output files. We now want to combine these PBM files back into a PDF. There might be a shortcut but I accomplish this by converting the PBMs to TIFFs, combining the TIFFs, and converting that. So, the first step is

ls out-*.pbm | xargs -I {} ksh -c 'pnmtotiff {} > {}.tiff'

where I've used xargs to pass the PBMs to pnmtotiff. I warn you that pnmtotiff might be deprecated, in which case pamtotiff should do it. Back on track, no-one really uses TIFFs, so as long as your pages are the only TIFFs around, you can combine them with

tiffcp *.tiff out.tiff

and finally convert to PDF with

tiff2pdf -z -o out.pdf out.tiff

where the -z flag indicates zip compression. That should be it!

One note I will make is that this seemed to use quite a lot of space. I say this as someone who has no problem with 10GB of user data space, so it probably won't worry anyone else. Regardless, it's still worth pointing out that working with a 5MB PDF file generated PBMs and TIFFs adding up to several hundred MB in each format, so the better part of a GB when everything was around. I warned you.

Tuesday, 12 April 2011

Getting full articles in RSS

Most sites practice the annoying habit of truncating their RSS feeds. Their motive is simple: an article snippet forces you to visit the site to read the whole thing, so they get more traffic, and hopefully ad-clicks. Still, it is annoying, especially on my phone, where the sites aren't guaranteed to display well and loading them isn't fast either. Fortunately, there are sites and scripts that will take a given feed and send the complete articles to your favourite reader.

The first (and my preferred) option is Full Text RSS Feed Builder. No secrets here: it really does build full text RSS feeds. Head over to the site, enter the desired feed into the box, and hit "Submit". You'll be given a new feed with articles restored to their full text. When I first tried the site, it didn't seem to work with many feeds, but it now seems to work perfectly with Lifehacker, Wired Science, and Scientific American. It isn't perfect with BBC Science & Environment and Technology feeds, but it only consistently skips the audio and video entries, which I tend not to read.

After experimenting with FTRFB (for want of a better abbreviation) above, I wondered if there were Yahoo! Pipes that would accomplish the same thing. It appears that there are. Just search "full text" and try some of the entries. I haven't tested these as fully. It appears that some pipes work and some don't. Good luck.

Tuesday, 5 April 2011

Finding co-citations on NASA ADS

The NASA Astronomy Data Service is a marvellous tool. Not only is it the baseline for astronomical literature searches, it also locates an article in the cite-web. It's easy to obtain the list of articles that cite or are cited by the reference in question. This all means that the database can be used for all sorts of other interesting experiments, for which I have a few ideas. Here's the first.

Lately, I've been working closely from two papers. Though they are authored by different groups (that had collaborated before and have since), their content is very similar and they were published back-to-back in the Astrophysical Journal. What became interesting to me was this: what articles cite both papers?

To answer this question, I needed the lists of article identifiers ("bibcodes") for the citation lists of both papers. For each article, I navigated to the database page linked above, clicked "Refereed citations to the article" (or just "Citations to the article", if you prefer all of them), clicked "Select All Records" at the bottom, and requested the records in a custom format of %R. This returned a plain text list of the identifiers of all the refereed citations. I saved these lists to two plain text files. I concatenated them, sorted them, and found the identifiers that occurred twice in the combined list. These are the articles that appear twice and thus cite both articles. To get this list of identifiers back into a list of hits, I copied the list into the "Bibliographic Code Query" at this ADS query page.

In short then, all I did was get the lists of citations to each of my two articles, find the common hits, and locate them on ADS. The results revealed a few further papers that were worth going through. In addition, I think this kind of search reveals, from metadata alone, how much these papers have in common. The articles each have 176 and 167 of their own refereed citations. Of those, 112 papers cite both: 63% and 67%, respectively!

This process can probably be automated quite easily. Using embedded queries or the Perl module, I imagine an able scripter could quite easily write a short program that will find the list of common citations for two or more papers. What I don't know is how useful this would generally be. A more interesting application might be to rank papers based on how much they are co-cited with a selected reference, but that could be a big calculation because it would require a second step in the citation web. Still, if such a calculation is made only at regular intervals, it might still be useful.

This is one a few a few ADS-based ideas I have. Let me know if you think (or have found) it's useful or interesting, and watch out for more ideas further down the line.

Monday, 21 March 2011

Installing RPM software without admin privileges

I'll kick off by saying this post is probably only going to be useful to other grad students in my department. I hope that no-one else is cursed by being without privilege or up-to-date software on a centrally administered RHEL 5 system. It works well enough but it's really old and not really supported by developers. (I'm looking at you, Chrome...)

So, the setup is this. The computer in my office runs RHEL. I can't install software because I don't have admin or sudo rights. Fortunately, there's a pretty neat way around all this: create your own, additional, RPM database and install software in your space. I've drawn heavily from this guide with a few tweaks.

Create Your Database

First thing is to make the folder that will store your database. I decided to build my own quasi-filesystem under ~/local, so I made this folder ~/local/var/lib/rpm. It then needs to be initialized as a database. Finally, we copy the main database so that new packages know what's already installed. These three things are accomplished by
mkdir -p \$HOME/local/var/lib/rpm
rpmdb \$HOME/local/var/lib/rpm
cp /var/lib/rpm/*  /local/var/lib/rpm/


where /var/lib/rpm/ is a usual location for the RPM database.

Installing RPM Packages

To install a new package, in theory all we have to enter is
rpm -i --db-path \$HOME/local/var/lib/rpm package.rpm

In reality, this will probably give a load of dependency errors and a failure to write to certain locations.

To solve the dependency problem, we add the --no-deps flag. This doesn't really solve the problem as much as it blatantly ignores it. Not a great strategy, broadly speaking, but it works. Just keep it in mind so that you can manually install the dependencies yourself.

To solve the location problem, we need to tell the package to use different locations. To get a list of locations that need relocating, type

rpm -qip package.rpm

and have a look at the "Relocations" information. For each necessary relocation, include a flag like

--relocate /old=/new

Details...

That more or less covers it but here are a few extra things I needed to do. saw many of these when trying to install Google Earth, so I'll refer to it as an example.

Firstly, the installers may try to execute programs that are in your local database, rather than the overall system. To find them successfully, you should add the additional path for executables to your default shell script. I added /home/wball/local/usr/bin/ to my \$PATH variable. In the Google Earth setup, this was necessary when it tried to execute xdg-icon-resource, which I'd installed through this whole method.

Secondly, any given installation might need some fine tuning. After installing Google Earth, I discovered that the executable (/local/usr/bin/google-earth) is actually just a symbolic link to the real executable in the depths of /opt. I had to recreate the link to point to the right place.

Thirdly, I found myself using a whole whack of --relocate flags, so I just created an alias for rpmlocal which included everything I kept typing. In my tcsh setup, it looks like this

alias rpmlocal 'rpm -ivh --dbpath ~/local/lib/rpm/ \
--relocate /opt=/home/wball/local/opt \
--relocate /etc=/home/wball/local/etc \
--relocate /usr=/home/wball/local/usr \
--relocate /usr/bin/=/home/wball/local/usr/bin \
--relocate /etc/default=/home/wball/local/etc/default \
--nodeps \!*'

and I add to it each time I find a new relocation is necessary. The hardcoded user directory is probably a bad idea. I suggest using \$HOME or ~ instead. If you're wondering the -v and -h flags give verbose output and a # progress bar.

The Unsolvables

After all this, there's still only so much that works and a few handy things that don't. Google Earth and irssi worked fine. LibreOffice just wanted way too much space: it claimed to need 7GB! Skype and Google Chrome didn't work after installing but this might just be a compatibility error with an ageing infrastructure. Maybe RHEL 6 will be on these machines before I finish my PhD.

Thursday, 17 March 2011

Scintilla, Nature's Curious News Aggregator

I can't quite remember what wandering on the interwebs brought me to it. Whatever the path that led me to it, I recently discovered Nature's Scintilla service. I haven't entirely worked out what its purpose is but if you click on the "Top Stories", it aggregates articles on a common subject from a number of sources (based on how many mentions you request).

It does solidify my opinion that many news sources are horribly uncreative. They even have the same article titles at times. That said, this might be a selection effect based on how Scintilla's algorithm works and which page it places on top. Make of this tool what you will.

Wednesday, 23 February 2011

Making a favicon for your Blogger/Blogspot blog

As an escape from the PhDemon, I decided I would look at how to make a favicon for this very blog. You may be asking "What's a favicon?" in which case I suggest glancing at the relevant Wikipedia article, it's basically that little icon you see on your browser tab. If you aren't browsing with tabs, I'm presuming your browser predates "Thunder Thighs". More gory, imaged details are over at the Google Chrome Browser blog, from which most of this is drawn.

The first step, obviously, is to actually make a favicon. I quickly put together the one you see here using Inkscape. I'm no design guru (although I find it a fascinating subject), so I just went for a shiny black background and white text, a bit like the blog title. Maybe I can increase the number of magpies that read this. I exported that from Inkscape as a PNG and used GIMP to shrink and convert to an icon file. I rendered the original at 160x160 but shrunk it down to 24x24 because there's possibly an upper limit to the size (36x36). If you're lazy, you can just use an online tool to faviconize an image.

This brings us to the subject of image format. Basically, I can't seem to establish what formats are usable. PNG, GIF or ICO (Microsoft Icon) should all work. Apparently x-icon should too but support isn't as widespread. Your choice of format will correspond to an appropriate specification in the HTML that we're about to add to the template.

Secondly, you need the favicon to be hosted somewhere online. This part is left as an exercise for the reader. Most of the blog posts I found on the matter recommend defunct services. A current example would be ImageShack but, like I said, any service will do.

Now, to add the favicon to the Blogger template. From your blog's homepage, click "Design" and then "Edit HTML". The following code should be entered somewhere in the header block (i.e. between <head> and </head> but not inside any other command)

<link href="http://www.example.com/username/favicon.ico" rel="shortcut icon" />

and one of the following (depending on file format).

<link href="http://www.example.com/username/favicon.ico" rel="icon" type="image/vnd.microsoft.icon />
<link href="http://www.example.com/username/favicon.png" rel="icon" type="image/png />
<link href="http://www.example.com/username/favicon.gif" rel="icon" type="image/gif />

where the URL should be replaced with the location of your favicon. That should do it!

A couple of details. The first is that I haven't been able to test extensively for what formats and sizes do and don't work or on what browsers. It seems to work on Firefox 3.6.13 and Chrome 9.0 using a PNG for the second format. If you're testing with Firefox, you might find that the favicon isn't refreshed when you change it. One solution is to load the favicon by entering its URL in the address bar. This appears to flush Firefox's cache and worked for me.

Monday, 21 February 2011

More on bad astro code

Along with my own rant, a large portion of the blogosphere picked up on two articles in Nature regarding the state of scientific programming. Just before Christmas, this article turned up on arXiv. I mentioned it in a comment on AstroCompute and it was promoted into a post there.

The basic content agrees on what everyone seems to appreciate. The article gives a broader range of code (including databases and visualization tools) but all the ideas are the same. In fact, their list of common "features" in astronomy codes (Section 2.1) is almost a checklist for how bad said code can be. Compare your favourite code against this:
  • the code is written in Fortran (bonus if it's earlier than Fortran 77);
  • the code includes frequent GOTO statements;
  • compilation instructions are through a terminal script rather than a Makefile (or other standard builder);
  • in release form, code is non-portable;
  • no variable naming convention is adopted;
  • filenames and paths are hardcoded; or
  • standard algorithms (like matrix inversion, list sorting) are re-invented.
STARS doesn't fare too well. I'll give it the benefit of the doubt on portability and it does use a Makefile. That leaves a score of 5/7. We haven't even started on documentation yet...

How does your code compare? What else could be added to the scorecard?

Friday, 11 February 2011

WubEee, or dual-booting an Eee 1015P the easy way

While I was in South Africa over the Christmas break, my Eee 1000 was stolen from my parents' home. I guess it served as a reminder of why I and hundreds of thousands of other South Africans have moved abroad. More to the point, it meant I needed to replace my dearest netbook. Not that I had any other netbook, but you know what I mean.

As I began the quest to find a suitable successor, I realised that much in the market had changed. Firstly, the Eee PC is now only available with Windows 7 Starter on it. This annoys me but at least, in theory, I could reformat and install Ubuntu, as I had done on my last Eee. I ordered a 1015P from Amazon, which arrived, to my astonishment, less than 40 hours later. Now, the plot thickened as I opened the box: there were no disks. Thus, reformatting the drive comes with the risk of never recovering factory settings (if you knock out the recovery partition). Hence, I cursed Microsoft for strong arming the netbook market and, after the blood-haze of my craze lifted, I decided to try Wubi. This is an account of what I did including some of the standard Ubuntu-on-Eee fixes that are required.

Installing Wubi

After removing a large amount of the crapware that Asus decided to bundle with my Eee and installing a few programs I'd like available in Windows (e.g. Chrome, Dropbox, Skype), I navigated to the Ubuntu download page. Windows was detected, so I was automatically taken to the Wubi download. Installing using Wubi is easy enough: just choose the drive where it's to be installed and the amount of space you'd like dedicated to Ubuntu's filesystem. The default (which I used) seems to be 17GB, with the minimum being 3GB and the maximum 31GB. As you'll see a bit further down, this only needs to be for the programs you install in Ubuntu and the corresponding user settings. You can access media that's also available to Windows.

One small alarm came when I noticed Wubi was downloading the Ubuntu distribution for the amd64 architecture. It turns out this covers all 64-bit capable systems, irrelevant of whether you're on an AMD or Intel system, unlike Debian (from which Ubuntu was forked) has a separate ia64 architecture. It's beyond me, so this is just to say don't worry if your 64-bit machine starts installing the amd64 version.

Linking Windows folders

From the start, my intention was to have Ubuntu and Windows access the same documents and media. Since Windows can't see into the Ubuntu filesystem, this has to be done from Ubuntu. Fortunately, it isn't difficult. To be precise, the behaviour I was going for is like this. In my Windows user folder, there are subfolders like "My Pictures" and "My Documents". In Ubuntu, the home folder (i.e. ~) also contains folders like "Pictures" and "Documents". I want these to point to the same place.

The first step I took was renaming the Windows subfolders. The "My..." bit annoys me, so I removed it. Windows doesn't have a problem with this. All the shortcuts in the Start Menu and other places are unaffected.

Next, in Ubuntu, I removed the folders that I planned to link with rmdir. Then I created symbolic links to the Windows folders. The drive that Wubi installs Ubuntu to, in my case C:, is in the Ubuntu filesystem as /host/. Windows stores the user data under C:\Users\(username)\. In Ubuntu, I created the appropriate links with

ln -s /host/Warrick/Documents/ Documents

Obviously, Warrick is my Windows username. You can do the same for Music, Pictures, and Videos. I've linked Downloads too but that's up to you.

Dropbox is a bit more complicated. I don't know how Dropbox would respond to being told to sync a symbolic link, so instead I told Dropbox to sync the Windows Dropbox folder by specifying /host/Users/Warrick/Dropbox/ as the sync directory when I installed. I then made a symbolic link in my home folder. ~/Dropbox is now effectively synced because it links to the folder that Dropbox actively syncs (in both systems).

Adding Wifi hotkey support

The Eee PC has a set of hardware controls on the keyboard along the lines of pressing Fn with some function key. For example, Fn+F5 dims the screen. Fn+F2 toggles the Wifi. From the first day I installed Ubuntu on my 1000, getting these to work has been variably supported. At first (8.04), support required all sorts of trickery, but it improved to the point that everything actually worked out-of-the-box from Ubuntu 10.04 on.

On my new Eee, everything seemed to work, except the Wifi toggle. Bit of a bummer. Disabling the Wifi when not in use is critical for the battery. I first tried installing the Jupiter Applet to no avail. This is a small program that allows Eee users to access hardware controls that Ubuntu doesn't naturally recognize. After enquiring on the ever helpful Eee PC forums, it turns out the solution is to find the line

GRUB_CMDLINE_LINUX_DEFAULT="quiet splash"

in /etc/default/grub and append it to read

GRUB_CMDLINE_LINUX_DEFAULT="quiet splash acpi_osi=Linux"

It all now works beautifully. I'm not sure if this requires Jupiter to be installed but I've left it for the time being.

Those darned window controls

Ubuntu 10.04, it would be nice to distinguish Ubuntu as a brand by moving the window controls to the left corner, but leave them in a different order to OS X. So it's not quite Windows or Mac. It is quite annoying but there's an easy fix. Entering

gconftool-2 --set /apps/metacity/general/button_layout --type string menu:minimize,maximize,close

at a terminal will do it. This is one of just a few things I do each time I update (normally by wiping the Ubuntu partition). I expect to make up a script next time to save me the trouble.

Making Ubuntu the default OS in the Windows Boot Loader

Finally, I changed the Windows Boot Loader to automatically load Ubuntu. This is done by going to My Computer/Properties, selecting Advanced System Settings on clicking settings under System Startup and Recovery. There should be a drop down list corresponding to the OSes in the boot loader (and an option for the length of the timeout).

Remarks

Using Wubi is a pretty lazy way of dual-booting but it does seem suitable for introducing newbies to Ubuntu. With my media linked I'm not worried about disk space at all and if I want to do a clean install of Ubuntu I can backup my user settings and uninstall/delete Wubi.

If I want to do a clean install of Windows 7 Starter, however, Asus haven't left too many options. I may write a further post if I find out how to restore my system after reformatting the whole drive.

Tuesday, 25 January 2011

Custom Search in Firefox and Chrome

I was toying with Chrome on my Eee PC, partly for experimentation and partly because of the slightly-more-efficient use of the 10" screen. Alas, the experiment is on hold as my Eee was liberated from my parents' home in Cape Town while I was there over Christmas. I guess I should've remembered to install Prey...

Back on track, I found Chrome's default search inferior to Firefox's, so I went looking for a fix. In the process I've learned a bit more about the custom search tools available in both Chrome and Firefox. Now I'm recording it here, for your perusal (and probably mine after I've forgotten such tricks).

For the record, these tips were tested on Firefox 3.6.13 and Chrome 8. Major revisions are coming and might change some details.

Restoring Firefox-like behaviour in the Chrome address bar

The "problem" I had was that typing a string into Chrome's address bar would produce a Google search. Not really a problem, but I prefer Firefox's use of Google's "I'm Feeling Lucky" button. Basically, if Google's first hit is really spot on, it'll take you straight to that page. For example, if you type "met office cambridge" into Firefox's bar, you'll get the Met Office forecast for Cambridge. In Chrome, that will be the first hit, but you still need that extra click.

The fix is to change the default search behaviour to use the "I'm Feeling Lucky" feature. I found the explanation on this blog but I'll restate it here for you. First, go to your Chrome options and choose "Basics". Choose "Manage" next to default search and add a search with this URL

{google:baseURL}search?btnI=I%27m+Feeling+Lucky&q=%s

Make that the default and you're set.

Other custom searches in Chrome

The custom searches are quite straightforward: Chrome just passes the string you enter in the address bar to the URL template you provide. The keywords are really useful, though. If you have (or add) a search for Wikipedia like

http://en.wikipedia.org/w/index.php?title=Special%3ASearch&search=%s

and assign it the keyword "wiki", then typing "wiki electric field" in the address bar will use Wikipedia's search to find articles related to "electric field". Note that this is different to using "site:en.wikipedia.org" as a Google search argument. It's as if you're typing "electric field" straight into the Wikipedia search.

Other things you might want could be Youtube or Google Maps.

http://www.youtube.com/results?search_query=%s
http://maps.google.com/maps?f=q&source=s_q&hl=en&q=%s

In fact, it's not hard to add anything. This blog explains. The long and short is to go to your desired custom search, search something, copy the consequent URL and replace the argument you used with "%s". Assign keywords and you'll save yourself some clicks.

Custom searches in Firefox

Little did I know the keyword trick is available in Firefox too. You can see a list of search engines available from the drop down menu next to the search bar. Click "Manage Search Engines..." and you'll be able to add keywords to the searches that are in the list. These work the same way as for Chrome as described above. Arguments entered in the search bar will still use the selected engine.

To add new searches in Firefox, go to the site, click the search bar drop menu and select "Add <site>". You can add a keyword later. You can also hit the "Get more search engines..." link for even more searches, including some that you can't add with the first method. For example, I can't add Google Maps in Firefox without downloading the Add-On.

Now, if I only I could find a way of getting this keyword/search-engine behaviour on Android's Google search widget...

Friday, 21 January 2011

Notes and tasks with Emacs' Org-Mode

In Afrikaans, I would be described as loskop. That probably translates best as "absent-minded" but, as ever, Afrikaans' simplicity cuts to the point: I really do just lose my head. Since I tried Tasks in Gmail, I've been looking for a superior way of keeping together everything from lists of books I want to read, things I need to pay and notes for upcoming blog posts. I've discovered I needed to look no further than the nearly omnipotent text editor: Emacs. I probably should never have doubted that it had this ability. Forgive me, Emacs-god. (Is there a M-x finish-phd command yet?)

Basic functionality is pretty straightforward. I haven't tried anything particularly advanced yet. Start Emacs and enter the command M-x org-mode. Obviously, if you get errors at this point, then you probably need to install the org-mode extension. The feature I yearned for, given its annoying absence from Tasks, is being able to nest collections of tasks or notes into collapsible trees. To make a collapsible heading, precede it with *. If you want a second-order heading, precede it with **. And so on. You can enter text under these headings and it, too, will collapse. To (un)collapse things, just hit Tab while the cursor is on that line.

For example, you could type the following.

* Notes
** Google CEOs
-- Eric Schmidt (til April 4)
-- Larry Page (after April 4)
** Perfect numbers
-- 6
-- 28
* Shopping list
- bread
- milk

Hitting Tab while on the "Google CEOs" line will reduce it to

* Notes
** Google CEOs...
** Perfect numbers
-- 6
-- 28
* Shopping list
- bread
- milk


Hitting Tab over "Notes" gives

* Notes...
* Shopping list
- bread
- milk

Tab again will re-open them to their previous state. It's all stored as plain text, so you can modify the file without Emacs, but you won't get the collapsibility and syntax-highlighting.

Org-mode can do much, much more, including setting deadlines and timestamping, never mind a bottomless bucketworth of C-c type shortcuts. You'll find more complete guides with a Google search but if you're lazy, here's The Compact Org-mode Guide and an article at Linux Journal.