Thursday, 30 June 2011

Like Google+'s circles? Then use Facebook lists

Back when I still wasted time following TechCrunch, I found a gem amidst the torrent of pointless speculation about Quora. The Real Life Social Network is a presentation by Paul Adams, a Google engineer who crossed to Facebook. The presentation is well worth viewing. For a start, it's just a good presentation. It makes good use of slides, animation and notes. More importantly, it carries a simple point: distinct groups of people that we know are thrown together when we interact with them online. Worlds collide. Broadly, we collect friends and acquaintances around certain interests. Our close friends might represent significant overlap that has grown over time, but not all of your 130-on-average Facebook friends are really more than acquaintances because of some common activity or event. With you at the centre, these otherwise separate groups can see other sides of your life when, without the Internet, they never could have.

Fast forward to the present, and the interwebs are buzzing about Google's latest (re-)entry into the world of social networking. Part of what's being leveraged is the solution to the problem above, in the form of the Circles feature. This isn't at all new. Most Facebook-alternatives that have been springing up boast something like this. These are the cliques of InCliq; the aspescts of Diaspora. Clearly, Facebook is drawing a lot of fire from one direction. Wired magazine's coverage of the new platform puts it like this:
Google believes that with Circles it has solved the tough sharing problem that Facebook has inexplicably failed to crack. "With Facebook I have 500 friends -- my mum's my friend, my boss is my friend," says Shimrit Ben-Yair, the product manager in charge of the social graph. "So when I share on Facebook, I overshare. On Twitter, I undershare, because it's public. If Google hits that spot in the middle, we can revolutionise social interaction."

There are many reasons to want separation of these groups. The first is to prevent overshares like the example in the presentation. While not all of us will facilitate minors seeing photos of gay strip clubs, it's different only in magnitude from keeping your boss separate to your officemates when sharing YouTube videos of sneezing pandas. Also, we're likely to share more if we know it's only going to people who actually care and, if everyone does it, we all have less to trawl through in our feeds. Finally, we can distinguish the overall privacy between various groups: where you are seen but which groups, not just what you share with them.

You may have been nodding your head to all of this, thinking about how you share different things with your parents' friends than your teammates in whatever sport you play. So, having squarely criticized Facebook for throwing together people who would never meet with ourselves as the hinge, we make a U-turn: Facebook hasn't "failed to crack" this problem at all. This functionality exists in the form of lists.

Facebook's friend lists fit all the bills here. Whenever you share anything, be it a status message, a photo album or any old link, you customize with which lists (and individuals) that item will be shared. You can set different privacy settings for each list. You can even decide which lists will see you as available for chat. You can separate your online friends by the same lines that separate them in reality.

Engadget have written a detailed explanation of how to start using lists. They suggest their own lists for privacy purposes, but I also have certain groups of friends (sailors, divers, rowers, collegemates) for when I share things. It boils down to clicking Account, Edit Friends, and then Make a List. Once a few names are entered into a given list, Facebook will start suggesting friends to add, with pretty good accuracy, which speeds creation up a lot. With that done, the rest is straightforward. When sharing anything, you can click on the little lock icon an make a custom rule which applies only to that item.

I don't know why Facebook don't encourage people to use this feature. I don't know why they don't make it dead simple and plaster it in neon lights all over the front page. They could easily derail a common line of attack by their competitors, streamline everyone's feed, and eliminate some of their users' privacy concerns. The functionality is there, but they're failing to promote it.

I'll leave comment on Google+ for another time. We'll have to see if it one hits the mark, or misses it like Wave and Buzz. But for now, start using lists. Go and do it right now, and get your friends to. Not because we think Facebook is awesome, but because it's what we've got, it's what everyone uses, and it seems to work better that everything that gets thrown at it. Besides, the only things that gets thrown at it is... the same thing. Maybe that's all we want?

One one hand, you'll never be able to convince your parents to switch. On the other hand, you'll never be able to convince your parents to switch! (xkcd.com)

Thursday, 23 June 2011

Splitting PDF pages in two

Been a while. To fill the void, here's something I finally worked out how to do. "Do what?" you ask. I'll try to explain but maybe my inability is why I took so long to Google an answer.

Sometimes, you might find yourself with a document where two pages of real document are on each page of the document file. Like if you scanned two A5 sheets onto one A4 sheet. The document in question is typically a scan of something like, say, the 1989 IAU Style Manual. My quest, with such documents, is to separate the pages; to take the double-page layout and split it in two; to separate those A5 sheets from their doubled-up A4 version; or to go from the first screenshot to the second, subtly different, one...




To refine the problem slightly, I'm firstly presuming you're working with a PDF. Most documents should be circulated in this format but you can probably print to a PDF anyway. Secondly, I'm working on the Linux command line. This should work wherever the standard tools I use are present and will probably work on Macs with the Linux-based versions of OS X. If you know how to do this in Windows, let me know in the comments. Finally, just before you tell me that this is easy, I'm not paying for any software.

The real work here is done by a tool called Unpaper. It's capable of much more and I invite you to check out the documentation to see what other tricks are possible. It can be downloaded as a binary (navigate to /bin/ in the tarball) so it doesn't require permissions to use. Given that it runs here, I guess the binary must be 32-bit x86 compiled. Other architectures might require compilation from source.

Unpaper works with Portable Bitmap Files, or PBMs, so the first thing we need is to extract such images from the PDF using pdfimages.

pdfimages in.pdf in

This tool is part of the Xpdf package, which is itself bundled in just about every major Linux distro, as far as I know. It produces a set of files with names like in-012.pbm, where 012 is the page number in the PDF file. Unpaper can now get cracking. Following the example given there,

unpaper --layout double --output-pages 2 in-%03d.pbm out-%03d.pbm

The %03d is the wildcard for the numbers in the filenames. This will, unsurprisingly, produce twice as many output files. We now want to combine these PBM files back into a PDF. There might be a shortcut but I accomplish this by converting the PBMs to TIFFs, combining the TIFFs, and converting that. So, the first step is

ls out-*.pbm | xargs -I {} ksh -c 'pnmtotiff {} > {}.tiff'

where I've used xargs to pass the PBMs to pnmtotiff. I warn you that pnmtotiff might be deprecated, in which case pamtotiff should do it. Back on track, no-one really uses TIFFs, so as long as your pages are the only TIFFs around, you can combine them with

tiffcp *.tiff out.tiff

and finally convert to PDF with

tiff2pdf -z -o out.pdf out.tiff

where the -z flag indicates zip compression. That should be it!

One note I will make is that this seemed to use quite a lot of space. I say this as someone who has no problem with 10GB of user data space, so it probably won't worry anyone else. Regardless, it's still worth pointing out that working with a 5MB PDF file generated PBMs and TIFFs adding up to several hundred MB in each format, so the better part of a GB when everything was around. I warned you.