Saturday, 29 June 2013

Goodbye Google Reader; Hello Feedly

Exit Google Reader

It's no news that Google Reader is being shut down. Like many Reader-users (Readers? Rusers? Hmmm...), I've left it close to the last possible minute to find a replacement. But I've done so and here's a summary of my choice.

In defence of those who've put off the move, part of the problem was, initially, the lack of really Reader-like RSS readers. So since the announcement, many have added features and functionality to better mimic our departing favourite. Which is great, because we like Google Reader, and all we really want is something just like it.

Enter Feedly

I've settled on Feedly. To be honest, I haven't done much experimentation with other services but Feedly has been updating to closely fit my Reader fill. Firstly, I wanted access to my RSS feeds on a webpage. (Feedly only fulfilled this quite recently.) This rules out readers that install local clients or extensions. Secondly, I wanted an Android app. This ruled out iOS-exlcusives like Reeder. And finally, I just wanted something that looked like reader, so no Pulse. That is, one article per line. I don't want images all over my feeds. Not until I open an article anyway.

Hence Feedly. It can be rendered almost into an almost-perfect replica of Google Reader. In fact, the only UI difference I've found so far is that pressing n moves to the next article in the list and marks it read, which Google Reader didn't. But that's okay. I can get used to that.

The transition wasn't perfectly seamless, either, but it may have been my own doing. I was using Feedly on my phone to read from my Google Reader account. When I transferred the feeds on my PC, I suspect I opened a Feedly account and thus the two devices weren't, for a few days, actually reading the same feeds. I think I've fixed this but watch out for your own growing pains. Even if it is a bit rough, you don't have much choice, since Google Reader is cutting us off anyway.

RSS is still useful

Finally, while we're here, I'd like to point out to anyone asking (is anyone asking?) that I still find RSS very useful. While New Scientist and Scientific American's current stories can tumble down a Twitter or Google+ feed without my having to worry that I've missed some critical information, there are many things for which I like to be sure I've seen all the content releases. For my own enjoyment, this includes, say, webcomics. I like to read every xkcd, whether I'm coming into the office as usual or computer-less on the Baltic Sea for a week. More notably, however, are journal articles. Want to make sure you miss nothing of your favourite scientific journals (or the arXiv preprints)? Subscribe to the RSS feed! That way, the article only goes away when you move past it's title in your feed.

Have you moved on to a new RSS reader? Feedly or some other? Or have you finally abandoned RSS entirely? Let me know.

Tuesday, 4 June 2013

Parallel iPython

For a few months now, I've been using IPython to do a heavy but embarrassingly parallel calculation. I finally decided to work out how to use IPython's parallel computing mechanisms to do the job several times faster. Here's a summary of my routine to make the parallel calculation. Most of this can be found in the IPython documentation but I'll mention a few extra points I noted.

Starting the IPython cluster

To do parallel calculations, IPython needs to run a number of engines, which it calls on from the interface to do the heavy lifting. These are started with

ipcluster start -n 4

where here, for example, 4 engines will be started. My quad-core processor seems to be multithreaded, so the OS actually thinks I have 8 cores. I usually run 6.

This command must be executed in parallel to IPython. You can, for example, run it in a different terminal or send it to the background of the same terminal, either by appending & to the command or by pressing Ctrl-Z and then typing bg. I tend to run it in a separate terminal tab and send it to the background.

When the time comes to stop the engines, you can either bring the ipcluster job to the foreground and abort (Ctrl-C) or type

ipcluster stop

Initializing the clients in IPython

Now, in your instance of IPython, you need to import IPython's parallel client module.

from IPython.parallel import Client

Then, we can assign a client object that will have access to the engines that you started with the ipcluster command.

c = Client()

We aren't quite ready to start calculating. From the documentation,
The two primary models for interacting with engines are:
  • A Direct interface, where engines are addressed explicitly.
  • A LoadBalanced interface, where the Scheduler is trusted with assigning work to appropriate engines.
I use the LoadBalanced interface because it decides on the most efficient way to assign work to the engines. The interface objects provides a new map function, which works like the intrinsic map function but invokes the engines, in parallel. To create the interface, type

lbv = c.load_balanced_view()

I'm not quite sure why, but we also need this command.

lbv.block = True

At this point, you could start calculating, if you have work that doesn't depend on having any data or any of your own functions. For example, try

lbv.map(lambda x:x**10, range(32))

In reality (or, at least, my reality), I need to make calculations that involve my own functions and data and there's a bit more to do to make all that work.

Preparing the clients

I think of the clients as new IPython instances that haven't issued any import commands or defined any variables or anything. So I need to make those imports and define those variables.

There are two ways to import packages. The first, which I use, boils down to telling the engines to issue the import command themselves. For example, to import NumPy,

c[:].execute('import numpy')

Alternatively, you can enter

with c[:].sync_imports():
    import numpy

I'm not aware of either method being preferred.

To define variables, we could use the execute function above. But that might get painful for complicated expressions like list comprehensions. Much better is to assign the variable directly in the dictionary of global variables. For a variable my_var defined in the local IPython instace, enter

c[:]['my_var'] = my_var

Calling your own functions

My work originally used a function with a call signature something like

output = my_fun(var1,var2,var3,list_of_arrays1,list_of_arrays2,list_of_arrays3,constant)

I couldn't figure out how to make this play nice with the map command, so I re-organized the function in two ways. First, I pre-processed my data in such a way that the last constant was no longer necessary. I was lucky that this was very easy. (In fact, I should've done it before because it removed a list comprehension from the innermost loop.) Second, I combined the lists with zip and had the function unpack once called. So I then had a call signature

output = my_package.my_fun(var1,var2,var3,zipped_up_arrays)

Finally, I invoked the parallel calculation with

output = 
lbv.map(lambda x: my_package.my_fun(var1,var2,var3,x),zip(list_of_arrays1,list_of_arrays2,list_of_arrays3)

Et voila! My calculation was done vastly faster.

The only problem...

...is that there seems to be a memory leak somewhere in ipcluster or the engines themselves. The result is that I kill the engines once in a while and re-initialize the client and interface objects before I run out of memory. Apparently this is a known problem that can be circumvented by manually clearing the client and interface objects' cache

view.results.clear()
client.results.clear()
client.metadata.clear()

but I generally haven't found that this helps at all.

Have you used IPython's parallel routines? See something silly I'm doing? Let me know in the comments!