PubSub

Posted on by Chris Warburton

My PubSub library is starting to show promise. The biggie is that it now has a (probably rather dodgy, but working) asynchronous callback system to handle replies.

A synchronous system would look something like the following:

def get_thing():
    thing = http://www.example.com/movie.avi
    return thing

mything = get_thing()

This is the regula way of doing things, but it suffers a big problem in situations where the "thing" takes a while to get. Prime examples are on the Internet, where this would make the entire program freeze until "get_thing" downloads a "thing" which is then given the name "mything". Another area where synchronous systems aren't well suited is graphical user interfaces (GUIs). In a GUI any button can be pressed or other action taken, making the following code pretty bad:

while running:
    button1_clicked = button1.check()
    button2_clicked = button2.check()
    ........
    buttonN_clicked = buttonN.check()
    time.sleep(0.1)

This checks, ten times every second, every button to see whether they have been clicked. This is wasteful, inefficient, ugly, hard to maintain, etc.

Asynchronous systems, on the other hand, are well suited to these situations. Instead of directly running code, an asynchronous program DEFINES what WILL be run in a given situation, then periodically checks for events using a 'main loop' (the further down the stack this loop is the better. In embedded SystemOnChip devices this is actually hard wired into the electronics as interrupts). If the main loop finds that an event has occurred then it checks what type of event it is and runs whatever has been defined to run on that event. For example, the Internet code written above could instead be done with this:

def assign_thing(name, thing):
    name = thing

def get_thing():
    self.thing = http://www.example.com/movie.avi
    signal("finished download")

attach_to_signal("finished download", assign_thing("mything", self.thing)
get_thing()

while running:
    check_signals()

It seems rather over the top, and indeed it is for such a simple task, but it scales much more easily than synchronous code, and is pretty efficient (the synchronous example had to wait, doing nothing, until the download finished. This asynchronous example can be doing anything else it likes during the download, safe in the knowledge that the correct code will be run as soon as the download finishes.

Here's an asynchronous version of the button checker from above:

def click(to_click):
    to_click = True

button1.attach("clicked", click(button1_clicked))
button2.attach("clicked", click(button2_clicked))
...
buttonN.attach("clicked", click(buttonN_clicked))

while running:
    check_mouse()

In this version the only thing which is checked again and again is the mouse, rather than each and every button. The buttons only enter the picture when something gets clicked, at which point to source is determined and the appropriate value is set.

Asynchronous programs are generally more difficult to write than synchronous ones, since events have to be set up, often namespaces and arguments must be carefully thought out, a system for passing notifications arouns the program is needed, etc.

So, my library uses XMPP, which involves sending messages over the Internet. That means asynchronous programming is the best bet, so I'm not wasting time waiting for messages to arrive. In fact, XMPP itself is an asynchronous notification system (that's the entire point!), you only get messages when someone sends one to you, you're not constantly trying to get them in the hope that eventually there might be something to get (like, for instance, a news reader or an email client does).

xmpppy, the libary I'm using for sorting out the low-level XMPP stuff, is also asynchronous. It can be given a message handler, an iq (query) handler and a disconnect handler, each of which is run when their repective event occurs.

On top of the asynchronous XMPP and xmpppy I have built my own asynchronous layer, designed for handling any incoming replies to sent queries. This allows users of the library to request things (like, "Can I have a list of all people subscribed to a node please?") and then get on with other things instead of sitting idle waiting for a reply.

On top of this custom asynchronous layer I've added ANOTHER asynchronous layer which allows the users of the library to run arbitrary parts of their application after replies have been handled. This is separate since I want to make sure that using the library doesn't restrict the technologies that can be used (so, for example, it can be integrated into the main loops for GTK, QT and others).

These external asynchronous calls are merely notifications at the moment, without payloads (so they say "The list of subscribers has arrived" rather than "Here is the list of subscribers"). The payloads are manifest as updates to an XML tree stored in the PubSubClient instance. This means that when a "The list of subscribers has arrived" function is run, the application knows that it can find the subscribers it asked for at a certain level of the tree (for example server_name/top-level_node/next-level_node/...../requested_node/subscriptions ), although payload support MAY be added, depending if I can make it generic enough.

So, what does this mean, I hear you ask? Well, I have written the first GUI application to make use of my library (using GTK, which is an asynchronous graphical toolkit :P ). It simply draws a GTK window, puts a vertical container inside, requests the nodes from a server, then for every node it gets in reply it adds a label to the vertical container with the node's name. This was pretty straightforward, and works in a regular pygtk way (no particularly warped programming needed to fit around the library), which is a good sign.

The typical flow of the library is something like this:

Application runs a library method, giving a function to run when the reply arrives

The library writes an appropriate XML stanza to do what the application has asked, defines a function which puts information from the reply into the main XML tree and passes both along with the application-supplied function to the "send" function

"send" stores the application-supplied function in a dictionary of callbacks, its key being the stanza's unique ID, and stores the method-defined reply handler in a similat way in a dictionary of handlers (since a function can't be placed directly into a dictionary they are first placed into singleton lists, these lists are then inserted into the dictionaries). It then tells xmpppy to send a text version of the XML off to its destination.

The other end does whatever is required to meet the spec and sends a reply back.

xmpppy receives the reply and runs the handler function.

This function extracts the stanza ID and uses it to get the handler and callback functions from their dictionaries

The handler is run, being passed the stanza and the callback function. It writes the contained data to rhe main XML tree then runs the callback function.

The callback function does whatever the application programmer told it to, probably updating the application.

The important point is that all of the way through that chain, the application has been free to work normally and not freeze.

Anyway, I'll continue programming later, since it's currently 5:30AM :P