Josh Haas's Web Log

Marrying Boto to Tornado: Greenlets bring them together

with 14 comments

This one is for the techies.

I’m doing web app development for my startup KeywordSmart using Tornado, the pure-Python web server released by the Facebook folks.

Tornado is one of those super-trendy event-driven / non-blocking things that are becoming all the rage (like Node.js and all). The concept is, instead of the traditional thread-per-HTTP-connection approach, you have a single thread that asynchronously interacts with each request, allowing potentially thousands of simultaneous connections to a single server. Think of it as a Las Vegas blackjack dealer, whirling from one player to the next, touching each card only for an instant. (Or, you know, you could visualize it as a, like, tornado).

Actually, I don’t really use Tornado for its non-blocky awesomeness, I use it for the get out of deployment-hell free card. But, if I’m gonna add it to my tech stack, I do want my money’s worth — I’ll take the event loop too, thank you very much!

The problem is, asynchronous is all fine and good as long as you’re inside your own tech stack, but as soon as you start looking for 3rd party tools, the brutal realization sets in that most of the world still runs sync. You can mix synchronous with asynchronous by spinning up a thread pool alongside your main event loop, and delegating blocking operations to the pool to keep your main loop running briskly, but if you call a blocking operation in the majority of your methods, at that point you’ve given up a big chunk of the asynchronous performance advantage.

I use Amazon Web Service’s S3 and SimpleDB for the majority of my data storage right now. You access them over HTTP, so theoretically you could use Tornado’s built-in asynchronous HTTP client to make the requests, thus keeping your code async-kosher. However, life is way, way, way to short to parse SOAP, and Amazon’s documentation, while thorough, is not fun. Luckily, the good folks behind the boto project have done the painstaking work for us: boto provides a comprehensive Python library for accessing all facets of AWS in a nice, object-oriented way. But — you guessed it — boto is strictly synchronous. It’s bound to the HTTP facilities that come with the Python standard library, and there’s therefore no way to make non-blocking requests with it.

So what to do? It looks like a couple people have considered porting Boto to use Tornado’s HTTP facilities, and / or writing a new AWS interface from scratch, but there’s nothing even most-of-the-way complete that I was able to find. And I just have no desire or time to do a project like that myself.

It would be really cool if there were a way to just use Boto, as is, and somehow have it use Tornado’s HTTP classes instead of Python’s. Actually, this is what I did! It took a little hacking around, and the magic of greenlets, but I now have a working proof-of-concept. Let’s walk through how I got there….

(Update 1/21/12: Simon from MoPub packaged the greenlet-Tornado interaction piece of this as an easy-to-use decorator: https://github.com/mopub/greenlet-tornado)

It’s relatively easy to switch out the class that Boto calls to make HTTP connections. Boto’s insert-name-of-service-Connection objects call self.get_http_connection() to fetch the standard Python HTTPConnection, so all you need to do is subclass the Boto classes and replace get_http_connection with a method that returns a mock HTTPConnection, and you’re in.

class AsyncConnectionMixin(object):
    def get_http_connection(self, host, is_secure):
        ...return my connection object...

#just like a normal S3Connection, but you've inserted your trojan horse!
class AsyncS3Connection(AsyncConnectionMixin, boto.s3.connection.S3Connection):
    pass

#this is how you use it:
conn = AsyncS3Connection('my key', 'my secret')

(For working code that shows how to do this, check out this project which I discuss further down.)

So far, all good. But now we’re at the hard part. Boto expects that when it calls the getresponse() method on the HTTPConnection object, that method will block until the HTTP request is complete and the method can return the results. Which is precisely what you don’t want to happen.

Inside Boto, there’s code that looks like this (faked for dramatic effect):

def fetch_me_some_s3_data_please(some_params):
    my_raw_xml = HTTPConnection.getresponse(build_request(some_params))
    the_data = parse_me_some_soap(my_raw_xml)
    return the data

What we really need to do is freeze this function mid-execution: we want it to call getresponse(), feed the request to our asynchronous tornado library, and then we want to hibernate the function until we get the data back again, at which point we want this function to pick up where it left off.

Can we do that? No, and yes. No based on Python’s built-in capabilities. But yes, with the installation of an easy C extension. Before we go there, though, let’s think for a minute about why we can’t do it natively. There are a couple Python language features that would seem to be potential candidates for pulling it off:

  • Generators: this is the canonical way in Python to freeze a function mid-execution and resume it later. A generator function, instead of producing a return value, produces an object that can both yield and accept input. The only problem is, you can’t turn a function into a generator after it has been written: it has to be a generator from the start. And generators can only yield control directly up the call stack: the function that calls a generator has to be aware that that’s what it is dealing with. We control the code at the top of the call stack, and we control the code at the bottom, but boto sits in the middle, and that’s the code we need to change to use generators successfully. (In one desperate moment, I considered writing a utility that would traverse the AST and automatically transform target code from function calls to trampolining. And then I returned to sanity).
     
  • Threads: why not just launch the boto call on a separate thread, then freeze the thread? Unfortunately, Python threads are high overhead, and I wasn’t able to figure out a way of “freezing” them that saved their state while releasing their resources (if there’s a way of doing it, please chime in in the comments, because that would be awesome). I’m anticipating potentially thousands of open outbound connections simultaneously, and in my unscientific experiments, the Python interpreter started erroring around the creation of thread #700.
     
  • Stack frame inspection: I was intrigued by the possibilities of the inspect module, specifically the fact that it lets you change the currently executing line of code. However, it turns out that that’s all it lets you do — you can’t jump around on the call stack to an arbitrary location, which is what I want. (I can imagine writing a utility that inspects the current call stack, saves the values of all the local variables, then systematically recreates it later by re-running the code and advancing the line pointer directly to each function call until you’re back where you started — but this idea is right next to the auto-trampolining idea in my “totally insane” bucket)
     
  • Exceptions: this is the other vanilla Python mechanism for subverting the call stack. By throwing exceptions, we can jump from our code in the HTTPConnection all the way at the bottom of the call stack up to our application code at the top, skipping over boto (assuming it doesn’t have blanket try-catch statements). This would allow us to pause boto’s execution while we launch our asynchronous request. But unfortunately, there’s no mechanism for jumping back down the call stack to the place where you initially threw the exception. Well, there is one way — the hard way. Yes, this guy’s code does exactly what you think it does: it calls the original boto method, throws exceptions to break out of the http request, and then re-calls and keeps re-calling the method until every HTTP request necessary for its execution has completed. He has a working proof of concept, and for that, I salute him, but on my personal “is it brave or stupid?” scale, I have to lean towards stupid, at least for production code: there’s just too many ways that irregularities in the boto codebase could break this technique, from over-aggressive try-catch statements to side-effects that cause weirdness when you call the same method twice.

So with all hope exhausted, I finally stumbled across the greenlet library. What are greenlets, you ask? They’re pieces of 100% carbon-neutral zero emission byte chunks that… oh wait, no, they’re actually a Python implementation of true coroutines. Unlike generators, greenlets can yield control to any arbitrary location in your code; whereas generators still operate in the paradigm of a single execution stack per thread, the greenlet extension introduces multiple parallel stacks. In other words, they are exactly what we need to solve this problem!

Greenlets work as follows: there’s a “master” greenlet that consists of the original call stack when you start your code, and you can create “child” greenlets that have their own, parallel call stacks, by creating a greenlet object, giving it a function to serve as the start of the call stack, and telling it to start. Unlike threads, only one greenlet is running at a time: you explicitly yield control via the switch() method, which you call on the object that represents the greenlet you want to switch to. Switch() works a lot like the built-in yield: you can send values out of the greenlet, and when control is returned, values can be passed back in. But unlike yield, which passes control up the call stack to whatever function invoked the next iteration of the generator, switch() lets you target any greenlet you want, which means, crucially, that you can use it to put a call stack on hold for an arbitrary amount of time. Also, unlike yield, you can use switch() alongside standard return statements, which means that a parent function calling a greenlet-enabled child doesn’t need to know that anything special is going on – -from its vantage point, it called a child, and got a return value as normal.

Let’s see how this works in practice to make boto asynchronous. The first step is to wrap your web-method in a greenlet:

import greenlet
import tornado.web

class MyApp(tornado.web.RequestHandler):
    @tornado.web.asynchronous
    def post(self):
        def business_logic():
            ...do whatever work needs to be done, including making calls to boto...
            self.write(...stuff that gets returned to client...)
            self.finish() #end the asynchronous request
        gr = greenlet.greenlet(business_logic)
        gr.switch()

The call to gr.switch() will start up your greenlet running the function business_logic. Since at this point we haven’t modified boto, business_logic will run through to the end, blocking on all the http calls, and will then write its response to the server. When business_logic finishes, gr becomes “dead”, and control switches to the parent of gr, which is our original call stack, at the point we switched away: i.e., back to the line following gr.switch(). In other words, as written, the code above will run just as if you had called business_logic normally: the code will run, it will block on the boto requests, and then it will return the results to the client.

Our goal, however, is to have the “post” function terminate at the point we fire the boto requests, and then have a callback added to Tornado’s event loop once the relevant data is fetched from the server. So at this point, we inject our custom HTTP handler into boto as described above. Here’s the code for the class we pass to boto in lieu of the HTTPConnection that it is expecting:

import tornado.httpclient
import tornado.ioloop
import greenlet

class AsyncHttpConnection(object):
    def __init__(self):    #boring  
        self.host = None
        self.is_secure = None
        
    def request(self, method, path, data, headers): #boring
        self.method = method
        self.path = path
        self.data = data
        self.headers = headers     

    def getresponse(self):  #this is the method boto calls to get the result
                                    #of the request... this is where we do our thing
        #prepare the request for Tornado's http client
        http_client = tornado.httpclient.AsyncHTTPClient()
        if self.is_secure:
            schema = "https"
        else:
            schema = "http"
        url = "%s://%s%s" % (schema, self.host, self.path)
        request = tornado.httpclient.HTTPRequest(url,self.method, self.headers, 
                                                                      self.data or None)
        
        #Find the greenlet that is currently executing -- 
        #this should be the one we created in MyApp.post() above
        gr = greenlet.getcurrent()
        #Create the callback function to be fired when Tornado gets the 
        #results of the request back
        def callback(tornado_response):
            #see https://github.com/almost/asyncboto for the AsyncHttpResponse class: 
            #it's just a dummy class we used to coerce the response into
            # something that looks like the response that Boto expects
            response = AsyncHttpResponse(tornado_response.code, "???", 
                              tornado_response.body, tornado_response.headers)
            #resume our current greenlet, passing in the response
            gr.switch(response)
        #fire off the http request, with the callback we just created
        http_client.fetch(request, callback)
        #now, yield control back to the master greenlet, and wait for data to be sent to us
        response = gr.parent.switch()
        #hand the data back to boto
        return response

This takes a little staring at to really wrap your head around the control flow. When we hit gr.parent.switch(), this sends us back to the master thread, which is, at least the first time we enter getresponse(), in MyApp.post(). MyApp.post() returns, causing Tornado to move on and handle the next thing on its event loop. Then later, Tornado receives the results of the http request, and fires the callback. The callback calls gr.switch(response), which re-activates getresponse() where we left off: “response = gr.parent.switch()”. The switch() passes the response through, it gets assigned to the variable, and then it gets returned to boto, which commences processing of it. Boto then returns the result back up the call stack to business_logic(), which writes the response to the client and calls self.finish(), ending the request. Follow that? It gets mildly more complicated when there’s multiple http calls inside business_logic()… each subsequent time we call gr.parent.switch(), the code that gets resumed is actually the code in the callback() function which is now the bottom layer of the master greenlet’s stack. But confusing as it is behind the scenes, from the perspective of you writing your application code in the business_logic function, it just works: you code in a synchronous style, but you automagically get the performance characteristics of asynchronous. Yay!

The code above is actually a simplified version of what I use: I like to wrap my business logic in timing code to make sure that the main event loop runs nippily, and exception-handling code to log bugs and send appropriate feedback to the clients. I omitted that for clarity, since it further obscures the control-flow. The thing to remember is that you need to wrap said code around every call to gr.switch(): both the original call in MyApp.post() that starts the execution of business_logic(), and then the subsequent calls in callback() that resume the business logic. The results of that timing won’t be the total time from request arrival to request fulfillment — if you want that number, put the timer at the start and end of business_logic — it will be the time spent by the main Tornado thread executing your business logic before switching to the next request, which is the relevant metric for determining whether or not one of your functions is slowing down the event loop and thus the number of simultaneous connections you can take on before you need another server.

So to sum up, greenlets allow us to freeze and then resume execution of arbitrary code without incurring the prohibitive overheads of multithreading. This can be used to convert synchronous libraries to asynchronous — we used boto above, but we should be able to use this technique on any library that makes a call to an underlying resource that we can switch out for an asynchronous driver.

Written by jphaas

June 19th, 2011 at 11:37 pm

Posted in Uncategorized