Josh Haas's Web Log

Marrying Boto to Tornado: Greenlets bring them together

with 13 comments

This one is for the techies.

I’m doing web app development for my startup KeywordSmart using Tornado, the pure-Python web server released by the Facebook folks.

Tornado is one of those super-trendy event-driven / non-blocking things that are becoming all the rage (like Node.js and all). The concept is, instead of the traditional thread-per-HTTP-connection approach, you have a single thread that asynchronously interacts with each request, allowing potentially thousands of simultaneous connections to a single server. Think of it as a Las Vegas blackjack dealer, whirling from one player to the next, touching each card only for an instant. (Or, you know, you could visualize it as a, like, tornado).

Actually, I don’t really use Tornado for its non-blocky awesomeness, I use it for the get out of deployment-hell free card. But, if I’m gonna add it to my tech stack, I do want my money’s worth — I’ll take the event loop too, thank you very much!

The problem is, asynchronous is all fine and good as long as you’re inside your own tech stack, but as soon as you start looking for 3rd party tools, the brutal realization sets in that most of the world still runs sync. You can mix synchronous with asynchronous by spinning up a thread pool alongside your main event loop, and delegating blocking operations to the pool to keep your main loop running briskly, but if you call a blocking operation in the majority of your methods, at that point you’ve given up a big chunk of the asynchronous performance advantage.

I use Amazon Web Service’s S3 and SimpleDB for the majority of my data storage right now. You access them over HTTP, so theoretically you could use Tornado’s built-in asynchronous HTTP client to make the requests, thus keeping your code async-kosher. However, life is way, way, way to short to parse SOAP, and Amazon’s documentation, while thorough, is not fun. Luckily, the good folks behind the boto project have done the painstaking work for us: boto provides a comprehensive Python library for accessing all facets of AWS in a nice, object-oriented way. But — you guessed it — boto is strictly synchronous. It’s bound to the HTTP facilities that come with the Python standard library, and there’s therefore no way to make non-blocking requests with it.

So what to do? It looks like a couple people have considered porting Boto to use Tornado’s HTTP facilities, and / or writing a new AWS interface from scratch, but there’s nothing even most-of-the-way complete that I was able to find. And I just have no desire or time to do a project like that myself.

It would be really cool if there were a way to just use Boto, as is, and somehow have it use Tornado’s HTTP classes instead of Python’s. Actually, this is what I did! It took a little hacking around, and the magic of greenlets, but I now have a working proof-of-concept. Let’s walk through how I got there….

(Update 1/21/12: Simon from MoPub packaged the greenlet-Tornado interaction piece of this as an easy-to-use decorator: https://github.com/mopub/greenlet-tornado)

It’s relatively easy to switch out the class that Boto calls to make HTTP connections. Boto’s insert-name-of-service-Connection objects call self.get_http_connection() to fetch the standard Python HTTPConnection, so all you need to do is subclass the Boto classes and replace get_http_connection with a method that returns a mock HTTPConnection, and you’re in.

class AsyncConnectionMixin(object):
    def get_http_connection(self, host, is_secure):
        ...return my connection object...

#just like a normal S3Connection, but you've inserted your trojan horse!
class AsyncS3Connection(AsyncConnectionMixin, boto.s3.connection.S3Connection):
    pass

#this is how you use it:
conn = AsyncS3Connection('my key', 'my secret')

(For working code that shows how to do this, check out this project which I discuss further down.)

So far, all good. But now we’re at the hard part. Boto expects that when it calls the getresponse() method on the HTTPConnection object, that method will block until the HTTP request is complete and the method can return the results. Which is precisely what you don’t want to happen.

Inside Boto, there’s code that looks like this (faked for dramatic effect):

def fetch_me_some_s3_data_please(some_params):
    my_raw_xml = HTTPConnection.getresponse(build_request(some_params))
    the_data = parse_me_some_soap(my_raw_xml)
    return the data

What we really need to do is freeze this function mid-execution: we want it to call getresponse(), feed the request to our asynchronous tornado library, and then we want to hibernate the function until we get the data back again, at which point we want this function to pick up where it left off.

Can we do that? No, and yes. No based on Python’s built-in capabilities. But yes, with the installation of an easy C extension. Before we go there, though, let’s think for a minute about why we can’t do it natively. There are a couple Python language features that would seem to be potential candidates for pulling it off:

  • Generators: this is the canonical way in Python to freeze a function mid-execution and resume it later. A generator function, instead of producing a return value, produces an object that can both yield and accept input. The only problem is, you can’t turn a function into a generator after it has been written: it has to be a generator from the start. And generators can only yield control directly up the call stack: the function that calls a generator has to be aware that that’s what it is dealing with. We control the code at the top of the call stack, and we control the code at the bottom, but boto sits in the middle, and that’s the code we need to change to use generators successfully. (In one desperate moment, I considered writing a utility that would traverse the AST and automatically transform target code from function calls to trampolining. And then I returned to sanity).
     
  • Threads: why not just launch the boto call on a separate thread, then freeze the thread? Unfortunately, Python threads are high overhead, and I wasn’t able to figure out a way of “freezing” them that saved their state while releasing their resources (if there’s a way of doing it, please chime in in the comments, because that would be awesome). I’m anticipating potentially thousands of open outbound connections simultaneously, and in my unscientific experiments, the Python interpreter started erroring around the creation of thread #700.
     
  • Stack frame inspection: I was intrigued by the possibilities of the inspect module, specifically the fact that it lets you change the currently executing line of code. However, it turns out that that’s all it lets you do — you can’t jump around on the call stack to an arbitrary location, which is what I want. (I can imagine writing a utility that inspects the current call stack, saves the values of all the local variables, then systematically recreates it later by re-running the code and advancing the line pointer directly to each function call until you’re back where you started — but this idea is right next to the auto-trampolining idea in my “totally insane” bucket)
     
  • Exceptions: this is the other vanilla Python mechanism for subverting the call stack. By throwing exceptions, we can jump from our code in the HTTPConnection all the way at the bottom of the call stack up to our application code at the top, skipping over boto (assuming it doesn’t have blanket try-catch statements). This would allow us to pause boto’s execution while we launch our asynchronous request. But unfortunately, there’s no mechanism for jumping back down the call stack to the place where you initially threw the exception. Well, there is one way — the hard way. Yes, this guy’s code does exactly what you think it does: it calls the original boto method, throws exceptions to break out of the http request, and then re-calls and keeps re-calling the method until every HTTP request necessary for its execution has completed. He has a working proof of concept, and for that, I salute him, but on my personal “is it brave or stupid?” scale, I have to lean towards stupid, at least for production code: there’s just too many ways that irregularities in the boto codebase could break this technique, from over-aggressive try-catch statements to side-effects that cause weirdness when you call the same method twice.

So with all hope exhausted, I finally stumbled across the greenlet library. What are greenlets, you ask? They’re pieces of 100% carbon-neutral zero emission byte chunks that… oh wait, no, they’re actually a Python implementation of true coroutines. Unlike generators, greenlets can yield control to any arbitrary location in your code; whereas generators still operate in the paradigm of a single execution stack per thread, the greenlet extension introduces multiple parallel stacks. In other words, they are exactly what we need to solve this problem!

Greenlets work as follows: there’s a “master” greenlet that consists of the original call stack when you start your code, and you can create “child” greenlets that have their own, parallel call stacks, by creating a greenlet object, giving it a function to serve as the start of the call stack, and telling it to start. Unlike threads, only one greenlet is running at a time: you explicitly yield control via the switch() method, which you call on the object that represents the greenlet you want to switch to. Switch() works a lot like the built-in yield: you can send values out of the greenlet, and when control is returned, values can be passed back in. But unlike yield, which passes control up the call stack to whatever function invoked the next iteration of the generator, switch() lets you target any greenlet you want, which means, crucially, that you can use it to put a call stack on hold for an arbitrary amount of time. Also, unlike yield, you can use switch() alongside standard return statements, which means that a parent function calling a greenlet-enabled child doesn’t need to know that anything special is going on – -from its vantage point, it called a child, and got a return value as normal.

Let’s see how this works in practice to make boto asynchronous. The first step is to wrap your web-method in a greenlet:

import greenlet
import tornado.web

class MyApp(tornado.web.RequestHandler):
    @tornado.web.asynchronous
    def post(self):
        def business_logic():
            ...do whatever work needs to be done, including making calls to boto...
            self.write(...stuff that gets returned to client...)
            self.finish() #end the asynchronous request
        gr = greenlet.greenlet(business_logic)
        gr.switch()

The call to gr.switch() will start up your greenlet running the function business_logic. Since at this point we haven’t modified boto, business_logic will run through to the end, blocking on all the http calls, and will then write its response to the server. When business_logic finishes, gr becomes “dead”, and control switches to the parent of gr, which is our original call stack, at the point we switched away: i.e., back to the line following gr.switch(). In other words, as written, the code above will run just as if you had called business_logic normally: the code will run, it will block on the boto requests, and then it will return the results to the client.

Our goal, however, is to have the “post” function terminate at the point we fire the boto requests, and then have a callback added to Tornado’s event loop once the relevant data is fetched from the server. So at this point, we inject our custom HTTP handler into boto as described above. Here’s the code for the class we pass to boto in lieu of the HTTPConnection that it is expecting:

import tornado.httpclient
import tornado.ioloop
import greenlet

class AsyncHttpConnection(object):
    def __init__(self):    #boring  
        self.host = None
        self.is_secure = None
        
    def request(self, method, path, data, headers): #boring
        self.method = method
        self.path = path
        self.data = data
        self.headers = headers     

    def getresponse(self):  #this is the method boto calls to get the result
                                    #of the request... this is where we do our thing
        #prepare the request for Tornado's http client
        http_client = tornado.httpclient.AsyncHTTPClient()
        if self.is_secure:
            schema = "https"
        else:
            schema = "http"
        url = "%s://%s%s" % (schema, self.host, self.path)
        request = tornado.httpclient.HTTPRequest(url,self.method, self.headers, 
                                                                      self.data or None)
        
        #Find the greenlet that is currently executing -- 
        #this should be the one we created in MyApp.post() above
        gr = greenlet.getcurrent()
        #Create the callback function to be fired when Tornado gets the 
        #results of the request back
        def callback(tornado_response):
            #see https://github.com/almost/asyncboto for the AsyncHttpResponse class: 
            #it's just a dummy class we used to coerce the response into
            # something that looks like the response that Boto expects
            response = AsyncHttpResponse(tornado_response.code, "???", 
                              tornado_response.body, tornado_response.headers)
            #resume our current greenlet, passing in the response
            gr.switch(response)
        #fire off the http request, with the callback we just created
        http_client.fetch(request, callback)
        #now, yield control back to the master greenlet, and wait for data to be sent to us
        response = gr.parent.switch()
        #hand the data back to boto
        return response

This takes a little staring at to really wrap your head around the control flow. When we hit gr.parent.switch(), this sends us back to the master thread, which is, at least the first time we enter getresponse(), in MyApp.post(). MyApp.post() returns, causing Tornado to move on and handle the next thing on its event loop. Then later, Tornado receives the results of the http request, and fires the callback. The callback calls gr.switch(response), which re-activates getresponse() where we left off: “response = gr.parent.switch()”. The switch() passes the response through, it gets assigned to the variable, and then it gets returned to boto, which commences processing of it. Boto then returns the result back up the call stack to business_logic(), which writes the response to the client and calls self.finish(), ending the request. Follow that? It gets mildly more complicated when there’s multiple http calls inside business_logic()… each subsequent time we call gr.parent.switch(), the code that gets resumed is actually the code in the callback() function which is now the bottom layer of the master greenlet’s stack. But confusing as it is behind the scenes, from the perspective of you writing your application code in the business_logic function, it just works: you code in a synchronous style, but you automagically get the performance characteristics of asynchronous. Yay!

The code above is actually a simplified version of what I use: I like to wrap my business logic in timing code to make sure that the main event loop runs nippily, and exception-handling code to log bugs and send appropriate feedback to the clients. I omitted that for clarity, since it further obscures the control-flow. The thing to remember is that you need to wrap said code around every call to gr.switch(): both the original call in MyApp.post() that starts the execution of business_logic(), and then the subsequent calls in callback() that resume the business logic. The results of that timing won’t be the total time from request arrival to request fulfillment — if you want that number, put the timer at the start and end of business_logic — it will be the time spent by the main Tornado thread executing your business logic before switching to the next request, which is the relevant metric for determining whether or not one of your functions is slowing down the event loop and thus the number of simultaneous connections you can take on before you need another server.

So to sum up, greenlets allow us to freeze and then resume execution of arbitrary code without incurring the prohibitive overheads of multithreading. This can be used to convert synchronous libraries to asynchronous — we used boto above, but we should be able to use this technique on any library that makes a call to an underlying resource that we can switch out for an asynchronous driver.

Written by jphaas

June 19, 2011 at 11:37 pm

Posted in Uncategorized

  • Sean Talts

    This is awesome.  Are you planning to open source it?  I’m about to do the same thing.

    • Anonymous

      Glad you like it :-) Yes, this is open source — I hearby release the
      source code on this blog entry under the MIT license:
      http://www.opensource.org/licenses/mit-license.php. (Unfortunately I don’t
      have a clean implementation you can just download and use — it’s part of a
      larger set of Tornado-related tools I’m working on, but it’s a total mess
      right now and I had to remove and edit the code for the purposes of this
      blog to make it all comprehensible).

      • Sean Talts

        Awesome, thanks for that and the great post.  Would you mind elaborating a little more on how it gets “mildly more complicated” with multiple http requests inside your business_logic function?  Does this require business_logic or post to do anything special in those cases?

        • Anonymous

          No, business_logic doesn’t need to do anything special. It’s only “more
          complicated” in the sense that it becomes harder to reason through the
          control flow, but the code as written will work. The complicated thing I
          was referring to is that after the callback fires from the first http
          request, that callback function, hosted by the main Tornado event loop, is
          actually the only thing on the stack: we are no longer in the original
          “post()” function. That’s important because if you want to, for instance,
          wrap a try…catch around your business_logic, you either have to do it in
          the business_logic function itself, or you have to do it *both* in post() ,
          *and* in callback() (as well as any other places in your code where the
          greenlet get resumed).

          • Sean Talts

            Interesting.  Thanks again.  Also, you might want to replace the new_http_connection instead of the get_http_connection (maybe). This allows the boto connection class to use its own get and put_http_connection methods, which are managing connection instances in a pool (on top of a nonblocking queue).

            I’m seeing what looks informally like there might be something else that could be blocking.  Sometimes there are long pauses between “post” getting called even when it’s regular (and quick) with something like httperf.  Have you seen anything like that?  I’m doing some light profiling and looking around in boto to see if anything besides getresponse might be blocking, but I haven’t come up with anything yet.

          • Anonymous

            Not sure why you’d want to use boto’s connection pool? The
            AsyncHttpConnection object basically costs nothing to construct, so I don’t
            really see a need to save instances between calls.

            What order of magnitude are we talking about with the “long pauses” you’re
            seeing?

            What I’ve done in terms of profiling is wrap the call to gr.switch() in both
            post() and in callback() with before-and-after clock checks, which should
            tell us how long each “visit” to business_logic() takes. Doing that, I’ve
            been able to squeeze a relatively complex application to a place where the
            majority of my business logic calls (all of which involve 1 or more calls to
            sdb) are returning control to Tornado in under 10 miliseconds, and none are
            taking more than 100 miliseconds. I don’t have enough experience managing
            Tornado at scale to know if that’s “good” performance or not.

            I haven’t done any testing yet to see what’s going on outside of
            business_logic in terms of performance — Tornado itself could be
            introducing delays, or the greenlet code might be slow, haven’t really
            looked into either of those possibilities. I guess the way you would do
            that is just invert the timer described above: I might try that later on and
            see what happens.

            Also, I think boto uses its http connections somewhat differently for
            different AWS services — I’ve only tested with s3 and sdb, what are you
            using?

          • Sean Talts

            I was seeing some long pauses on the order of seconds, but these went away after I did a couple things:

            1) update Boto to the latest version (I think we were on a 2 year old version that comes with 10.04 Ubuntu)
            2) Switch Tornado to use the CurlAsyncHTTPClient, which is orders of magnitude faster and additionally seemed to help the long pause issue3) Fix a memory leak / reference cycle in Kombu’s Channel class (not a problem you’d see, since you’re just using pure Boto I assume)

            We’re using just SQS here so far.  The only “problem” I’m seeing now is that I can’t get more than 100 or so requests a second filled, but this includes a response from SQS before the connection is “filled” (using httperf for this).  As you add more than that, the time to fill all connections raises dramatically.  But this is probably more of a network issue on our end than anything else (doing most of my testing locally and not in EC2).

            Thanks again for the article!  Greenlets seem like an awesome tool.  Let us know if you find any performance gotchas with them :)  

  • Anonymous

    You are awesome for sharing the solution. I am so lazy I dont even want to think about doing this and we all know this is exactly what we want, async calls to S3 or any of the several boto like libraries, my skills are no where near as close to yours in pythoning stuff. I bow in reverence. 

  • Simon Radford

    Thanks a lot for sharing this. We used this solution at MoPub.

    I adapted the code into a clean and easy-to-use decorator, and put it on GitHub, keeping the MIT license. You can check it out here:http://github.com/mopub/greenlet-tornado

    • http://blog.joshhaas.com/ Josh Haas

      This is great — you’ve packaged it in a really clean, simple way. Updating my post to link to it.

      The *really* cool thing would be monkey-patching python’s standard httplib module so it auto-detects whether or not it’s being called from a method wrapped with the decorator, and seamlessly switches to tornado’s http client. But that might be a little tricky :-)

  • Fotis Hantzis

    Thanks for sharing this! However, there seems to be a problem with the code when you put HTTP requests in business_logic() before the boto calls. What happens is that when the first HTTP request is being called (using yield gen.Task() in tornado) inside business_logic which has already been called through the gr.switch() call in post(), then the greenlet dies because there is no reference to it afterwards. So when the HTTP request returns (from the gen.Task()) business_logic continues on to make the boto calls but they fail when they reach the gr.parent.switch() because there is no parent greenlet available. Have you come across this problem? Do you know of a workaround?

  • Fotis Hantzis

    Thanks for sharing this! However there seems to be a problem with the code
    when you put HTTP requests inside business_logic() before the boto calls.
    So, suppose you have: post()–> business_logic() -> HTTP fetch() and then boto_call().
    HTTP fetch() is called through the ‘yield gen.Task()’ of tornado and
    business_logic is called from the gr.switch(). What happens is that when
    the asynchronous HTTP request is being made, then business_logic() returns
    temporarily and the greenlet that called it, dies, since there is no
    reference to it. Afterwards, when business_logic() resumes, when the data
    for the HTTP call have come and then the boto calls are made, the whole
    program crashes because there is no greenlet parent to switch to and
    gr.parent.switch() fails. Have you come across this problem? Do you know of
    any workaround? Thanks!

    • http://blog.joshhaas.com/ Josh Haas

      So, disclaimer before I answer — for the last year I’ve been working on a different project using node instead of python, so this stuff is a little fuzzy in my head right now.

      That said, I think the problem is that you’re using “yield”, which turns the business_logic function into a generator object. If you go down the path of using greenlets to control exiting and re-entering business_logic, you have to stick with greenlets the whole way; you can’t use “yield” as well. Tornado is no longer directly controlling business_logic; rather, we’ve wrapped it in a greenlet, which expects a normal function, not a generator. I’m not sure what happens when you pass a generator to a greenlet, but the bug you’re seeing is probably it :-)

      Instead, take a look at the way the AsyncHttpConnection.getresponse function works above. First it creates a callback that takes the http response and resumes the greenlet with the response via the call “gr.switch(response)” . Then it fires off the fetch request using that callback: “http_client.fetch(request, callback)”. Finally, it freezes the current greenlet until the callback is called: “response = gr.parent.switch()”. I think what you want to do is use those three lines, instead of the yield call.