I’ve been learning the new-in-Python-3.4 asyncio module recently, since I want to employ it in a project. I started reading the docs, and after reading a bit about the EventLoop I clicked through to the chapter on coroutines, Future and Task. And got rather confused.
After poking around for a while, reading other articles on asyncio, talking to one of the developers, and looking through the source code, I’m pretty sure I’ve figured out how it works, and what roles coroutines, Future and Task play. Someone who knows asyncio who reviewed this article briefly commented that it was way too long and complex, that the concepts really should be simple. I think if I were aiming to explain how to use asyncio I wouldn’t have an article this long (indeed, some of the ones I read were quite short). But what I wanted, as an experienced Python programmer new to both asyncio in particular and async programming in general, was an explanation of how it worked, and what role these various classes actually played in making it work.
I did not find any articles that explained things at this level (that doesn’t meant they don’t exist, I just didn’t find one), so I wrote one in order to solidify my understanding. And, indeed, I don’t feel my understanding was complete until I finished writing the article.
So, on to my (hopefully correct) explanation of how asyncio works.
The most fundamental building block of asyncio is the concept of the Future. This is similar to concurrent.futures.Future, but adapted so that it works with the second most fundamental component of asyncio, the EventLoop[1].
Conceptually a Future object is really very simple. It is a holder for (eventually) a result or exception, and also for a list of callbacks to be called when it is “done” (that is, when there is a result, an exception, or the Future has been canceled).
Conceptually, the EventLoop is also very simple: each time through the loop, it calls any callback in the list of ‘ready’ callbacks (the call_soon list), and then uses a selector to wait either for the next pending IO operation to complete or the time for the next scheduled task to arrive, at which point it adds the callback that will handle the event to the call_soon list and starts a new loop iteration.
An asyncio program can be written in “callback” style using just these two components: Future objects are used for signalling by attaching callbacks to be scheduled for execution by the EventLoop when the Future‘s set_result method is called (or some other call is made that marks the Future as “done”). Other callbacks are scheduled with the EventLoop to handle IO events and to run scheduled tasks, and when these callbacks run they call the appropriate methods on the appropriate Futures to mark them as “done” and therefore trigger the Future‘s callbacks to run.
The power of asyncio programming, however, comes from two additional components: Coroutines and Tasks. These two components tie Futures and the EventLoop into a system that allows one to write procedural-looking code that, under the hood, is async code.
Note: the following discussion simplifies certain advanced details of how coroutines work (and that I currently don’t understand :) in order to make the fundamental mechanisms clearer.
The nature of a coroutine is that it is a Python generator function that uses only[2] yield from. When writing code using asyncio, instead of calling a function using python function call syntax and obtaining a result:
res = normal_function()
you use yield from:
res = yield from async_function()
In the above snippet, async_function is a function that returns either a Future or a coroutine.
A Task is, itself, a Future, and it wraps a coroutine (or another Future, but there’s no reason to do that). When a Task is created, it adds a callback to the EventLoop‘s call_soon queue that starts the iteration of the coroutine it is wrapping. That is, it arranges to call next on the coroutine. The result of that call to next has one of three valid results: a Future, a StopIteration exception with a value, or some other exception.
If it is an exception, the Task schedules a call_soon callback with the EventLoop that, on the next pass through the loop, will throw the exception into the coroutine. This means that the exception will be raised at the point where the (innermost) yield from call was made.
If the result is a Future, the Task schedules a callback on the Future to call the Task when the Future has completed. When some other thread of control eventually causes the Future to move to the “done” state, the Future will schedule that callback to run. That callback in turn will schedule another call_soon callback that will call next on the coroutine.
If the result is a StopIteration exception, the Task sets the value associated with the exception (which will be what the wrapped coroutine specified in its return statement) as its result via set_result (remember, the Task is a Future).
All coroutines make calls to other coroutines and Futures using yield from. What yield from does is to iterate over the object passed to it and yield each result in turn. If we call yield from on a generator, and that generator in turn calls yield from, the values from the inner iterator will be yielded as values from the outer yield from. Since coroutines only call yield from on other coroutines or on Futures, this means that when a Task callback calls next on the coroutine it wraps what it gets back is a Future, and it then schedules a callback on the Future and control returns to the EventLoop. Control thus returns to the EventLoop after each iteration of the innermost iterator in the coroutine call chain, no matter how deeply nested in a chain of yeild froms that Future was.
When a Future completes, it schedules the callback provided by the Task that wraps the coroutine that was at the top of the chain of yield froms that resulted in yield from being called on that Future, and then it executes a return statement, passing return the value that was set on the Future via set_result. The Future-scheduled callback (provided by the Task that wraps the top level coroutine) schedules another callback that will make another call to next on the coroutine. That causes all of the yield froms in the chain to request the next value, which for the innermost yield from will cause the coroutine that executed it to obtain the value returned by the Future, and that coroutine will continue execution with value in hand. When that lowest level coroutine itself reaches its end and returns a value, the yield from that called it returns that value and the next higher coroutine, that executed that yield from, will continue execution with value in hand. And so on until the top level coroutine completes and returns the value that becomes the value of the Future that is the Task.
To summarise at a slightly higher level, the overall flow in an asyncio program is that we execute procedural style code, and every time we get to a yield from statement the execution of that procedural code is suspended. This may go on for several levels of yield from call, but eventually a Future will be yielded and make its way back up to the Task, and we will start a new pass through the EventLoop. The EventLoop will then run any call_soon callbacks. When all call_soon callbacks have run, the EventLoop uses a selector to wait for the next IO event or the next callback that was scheduled to run at a specific time. Those IO or timed events will provide values that will be set on certain Future objects, which will trigger the scheduling of call_soon callbacks which will in turn cause the corouties that were waiting for those Futures to be scheduled via call_soon to have next called on them and thus get another chance to run. This continues until all Futures are complete, including the Task or Tasks that the main EventLoop is waiting for (or the EventLoop is explicitly shut down).
From the point of view of the coroutine, this looks like procedural code: the coroutine (using yield from) calls a subroutine, gets back a value, and continues on with its computations. When you write the coroutine you don’t (for the most part) have to worry about the fact that there is an uncertain amount of time that will elapse between the yield from call and the acquisition of the result.
You do, of course, have to be cognizant of the potential for deadlocks and the mutation of shared data by other coroutines, just as you would in any programming involving multitasking. However, in async code, you do not have to worry about simultaneous modification of shared data: the other code can only execute when you call yield from[3].
And there you have it. Using this “one cool trick” (yield from) we can write async code as if it was procedural code.
[1] | In fact, the asyncio EventLoop is plugable. There are many different event loops that can be used, including third party loops such as Twisted. It is the concept of the EventLoop that is fundamental. |
[2] | A coroutine can also use a bare yield statement, which will yield control to the EventLoop but schedule the next iteration of the coroutine as call_soon. This is a way to cooperatively yield control during what might otherwise be a cycle-stealing long computation. |
[3] | Unless you are using the threading support to handle blocking calls. |