guido van rossum guido@python.org th laser summer school...
Post on 20-May-2020
6 Views
Preview:
TRANSCRIPT
Guido van Rossum guido@python.org
9th LASER summer school, Sept. 2012
Async I/O is as old as computers But programmers prefer synchronous I/O ◦ Easier to understand ◦ Easier to write code for ◦ Fewer chances of bugs
Many languages don't have async I/O at all Or hide it deep down in advanced libraries Or recommend using threads
Probably an old Mac OS interface (sound?) ◦ The OS would call a C function when done ◦ Problem was to call a Python function safely ◦ Ended up using a queue, polled by interpreter loop ◦ There's still mention of this in ceval.c...
Next were UNIX signal handlers ◦ Solution used the same mechanism ◦ Still in use
select() and ioct() syscalls ◦ make file descriptor non-blocking ◦ use select() to wait or poll for I/O ◦ also available on Windows (for sockets only)
Later: poll(), epoll(), kqueue() ◦ these are all faster versions of select(), really
How to write code for all this?
Threads "solve" the problem differently Just dedicate a separate thread to each socket Use synchronous I/O in the thread The OS will multiplex threads
Problems: ◦ thread switching overhead ◦ memory allocations for thread stacks ◦ max #sockets way higher than max #threads ◦ locking bugs
Modules asyncore.py, asynchat.py ◦ Sam Rushing, 1996
dispatcher class ◦ wraps non-blocking socket ◦ all active dispatchers collected in a table ◦ event handler methods called when data ready e.g. handle_read(); user subclasses to define actions ◦ loop() calls select() and calls handlers as needed
Sadly, considered inflexible and out of date ◦ also, async I/O is remarkably subtle
800 pound gorilla of Python's async I/O world ◦ Glyph Lefkowitz, 2002
flexible, robust, mature, maintained separation of protocols and transports deferreds and callbacks integration with foreign event loops ◦ e.g. GUI libraries
integration with threads (via deferreds) inline callbacks (based on generators)
Web framework designed for speed ◦ FriendFeed/Facebook, 2009
Lowest level: async I/O using callbacks ◦ IOLoop class ◦ IOLoop.add_handler(fd, handler, events)
Focus is web serving (that's a trend!)
Handles most requests/sec ◦ E.g. Tornado: 3k req/sec on 1 core (8k / 4 cores)
No locks required (usually) Easy to handle timeouts Familiar to JavaScript programmers
Logic spread out over multiple functions Boilerplate overwhelms program logic Passing state around is hard Proliferation of higher order functions Error handling often broken Utilizes only a single core Hard to debug
If you accidentally block, everyone hangs
Stackless Python (Christian Tismer, 1999) ◦ Avoid using C stack for Python-to-Python calls ◦ Microthreads, channels, scheduling ◦ Implemented as a fork of CPython ◦ Stackless 3.1 (2012)
Greenlet (Armin Rigo, 2004) ◦ Derived from Stackless ◦ Microthreads called tasklets (just coroutines, really) ◦ Implemented as a C extension module ◦ May move the C stack around! ◦ Greenlet 0.4.0 (2012)
from greenlet import greenlet
def test1(x, y): z = gr2.switch(x+y) print z
def test2(u): print u gr1.switch(42)
gr1 = greenlet(test1) gr2 = greenlet(test2) gr1.switch("hello", " world")
hello world 42
Both built on top of Greenlet Eventlet (Bob Ippolito, 2006) ◦ "Asynchronous I/O with a synchronous interface" ◦ Seems pretty dead now
Gevent (Denis Bilenko, 2009) ◦ Successor to Eventlet ◦ Uses libev (an event loop C library) ◦ Monkey-patches networking modules in stdlib
import gevent
def foo(): print('Running in foo') gevent.sleep(0) print('Explicit context switch to foo again')
def bar(): print('Explicit context to bar') gevent.sleep(0) print('Implicit context switch back to bar')
gevent.joinall([gevent.spawn(foo), gevent.spawn(bar)])
import gevent, gevent.monkey, json, urllib2
gevent.monkey.patch_socket()
url = 'http://json-time.appspot.com/time.json'
def fetch(pid): response = urllib2.urlopen(url) result = response.read() json_result = json.loads(result) datetime = json_result['datetime'] print 'Process ', pid, datetime return json_result['datetime']
for i in range(1, 10): fetch(i) # synchronous
threads = [] for i in range(1, 10): threads.append(gevent.spawn(fetch, i)) gevent.joinall(threads) # asynchronous
No callbacks! Logic remains in one place Looks a lot like OS-level threads Synchronous and async code looks the same Still as fast as Twisted or Tornado ◦ In fact, Tornado+Gevent is even faster!
Monkey-patching may be brittle It's easy to forget where switching happens Synchronization primitives reimplemented Not pure Python (won't work on App Engine)
Explicit couroutines, using Python generators Typically built on top of event loop, callbacks Twisted inlineCallbacks Monocle (Greg+Steve Hazel, 2010) ◦ Works with asyncore, Twisted, Tornado
NDB (GvR, 2011) ◦ For Google App Engine only
Not quite classic Knuth-style coroutines Python generator: function using yield ◦ Used to implement iterators ◦ One (!) stack frame is suspended by yield ◦ Resumed by .next() (Python 3: .__next__())
Addition in PEP 342: .send(), .throw() ◦ g.send(x) makes yield return x ◦ g.throw(e) makes yield raise e
A "trampoline" or scheduler can be written to implement nanothreads using generators
from monocle import _o @_o def do_cmd(conn): cmd = yield conn.read_until("\n") if cmd.type == "get-address": user = yield db.query(cmd.username) yield conn.write(user.address) else: yield conn.write("unknown command")
from google.appengine.ext import ndb
class Employee(ndb.Model): empid = ndb.StringProperty(required=True) ...
@ndb.tasklet def get_employee_async(empid): emp = yield Employee.get_by_id_async(empid) if emp is None: emp = Employee(empid=empid, id=empid) yield emp.put_async() raise ndb.Return(emp)
joe = get_employee_async('42').get_result()
No callbacks! Logic remains in one place Looks a lot like OS-level threads Synchronous and async code looks the same Still as fast as Twisted or Tornado It's a Comonad!!!
Pure Python implementation possible User is always aware of switch points Less need for locking
Async and synchronous calls look different Only a single frame can be suspended It's easy to forget the yield keyword Mixing async and sync calls is hazardous
???
top related