Slide 1

Slide 1 text

Asynchronous  web  frameworks,   Python,  and  MongoDB A.  Jesse  Jiryu  Davis [email protected] emptysquare.net

Slide 2

Slide 2 text

Agenda • Talk  about  web  services  in  a  really  dumb   (“abstract”?)  way • Explain  when  we  need  async  web  servers • Why  is  async  hard? • What  is  Tornado  and  how  does  it  work? • Using  Tornado  with  PyMongo,  and  with   AsyncMongo • Motor,  my  experimental  driver

Slide 3

Slide 3 text

CPU-­‐bound  web  service Client Server sockets •  No  need  for  async •  Just  spawn  one  process  per  core Clients

Slide 4

Slide 4 text

Normal  web  service Client Server sockets •  Assume  backend  is  unbounded •  Service  is  bound  by  memory Backend (DB,  web  service, SAN,  …) socket Clients

Slide 5

Slide 5 text

What’s  async  for? • Minimize  resources  per  connecVon • I.e.,  wait  for  backend  as  cheaply  as  possible

Slide 6

Slide 6 text

CPU-­‐  vs.  Memory-­‐bound Crypto Chat Most  web  services? • Memory-­‐bound CPU-­‐bound

Slide 7

Slide 7 text

HTTP  long-­‐polling  (“COMET”) • E.g.,  chat  server • Async’s  killer  app • Short-­‐polling  is  CPU-­‐bound:  tradeoff  between   latency  and  load • Long-­‐polling  is  memory  bound • “C10K  problem”:  kegel.com/c10k.html • Tornado  was  invented  for  this

Slide 8

Slide 8 text

Why  is  async  hard  to  code? Backend Client Server request response store  state request response

Slide 9

Slide 9 text

Ways  to  store  state this  slide  is  in  beta Coding  difficulty MulVthreading Tornado,  Node.js Greenlets  /  Gevent Memory  per  connecVon

Slide 10

Slide 10 text

What’s  a  greenlet? • A.K.A.  “green  threads” • A  feature  of  Stackless  Python,  packaged  as  a   module  for  standard  Python • Greenlet  stacks  are  stored  on  heap,  copied  to  /   from  OS  stack  on  resume  /  pause • CooperaVve • Memory-­‐efficient

Slide 11

Slide 11 text

Threads: # pseudo-Python sock = listen() request = parse_http(sock.recv()) mongo_data = db.collection.find() response = format_response(mongo_data) sock.sendall(response)

Slide 12

Slide 12 text

Gevent: # pseudo-Python import gevent.monkey; monkey.patch_all() sock = listen() request = parse_http(sock.recv()) mongo_data = db.collection.find() response = format_response(mongo_data) sock.sendall(response)

Slide 13

Slide 13 text

Tornado: class MainHandler(tornado.web.RequestHandler): @tornado.web.asynchronous def get(self): AsyncHTTPClient().fetch( ! ! "http://example.com", callback=self.on_response) def on_response(self, response): formatted = format_response(response) self.write(formatted) self.finish()

Slide 14

Slide 14 text

Tornado  IOStream class IOStream(object): def read_bytes(self, num_bytes, callback): self.read_bytes = num_bytes self.read_callback = callback io_loop.add_handler( self.socket.fileno(), ! ! ! ! self.handle_events, ! ! ! ! events=READ) def handle_events(self, fd, events):

Slide 15

Slide 15 text

Tornado  IOLoop class IOLoop(object): def add_handler(self, fd, handler, events): self._handlers[fd] = handler # _impl is epoll or kqueue or ... self._impl.register(fd, events) def start(self): while True: event_pairs = self._impl.poll() for fd, events in event_pairs: self._handlers[fd](fd, events)

Slide 16

Slide 16 text

Python,  MongoDB,  &  concurrency • Threads  work  great  with  pymongo • Gevent  works  great  with  pymongo – monkey.patch_socket();  monkey.patch_thread() • Tornado  works  so-­‐so – asyncmongo • No  replica  sets,  only  first  batch,  no  SON  manipulators,  no   document  classes,  … – pymongo • OK  if  all  your  queries  are  fast • Use  extra  Tornado  processes

Slide 17

Slide 17 text

Demo:  “Chirp” hPps://github.com/ajdavis/chirp • Using  PyMongo • Using  AsyncMongo • Using  AsyncMongo  with  generators

Slide 18

Slide 18 text

Introducing:  “Motor” • Mongo  +  Tornado • Experimental • Might  be  official  in  a  few  months • Uses  Tornado  IOLoop  and  IOStream • Presents  standard  Tornado  callback  API • Stores  state  internally  with  greenlets • github.com/ajdavis/mongo-­‐python-­‐driver/tree/tornado_async

Slide 19

Slide 19 text

Motor class MainHandler(tornado.web.RequestHandler): def __init__(self): self.c = MotorConnection() @tornado.web.asynchronous def post(self): # No-op if already open self.c.open(callback=self.connected) def connected(self, c, error): self.c.collection.insert( {‘x’:1}, callback=self.inserted) def inserted(self, result, error):

Slide 20

Slide 20 text

Motor  (with  Tornado  Tasks!) class MainHandler(tornado.web.RequestHandler): def __init__(self): self.c = MotorConnection() @tornado.web.asynchronous @gen.engine def post(self): yield gen.Task(self.c.open) self.c.db.collection.insert( {‘foo’:’bar’}, callback=(yield gen.Callback(’insert'))) while cursor.alive: for i in (yield gen.Wait(’insert')): self.write(json.dumps(i)) self.write(']')

Slide 21

Slide 21 text

Motor  internals pymongo IOLoop RequestHandler request schedule callback start

Slide 22

Slide 22 text

Motor  internals:  wrapper class MotorCollection(object): def insert(self, *args, **kwargs): callback = kwargs['callback'] del kwargs['callback'] kwargs['safe'] = True def call_insert(): # Runs on child greenlet result, error = None, None try: sync_insert = self.sync_collection.insert result = sync_insert(*args, **kwargs) except Exception, e: error = e # Schedule the callback to be run on the main greenlet tornado.ioloop.IOLoop.instance().add_callback( lambda: callback(result, error) ) # Start child greenlet 1 2 3 6 8

Slide 23

Slide 23 text

Motor  internals:  fake  socket class MotorSocket(object): def __init__(self, socket): # Makes socket non-blocking self.stream = tornado.iostream.IOStream(socket) def sendall(self, data): child_gr = greenlet.getcurrent() # This is run by IOLoop on the main greenlet # when data has been sent; # switch back to child to continue processing def sendall_callback(): child_gr.switch() self.stream.write(data, callback=sendall_callback) 4 5 7

Slide 24

Slide 24 text

Motor • Shows  a  general  method  for  asynchronizing   synchronous  network  APIs  in  Python • Who  wants  to  try  it  with  MySQL?  Thrie? • (Bonus  round:  resynchronizing  Motor  for   tes:ng)

Slide 25

Slide 25 text

QuesVons? A.  Jesse  Jiryu  Davis [email protected] emptysquare.net