Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Asynchronous MongoDB with Python and Tornado - A. Jesse Jiryu Davis, Python Evangelist

mongodb
March 21, 2012

Asynchronous MongoDB with Python and Tornado - A. Jesse Jiryu Davis, Python Evangelist

A. Jesse Jiryu Davis will review the state of the art for Tornado and MongoDB, and demonstrate building a real-time web app.

mongodb

March 21, 2012
Tweet

More Decks by mongodb

Other Decks in Technology

Transcript

  1. Agenda • Talk  about  web  services  in  a  really  dumb

      (“abstract”?)  way • Explain  when  we  need  async  web  servers • Why  is  async  hard? • What  is  Tornado  and  how  does  it  work? • Using  Tornado  with  PyMongo,  and  with   AsyncMongo • Motor,  my  experimental  driver
  2. CPU-­‐bound  web  service Client Server sockets •  No  need  for

     async •  Just  spawn  one  process  per  core Clients
  3. Normal  web  service Client Server sockets •  Assume  backend  is

     unbounded •  Service  is  bound  by  memory Backend (DB,  web  service, SAN,  …) socket Clients
  4. What’s  async  for? • Minimize  resources  per  connecVon • I.e.,

     wait  for  backend  as  cheaply  as  possible
  5. HTTP  long-­‐polling  (“COMET”) • E.g.,  chat  server • Async’s  killer

     app • Short-­‐polling  is  CPU-­‐bound:  tradeoff  between   latency  and  load • Long-­‐polling  is  memory  bound • “C10K  problem”:  kegel.com/c10k.html • Tornado  was  invented  for  this
  6. Why  is  async  hard  to  code? Backend Client Server request

    response store  state request response <me
  7. Ways  to  store  state this  slide  is  in  beta Coding

     difficulty MulVthreading Tornado,  Node.js Greenlets  /  Gevent Memory  per  connecVon
  8. What’s  a  greenlet? • A.K.A.  “green  threads” • A  feature

     of  Stackless  Python,  packaged  as  a   module  for  standard  Python • Greenlet  stacks  are  stored  on  heap,  copied  to  /   from  OS  stack  on  resume  /  pause • CooperaVve • Memory-­‐efficient
  9. Threads: # pseudo-Python sock = listen() request = parse_http(sock.recv()) mongo_data

    = db.collection.find() response = format_response(mongo_data) sock.sendall(response)
  10. Gevent: # pseudo-Python import gevent.monkey; monkey.patch_all() sock = listen() request

    = parse_http(sock.recv()) mongo_data = db.collection.find() response = format_response(mongo_data) sock.sendall(response)
  11. Tornado: class MainHandler(tornado.web.RequestHandler): @tornado.web.asynchronous def get(self): AsyncHTTPClient().fetch( ! ! "http://example.com",

    callback=self.on_response) def on_response(self, response): formatted = format_response(response) self.write(formatted) self.finish()
  12. Tornado  IOStream class IOStream(object): def read_bytes(self, num_bytes, callback): self.read_bytes =

    num_bytes self.read_callback = callback io_loop.add_handler( self.socket.fileno(), ! ! ! ! self.handle_events, ! ! ! ! events=READ) def handle_events(self, fd, events):
  13. Tornado  IOLoop class IOLoop(object): def add_handler(self, fd, handler, events): self._handlers[fd]

    = handler # _impl is epoll or kqueue or ... self._impl.register(fd, events) def start(self): while True: event_pairs = self._impl.poll() for fd, events in event_pairs: self._handlers[fd](fd, events)
  14. Python,  MongoDB,  &  concurrency • Threads  work  great  with  pymongo

    • Gevent  works  great  with  pymongo – monkey.patch_socket();  monkey.patch_thread() • Tornado  works  so-­‐so – asyncmongo • No  replica  sets,  only  first  batch,  no  SON  manipulators,  no   document  classes,  … – pymongo • OK  if  all  your  queries  are  fast • Use  extra  Tornado  processes
  15. Introducing:  “Motor” • Mongo  +  Tornado • Experimental • Might

     be  official  in  a  few  months • Uses  Tornado  IOLoop  and  IOStream • Presents  standard  Tornado  callback  API • Stores  state  internally  with  greenlets • github.com/ajdavis/mongo-­‐python-­‐driver/tree/tornado_async
  16. Motor class MainHandler(tornado.web.RequestHandler): def __init__(self): self.c = MotorConnection() @tornado.web.asynchronous def

    post(self): # No-op if already open self.c.open(callback=self.connected) def connected(self, c, error): self.c.collection.insert( {‘x’:1}, callback=self.inserted) def inserted(self, result, error):
  17. Motor  (with  Tornado  Tasks!) class MainHandler(tornado.web.RequestHandler): def __init__(self): self.c =

    MotorConnection() @tornado.web.asynchronous @gen.engine def post(self): yield gen.Task(self.c.open) self.c.db.collection.insert( {‘foo’:’bar’}, callback=(yield gen.Callback(’insert'))) while cursor.alive: for i in (yield gen.Wait(’insert')): self.write(json.dumps(i)) self.write(']')
  18. Motor  internals pymongo IOLoop RequestHandler request schedule callback start <me

    Client greenlet IOStream.sendall(callback) switch() switch() return stack  depth callback() HTTP  response parse  Mongo  response callback()
  19. Motor  internals:  wrapper class MotorCollection(object): def insert(self, *args, **kwargs): callback

    = kwargs['callback'] del kwargs['callback'] kwargs['safe'] = True def call_insert(): # Runs on child greenlet result, error = None, None try: sync_insert = self.sync_collection.insert result = sync_insert(*args, **kwargs) except Exception, e: error = e # Schedule the callback to be run on the main greenlet tornado.ioloop.IOLoop.instance().add_callback( lambda: callback(result, error) ) # Start child greenlet 1 2 3 6 8
  20. Motor  internals:  fake  socket class MotorSocket(object): def __init__(self, socket): #

    Makes socket non-blocking self.stream = tornado.iostream.IOStream(socket) def sendall(self, data): child_gr = greenlet.getcurrent() # This is run by IOLoop on the main greenlet # when data has been sent; # switch back to child to continue processing def sendall_callback(): child_gr.switch() self.stream.write(data, callback=sendall_callback) 4 5 7
  21. Motor • Shows  a  general  method  for  asynchronizing   synchronous

     network  APIs  in  Python • Who  wants  to  try  it  with  MySQL?  Thrie? • (Bonus  round:  resynchronizing  Motor  for   tes:ng)