Upgrade to Pro — share decks privately, control downloads, hide ads and more …

A tale of concurrency through creativity in Python:  a deep dive into how gevent works

kavya
May 30, 2016

A tale of concurrency through creativity in Python:  a deep dive into how gevent works

gevent is an open source Python library for asynchronous I/O. It provides a powerful construct to build concurrent applications; think threads, except lightweight and cooperatively scheduled. We will delve into how gevent is architected from its building blocks — sophisticated coroutines, an event loop, and a dash of creativity to neatly integrate them.

Abstract:

Asynchronous frameworks like gevent make it possible to write highly concurrent and performant applications in CPython. Unlike traditional concurrency constructs like threads and processes which tend to be heavyweight, gevent's "Greenlets" are inexpensive to spawn, making them ideal for applications like servers that need to handle tens of thousands of concurrent connections.

So how does gevent provide an execution unit that is both ideal for concurrency and lightweight?

At its heart, gevent runs on sophisticated coroutines (greenlets) and an event loop (the libev event loop). It neatly integrates greenlets into the libev event loop and builds additional mechanisms to schedule execution switches as well.

In this talk, we will dive into how gevent works. We will look into how greenlets and the libev event loop work, and what gevent uses them for. We will then delve into how gevent integrates them and its additional mechanisms to provide a concurrency model that, like threads, is transparent to the application.

kavya

May 30, 2016
Tweet

More Decks by kavya

Other Decks in Programming

Transcript

  1. # Open a connection to the server conn = get_authenticated_connection(user)

    # Download all photos photos = get_photos(conn) # Save for later display save_photos(user, photos) def download_photos(user):
  2. import multiprocessing as mp def downloader(): pool = [] for

    user in users: p = mp.Process(download_photos, user) pool.append(p) p.start() for p in pool: p.join()
  3. import threading def downloader(): pool = [] for user in

    users: t = threading.Thread(download_photos, user) pool.append(t) t.start() for t in pool: t.join()
  4. import twisted def download_photos(): # Modify this to add callbacks

    def downloader(): # Something something loop.run()
  5. green threads user space — 
 the OS does not

    create or manage them cooperatively scheduled — 
 the OS does not schedule or preempt them lightweight
  6. import gevent from gevent import monkey; monkey.patch_all() def downloader(): pool

    = [] for user in users: g = gevent.Greenlet(download_photos, user) g.start() pool.append(g) gevent.joinall(pool)
  7. from greenlet import greenlet ... class Greenlet(greenlet): """ A light-weight

    cooperatively-scheduled execution unit. """ ... ? g = gevent.Greenlet(download_photos, user)
  8. def print_red(): print 'red' gr2.switch() print ‘red done!’ def print_blue():

    print 'blue' gr1.switch() print ‘blue done!’ red blue red done! from greenlet import greenlet gr1 = greenlet(print_red) gr2 = greenlet(print_blue) gr1.switch()
  9. { base = SP1 } SP1 SP2 { base =

    SP1 start = SP2 } { base = SP2 } gr1.switch() gr2.switch() SP3 gr1.switch() } C STACK
  10. import gevent from gevent import monkey; monkey.patch_all() def downloader(): pool

    = [] for user in users: g = gevent.Greenlet(download_photos, user) g.start() pool.append(g) gevent.joinall(pool)
  11. def start(self): """ Schedule the greenlet to run in this

    loop iteration. """ if self._start_event is None: self._start_event = \ ...loop.run_callback(self.switch) g.start()
  12. “Hey loop, Wait for a write on this socket and

    call parse_recv() when that happens.”
  13. while True: block for I/O call pending io_watchers fd =

    make_nonblocking(socket_fd) loop.io_watch(fd, write, callback_fn) loop.run() call all pre_block_watchers call all post_block_watchers
  14. always call pre_block_watchers Hook to integrate other event mechanisms into

    the loop. “Hey loop, If there are coroutines ready to run, run them. Then, block for a write on...”
  15. import gevent from gevent import monkey; monkey.patch_all() def downloader(): pool

    = [] for user in users: g = gevent.Greenlet(download_photos, user) g.start() pool.append(g) gevent.joinall(pool)
  16. HUB

  17. block for I/O call pending io_watchers = [g1.switch] Hub resumes

    download_photos(user1) g1 g1.switch() loop.run() call pre_block_watchers = [] ...
  18. minuses no parallelism non-cooperative code will block the entire process:

    
 C-extensions —> use pure Python libraries
 compute-bound greenlets —> use gevent.sleep(0)
 —> use greenlet blocking detection
 monkey-patch may have confusing implications
 order of imports matters
  19. …but excellent for workloads that are: 
 I/O bound, highly

    concurrent —> 20-30k concurrent connections!
 Used at “web scale” at:
 Pinterest, Facebook, Mixpanel, PayPal, Disqus, Nylas…