A tale of concurrency through creativity in Python:  a deep dive into how gevent works

A tale of concurrency through creativity in Python:  a deep dive into how gevent works

gevent is an open source Python library for asynchronous I/O. It provides a powerful construct to build concurrent applications; think threads, except lightweight and cooperatively scheduled. We will delve into how gevent is architected from its building blocks — sophisticated coroutines, an event loop, and a dash of creativity to neatly integrate them.

Abstract:

Asynchronous frameworks like gevent make it possible to write highly concurrent and performant applications in CPython. Unlike traditional concurrency constructs like threads and processes which tend to be heavyweight, gevent's "Greenlets" are inexpensive to spawn, making them ideal for applications like servers that need to handle tens of thousands of concurrent connections.

So how does gevent provide an execution unit that is both ideal for concurrency and lightweight?

At its heart, gevent runs on sophisticated coroutines (greenlets) and an event loop (the libev event loop). It neatly integrates greenlets into the libev event loop and builds additional mechanisms to schedule execution switches as well.

In this talk, we will dive into how gevent works. We will look into how greenlets and the libev event loop work, and what gevent uses them for. We will then delve into how gevent integrates them and its additional mechanisms to provide a concurrency model that, like threads, is transparent to the application.

69c2f55e7b157c112c0d988ddba7484d?s=128

kavya

May 30, 2016
Tweet

Transcript

  1. A TALE OF CONCURRENCY THROUGH CREATIVITY IN PYTHON: A DEEP

    DIVE INTO HOW GEVENT WORKS
  2. KAVYA

  3. GEVENT

  4. What is asynchronous I/O? What is gevent?

  5. download_photos network

  6. # Open a connection to the server conn = get_authenticated_connection(user)

    # Download all photos photos = get_photos(conn) # Save for later display save_photos(user, photos) def download_photos(user):
  7. def downloader(): users = get_users() for user in users: download_photos(user)

    network I/O
  8. import multiprocessing threading twisted green_thread ?

  9. import multiprocessing

  10. import multiprocessing as mp def downloader(): pool = [] for

    user in users: p = mp.Process(download_photos, user) pool.append(p) p.start() for p in pool: p.join()
  11. import threading

  12. import threading def downloader(): pool = [] for user in

    users: t = threading.Thread(download_photos, user) pool.append(t) t.start() for t in pool: t.join()
  13. import twisted

  14. import twisted def download_photos(): # Modify this to add callbacks

    def downloader(): # Something something loop.run()
  15. green threads user space — 
 the OS does not

    create or manage them cooperatively scheduled — 
 the OS does not schedule or preempt them lightweight
  16. import gevent

  17. import gevent from gevent import monkey; monkey.patch_all() def downloader(): pool

    = [] for user in users: g = gevent.Greenlet(download_photos, user) g.start() pool.append(g) gevent.joinall(pool)
  18. THE BUILDING BLOCKS PUTTING IT TOGETHER WRAP-UP/ Q&A

  19. THE BUILDING BLOCKS

  20. from greenlet import greenlet ... class Greenlet(greenlet): """ A light-weight

    cooperatively-scheduled execution unit. """ ... ? g = gevent.Greenlet(download_photos, user)
  21. def print_red(): print 'red' gr2.switch() print ‘red done!’ def print_blue():

    print 'blue' gr1.switch() print ‘blue done!’ red blue red done! from greenlet import greenlet gr1 = greenlet(print_red) gr2 = greenlet(print_blue) gr1.switch()
  22. .switch() pause current + yield control flow resume next.switch() coroutine

  23. gr1 = greenlet(run_fn) { } run_fn parent …

  24. { base = SP1 } SP1 SP2 { base =

    SP1 start = SP2 } { base = SP2 } gr1.switch() gr2.switch() SP3 gr1.switch() } C STACK
  25. } start SP3 SP4 = HEAP C STACK

  26. greenlets for coroutines via assembly-based stack-slicing

  27. import gevent from gevent import monkey; monkey.patch_all() def downloader(): pool

    = [] for user in users: g = gevent.Greenlet(download_photos, user) g.start() pool.append(g) gevent.joinall(pool)
  28. def start(self): """ Schedule the greenlet to run in this

    loop iteration. """ if self._start_event is None: self._start_event = \ ...loop.run_callback(self.switch) g.start()
  29. libev API to register event_handler callbacks watches for events calls

    registered callbacks
  30. “Hey loop, Wait for a write on this socket and

    call parse_recv() when that happens.”
  31. while True: block for I/O call pending io_watchers fd =

    make_nonblocking(socket_fd) loop.io_watch(fd, write, callback_fn) loop.run() call all pre_block_watchers call all post_block_watchers
  32. always call pre_block_watchers Hook to integrate other event mechanisms into

    the loop. “Hey loop, If there are coroutines ready to run, run them. Then, block for a write on...”
  33. libev for an event loop

  34. PUTTING IT TOGETHER

  35. import gevent from gevent import monkey; monkey.patch_all() def downloader(): pool

    = [] for user in users: g = gevent.Greenlet(download_photos, user) g.start() pool.append(g) gevent.joinall(pool)
  36. for user in users: g = gevent.Greenlet(download_photos,user)

  37. g = gevent.Greenlet(download_photos,user) class Greenlet(greenlet): def __init__(self, run=None,...): greenlet.__init__(self, None,

    get_hub()) g.parent = Hub
  38. class Greenlet(greenlet): greenlet.__init__(self, None, get_hub()) g.parent = Hub class Hub(greenlet):

    def __init__(self): greenlet.__init__(self) self.loop = ...
  39. Greenlet() a greenlet — to run download_photos() the event loop

    — i.e. the Hub .parent
  40. for user in users: g = gevent.Greenlet(download_photos,user) g.start()

  41. self.parent.loop.run_callback(self.switch) g.start() Hub pre_block_watcher

  42. while True: block for I/O ... call all pre_block_watchers =

    g.switch loop.run()
  43. .start() “Hey loop, This coroutine is ready to run. Run

    it before you block...”
  44. for user in users: g = gevent.Greenlet(download_photos,user) g.start() pool.append(g) gevent.joinall(pool)

  45. gevent.joinall() g.join() result = self.parent.switch() Hub class Hub(greenlet): def run(self):

    while True: self.loop.run()
  46. .join() runs the loop

  47. .join() = loop.run() while True: ... call pre_block_watchers = g.switch

    download_photos()
  48. HUB

  49. loop.run() g.switch() download_photos() network I/O

  50. import gevent from gevent import monkey; monkey.patch_all() def downloader(): ...

  51. socket gevent.socket import

  52. fd = make_nonblocking(socket_fd) loop.io_watch(fd, write, callback_fn) loop.run() g.switch Hub.switch create:

    send:
  53. network I/O

  54. for user in users: g = gevent.Greenlet(download_photos,user) g.start() pre_block_watchers =

    [g1.switch, g2.switch]
  55. gevent.joinall() g1.switch() loop.run() call pre_block_watchers = [g1.switch, ...] Hub download_photos(user1)

    network_request g1 io_watchers = [g1.switch] Hub.switch()
  56. g2.switch() loop.run() call pre_block_watchers = [g2.switch] Hub download_photos(user2) network_request g2

    io_watchers = [g2.switch, g1.switch] Hub.switch()
  57. block for I/O call pending io_watchers = [g1.switch] Hub resumes

    download_photos(user1) g1 g1.switch() loop.run() call pre_block_watchers = [] ...
  58. WRAP-UP

  59. minuses no parallelism non-cooperative code will block the entire process:

    
 C-extensions —> use pure Python libraries
 compute-bound greenlets —> use gevent.sleep(0)
 —> use greenlet blocking detection
 monkey-patch may have confusing implications
 order of imports matters
  60. …but excellent for workloads that are: 
 I/O bound, highly

    concurrent —> 20-30k concurrent connections!
 Used at “web scale” at:
 Pinterest, Facebook, Mixpanel, PayPal, Disqus, Nylas…
  61. greenlet libev Hub monkeypatching

  62. KAVYA @KAVYA719

  63. greenlet libev Hub monkeypatching