$30 off During Our Annual Pro Sale. View Details »

A tale of concurrency through creativity in Python:  a deep dive into how gevent works

kavya
May 30, 2016

A tale of concurrency through creativity in Python:  a deep dive into how gevent works

gevent is an open source Python library for asynchronous I/O. It provides a powerful construct to build concurrent applications; think threads, except lightweight and cooperatively scheduled. We will delve into how gevent is architected from its building blocks — sophisticated coroutines, an event loop, and a dash of creativity to neatly integrate them.

Abstract:

Asynchronous frameworks like gevent make it possible to write highly concurrent and performant applications in CPython. Unlike traditional concurrency constructs like threads and processes which tend to be heavyweight, gevent's "Greenlets" are inexpensive to spawn, making them ideal for applications like servers that need to handle tens of thousands of concurrent connections.

So how does gevent provide an execution unit that is both ideal for concurrency and lightweight?

At its heart, gevent runs on sophisticated coroutines (greenlets) and an event loop (the libev event loop). It neatly integrates greenlets into the libev event loop and builds additional mechanisms to schedule execution switches as well.

In this talk, we will dive into how gevent works. We will look into how greenlets and the libev event loop work, and what gevent uses them for. We will then delve into how gevent integrates them and its additional mechanisms to provide a concurrency model that, like threads, is transparent to the application.

kavya

May 30, 2016
Tweet

More Decks by kavya

Other Decks in Programming

Transcript

  1. A TALE OF CONCURRENCY
    THROUGH CREATIVITY IN PYTHON:
    A DEEP DIVE INTO HOW GEVENT WORKS

    View Slide

  2. KAVYA

    View Slide

  3. GEVENT

    View Slide

  4. What is asynchronous I/O?
    What is gevent?

    View Slide

  5. download_photos
    network

    View Slide

  6. # Open a connection to the server
    conn = get_authenticated_connection(user)
    # Download all photos
    photos = get_photos(conn)
    # Save for later display
    save_photos(user, photos)
    def download_photos(user):

    View Slide

  7. def downloader():
    users = get_users()
    for user in users:
    download_photos(user)
    network I/O

    View Slide

  8. import
    multiprocessing
    threading
    twisted
    green_thread ?

    View Slide

  9. import multiprocessing

    View Slide

  10. import multiprocessing as mp
    def downloader():
    pool = []
    for user in users:
    p = mp.Process(download_photos, user)
    pool.append(p)
    p.start()
    for p in pool:
    p.join()

    View Slide

  11. import threading

    View Slide

  12. import threading
    def downloader():
    pool = []
    for user in users:
    t = threading.Thread(download_photos, user)
    pool.append(t)
    t.start()
    for t in pool:
    t.join()

    View Slide

  13. import twisted

    View Slide

  14. import twisted
    def download_photos():
    # Modify this to add callbacks
    def downloader():
    # Something something loop.run()

    View Slide

  15. green threads
    user space — 

    the OS does not create or manage them
    cooperatively scheduled — 

    the OS does not schedule or preempt them
    lightweight

    View Slide

  16. import gevent

    View Slide

  17. import gevent
    from gevent import monkey; monkey.patch_all()
    def downloader():
    pool = []
    for user in users:
    g = gevent.Greenlet(download_photos,
    user)
    g.start()
    pool.append(g)
    gevent.joinall(pool)

    View Slide

  18. THE BUILDING BLOCKS
    PUTTING IT TOGETHER
    WRAP-UP/ Q&A

    View Slide

  19. THE BUILDING BLOCKS

    View Slide

  20. from greenlet import greenlet
    ...
    class Greenlet(greenlet):
    """
    A light-weight cooperatively-scheduled
    execution unit.
    """
    ...
    ?
    g = gevent.Greenlet(download_photos, user)

    View Slide

  21. def print_red():
    print 'red'
    gr2.switch()
    print ‘red done!’
    def print_blue():
    print 'blue'
    gr1.switch()
    print ‘blue done!’
    red
    blue
    red done!
    from greenlet import greenlet
    gr1 = greenlet(print_red)
    gr2 = greenlet(print_blue)
    gr1.switch()

    View Slide

  22. .switch()
    pause current + yield control flow
    resume next.switch()
    coroutine

    View Slide

  23. gr1 = greenlet(run_fn)
    {
    }
    run_fn
    parent

    View Slide

  24. {
    base = SP1
    }
    SP1
    SP2
    {
    base = SP1
    start = SP2
    }
    {
    base = SP2
    }
    gr1.switch()
    gr2.switch()
    SP3 gr1.switch()
    }
    C STACK

    View Slide

  25. }
    start
    SP3
    SP4
    =
    HEAP
    C STACK

    View Slide

  26. greenlets
    for
    coroutines
    via
    assembly-based stack-slicing

    View Slide

  27. import gevent
    from gevent import monkey; monkey.patch_all()
    def downloader():
    pool = []
    for user in users:
    g = gevent.Greenlet(download_photos,
    user)
    g.start()
    pool.append(g)
    gevent.joinall(pool)

    View Slide

  28. def start(self):
    """ Schedule the greenlet to run in this
    loop iteration. """
    if self._start_event is None:
    self._start_event = \
    ...loop.run_callback(self.switch)
    g.start()

    View Slide

  29. libev
    API to register event_handler callbacks
    watches for events
    calls registered callbacks

    View Slide

  30. “Hey loop,
    Wait for a write on this socket and
    call parse_recv() when that happens.”

    View Slide

  31. while True:
    block for I/O
    call pending io_watchers
    fd = make_nonblocking(socket_fd)
    loop.io_watch(fd, write, callback_fn)
    loop.run()
    call all pre_block_watchers
    call all post_block_watchers

    View Slide

  32. always call pre_block_watchers
    Hook to integrate other event mechanisms
    into the loop.
    “Hey loop,
    If there are coroutines ready to run,
    run them. Then, block for a write on...”

    View Slide

  33. libev
    for an
    event loop

    View Slide

  34. PUTTING IT TOGETHER

    View Slide

  35. import gevent
    from gevent import monkey; monkey.patch_all()
    def downloader():
    pool = []
    for user in users:
    g = gevent.Greenlet(download_photos,
    user)
    g.start()
    pool.append(g)
    gevent.joinall(pool)

    View Slide

  36. for user in users:
    g = gevent.Greenlet(download_photos,user)

    View Slide

  37. g = gevent.Greenlet(download_photos,user)
    class Greenlet(greenlet):
    def __init__(self, run=None,...):
    greenlet.__init__(self, None, get_hub())
    g.parent = Hub

    View Slide

  38. class Greenlet(greenlet):
    greenlet.__init__(self, None, get_hub())
    g.parent = Hub
    class Hub(greenlet):
    def __init__(self):
    greenlet.__init__(self)
    self.loop = ...

    View Slide

  39. Greenlet()
    a greenlet —
    to run download_photos()
    the event loop —
    i.e. the Hub
    .parent

    View Slide

  40. for user in users:
    g = gevent.Greenlet(download_photos,user)
    g.start()

    View Slide

  41. self.parent.loop.run_callback(self.switch)
    g.start()
    Hub
    pre_block_watcher

    View Slide

  42. while True:
    block for I/O
    ...
    call all pre_block_watchers = g.switch
    loop.run()

    View Slide

  43. .start()
    “Hey loop,
    This coroutine is ready to run.
    Run it before you block...”

    View Slide

  44. for user in users:
    g = gevent.Greenlet(download_photos,user)
    g.start()
    pool.append(g)
    gevent.joinall(pool)

    View Slide

  45. gevent.joinall()
    g.join()
    result = self.parent.switch()
    Hub
    class Hub(greenlet):
    def run(self):
    while True:
    self.loop.run()

    View Slide

  46. .join() runs the loop

    View Slide

  47. .join()
    = loop.run() while True:
    ...
    call pre_block_watchers
    = g.switch
    download_photos()

    View Slide

  48. HUB

    View Slide

  49. loop.run() g.switch()
    download_photos()
    network I/O

    View Slide

  50. import gevent
    from gevent import monkey; monkey.patch_all()
    def downloader():
    ...

    View Slide

  51. socket
    gevent.socket
    import

    View Slide

  52. fd = make_nonblocking(socket_fd)
    loop.io_watch(fd, write, callback_fn)
    loop.run()
    g.switch
    Hub.switch
    create:
    send:

    View Slide

  53. network I/O

    View Slide

  54. for user in users:
    g = gevent.Greenlet(download_photos,user)
    g.start()
    pre_block_watchers = [g1.switch,
    g2.switch]

    View Slide

  55. gevent.joinall()
    g1.switch()
    loop.run()
    call pre_block_watchers = [g1.switch, ...]
    Hub
    download_photos(user1)
    network_request
    g1
    io_watchers = [g1.switch]
    Hub.switch()

    View Slide

  56. g2.switch()
    loop.run()
    call pre_block_watchers = [g2.switch]
    Hub
    download_photos(user2)
    network_request
    g2
    io_watchers = [g2.switch,
    g1.switch]
    Hub.switch()

    View Slide

  57. block for I/O
    call pending io_watchers = [g1.switch]
    Hub
    resumes download_photos(user1)
    g1
    g1.switch()
    loop.run()
    call pre_block_watchers = []
    ...

    View Slide

  58. WRAP-UP

    View Slide

  59. minuses
    no parallelism
    non-cooperative code will block the entire process: 

    C-extensions —> use pure Python libraries

    compute-bound greenlets —> use gevent.sleep(0)

    —> use greenlet blocking detection

    monkey-patch may have confusing implications

    order of imports matters

    View Slide

  60. …but
    excellent for workloads that are: 

    I/O bound, highly concurrent —> 20-30k concurrent
    connections!

    Used at “web scale” at:

    Pinterest, Facebook, Mixpanel, PayPal, Disqus, Nylas…

    View Slide

  61. greenlet
    libev
    Hub
    monkeypatching

    View Slide

  62. KAVYA
    @KAVYA719

    View Slide

  63. greenlet
    libev
    Hub
    monkeypatching

    View Slide