Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Modern Python Concurrency

Modern Python Concurrency

What is concurrency? Is Python good at it? How to scale up from single-node concurrency to multi-node. What does Python's new concurrent.futures stdlib library give us? What does asyncio give us? How do we evaluate concurrent performance?

Andrew Montalenti

September 03, 2014
Tweet

More Decks by Andrew Montalenti

Other Decks in Programming

Transcript

  1. Modern Python concurrency
    toward futures and asyncio
    Andrew Montalenti
    Doug Turnbull
    Python Charlottesville Meetup
    September 2, 2014

    View Slide

  2. @softwaredoug
    @amontalenti

    View Slide

  3. Doug’s part

    View Slide

  4. What’s Concurrency?
    ● Our system’s ability to do more than one
    thing at once
    I.E.:
    or simultaneous:
    web requests
    database transactions
    requests to drive
    requests to databases
    requests web services
    user input

    View Slide

  5. ● run your app with less
    ● simplify
    ● save green:
    $
    From 100 web requests per
    second per server...
    ...to 10,000 web requests per
    second per server
    Getting it right is a Big Deal(R)

    View Slide

  6. Concurrency -- How?
    ● Nature of the work?
    ● My OS just does this, right?
    ● Is Python good at this?

    View Slide

  7. Nature of the work?
    CPU Bound -- “calculating digits of pi”
    IO Bound Work -- “take request, query db, send
    response”

    View Slide

  8. What stops us from doing work?
    ● contention -- fighting over a resource (like a
    lock)
    ● blocking -- stopping execution to wait (stops
    me, lets everyone else go)

    View Slide

  9. My OS just does this right?
    ● processes: sandboxed memory space
    ● threads: runs within process, shares memory
    with other threads
    ● These get me where I need to be?

    View Slide

  10. Is Python good at this?
    ● It’s half good
    GIL!!!!
    Who is GIL?
    Global Interpreter Lock
    ● One thread runs in Python interpreter
    at once
    ● Threads tend to keep GIL until done
    or IO
    But…
    ● This slows us down. Contention!
    ● To teh codez!

    View Slide

  11. Python CPU Bound Threads
    from queue import Queue
    from threading import Thread
    inQ = Queue()
    outQ = Queue()
    def worker():
    while True:
    l = inQ.get()
    sumL = sum(l)
    outQ.put( sumL )
    numWorkers=10
    ts = [Thread(target=worker) for
    i in xrange(numWorkers)]
    for t in ts:
    t.start()
    Get work to do
    CPU Bound Work
    Work output
    Main thread carries on
    Create Threads

    View Slide

  12. Python IO Worker Threads
    ...
    inQ = Queue()
    outQ = Queue()
    def worker():
    while True:
    url = inQ.get()
    resp = requests.get(url)
    outQ.put( (url, resp.status_code, resp.text) )
    numWorkers=10
    ts = [Thread(target=worker) for
    i in xrange(numWorkers)]
    ...
    Get work to do
    Blocking IO
    Work output

    View Slide

  13. CPU Bound Threads… we like?
    from queue import Queue
    from threading import Thread
    inQ = Queue()
    outQ = Queue()
    def worker():
    while True:
    l = inQ.get()
    sumL = sum(l)
    outQ.put( sumL )
    numWorkers=10
    ts = [Thread(target=worker) for
    i in xrange(numWorkers)]
    ...
    ☹ CPU Bound Work
    doesn’t release GIL
    so...
    ☹ … no gain from more
    threads/cores
    ☹ … only more contention
    ☺ Code straight-forward
    ☹ Contention Points?

    View Slide

  14. IO Worker Threads.. we like?

    inQ = Queue()
    outQ = Queue()
    def worker():
    while True:
    url = inQ.get()
    resp = requests.get(url)
    outQ.put( (url, resp.status_code, resp.text) )
    numWorkers=10
    ts = [Thread(target=worker) for
    i in xrange(numWorkers)]
    ...
    ☹ … how many to
    start? pool?
    ☹ Contention Points? ☺ Gives up GIL
    ☹ One blocking IO operation per
    thread so...
    ☺Code straight-forward

    View Slide

  15. Improve upon CPU bound?
    from multiprocessing import Process, Queue
    def worker(inQ, outQ):
    while True:
    l = inQ.get()
    sumL = sum(l)
    outQ.put( sumL )
    inQ = Queue()
    outQ = Queue()
    p = Process(target=worker, args=(inQ, outQ))
    p.start()
    Using Processes (and an interprocess
    Queue)
    Same code

    View Slide

  16. CPU Bound Processes… we like?
    def worker(inQ, outQ):
    while True:
    l = inQ.get()
    sumL = sum(l)
    outQ.put( sumL )
    numWorkers=10
    ts = [Process(target=worker, args=(inQ, outQ))
    for i in xrange(numWorkers)]
    ...
    ☺Not sharing GIL
    ☺ No GIL: More Processes
    means concurrent work
    ☺ Max out all your cores!
    ☺ Code straight-forward
    ☹ Contention?
    ☺(less scary -- process abstractions)
    We LIKE!

    View Slide

  17. Processes Rule, Threads Drool
    Threads Processes
    Light? Yes ☺ Almost As Light ☺
    Danger? High -- mutable, shared state.
    deadlocks. ☹
    Lower, stricter
    communication ☺
    Communication
    Primitives?
    Mutexes/locks, atomic CPU
    instructions, thread-safe data
    structures ☺
    OS abstractions, pipes,
    sockets, shared memory,
    etc ☺
    If they crash... whole program crashes ☹ only process crashes ☺
    Control? Extremely High ☺ Moderate, through
    abstractions
    GIL? Yes ☹ No ☺

    View Slide

  18. Processes Rule, Threads Drool
    ● Processes: safer choice, sandboxed,
    coarse-grain control
    ● Threads: dangerous choice, very fine-grain
    control/interaction

    View Slide

  19. Improve on IO bound work?
    def worker():
    while True:
    handle_ui_input()
    handle_io()
    def worker1():
    while True:
    resp = handle_io1()
    update_shared_state(resp)
    def worker2():
    while True:
    resp = handle_io2()
    update_shared_state(resp)
    Blocking
    Contention -- need to lock and
    update
    IO Bound work often looks like:

    View Slide

  20. Other OS primitives help?
    def event_loop():
    while True:
    whichIsReady = select(ui, io1, io2)
    if whichIsReady == io1:
    resp = handle_io1(req)
    if whichIsReady == ui
    ...
    Simultaneously block on
    multiple IO operations
    No longer need to lock
    shared state (in one
    thread)

    View Slide

  21. Non-blocking IO… we like?
    def event_loop():
    schedIo = [...]
    while True:
    whichIsReady = select(schedIo)
    whichIsReady.callback()
    httpReq = HttpReq(‘http://odu.edu’)
    def callWhenDone():
    print httpReq.data
    httpReq.fetch(callback=callWhenDone)
    ...
    ☹ harder to read (Javascript anyone?)
    ☹ extra “IO” scheduler/event loop
    ☺concurrent IO not bound to number of
    threads

    View Slide

  22. Non-blocking IO improvements
    httpReq = HttpReq(‘http://odu.edu’)
    def callWhenDone():
    print “Fetched and stored imgs!”
    def storeInDb(httpPromise)
    dbPromise = db.store(httpPromise)
    return dbPromise
    promise = httpReq.fetch()
    imgPromise = parseImgTags(promise)
    dbPromise = storedInDb(imgPromise)
    dbPromise.onDone(callback=callWhenDone)
    Promise/Deferred/Futures
    Handle to future work
    Handle to pending IO
    Chainable with other
    operations
    Can still use callbacks

    View Slide

  23. Non-blocking IO improvements
    httpReq = HttpReq(‘http://odu.edu’)
    def storeInDb(httpPromise)
    dbPromise = storeInDb(httpPromise)
    return dbPromise
    promise = httpReq.fetch()
    promise.whenComplete(callWhenDone)
    imgPromise = parseImgTags(promise)
    dbPromise = storedInDb(imgPromise)
    myCoroutine.yieldUntil(dbPromise)
    print “Fetched and stored IMG tags!”
    Coroutines/cooperative
    multitasking.
    I own this thread until I say I’
    m done
    Looks like a blocking call,
    but in reality yields back to
    event loop
    ☺ Readability of blocking IO
    ☺ Performance of non-blocking
    async IO

    View Slide

  24. Example: Greenlet/Gevent
    Greenlet:
    ● coroutine library, greenlet decides when to
    give up control
    Gevent:
    ● monkey-patch a big event loop into Python,
    replacing core blocking IO bits

    View Slide

  25. Gevent is magic
    class Fetcher(object):
    def __init__(self, fetchUrl):
    self.fetchUrl = fetchUrl
    self.resp = None
    def fetch(self):
    self.resp = requests.get(self.fetchUrl)
    def fetchMultiple(urls):
    fetchers = [Fetcher(url) for url in urls]
    handles = []
    for fetcher in fetchers:
    handles.append(gevent.spawn(fetcher.fetch))
    gevent.joinall(handles)
    Blocking?
    Depends on your definition of
    “blocking”
    Spawn a Gevent worker
    that calls “fetch”
    Wait till all done

    View Slide

  26. Other Solutions (aka future lightning talk fodder)
    Twisted
    CPython C Modules (scipy, your module)
    Cython (compile to Python -> C)
    Jython/IronPython (JIT to JVM or .NET CLI)
    GPUs (CUDA, etc)
    cluster frameworks (discussed later)

    View Slide

  27. Andrew’s part

    View Slide

  28. Python GIL
    Why it’s there
    What it messes up, concurrency-wise
    Failed efforts to remove GIL
    pypy and pypy-stm

    View Slide

  29. Python concurrency menu
    CPU-bound work IO-bound work
    single-node multiprocessing
    ProcessPoolExecutor
    Twisted (Network)
    Tornado (HTTP)
    ThreadPoolExecutor (Generic)
    gevent (Sockets)
    asyncore (Sockets)
    asyncio (Generic)
    multi-node rq
    celery
    IPython.parallel
    scrapyd? (HTTP)

    View Slide

  30. I love processes.
    (you should, too)

    View Slide

  31. Choose your own message broker
    Pure Python tasks
    Pure Python worker infrastructure
    Advanced message patterns
    Choose your own message broker
    Mix Python + Java or other languages
    Java worker infrastructure
    Advanced message patterns
    More complex operationally
    High availability & linear scalability
    “Lambda Architecture”
    Options for multi-node concurrency
    start here
    upgrade here
    Redis as message broker
    Pure Python tasks
    Pure Python worker infrastructure
    Simple message patterns

    View Slide

  32. Python concurrency in clusters
    ● mrjob: Hadoop Streaming (batch)
    ● streamparse: Apache Storm (real-time)
    ● parallelize through Python process model
    ● mixed workloads
    ○ CPU- and IO-bound
    ● mixed concurrency models are possible
    ○ threads within Storm Bolts
    ○ process pools within Hadoop Tasks

    View Slide

  33. My instinct:
    threads = bugs

    View Slide

  34. View Slide

  35. But, sometimes necessary
    - BatchingBolt
    - IO-bound Drivers

    View Slide

  36. What is asyncore?
    ● stdlib-included async sockets (like libev)
    ● in stdlib since 2000!
    Comment from the source code in 2000:
    There are only two ways to have a program on a single processor do
    "more than one thing at a time". Multi-threaded programming is the
    simplest and most popular way to do it, but there is another very different
    technique, that lets you have nearly all the advantages of multi-threading,
    without actually using multiple threads. it's really only practical if your
    program is largely I/O bound. If your program is CPU bound, then pre-
    emptive scheduled threads are probably what you really need. Network
    servers are rarely CPU-bound, however.

    View Slide

  37. Python async networking
    comparison to nginx / Node.JS
    Twisted
    Tornado
    gevent / gthreads
    all using their own “reactor” / “event loop”

    View Slide

  38. What is concurrent.futures?
    ● PEP-3148
    ● new unified API for concurrency in Python
    ● in stdlib in Python 3.2+
    ● backport in 2.7 (pip install futures)
    ● API design like Java Executor Framework
    ● a Future abstraction as a return value
    ● an Executor abstraction for running things

    View Slide

  39. asyncio history
    ● PEP-3153: Async IO Support
    ● PEP-3156: Async IO Support “Rebooted”
    ● GvR’s pet project from 2012-2014
    ● Original implementation called tulip
    ● Released in Python 3.4 as asyncio
    ● PyCon 2013 keynote by GvR focused on it
    ● PEP-380 (yield from) utilized by it

    View Slide

  40. asyncio primitives
    ● a loop that starts and stops
    ● callback scheduling
    ○ now
    ○ time in in the future
    ○ repeated / periodic
    ● associate callbacks with file I/O states
    ● offer pluggable I/O multiplexing mechanism
    ○ select()
    ○ poll(), epoll(), others

    View Slide

  41. asyncio “ooooh ahhhh” moments
    ● introduces @coroutine decorator
    ● uses yield from to simplify callback hell
    ● one event loop to rule them all
    ○ Twisted and Tornado and gevent in same app!
    ● offers an asyncio.Future
    ○ asyncio.Future quacks-like futures.Future
    ○ asyncio.wrap_future is an adapter
    ○ asyncio.Task is subclass of Future

    View Slide

  42. @dabeaz on generators

    View Slide

  43. asyncio coroutines
    code explain
    result = yield from future suspend until future done, then return result
    result = yield from
    coroutine
    suspend until coroutine returns a result
    return expression return a result to another coroutine
    raise exception raise an exception to another coroutine

    View Slide

  44. @dabeaz on Future and Task unity
    asyncio and an
    event loop with
    scheduler
    threading with
    a thread pool
    and executor

    View Slide

  45. What is the gain of “yield from”? (1)
    So, why don’t I like green threads?
    In a simple program using stackless or gevent, it’s easy enough to say, ‘This is
    a call that goes to the scheduler -- it uses read() or send() or something. I know
    that’s a blocking call, I’ll be careful…. I don’t need explicit locking because
    between points A or B, I just need to make sure I don’t make any other calls to
    the scheduler.’
    However, as code gets longer, it becomes hard to keep track. Sooner or later…
    - Guido van Rossum

    View Slide

  46. What is the gain of “yield from”? (2)
    Just trust me; this problem happened to
    me at a young and impressionable age,
    and I’ve never forgotten it.
    - Guido van Rossum

    View Slide

  47. The Future() is now()
    - Tulip project update (Jan 2014)
    by @guidovanrossum
    - Unyielding (Feb 2014)
    by @glyph
    - Generators, The Final Frontier (Apr 2014)
    by @dabeaz

    View Slide

  48. Concurrency in the real world
    ● Cassandra driver shows different models
    ● pip install cassandra-driver
    ● asyncore and libev event loops preferred
    ● twisted and gevent also provided
    ● performance benchmarking is dramatic
    (ranges from 2k to 18k write ops per sec)

    View Slide

  49. yield from CassandraDemo()
    ● spin up a Cassandra node
    ● execute / benchmark naive sync writes
    ● switch to batched futures
    ● switch to callback chaining
    ● try different event loops
    ● switch to pypy
    ● discuss how asyncio could clean this up

    View Slide

  50. bonus round: asyncio crawler
    ● if there’s time, show GvR’s example crawler
    ● asyncio, @coroutine, and yield from
    in a readable Python 3.4 program
    ● crawls 1,300 xkcd.com pages in 7 seconds

    View Slide