Modern Python Concurrency

Modern Python Concurrency

What is concurrency? Is Python good at it? How to scale up from single-node concurrency to multi-node. What does Python's new concurrent.futures stdlib library give us? What does asyncio give us? How do we evaluate concurrent performance?

36988ea18935a2bd1278a97c6ba03707?s=128

Andrew Montalenti

September 03, 2014
Tweet

Transcript

  1. Modern Python concurrency toward futures and asyncio Andrew Montalenti Doug

    Turnbull Python Charlottesville Meetup September 2, 2014
  2. @softwaredoug @amontalenti

  3. Doug’s part

  4. What’s Concurrency? • Our system’s ability to do more than

    one thing at once I.E.: or simultaneous: web requests database transactions requests to drive requests to databases requests web services user input
  5. • run your app with less • simplify • save

    green: $ From 100 web requests per second per server... ...to 10,000 web requests per second per server Getting it right is a Big Deal(R)
  6. Concurrency -- How? • Nature of the work? • My

    OS just does this, right? • Is Python good at this?
  7. Nature of the work? CPU Bound -- “calculating digits of

    pi” IO Bound Work -- “take request, query db, send response”
  8. What stops us from doing work? • contention -- fighting

    over a resource (like a lock) • blocking -- stopping execution to wait (stops me, lets everyone else go)
  9. My OS just does this right? • processes: sandboxed memory

    space • threads: runs within process, shares memory with other threads • These get me where I need to be?
  10. Is Python good at this? • It’s half good GIL!!!!

    Who is GIL? Global Interpreter Lock • One thread runs in Python interpreter at once • Threads tend to keep GIL until done or IO But… • This slows us down. Contention! • To teh codez!
  11. Python CPU Bound Threads from queue import Queue from threading

    import Thread inQ = Queue() outQ = Queue() def worker(): while True: l = inQ.get() sumL = sum(l) outQ.put( sumL ) numWorkers=10 ts = [Thread(target=worker) for i in xrange(numWorkers)] for t in ts: t.start() Get work to do CPU Bound Work Work output Main thread carries on Create Threads
  12. Python IO Worker Threads ... inQ = Queue() outQ =

    Queue() def worker(): while True: url = inQ.get() resp = requests.get(url) outQ.put( (url, resp.status_code, resp.text) ) numWorkers=10 ts = [Thread(target=worker) for i in xrange(numWorkers)] ... Get work to do Blocking IO Work output
  13. CPU Bound Threads… we like? from queue import Queue from

    threading import Thread inQ = Queue() outQ = Queue() def worker(): while True: l = inQ.get() sumL = sum(l) outQ.put( sumL ) numWorkers=10 ts = [Thread(target=worker) for i in xrange(numWorkers)] ... ☹ CPU Bound Work doesn’t release GIL so... ☹ … no gain from more threads/cores ☹ … only more contention ☺ Code straight-forward ☹ Contention Points?
  14. IO Worker Threads.. we like? … inQ = Queue() outQ

    = Queue() def worker(): while True: url = inQ.get() resp = requests.get(url) outQ.put( (url, resp.status_code, resp.text) ) numWorkers=10 ts = [Thread(target=worker) for i in xrange(numWorkers)] ... ☹ … how many to start? pool? ☹ Contention Points? ☺ Gives up GIL ☹ One blocking IO operation per thread so... ☺Code straight-forward
  15. Improve upon CPU bound? from multiprocessing import Process, Queue def

    worker(inQ, outQ): while True: l = inQ.get() sumL = sum(l) outQ.put( sumL ) inQ = Queue() outQ = Queue() p = Process(target=worker, args=(inQ, outQ)) p.start() Using Processes (and an interprocess Queue) Same code
  16. CPU Bound Processes… we like? def worker(inQ, outQ): while True:

    l = inQ.get() sumL = sum(l) outQ.put( sumL ) numWorkers=10 ts = [Process(target=worker, args=(inQ, outQ)) for i in xrange(numWorkers)] ... ☺Not sharing GIL ☺ No GIL: More Processes means concurrent work ☺ Max out all your cores! ☺ Code straight-forward ☹ Contention? ☺(less scary -- process abstractions) We LIKE!
  17. Processes Rule, Threads Drool Threads Processes Light? Yes ☺ Almost

    As Light ☺ Danger? High -- mutable, shared state. deadlocks. ☹ Lower, stricter communication ☺ Communication Primitives? Mutexes/locks, atomic CPU instructions, thread-safe data structures ☺ OS abstractions, pipes, sockets, shared memory, etc ☺ If they crash... whole program crashes ☹ only process crashes ☺ Control? Extremely High ☺ Moderate, through abstractions GIL? Yes ☹ No ☺
  18. Processes Rule, Threads Drool • Processes: safer choice, sandboxed, coarse-grain

    control • Threads: dangerous choice, very fine-grain control/interaction
  19. Improve on IO bound work? def worker(): while True: handle_ui_input()

    handle_io() def worker1(): while True: resp = handle_io1() update_shared_state(resp) def worker2(): while True: resp = handle_io2() update_shared_state(resp) Blocking Contention -- need to lock and update IO Bound work often looks like:
  20. Other OS primitives help? def event_loop(): while True: whichIsReady =

    select(ui, io1, io2) if whichIsReady == io1: resp = handle_io1(req) if whichIsReady == ui ... Simultaneously block on multiple IO operations No longer need to lock shared state (in one thread)
  21. Non-blocking IO… we like? def event_loop(): schedIo = [...] while

    True: whichIsReady = select(schedIo) whichIsReady.callback() httpReq = HttpReq(‘http://odu.edu’) def callWhenDone(): print httpReq.data httpReq.fetch(callback=callWhenDone) ... ☹ harder to read (Javascript anyone?) ☹ extra “IO” scheduler/event loop ☺concurrent IO not bound to number of threads
  22. Non-blocking IO improvements httpReq = HttpReq(‘http://odu.edu’) def callWhenDone(): print “Fetched

    and stored imgs!” def storeInDb(httpPromise) dbPromise = db.store(httpPromise) return dbPromise promise = httpReq.fetch() imgPromise = parseImgTags(promise) dbPromise = storedInDb(imgPromise) dbPromise.onDone(callback=callWhenDone) Promise/Deferred/Futures Handle to future work Handle to pending IO Chainable with other operations Can still use callbacks
  23. Non-blocking IO improvements httpReq = HttpReq(‘http://odu.edu’) def storeInDb(httpPromise) dbPromise =

    storeInDb(httpPromise) return dbPromise promise = httpReq.fetch() promise.whenComplete(callWhenDone) imgPromise = parseImgTags(promise) dbPromise = storedInDb(imgPromise) myCoroutine.yieldUntil(dbPromise) print “Fetched and stored IMG tags!” Coroutines/cooperative multitasking. I own this thread until I say I’ m done Looks like a blocking call, but in reality yields back to event loop ☺ Readability of blocking IO ☺ Performance of non-blocking async IO
  24. Example: Greenlet/Gevent Greenlet: • coroutine library, greenlet decides when to

    give up control Gevent: • monkey-patch a big event loop into Python, replacing core blocking IO bits
  25. Gevent is magic class Fetcher(object): def __init__(self, fetchUrl): self.fetchUrl =

    fetchUrl self.resp = None def fetch(self): self.resp = requests.get(self.fetchUrl) def fetchMultiple(urls): fetchers = [Fetcher(url) for url in urls] handles = [] for fetcher in fetchers: handles.append(gevent.spawn(fetcher.fetch)) gevent.joinall(handles) Blocking? Depends on your definition of “blocking” Spawn a Gevent worker that calls “fetch” Wait till all done
  26. Other Solutions (aka future lightning talk fodder) Twisted CPython C

    Modules (scipy, your module) Cython (compile to Python -> C) Jython/IronPython (JIT to JVM or .NET CLI) GPUs (CUDA, etc) cluster frameworks (discussed later)
  27. Andrew’s part

  28. Python GIL Why it’s there What it messes up, concurrency-wise

    Failed efforts to remove GIL pypy and pypy-stm
  29. Python concurrency menu CPU-bound work IO-bound work single-node multiprocessing ProcessPoolExecutor

    Twisted (Network) Tornado (HTTP) ThreadPoolExecutor (Generic) gevent (Sockets) asyncore (Sockets) asyncio (Generic) multi-node rq celery IPython.parallel scrapyd? (HTTP)
  30. I love processes. (you should, too)

  31. Choose your own message broker Pure Python tasks Pure Python

    worker infrastructure Advanced message patterns Choose your own message broker Mix Python + Java or other languages Java worker infrastructure Advanced message patterns More complex operationally High availability & linear scalability “Lambda Architecture” Options for multi-node concurrency start here upgrade here Redis as message broker Pure Python tasks Pure Python worker infrastructure Simple message patterns
  32. Python concurrency in clusters • mrjob: Hadoop Streaming (batch) •

    streamparse: Apache Storm (real-time) • parallelize through Python process model • mixed workloads ◦ CPU- and IO-bound • mixed concurrency models are possible ◦ threads within Storm Bolts ◦ process pools within Hadoop Tasks
  33. My instinct: threads = bugs

  34. None
  35. But, sometimes necessary - BatchingBolt - IO-bound Drivers

  36. What is asyncore? • stdlib-included async sockets (like libev) •

    in stdlib since 2000! Comment from the source code in 2000: There are only two ways to have a program on a single processor do "more than one thing at a time". Multi-threaded programming is the simplest and most popular way to do it, but there is another very different technique, that lets you have nearly all the advantages of multi-threading, without actually using multiple threads. it's really only practical if your program is largely I/O bound. If your program is CPU bound, then pre- emptive scheduled threads are probably what you really need. Network servers are rarely CPU-bound, however.
  37. Python async networking comparison to nginx / Node.JS Twisted Tornado

    gevent / gthreads all using their own “reactor” / “event loop”
  38. What is concurrent.futures? • PEP-3148 • new unified API for

    concurrency in Python • in stdlib in Python 3.2+ • backport in 2.7 (pip install futures) • API design like Java Executor Framework • a Future abstraction as a return value • an Executor abstraction for running things
  39. asyncio history • PEP-3153: Async IO Support • PEP-3156: Async

    IO Support “Rebooted” • GvR’s pet project from 2012-2014 • Original implementation called tulip • Released in Python 3.4 as asyncio • PyCon 2013 keynote by GvR focused on it • PEP-380 (yield from) utilized by it
  40. asyncio primitives • a loop that starts and stops •

    callback scheduling ◦ now ◦ time in in the future ◦ repeated / periodic • associate callbacks with file I/O states • offer pluggable I/O multiplexing mechanism ◦ select() ◦ poll(), epoll(), others
  41. asyncio “ooooh ahhhh” moments • introduces @coroutine decorator • uses

    yield from to simplify callback hell • one event loop to rule them all ◦ Twisted and Tornado and gevent in same app! • offers an asyncio.Future ◦ asyncio.Future quacks-like futures.Future ◦ asyncio.wrap_future is an adapter ◦ asyncio.Task is subclass of Future
  42. @dabeaz on generators

  43. asyncio coroutines code explain result = yield from future suspend

    until future done, then return result result = yield from coroutine suspend until coroutine returns a result return expression return a result to another coroutine raise exception raise an exception to another coroutine
  44. @dabeaz on Future and Task unity asyncio and an event

    loop with scheduler threading with a thread pool and executor
  45. What is the gain of “yield from”? (1) So, why

    don’t I like green threads? In a simple program using stackless or gevent, it’s easy enough to say, ‘This is a call that goes to the scheduler -- it uses read() or send() or something. I know that’s a blocking call, I’ll be careful…. I don’t need explicit locking because between points A or B, I just need to make sure I don’t make any other calls to the scheduler.’ However, as code gets longer, it becomes hard to keep track. Sooner or later… - Guido van Rossum
  46. What is the gain of “yield from”? (2) Just trust

    me; this problem happened to me at a young and impressionable age, and I’ve never forgotten it. - Guido van Rossum
  47. The Future() is now() - Tulip project update (Jan 2014)

    by @guidovanrossum - Unyielding (Feb 2014) by @glyph - Generators, The Final Frontier (Apr 2014) by @dabeaz
  48. Concurrency in the real world • Cassandra driver shows different

    models • pip install cassandra-driver • asyncore and libev event loops preferred • twisted and gevent also provided • performance benchmarking is dramatic (ranges from 2k to 18k write ops per sec)
  49. yield from CassandraDemo() • spin up a Cassandra node •

    execute / benchmark naive sync writes • switch to batched futures • switch to callback chaining • try different event loops • switch to pypy • discuss how asyncio could clean this up
  50. bonus round: asyncio crawler • if there’s time, show GvR’s

    example crawler • asyncio, @coroutine, and yield from in a readable Python 3.4 program • crawls 1,300 xkcd.com pages in 7 seconds