Slide 1

Slide 1 text

Modern Python concurrency toward futures and asyncio Andrew Montalenti Doug Turnbull Python Charlottesville Meetup September 2, 2014

Slide 2

Slide 2 text

@softwaredoug @amontalenti

Slide 3

Slide 3 text

Doug’s part

Slide 4

Slide 4 text

What’s Concurrency? ● Our system’s ability to do more than one thing at once I.E.: or simultaneous: web requests database transactions requests to drive requests to databases requests web services user input

Slide 5

Slide 5 text

● run your app with less ● simplify ● save green: $ From 100 web requests per second per server... ...to 10,000 web requests per second per server Getting it right is a Big Deal(R)

Slide 6

Slide 6 text

Concurrency -- How? ● Nature of the work? ● My OS just does this, right? ● Is Python good at this?

Slide 7

Slide 7 text

Nature of the work? CPU Bound -- “calculating digits of pi” IO Bound Work -- “take request, query db, send response”

Slide 8

Slide 8 text

What stops us from doing work? ● contention -- fighting over a resource (like a lock) ● blocking -- stopping execution to wait (stops me, lets everyone else go)

Slide 9

Slide 9 text

My OS just does this right? ● processes: sandboxed memory space ● threads: runs within process, shares memory with other threads ● These get me where I need to be?

Slide 10

Slide 10 text

Is Python good at this? ● It’s half good GIL!!!! Who is GIL? Global Interpreter Lock ● One thread runs in Python interpreter at once ● Threads tend to keep GIL until done or IO But… ● This slows us down. Contention! ● To teh codez!

Slide 11

Slide 11 text

Python CPU Bound Threads from queue import Queue from threading import Thread inQ = Queue() outQ = Queue() def worker(): while True: l = inQ.get() sumL = sum(l) outQ.put( sumL ) numWorkers=10 ts = [Thread(target=worker) for i in xrange(numWorkers)] for t in ts: t.start() Get work to do CPU Bound Work Work output Main thread carries on Create Threads

Slide 12

Slide 12 text

Python IO Worker Threads ... inQ = Queue() outQ = Queue() def worker(): while True: url = inQ.get() resp = requests.get(url) outQ.put( (url, resp.status_code, resp.text) ) numWorkers=10 ts = [Thread(target=worker) for i in xrange(numWorkers)] ... Get work to do Blocking IO Work output

Slide 13

Slide 13 text

CPU Bound Threads… we like? from queue import Queue from threading import Thread inQ = Queue() outQ = Queue() def worker(): while True: l = inQ.get() sumL = sum(l) outQ.put( sumL ) numWorkers=10 ts = [Thread(target=worker) for i in xrange(numWorkers)] ... ☹ CPU Bound Work doesn’t release GIL so... ☹ … no gain from more threads/cores ☹ … only more contention ☺ Code straight-forward ☹ Contention Points?

Slide 14

Slide 14 text

IO Worker Threads.. we like? … inQ = Queue() outQ = Queue() def worker(): while True: url = inQ.get() resp = requests.get(url) outQ.put( (url, resp.status_code, resp.text) ) numWorkers=10 ts = [Thread(target=worker) for i in xrange(numWorkers)] ... ☹ … how many to start? pool? ☹ Contention Points? ☺ Gives up GIL ☹ One blocking IO operation per thread so... ☺Code straight-forward

Slide 15

Slide 15 text

Improve upon CPU bound? from multiprocessing import Process, Queue def worker(inQ, outQ): while True: l = inQ.get() sumL = sum(l) outQ.put( sumL ) inQ = Queue() outQ = Queue() p = Process(target=worker, args=(inQ, outQ)) p.start() Using Processes (and an interprocess Queue) Same code

Slide 16

Slide 16 text

CPU Bound Processes… we like? def worker(inQ, outQ): while True: l = inQ.get() sumL = sum(l) outQ.put( sumL ) numWorkers=10 ts = [Process(target=worker, args=(inQ, outQ)) for i in xrange(numWorkers)] ... ☺Not sharing GIL ☺ No GIL: More Processes means concurrent work ☺ Max out all your cores! ☺ Code straight-forward ☹ Contention? ☺(less scary -- process abstractions) We LIKE!

Slide 17

Slide 17 text

Processes Rule, Threads Drool Threads Processes Light? Yes ☺ Almost As Light ☺ Danger? High -- mutable, shared state. deadlocks. ☹ Lower, stricter communication ☺ Communication Primitives? Mutexes/locks, atomic CPU instructions, thread-safe data structures ☺ OS abstractions, pipes, sockets, shared memory, etc ☺ If they crash... whole program crashes ☹ only process crashes ☺ Control? Extremely High ☺ Moderate, through abstractions GIL? Yes ☹ No ☺

Slide 18

Slide 18 text

Processes Rule, Threads Drool ● Processes: safer choice, sandboxed, coarse-grain control ● Threads: dangerous choice, very fine-grain control/interaction

Slide 19

Slide 19 text

Improve on IO bound work? def worker(): while True: handle_ui_input() handle_io() def worker1(): while True: resp = handle_io1() update_shared_state(resp) def worker2(): while True: resp = handle_io2() update_shared_state(resp) Blocking Contention -- need to lock and update IO Bound work often looks like:

Slide 20

Slide 20 text

Other OS primitives help? def event_loop(): while True: whichIsReady = select(ui, io1, io2) if whichIsReady == io1: resp = handle_io1(req) if whichIsReady == ui ... Simultaneously block on multiple IO operations No longer need to lock shared state (in one thread)

Slide 21

Slide 21 text

Non-blocking IO… we like? def event_loop(): schedIo = [...] while True: whichIsReady = select(schedIo) whichIsReady.callback() httpReq = HttpReq(‘http://odu.edu’) def callWhenDone(): print httpReq.data httpReq.fetch(callback=callWhenDone) ... ☹ harder to read (Javascript anyone?) ☹ extra “IO” scheduler/event loop ☺concurrent IO not bound to number of threads

Slide 22

Slide 22 text

Non-blocking IO improvements httpReq = HttpReq(‘http://odu.edu’) def callWhenDone(): print “Fetched and stored imgs!” def storeInDb(httpPromise) dbPromise = db.store(httpPromise) return dbPromise promise = httpReq.fetch() imgPromise = parseImgTags(promise) dbPromise = storedInDb(imgPromise) dbPromise.onDone(callback=callWhenDone) Promise/Deferred/Futures Handle to future work Handle to pending IO Chainable with other operations Can still use callbacks

Slide 23

Slide 23 text

Non-blocking IO improvements httpReq = HttpReq(‘http://odu.edu’) def storeInDb(httpPromise) dbPromise = storeInDb(httpPromise) return dbPromise promise = httpReq.fetch() promise.whenComplete(callWhenDone) imgPromise = parseImgTags(promise) dbPromise = storedInDb(imgPromise) myCoroutine.yieldUntil(dbPromise) print “Fetched and stored IMG tags!” Coroutines/cooperative multitasking. I own this thread until I say I’ m done Looks like a blocking call, but in reality yields back to event loop ☺ Readability of blocking IO ☺ Performance of non-blocking async IO

Slide 24

Slide 24 text

Example: Greenlet/Gevent Greenlet: ● coroutine library, greenlet decides when to give up control Gevent: ● monkey-patch a big event loop into Python, replacing core blocking IO bits

Slide 25

Slide 25 text

Gevent is magic class Fetcher(object): def __init__(self, fetchUrl): self.fetchUrl = fetchUrl self.resp = None def fetch(self): self.resp = requests.get(self.fetchUrl) def fetchMultiple(urls): fetchers = [Fetcher(url) for url in urls] handles = [] for fetcher in fetchers: handles.append(gevent.spawn(fetcher.fetch)) gevent.joinall(handles) Blocking? Depends on your definition of “blocking” Spawn a Gevent worker that calls “fetch” Wait till all done

Slide 26

Slide 26 text

Other Solutions (aka future lightning talk fodder) Twisted CPython C Modules (scipy, your module) Cython (compile to Python -> C) Jython/IronPython (JIT to JVM or .NET CLI) GPUs (CUDA, etc) cluster frameworks (discussed later)

Slide 27

Slide 27 text

Andrew’s part

Slide 28

Slide 28 text

Python GIL Why it’s there What it messes up, concurrency-wise Failed efforts to remove GIL pypy and pypy-stm

Slide 29

Slide 29 text

Python concurrency menu CPU-bound work IO-bound work single-node multiprocessing ProcessPoolExecutor Twisted (Network) Tornado (HTTP) ThreadPoolExecutor (Generic) gevent (Sockets) asyncore (Sockets) asyncio (Generic) multi-node rq celery IPython.parallel scrapyd? (HTTP)

Slide 30

Slide 30 text

I love processes. (you should, too)

Slide 31

Slide 31 text

Choose your own message broker Pure Python tasks Pure Python worker infrastructure Advanced message patterns Choose your own message broker Mix Python + Java or other languages Java worker infrastructure Advanced message patterns More complex operationally High availability & linear scalability “Lambda Architecture” Options for multi-node concurrency start here upgrade here Redis as message broker Pure Python tasks Pure Python worker infrastructure Simple message patterns

Slide 32

Slide 32 text

Python concurrency in clusters ● mrjob: Hadoop Streaming (batch) ● streamparse: Apache Storm (real-time) ● parallelize through Python process model ● mixed workloads ○ CPU- and IO-bound ● mixed concurrency models are possible ○ threads within Storm Bolts ○ process pools within Hadoop Tasks

Slide 33

Slide 33 text

My instinct: threads = bugs

Slide 34

Slide 34 text

No content

Slide 35

Slide 35 text

But, sometimes necessary - BatchingBolt - IO-bound Drivers

Slide 36

Slide 36 text

What is asyncore? ● stdlib-included async sockets (like libev) ● in stdlib since 2000! Comment from the source code in 2000: There are only two ways to have a program on a single processor do "more than one thing at a time". Multi-threaded programming is the simplest and most popular way to do it, but there is another very different technique, that lets you have nearly all the advantages of multi-threading, without actually using multiple threads. it's really only practical if your program is largely I/O bound. If your program is CPU bound, then pre- emptive scheduled threads are probably what you really need. Network servers are rarely CPU-bound, however.

Slide 37

Slide 37 text

Python async networking comparison to nginx / Node.JS Twisted Tornado gevent / gthreads all using their own “reactor” / “event loop”

Slide 38

Slide 38 text

What is concurrent.futures? ● PEP-3148 ● new unified API for concurrency in Python ● in stdlib in Python 3.2+ ● backport in 2.7 (pip install futures) ● API design like Java Executor Framework ● a Future abstraction as a return value ● an Executor abstraction for running things

Slide 39

Slide 39 text

asyncio history ● PEP-3153: Async IO Support ● PEP-3156: Async IO Support “Rebooted” ● GvR’s pet project from 2012-2014 ● Original implementation called tulip ● Released in Python 3.4 as asyncio ● PyCon 2013 keynote by GvR focused on it ● PEP-380 (yield from) utilized by it

Slide 40

Slide 40 text

asyncio primitives ● a loop that starts and stops ● callback scheduling ○ now ○ time in in the future ○ repeated / periodic ● associate callbacks with file I/O states ● offer pluggable I/O multiplexing mechanism ○ select() ○ poll(), epoll(), others

Slide 41

Slide 41 text

asyncio “ooooh ahhhh” moments ● introduces @coroutine decorator ● uses yield from to simplify callback hell ● one event loop to rule them all ○ Twisted and Tornado and gevent in same app! ● offers an asyncio.Future ○ asyncio.Future quacks-like futures.Future ○ asyncio.wrap_future is an adapter ○ asyncio.Task is subclass of Future

Slide 42

Slide 42 text

@dabeaz on generators

Slide 43

Slide 43 text

asyncio coroutines code explain result = yield from future suspend until future done, then return result result = yield from coroutine suspend until coroutine returns a result return expression return a result to another coroutine raise exception raise an exception to another coroutine

Slide 44

Slide 44 text

@dabeaz on Future and Task unity asyncio and an event loop with scheduler threading with a thread pool and executor

Slide 45

Slide 45 text

What is the gain of “yield from”? (1) So, why don’t I like green threads? In a simple program using stackless or gevent, it’s easy enough to say, ‘This is a call that goes to the scheduler -- it uses read() or send() or something. I know that’s a blocking call, I’ll be careful…. I don’t need explicit locking because between points A or B, I just need to make sure I don’t make any other calls to the scheduler.’ However, as code gets longer, it becomes hard to keep track. Sooner or later… - Guido van Rossum

Slide 46

Slide 46 text

What is the gain of “yield from”? (2) Just trust me; this problem happened to me at a young and impressionable age, and I’ve never forgotten it. - Guido van Rossum

Slide 47

Slide 47 text

The Future() is now() - Tulip project update (Jan 2014) by @guidovanrossum - Unyielding (Feb 2014) by @glyph - Generators, The Final Frontier (Apr 2014) by @dabeaz

Slide 48

Slide 48 text

Concurrency in the real world ● Cassandra driver shows different models ● pip install cassandra-driver ● asyncore and libev event loops preferred ● twisted and gevent also provided ● performance benchmarking is dramatic (ranges from 2k to 18k write ops per sec)

Slide 49

Slide 49 text

yield from CassandraDemo() ● spin up a Cassandra node ● execute / benchmark naive sync writes ● switch to batched futures ● switch to callback chaining ● try different event loops ● switch to pypy ● discuss how asyncio could clean this up

Slide 50

Slide 50 text

bonus round: asyncio crawler ● if there’s time, show GvR’s example crawler ● asyncio, @coroutine, and yield from in a readable Python 3.4 program ● crawls 1,300 xkcd.com pages in 7 seconds