Modern Python Concurrency

Modern Python concurrency toward futures and asyncio Andrew Montalenti Doug
Turnbull Python Charlottesville Meetup September 2, 2014

@softwaredoug @amontalenti

Doug’s part

What’s Concurrency? • Our system’s ability to do more than
one thing at once I.E.: or simultaneous: web requests database transactions requests to drive requests to databases requests web services user input

• run your app with less • simplify • save
green: $ From 100 web requests per second per server... ...to 10,000 web requests per second per server Getting it right is a Big Deal(R)

Concurrency -- How? • Nature of the work? • My
OS just does this, right? • Is Python good at this?

Nature of the work? CPU Bound -- “calculating digits of
pi” IO Bound Work -- “take request, query db, send response”

What stops us from doing work? • contention -- fighting
over a resource (like a lock) • blocking -- stopping execution to wait (stops me, lets everyone else go)

My OS just does this right? • processes: sandboxed memory
space • threads: runs within process, shares memory with other threads • These get me where I need to be?

Is Python good at this? • It’s half good GIL!!!!
Who is GIL? Global Interpreter Lock • One thread runs in Python interpreter at once • Threads tend to keep GIL until done or IO But… • This slows us down. Contention! • To teh codez!

Python CPU Bound Threads from queue import Queue from threading
import Thread inQ = Queue() outQ = Queue() def worker(): while True: l = inQ.get() sumL = sum(l) outQ.put( sumL ) numWorkers=10 ts = [Thread(target=worker) for i in xrange(numWorkers)] for t in ts: t.start() Get work to do CPU Bound Work Work output Main thread carries on Create Threads

Python IO Worker Threads ... inQ = Queue() outQ =
Queue() def worker(): while True: url = inQ.get() resp = requests.get(url) outQ.put( (url, resp.status_code, resp.text) ) numWorkers=10 ts = [Thread(target=worker) for i in xrange(numWorkers)] ... Get work to do Blocking IO Work output

CPU Bound Threads… we like? from queue import Queue from
threading import Thread inQ = Queue() outQ = Queue() def worker(): while True: l = inQ.get() sumL = sum(l) outQ.put( sumL ) numWorkers=10 ts = [Thread(target=worker) for i in xrange(numWorkers)] ... ☹ CPU Bound Work doesn’t release GIL so... ☹ … no gain from more threads/cores ☹ … only more contention ☺ Code straight-forward ☹ Contention Points?

IO Worker Threads.. we like? … inQ = Queue() outQ
= Queue() def worker(): while True: url = inQ.get() resp = requests.get(url) outQ.put( (url, resp.status_code, resp.text) ) numWorkers=10 ts = [Thread(target=worker) for i in xrange(numWorkers)] ... ☹ … how many to start? pool? ☹ Contention Points? ☺ Gives up GIL ☹ One blocking IO operation per thread so... ☺Code straight-forward

Improve upon CPU bound? from multiprocessing import Process, Queue def
worker(inQ, outQ): while True: l = inQ.get() sumL = sum(l) outQ.put( sumL ) inQ = Queue() outQ = Queue() p = Process(target=worker, args=(inQ, outQ)) p.start() Using Processes (and an interprocess Queue) Same code

CPU Bound Processes… we like? def worker(inQ, outQ): while True:
l = inQ.get() sumL = sum(l) outQ.put( sumL ) numWorkers=10 ts = [Process(target=worker, args=(inQ, outQ)) for i in xrange(numWorkers)] ... ☺Not sharing GIL ☺ No GIL: More Processes means concurrent work ☺ Max out all your cores! ☺ Code straight-forward ☹ Contention? ☺(less scary -- process abstractions) We LIKE!

Processes Rule, Threads Drool Threads Processes Light? Yes ☺ Almost
As Light ☺ Danger? High -- mutable, shared state. deadlocks. ☹ Lower, stricter communication ☺ Communication Primitives? Mutexes/locks, atomic CPU instructions, thread-safe data structures ☺ OS abstractions, pipes, sockets, shared memory, etc ☺ If they crash... whole program crashes ☹ only process crashes ☺ Control? Extremely High ☺ Moderate, through abstractions GIL? Yes ☹ No ☺

Processes Rule, Threads Drool • Processes: safer choice, sandboxed, coarse-grain
control • Threads: dangerous choice, very fine-grain control/interaction

Improve on IO bound work? def worker(): while True: handle_ui_input()
handle_io() def worker1(): while True: resp = handle_io1() update_shared_state(resp) def worker2(): while True: resp = handle_io2() update_shared_state(resp) Blocking Contention -- need to lock and update IO Bound work often looks like:

Other OS primitives help? def event_loop(): while True: whichIsReady =
select(ui, io1, io2) if whichIsReady == io1: resp = handle_io1(req) if whichIsReady == ui ... Simultaneously block on multiple IO operations No longer need to lock shared state (in one thread)

Non-blocking IO… we like? def event_loop(): schedIo = [...] while
True: whichIsReady = select(schedIo) whichIsReady.callback() httpReq = HttpReq(‘http://odu.edu’) def callWhenDone(): print httpReq.data httpReq.fetch(callback=callWhenDone) ... ☹ harder to read (Javascript anyone?) ☹ extra “IO” scheduler/event loop ☺concurrent IO not bound to number of threads

Non-blocking IO improvements httpReq = HttpReq(‘http://odu.edu’) def callWhenDone(): print “Fetched
and stored imgs!” def storeInDb(httpPromise) dbPromise = db.store(httpPromise) return dbPromise promise = httpReq.fetch() imgPromise = parseImgTags(promise) dbPromise = storedInDb(imgPromise) dbPromise.onDone(callback=callWhenDone) Promise/Deferred/Futures Handle to future work Handle to pending IO Chainable with other operations Can still use callbacks

Non-blocking IO improvements httpReq = HttpReq(‘http://odu.edu’) def storeInDb(httpPromise) dbPromise =
storeInDb(httpPromise) return dbPromise promise = httpReq.fetch() promise.whenComplete(callWhenDone) imgPromise = parseImgTags(promise) dbPromise = storedInDb(imgPromise) myCoroutine.yieldUntil(dbPromise) print “Fetched and stored IMG tags!” Coroutines/cooperative multitasking. I own this thread until I say I’ m done Looks like a blocking call, but in reality yields back to event loop ☺ Readability of blocking IO ☺ Performance of non-blocking async IO

Example: Greenlet/Gevent Greenlet: • coroutine library, greenlet decides when to
give up control Gevent: • monkey-patch a big event loop into Python, replacing core blocking IO bits

Gevent is magic class Fetcher(object): def __init__(self, fetchUrl): self.fetchUrl =
fetchUrl self.resp = None def fetch(self): self.resp = requests.get(self.fetchUrl) def fetchMultiple(urls): fetchers = [Fetcher(url) for url in urls] handles = [] for fetcher in fetchers: handles.append(gevent.spawn(fetcher.fetch)) gevent.joinall(handles) Blocking? Depends on your definition of “blocking” Spawn a Gevent worker that calls “fetch” Wait till all done

Other Solutions (aka future lightning talk fodder) Twisted CPython C
Modules (scipy, your module) Cython (compile to Python -> C) Jython/IronPython (JIT to JVM or .NET CLI) GPUs (CUDA, etc) cluster frameworks (discussed later)

Andrew’s part

Python GIL Why it’s there What it messes up, concurrency-wise
Failed efforts to remove GIL pypy and pypy-stm

Python concurrency menu CPU-bound work IO-bound work single-node multiprocessing ProcessPoolExecutor
Twisted (Network) Tornado (HTTP) ThreadPoolExecutor (Generic) gevent (Sockets) asyncore (Sockets) asyncio (Generic) multi-node rq celery IPython.parallel scrapyd? (HTTP)

I love processes. (you should, too)

Choose your own message broker Pure Python tasks Pure Python
worker infrastructure Advanced message patterns Choose your own message broker Mix Python + Java or other languages Java worker infrastructure Advanced message patterns More complex operationally High availability & linear scalability “Lambda Architecture” Options for multi-node concurrency start here upgrade here Redis as message broker Pure Python tasks Pure Python worker infrastructure Simple message patterns

Python concurrency in clusters • mrjob: Hadoop Streaming (batch) •
streamparse: Apache Storm (real-time) • parallelize through Python process model • mixed workloads ◦ CPU- and IO-bound • mixed concurrency models are possible ◦ threads within Storm Bolts ◦ process pools within Hadoop Tasks

My instinct: threads = bugs

But, sometimes necessary - BatchingBolt - IO-bound Drivers

What is asyncore? • stdlib-included async sockets (like libev) •
in stdlib since 2000! Comment from the source code in 2000: There are only two ways to have a program on a single processor do "more than one thing at a time". Multi-threaded programming is the simplest and most popular way to do it, but there is another very different technique, that lets you have nearly all the advantages of multi-threading, without actually using multiple threads. it's really only practical if your program is largely I/O bound. If your program is CPU bound, then pre- emptive scheduled threads are probably what you really need. Network servers are rarely CPU-bound, however.

Python async networking comparison to nginx / Node.JS Twisted Tornado
gevent / gthreads all using their own “reactor” / “event loop”

What is concurrent.futures? • PEP-3148 • new unified API for
concurrency in Python • in stdlib in Python 3.2+ • backport in 2.7 (pip install futures) • API design like Java Executor Framework • a Future abstraction as a return value • an Executor abstraction for running things

asyncio history • PEP-3153: Async IO Support • PEP-3156: Async
IO Support “Rebooted” • GvR’s pet project from 2012-2014 • Original implementation called tulip • Released in Python 3.4 as asyncio • PyCon 2013 keynote by GvR focused on it • PEP-380 (yield from) utilized by it

asyncio primitives • a loop that starts and stops •
callback scheduling ◦ now ◦ time in in the future ◦ repeated / periodic • associate callbacks with file I/O states • offer pluggable I/O multiplexing mechanism ◦ select() ◦ poll(), epoll(), others

asyncio “ooooh ahhhh” moments • introduces @coroutine decorator • uses
yield from to simplify callback hell • one event loop to rule them all ◦ Twisted and Tornado and gevent in same app! • offers an asyncio.Future ◦ asyncio.Future quacks-like futures.Future ◦ asyncio.wrap_future is an adapter ◦ asyncio.Task is subclass of Future

@dabeaz on generators

asyncio coroutines code explain result = yield from future suspend
until future done, then return result result = yield from coroutine suspend until coroutine returns a result return expression return a result to another coroutine raise exception raise an exception to another coroutine

@dabeaz on Future and Task unity asyncio and an event
loop with scheduler threading with a thread pool and executor

What is the gain of “yield from”? (1) So, why
don’t I like green threads? In a simple program using stackless or gevent, it’s easy enough to say, ‘This is a call that goes to the scheduler -- it uses read() or send() or something. I know that’s a blocking call, I’ll be careful…. I don’t need explicit locking because between points A or B, I just need to make sure I don’t make any other calls to the scheduler.’ However, as code gets longer, it becomes hard to keep track. Sooner or later… - Guido van Rossum

What is the gain of “yield from”? (2) Just trust
me; this problem happened to me at a young and impressionable age, and I’ve never forgotten it. - Guido van Rossum

The Future() is now() - Tulip project update (Jan 2014)
by @guidovanrossum - Unyielding (Feb 2014) by @glyph - Generators, The Final Frontier (Apr 2014) by @dabeaz

Concurrency in the real world • Cassandra driver shows different
models • pip install cassandra-driver • asyncore and libev event loops preferred • twisted and gevent also provided • performance benchmarking is dramatic (ranges from 2k to 18k write ops per sec)

yield from CassandraDemo() • spin up a Cassandra node •
execute / benchmark naive sync writes • switch to batched futures • switch to callback chaining • try different event loops • switch to pypy • discuss how asyncio could clean this up

bonus round: asyncio crawler • if there’s time, show GvR’s
example crawler • asyncio, @coroutine, and yield from in a readable Python 3.4 program • crawls 1,300 xkcd.com pages in 7 seconds

Modern Python Concurrency

Modern Python Concurrency

More Decks by Andrew Montalenti

Other Decks in Programming

Featured

Transcript