PyParallel - Trent Nelson

PyParallel – PyGotham 2014 Trent Nelson Managing Director,
New York Continuum Analytics @ContinuumIO, @trentnelson [email protected] http://speakerdeck.com/trent/

About Me •  Systems Software Engineer •  Core Python
Committer •  Apache/Subversion Committer •  Founded Snakebite @ Michigan State University o  AIX RS/6000 o  SGI IRIX/MIPS o  Alpha/Tru64 o  Solaris/SPARC o  HP-‐UX/IA64 o  FreeBSD, NetBSD, OpenBSD, DragonFlyBSD •  Background is UNIX •  Made peace with Windows when XP came out

http://www.snakebite.net

What is PyParallel? •  Set of modifications to CPython interpreter
•  Allows multiple interpreter threads to run in parallel without incurring any additional performance penalties •  Solves the GIL problem without removing the GIL o  Because the problem isn’t the GIL. o  (The problem is that I want to optimally exploit my hardware as efficiently as possible with a reasonable amount of development effort.) •  Started as a proof of concept •  I’m now convinced it’s essential for Python to stay competitive for the next 20+ years and beyond •  That time is going to pass anyway, we may as well have a plan in place

“Describe what developing for each console you’ve developed for is
like.” •  Like all the best quotations, this one comes from reddit: o  http://www.reddit.com/r/gamedev/comments/xddlp/ describe_what_developing_for_each_console_youve/

PS2: You are handed a 10-inch thick stack of manuals
written by Japanese hardware engineers. The first time you read the stack, nothing makes any sense at all. The second time your read the stack, the 3rd book makes a bit more sense because of what you learned in the 8th book. The machine has 10 different processors (IOP, SPU1&2, MDEC, R5900, VU0&1, GIF, VIF, GS) and 6 different memory spaces (IOP, SPU, CPU, GS, VU0&1) that all work in completely different ways. There are so many amazing things you can do, but everything requires backflips through invisible blades of segfault. Getting the first triangle to appear on the screen took some teams over a month because it involved routing commands through R5900->VIF->VU1->GIF->GS oddities with no feedback about what your were doing wrong until you got every step along the way to be correct. If you were willing to do twist your game to fit the machine, you could get awesome results. There was a debugger for the main CPU (R5900). It worked pretty OK. For the rest of the processors, you just had to write code without bugs. “everything requires backﬂips through invisible blades of segfault” -‐ PyParallel: The Early Days. [*]: still applicable, 17th August, 2014, 2:18pm

Motivation behind PyParallel •  What problem was I trying to
solve? •  Wasn’t happy with the status quo o  Parallel options (for compute-‐bound, data parallelism problems): •  GIL prevents simultaneous multithreading •  ….so you have to rely on separate Python processes if you want to exploit more than one core o  Concurrency options (for I/O-‐bound or I/O-‐driven, task parallelism problems): •  One thread per client, blocking I/O •  Single-‐thread, event loop, multiplexing system call (select/poll/epoll/kqueue)

What if I’m I/O-bound and compute-bound? •  Contemporary enterprise problems:
o  Computationally-‐intensive (compute-‐bound) work against TBs/PBs of data (I/O-‐bound) o  Serving tens of thousands of network clients (I/O-‐driven) with non-‐trivial computation required per request (compute-‐bound) o  Serving fewer clients, but providing ultra-‐low latency or maximum throughput to those you do serve (HFT, remote array servers, etc) •  Contemporary data center hardware : o  128 cores, 512GB RAM o  Quad 10Gb Ethernet NICs o  SSDs & Fusion-‐IO style storage -‐> 500k-‐800k+ IOPS from a single device o  2016: 128Gb Fibre Channel (4x32Gb) -‐> 25.6GB/s throughput

Real Problems, Powerful Hardware •  I want to solve my
problems as optimally as my hardware will allow •  Optimal hardware use necessitates things like: o  One active thread per core •  Any more results in unnecessary context switches o  No unnecessary duplication of shared/common data in memory o  Ability to saturate the bandwidth of my I/O devices •  And I want to do it all in Python •  ....yet still be competitive against C/C++ where it matters

Choose Your Own Adventure

What do you want to see next? •  Segfaul^WLive Demo!
•  Benchmarks! •  Moor slides! o  I have 74 more in this deck. o  And 154 in my other one! •  Q&A! •  Exclamation points!

First, some definitions…

Concurrency versus Parallelism •  Concurrency: o  Making progress on
multiple things at the same time •  Task A doesn’t need to complete before you can start work on task B o  Typically used to describe I/O-‐bound or I/O-‐driven systems, especially network-‐ oriented socket servers •  Parallelism: o  Making progress on one thing in multiple places at the same time •  Task A is split into 8 parts, each part runs on a separate core o  Typically used in compute-‐bound contexts •  Map/reduce, aggregation, “embarrassingly parallelizable” data etc

So for a given time frame T (1us, 1ms, 1s
etc)… •  Concurrency: how many things did I do? o  Things = units of work (e.g. servicing network clients) o  Performance benchmark: •  How fast was everyone served? (i.e. request latency) •  And were they served fairly? •  Parallelism: how many things did I do them on? o  Things = hardware units (e.g. CPU cores, GPU cores) o  Performance benchmark : •  How much did I get done? •  How long did it take?

Concurrent Python

Concurrent Python •  I/O-‐driven client/server systems (socket-‐oriented) •  There
are some pretty decent Python libraries out there geared toward concurrency o  Twisted, Tornado, Tulip/asyncio (3.x), etc •  Common themes: o  Set all your sockets and ﬁle descriptors to non-‐blocking o  Write your Python in an event-‐oriented fashion •  def data_received(self, data): … •  Hollywood Principle: don’t call us, we’ll call you o  Appearance of asynchronous I/O achieved via single-‐threaded event loop with multiplexing system call •  Biggest drawback: o  Inherently limited to a single-‐core o  Thus, inadequate for problems that are both concurrent and computationally bound

Parallel Python

Coarse-grained versus fine-grained parallelism •  Coarse-‐grained (task parallelism) o 
Batch processing daily ﬁles o  Data mining distinct segments/chunks/partitions o  Process A runs on X data set, independent to process B running on Y data set •  Fine-‐grained (data parallelism) o  Map/reduce, divide & conquer, aggregation etc o  Common theme: sequential execution, fan out to parallel against shared data set, collapse back down to sequential

Coarse-grained versus fine-grained parallelism •  Coarse-‐grained (multiple processes): o 
Typically adequate: using multiple processes that don’t need to talk to each other (or if they do, don’t need to talk often) o  Depending on shared state, could still beneﬁt if implemented via threads instead of processes •  Better cache usage, less duplication of identical memory structures, less overhead overall •  Fine-‐grained (multiple threads): o  Typically optimal: using multiple threads within the same address space o  IPC overhead can severely impact net performance when having to use processes instead of threads

Python landscape for fine-grained parallelism •  Python’s GIL (global interpreter
lock) prevents more than one Python interpreter thread running at a given time •  If you want to use multiple threads within the same Python process, you have to come up with a way to avoid the GIL o  (Fine-‐grained parallelism =~ multithreading) •  Today, this relies on: o  Extension modules or libraries o  Bypassing CPython interpreter entirely and compiling to machine code

Python landscape for fine-grained parallelism •  Options today: o 
Extension modules or libraries: •  Accelerate/NumbaPro (GPU, Multicore) •  OpenCV •  Intel MKL Libraries o  Bypassing the CPython interpreter entirely by compiling Python to machine code: •  Numba with threading •  Cython with OpenMP •  Options tomorrow (Python 4.x): o  PyParallel? •  Demonstrates it is possible to have multiple threads running CPython interpreter threads in parallel without incurring a performance overhead o  PyPy-‐STM?

Python Landscape for Coarse-grained Parallelism •  Rich ecosystem depending on
your problem: o  https://wiki.python.org/moin/ParallelProcessing o  batchlib, Celery, Deap, disco, dispy, DistributedPython, exec_proxy, execnet, IPython Parallel, jug, mpi4py, PaPy, pyMPI, pypar, pypvm, Pyro, rthread, SCOOP, seppo, superspy •  Python stdlib options: o  multiprocessing (since 2.6) o  concurrent.futures (introduced in 3.2, backported to 2.7) •  Common throughout: o  Separate Python processes to achieve parallel execution

Python & the GIL •  No talk on parallelism and
concurrency in Python would be complete without mentioning the GIL (global interpreter lock) •  What is it? o  A lock that ensures only one thread can execute CPython innards at any given time o  Create 100 threading.Thread() instances… o  ....and only one will run at any given time •  So why even support threads if they can’t run in parallel? •  Because they can be useful for blocking, I/O-‐bound problems o  Ironically, they facilitate concurrency in Python, not parallelism •  But they won’t solve your compute-‐bound problem any faster •  Nor will you ever exploit more than one core

Exploiting multiple cores for compute-bound problems…

import multiprocessing •  Added in Python 2.6 (2008) • 
Similar interface to threading module •  Uses separate Python processes behind the scenes

from multiprocessing import pros •  It works •  It’s
in the stdlib •  It’s adequate for coarse-‐grained parallelism •  It’ll use all my cores if I’m compute-‐bound

from multiprocessing import cons •  Often sub-‐optimal depending on the
problem •  Inadequate for ﬁne-‐grained parallelism •  Inadequate for I/O-‐driven problems (speciﬁcally socket servers) •  Overhead of extra processes •  No shared memory out of the box (I’d have to set it up myself) •  Kinda’ quirky on Windows •  The examples in the docs are trivialized and don’t really map to real world problems o  https://docs.python.org/2/library/multiprocessing.html o  i.e. x*x for x in [1, 2, 3, 4]

from multiprocessing import subtleties •  Recap: contemporary data center hardware:
128 cores, 512GB RAM •  I want to use multiprocessing to solve my compute-‐bound problem •  And I want to optimally use my hardware; idle cores are useless •  So how big should my multiprocessor pool be? How many processes? •  128, right?

128 cores = 128 processes? •  Works ﬁne… until you
need to do I/O •  And you’re probably going to be doing blocking I/O o  i.e. synchronous read/write calls o  Non-‐blocking I/O is poorly suited to multiprocessing as you’d need to have per-‐process event-‐loops doing the syscall multiplexing dance •  The problem is, as soon as you block, that’s one less process able to do useful work •  Can quickly become pathological: o  Start a pool of 64 processors (for 64 cores) o  Few minutes later: only 20-‐25 active •  Is the solution to create a bigger pool?

128 cores = 132 processes? 194? 256? •  Simply increasing
the number of processes isn’t the solution •  Results in pathological behavior on the opposite end of the spectrum o  Instead of idle cores, you have over-‐scheduled cores o  Signiﬁcant overhead incurred by context switching •  Cache pollution, TLB contention o  You can visibly see this with basic tools like top: 20% user, 80% sys •  Neither approach is optimal today: o  processes <= ncpu: idle cores o  processes > ncpu: over-‐scheduled cores

What do we really need? •  We want to solve
our problems optimally on our powerful hardware •  Avoid the sub-‐optimal: o  Blocking I/O o  Idleness (under-‐scheduled) o  Context switching (over-‐scheduled) o  Wasteful memory use •  Encourage the optimal: o  One active thread per core o  Eﬃcient memory use

I want one active thread per core •  This is
a subtlety complex problem •  Intrinsically dependent upon I/O facilities provided by the OS: o  Readiness-‐oriented or completion-‐oriented? o  Thread-‐agnostic I/O or thread-‐speciﬁc I/O? •  Plus one critical element: o  Disassociating the work (computation) from the worker (thread) o  Associating a desired concurrency level (i.e. use all my cores) with the work •  This allows the kernel to make intelligent thread dispatching decisions o  Ensures only one active thread per core o  No over-‐scheduling or unnecessary context switches Blocking I/O

</exposition> (Gratuitous re-‐use of slides from a previous presentation.)

The problem is that no-one is solving the actual problem.

When it’s actually quite simple…

The Problem •  I have lots to do

The Desired Solution •  Getting the most out of my
hardware… •  ….from a proportional amount of development time

Getting the most out of my hardware… •  The target
should always be 100% core use, 100% I/O saturation, whichever comes first •  Why? •  Because I want to finish the job as fast as the hardware will allow •  Or serve the most clients with the least amount of hardware •  With sensible amounts of development effort o  Python has always been fantastic for this o  But not so great for getting the most out of my hardware

Until Now!

•  IOCPs can be thought of as FIFO queues
•  I/O manager pushes completion packets asynchronously •  Threads pop completions oﬀ and process results: I/O Completion Ports do { s = GQCS(i); process(s); } while (1); do { s = GQCS(i); process(s); } while (1); do { s = GQCS(i); process(s); } while (1); do { s = GQCS(i); process(s); } while (1); GQCS = GetQueuedCompletionStatus() Completion Packet IOCP I/O Manager NIC IRP

IOCP and Concurrency •  Set I/O completion port’s concurrency to
number of CPUs/cores (2) •  Create double the number of threads (4) •  An active thread does something that blocks (i.e. ﬁle I/O) do { s = GQCS(i); process(s); } while (1); do { s = GQCS(i); process(s); } while (1); do { s = GQCS(i); process(s); } while (1); do { s = GQCS(i); process(s); } while (1); IOCP concurrency=2

number of CPUs/cores (2) •  Create double the number of threads (4) •  An active thread does something that blocks (i.e. ﬁle I/O) •  Windows can detect that the active thread count (1) has dropped below max concurrency (2) and that there are still outstanding packets in the completion queue do { s = GQCS(i); process(s); } while (1); do { s = GQCS(i); process(s); } while (1); do { s = GQCS(i); process(s); } while (1); do { s = GQCS(i); process(s); } while (1); IOCP concurrency=2

number of CPUs/cores (2) •  Create double the number of threads (4) •  An active thread does something that blocks (i.e. ﬁle I/O) •  Windows can detect that the active thread count (1) has dropped below max concurrency (2) and that there are still outstanding packets in the completion queue •  ....and schedules another thread to run do { s = GQCS(i); process(s); } while (1); do { s = GQCS(i); process(s); } while (1); do { s = GQCS(i); process(s); } while (1); do { s = GQCS(i); process(s); } while (1); IOCP concurrency=2

Windows and PyParallel •  The Windows concurrency and synchronization primitives
and approach to asynchronous I/ O is very well suited to what I wanted to do with PyParallel •  Vista introduced new thread pool APIs •  Tightly integrated into IOCP/overlapped ecosystem •  Greatly reduces the amount of scaﬀolding code I needed to write to prototype the concept void PxSocketClient_Callback(); CreateThreadpoolIo(.., &PxSocketClient_Callback) .. StartThreadpoolIo(..) AcceptEx(..)/WSASend(..)/WSARecv(..) •  That’s it. When the async I/O op completes, your callback gets invoked •  Windows manages everything: optimal thread pool size, NUMA-‐cognizant dispatching •  Didn’t need to create a single thread, no mutexes, none of the normal headaches that come with multithreading

Post-PyParallel •  I now have the Python glue to optimally
exploit my hardware •  But it’s still Python… •  ….and Python can be kinda’ slow •  Especially when doing computationally-‐intensive work o  Especially especially when doing numerically-‐oriented computation •  Enter… Numba!

Numba & JIT’ing via @decorators Numba

Image Filtering

Image Filtering: ~1500x Speedup

NumbaPro GPU Support

Black Scholes Speedup NumbaPro

Final Thoughts…

What do I want in Python 4.x? •  Native cross-‐platform
PyParallel support •  @jit hooks introduced in the stdlib •  ….and an API for multiple downstream jitters to hook into o  CPython broadcasts the AST/bytecode being executed in ceval to jitters o  Multiple jitters running in separate threads o  CPython: “Hey, can you optimize this chunk of Python? Let me know.” o  Next time it encounters that chunk, it can check for optimal versions •  Could provide a viable way of hooking in Numba, PyPy, Pythran, ShedSkin, etc, whilst still staying within the conﬁnes of CPython

Thanks! @ContinuumIO, @trentnelson [email protected] http://speakerdeck.com/trent/

(Backup slides)

I/O on Contemporary Windows Kernels (Vista+) •  Fantastic support for
asynchronous I/O •  Threads have been ﬁrst class citizens since day 1 (not bolted on as an afterthought) •  Designed to be programmed in a completion-‐oriented, multi-‐threaded fashion •  Overlapped I/O + IOCP + threads + kernel synchronization primitives = excellent combo for achieving high performance

I/O Completion Ports •  The best way to grok IOCP
is to understand the problem it was designed to solve: o Facilitate writing high-‐performance network/ﬁle servers (http, database, ﬁle server) o Extract maximum performance from multi-‐processor/multi-‐core hardware o (Which necessitates optimal resource usage)

IOCP: Goals •  Extract maximum performance through parallelism o 
Thread running on every core servicing a client request o  Upon ﬁnishing a client request, immediately processes the next request if one is waiting o  Never block o  (And if you do block, handle it as optimally as possible) •  Optimal resource usage o  One active thread per core

On not blocking... •  UNIX approach: o  Set ﬁle
descriptor to non-‐blocking o  Try read or write data o  Get EAGAIN instead of blocking o  Try again later •  Windows approach o  Create an overlapped I/O structure o  Issue a read or write, passing the overlapped structure and completion port info o  Call returns immediately o  Read/write done asynchronously by I/O manager o  Optional completion packet queued to the completion port a) on error, b) on completion. o  Thread waiting on completion port de-‐queues completion packet and processes request

On not blocking... •  UNIX approach: o  Is this
ready to write yet yet? o  No? How about now? o  Still no? o  Now? o  Yes!? Really? Ok, write it! o  Hi! Me again. Anything to read? o  No? o  How about now? •  Windows approach: o  Here, do this. Let me know when it’s done. Readiness-oriented Completion-oriented (reactor pattern) (proactor pattern)

On not blocking... •  Windows provides an asynchronous/overlapped way to
do just about everything •  Basically, if it could block, there’s a way to do it asynchronously in Windows •  WSASend and WSARecv •  AcceptEx() vs accept() •  ConnectEx() vs connect() •  DisconnectEx() vs close() •  GetAddrinfoEx() vs getaddrinfo() (Windows 8+) •  (And that’s just for sockets; all device I/O can be done asynchronously)

Thread-agnostic I/O with IOCP •  Secret sauce behind asynchronous I/O
on Windows •  IOCPs allow IRP completion (copying data from nonpaged kernel memory back to user’s buﬀer) to be deferred to a thread-‐agnostic queue •  Any thread can wait on this queue (completion port) via GetQueuedCompletionStatus() •  IRP completion done just before that call returns •  Allows I/O manager to rapidly queue IRP completions •  ....and waiting threads to instantly dequeue and process

•  IOCPs can be thought of as FIFO queues
•  I/O manager pushes completion packets asynchronously •  Threads pop completions oﬀ and process results: IOCP and Concurrency do { s = GQCS(i); process(s); } while (1); do { s = GQCS(i); process(s); } while (1); do { s = GQCS(i); process(s); } while (1); do { s = GQCS(i); process(s); } while (1); GQCS = GetQueuedCompletionStatus() Completion Packet IOCP I/O Manager NIC IRP

IOCP and Concurrency •  Remember IOCP design goals: o 
Maximize performance o  Optimize resource usage •  Optimal number of active threads running per core: 1 •  Optimal number of total threads running: 1 * ncpu •  Windows can’t control how many threads you create and then have waiting against the completion port •  But it can control when and how many threads get awoken •  ….via the IOCP’s maximum concurrency value •  (Speciﬁed when you create the IOCP)

number of CPUs/cores (2) •  Create double the number of threads (4) •  An active thread does something that blocks (i.e. ﬁle I/O) do { s = GQCS(i); process(s); } while (1); do { s = GQCS(i); process(s); } while (1); do { s = GQCS(i); process(s); } while (1); do { s = GQCS(i); process(s); } while (1); IOCP concurrency=2

number of CPUs/cores (2) •  Create double the number of threads (4) •  An active thread does something that blocks (i.e. ﬁle I/O) •  Windows can detect that the active thread count (1) has dropped below max concurrency (2) and that there are still outstanding packets in the completion queue do { s = GQCS(i); process(s); } while (1); do { s = GQCS(i); process(s); } while (1); do { s = GQCS(i); process(s); } while (1); do { s = GQCS(i); process(s); } while (1); IOCP concurrency=2

number of CPUs/cores (2) •  Create double the number of threads (4) •  An active thread does something that blocks (i.e. ﬁle I/O) •  Windows can detect that the active thread count (1) has dropped below max concurrency (2) and that there are still outstanding packets in the completion queue •  ....and schedules another thread to run do { s = GQCS(i); process(s); } while (1); do { s = GQCS(i); process(s); } while (1); do { s = GQCS(i); process(s); } while (1); do { s = GQCS(i); process(s); } while (1); IOCP concurrency=2

Removing the GIL (Without needing to remove the GIL.)

So how does it work? •  First, how it doesn’t
work: o  No GIL removal •  This was previously tried and rejected •  Required fine-grained locking throughout the interpreter •  Mutexes are expensive •  Single-threaded execution significantly slower o  Not using PyPy’s approach via Software Transactional Memory (STM) •  Huge overhead •  64 threads trying to write to something, 1 wins, continues •  63 keep trying •  63 bottles of beer on the wall… •  Doesn’t support “free threading” o  Existing code using threading.Thread won’t magically run on all cores o  You need to use the new async APIs

PyParallel’s Approach •  Don’t touch the GIL o  It’s
great, serves a very useful purpose •  Instead, intercept all thread-‐sensitive calls: o  Reference counting (Py_(INCREF|DECREF|CLEAR)) o  Memory management (PyMem_(Malloc|Free), PyObject_(INIT|NEW)) o  Free lists o  Static C globals o  Interned strings •  If we’re the main thread, do what we normally do •  However, if we’re a parallel thread, do a thread-‐safe alternative

Main thread or Parallel Thread? •  “If we’re a parallel
thread, do X, if not, do Y” o  X = thread-‐safe alternative o  Y = what we normally do •  “If we’re a parallel thread” o  Thread-‐sensitive calls are ubiquitous o  But we want to have a negligible performance impact o  So the challenge is how quickly can we detect if we’re a parallel thread o  The quicker we can detect it, the less overhead incurred

The Py_PXCTX macro “Are we running in a parallel context?”
#define Py_PXCTX (Py_MainThreadId != _Py_get_current_thread_id()) •  What’s so special about _Py_get_current_thread_id()? o  On Windows, you could use GetCurrentThreadId() o  On POSIX, pthread_self() •  Unnecessary overhead (this macro will be everywhere) •  Is there a quicker way? •  Can we determine if we’re running in a parallel context without needing a function call?

Windows Solution: Interrogate the TEB #ifdef WITH_INTRINSICS #
ifdef MS_WINDOWS # include <intrin.h> # if defined(MS_WIN64) # pragma intrinsic(__readgsdword) # define _Py_get_current_process_id() (__readgsdword(0x40)) # define _Py_get_current_thread_id() (__readgsdword(0x48)) # elif defined(MS_WIN32) # pragma intrinsic(__readfsdword) # define _Py_get_current_process_id() __readfsdword(0x20) # define _Py_get_current_thread_id() __readfsdword(0x24)

Py_PXCTX Example -‐#define _Py_ForgetReference(op) _Py_INC_TPFREES(op) +#define _Py_ForgetReference(op)
\ + do { \ + if (Py_PXCTX) \ + _Px_ForgetReference(op); \ + else \ + _Py_INC_TPFREES(op); \ + } while (0) + +#endif /* WITH_PARALLEL */ •  Py_PXCTX == (Py_MainThreadId == __readfsdword(0x48)) •  Overhead reduced to a couple more instructions and an extra branch (cost of which can be eliminated by branch prediction) •  That’s basically free compared to STM or ﬁne-‐grained locking

PyParallel Advantages •  Initial proﬁling results: 0.01% overhead incurred by
Py_PXCTX for normal single-‐threaded code o  GIL removal: 40% overhead o  PyPy’s STM: “200-‐500% slower” •  Only touches a relatively small amount of code o  No need for intrusive surgery like re-‐writing a thread-‐safe bucket memory allocator or garbage collector •  Keeps GIL semantics o  Important for legacy code o  3rd party libraries, C extension code •  Code executing in parallel context has full visibility to “main thread objects” (in a read-‐only capacity, thus no need for locks)

PyParallel In Action •  Things to note with the chargen
demo coming up: o  One python_d.exe process o  Constant memory use o  CPU use proportional to concurrent client count (1 client = 25% CPU use) o  Every 10,000 sends, a status message is printed •  Depicts dynamically switching from synchronous sends to async sends •  Illustrates awareness of active I/O hogs •  Environment: o  Macbook Pro, 8 core i7 2.2GHz, 8GB RAM o  1-‐5 netcat instances on OS X o  Windows 7 instance running in Parallels, 4 cores, 3GB

1 Chargen (99/25%/67%) Num. Processes CPU% Mem%

2 Chargen (99/54%/67%)

3 Chargen (99/77%/67%)

4 Chargen (99/99%/68%)

5 Chargen?! (99/99%/67%)

Thanks! Follow us on Twitter for more PyParallel announcements!
@ContinuumIO @trentnelson http://continuum.io/

PyParallel - Trent Nelson

PyParallel - Trent Nelson

More Decks by PyGotham 2014

Other Decks in Programming

Featured

Transcript