Slide 1

Slide 1 text

Kiwi PyCon 2013 Auckland 6/7/8 September Concurrent Programming using multiprocessing Medhat Gayed Python Developer at Yellow NZ

Slide 2

Slide 2 text

multiprocessing ● multiprocessing is a package that supports spawning processes using an API similar to the threading module. ● Side-stepping the Global Interpreter Lock (GIL) by using subprocesses instead of threads.

Slide 3

Slide 3 text

Global Interpreter Lock (GIL) ● A global lock held by the interpreter to avoid sharing code that is not thread-safe with other threads. ● One GIL for each interpreter process. ● Only the thread that has acquired the GIL may operate on Python objects or call Python/C API functions making the object model safe against concurrent access. ● Applications can use separate processes to achieve full parallelism, as each process has its own interpreter and in turn has its own GIL.

Slide 4

Slide 4 text

multiprocessing ● Allows the programmer to fully leverage multiple processors on a given machine. ● Runs on both Unix and Windows.

Slide 5

Slide 5 text

The Process class ● Processes are spawned by creating a Process object and then calling its start() method. ● Follows the API of threading.Thread

Slide 6

Slide 6 text

Exchanging objects using Queues

Slide 7

Slide 7 text

Exchanging objects using Pipes ● Pipe() returns a pair of connection objects connected by a pipe. ● Duplex by default.

Slide 8

Slide 8 text

Sharing state ● When doing concurrent programming it is best to avoid using shared state. ● This is particularly true when using multiple processes. ● However, if you really do need to use some shared data then multiprocessing provides a couple of ways of doing so.

Slide 9

Slide 9 text

Sharing state using Shared memory ● Data can be stored in a shared memory map using Value or Array

Slide 10

Slide 10 text

Sharing state using Server process ● A manager object returned by Manager() controls a server process and holds Python objects and allows other processes to manipulate them. ● More flexible than shared memory objects because they can support arbitrary object types. ● However, slower than shared memory.

Slide 11

Slide 11 text

Sharing state using Server process

Slide 12

Slide 12 text

Synchronization between processes ● multiprocessing contains equivalents of all the synchronization primitives from threading.

Slide 13

Slide 13 text

multiprocessing threading Runs multiple processes Runs in a single process Each process has own interpreter instance All threads run in same interpreter instance Each process has own GIL Threads share same GIL Suited for CPU bound processes Suited for I/O bound processes

Slide 14

Slide 14 text

Conclusion ● multiprocessing is quick to setup and has a simple API. ● multiprocessing is useful for quick scripts when you don't want the overhead of setting up a queue server. ● multiprocessing is a good choice if you have multiple CPUs on your machine. ● multiprocessing is more suited than threading for CPU bound processes. ● multiprocessing achieves full parallelism which is not possible with threads because of the GIL.

Slide 15

Slide 15 text

References ● http://docs.python.org/2/library/multiprocessing.html#module-multiprocessing ● http://docs.python.org/2/library/threading.html#module-threading ● http://docs.python.org/2/library/array.html#module-array ● http://en.wikipedia.org/wiki/Global_Interpreter_Lock ● http://docs.python.org/2/glossary.html#term-global-interpreter-lock ● http://docs.python.org/2/c-api/init.html#thread-state-and-the-global-interpreter-lock