Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Medhat Gayed: Concurrent Programming using multiprocessing

Medhat Gayed: Concurrent Programming using multiprocessing

= = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = =
Medhat Gayed:
Concurrent Programming using multiprocessing
= = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = =
@ Kiwi PyCon 2013 - Saturday, 07 Sep 2013 - Track 2
http://nz.pycon.org/

**Audience level**

Intermediate

**Description**

How to use python's built in multiprocessing module to speed up the processing of large amounts of data.

**Abstract**

Will take a brief tour of Python's built in multiprocessing module and how it can be used to solve real life programming problems. Will present a demo on how to speed up processing of large amounts of data using multiprocessing.

**YouTube**

http://www.youtube.com/watch?v=CObboy8XzaM

New Zealand Python User Group

September 07, 2013
Tweet

More Decks by New Zealand Python User Group

Other Decks in Programming

Transcript

  1. multiprocessing • multiprocessing is a package that supports spawning processes

    using an API similar to the threading module. • Side-stepping the Global Interpreter Lock (GIL) by using subprocesses instead of threads.
  2. Global Interpreter Lock (GIL) • A global lock held by

    the interpreter to avoid sharing code that is not thread-safe with other threads. • One GIL for each interpreter process. • Only the thread that has acquired the GIL may operate on Python objects or call Python/C API functions making the object model safe against concurrent access. • Applications can use separate processes to achieve full parallelism, as each process has its own interpreter and in turn has its own GIL.
  3. multiprocessing • Allows the programmer to fully leverage multiple processors

    on a given machine. • Runs on both Unix and Windows.
  4. The Process class • Processes are spawned by creating a

    Process object and then calling its start() method. • Follows the API of threading.Thread
  5. Exchanging objects using Pipes • Pipe() returns a pair of

    connection objects connected by a pipe. • Duplex by default.
  6. Sharing state • When doing concurrent programming it is best

    to avoid using shared state. • This is particularly true when using multiple processes. • However, if you really do need to use some shared data then multiprocessing provides a couple of ways of doing so.
  7. Sharing state using Shared memory • Data can be stored

    in a shared memory map using Value or Array
  8. Sharing state using Server process • A manager object returned

    by Manager() controls a server process and holds Python objects and allows other processes to manipulate them. • More flexible than shared memory objects because they can support arbitrary object types. • However, slower than shared memory.
  9. multiprocessing threading Runs multiple processes Runs in a single process

    Each process has own interpreter instance All threads run in same interpreter instance Each process has own GIL Threads share same GIL Suited for CPU bound processes Suited for I/O bound processes
  10. Conclusion • multiprocessing is quick to setup and has a

    simple API. • multiprocessing is useful for quick scripts when you don't want the overhead of setting up a queue server. • multiprocessing is a good choice if you have multiple CPUs on your machine. • multiprocessing is more suited than threading for CPU bound processes. • multiprocessing achieves full parallelism which is not possible with threads because of the GIL.