Rudy Gilmore - Parallel processing - PyDSLA meetup - Nov 2014

Intro to Multiprocessing with Python Rudy Gilmore
Data Scien3st, TrueCar Analy3cs Team PyData Meetup, 11/3/14

Code Paralleliza,on •  Modern processors are not becoming
much faster, but are more numerous •  Many problems in analy3cs are easily parallelizable •  Wri3ng parallel code will oGen allow you to get done in 1/nth the 3me •  Amdahl’s Law: •  Python has some barriers to paralleliza3on, but there are simple workarounds There are many op3ons for high-‐performance parallel compu3ng Ø  Cluster Compu,ng? Ø  Hadoop? Ø  Distributed Processing? Ø  GPGPUs? Let’s start simple, how to get mul,ple cores on one machine into the ac,on

“Embarrassingly Parallel” (Processes completely independent) Examples:
1  independent for loop 2  .map ops on dataset 3  integra3on 4  Monte-‐Carlo methods 5  Some ML problems “Inherently Serial” (Diﬃcult or impossible to run in parallel) Example: numerical PDE “Somewhat Parallelizable” (Some communica3on needed) Example: sor3ng Parallel algorithms can be classiﬁed by data transfer required between processes -‐ this can be done via message passing or shared memory

Python’s Global Interpreter Lock (GIL) Only one thread may
access code in python interpreter at a ,me •  Mul3ple threads will automa3cally switch oﬀ at standard interval •  GIL appears in Cython; some other distros like Jython and PyPy do not have this limita3on

Python’s thread and threading modules •  Provide resources
for spli^ng program into mul3ple threads •  However, for CPU-‐intensive tasks... ....there will not be any speedup from mul3threading alone •  GIL s3ll in eﬀect •  So what good is mul3threading anyways? •  CPU-‐bound vs I/O bound: threading useful in lacer but not former What you want What you’re gonna get

mul,processing module •  part of standard lib as
of python 2.6 •  launchs mul3ple processes •  processes include separate interpreters -‐ and therefore separate GILs •  each process operates on a separate copy of memory from 3me of launch •  similar syntax to threading •  beware, processes have signiﬁcant overhead in some OS, namely Windows GIL 1 GIL 2

Some simple examples of threading and mul,processing Running Cpython
v2.7.6 First, let’s set up a CPU-‐bound task: def isprime(n):! for i in range(2,int(n**(0.5))+1):! if n%i==0:! return False! return True! ! def prime(Nth,q=None): # prints Nth prime! n_found = 0! i = 0! while n_found<Nth:! i+=1! n_found = n_found+int(isprime(i))! if q:! q.put(i) # send to Queue object if set! return i!

import time! import threading as th! import multiprocessing as mp!
! start=20000! ! if __name__=='__main__':! t1=time.time() #time serial segment! print prime(start), prime(start+1), prime(start+2), prime(start+3)! print 'Serial test took',time.time() - t1,'seconds'! ! t2 = time.time() #time multithreaded segment! jobs = [th.Thread(target=prime, args=(start,q))\! ,th.Thread(target=prime, args=(start+1,q))\! ,th.Thread(target=prime, args=(start+2,q))\! ,th.Thread(target=prime, args=(start+3,q))]! for j in jobs:! j.start()! for j in jobs:! j.join()! print 'Multithreaded test took',time.time() - t2,'seconds'! ! q = mp.Queue()! t3 = time.time() #time multiprocessing segment! jobs = [mp.Process(target=prime, args=(start,q))\! ,mp.Process(target=prime, args=(start+1,q))\! ,mp.Process(target=prime, args=(start+2,q))\! ,mp.Process(target=prime, args=(start+3,q))]! for j in jobs:! j.start()! for j in jobs:! j.join()! print 'Multiprocessing test took',time.time() - t3,'seconds'!

Output: 224729 224737 224743 224759! Serial test took
3.68699979782 seconds! Multithreaded test took 5.64900016785 seconds! Multiprocessing test took 1.29299998283 seconds!

mul3processing.Pool() provides a map-‐like interface with automa3c paralleliza3on among
pool of workers # converting into a pool process! t4 = time.time()! pool = mp.Pool(processes=4)! result = pool.map(prime,range(start,start+4))! print result! print 'Pool test took',time.time() - t4,'seconds'! Output: Serial test took 3.68699979782 seconds! Multithreaded test took 5.64900016785 seconds! Multiprocessing test took 1.29299998283 seconds! [224729, 224737, 224743, 224759]! Pool test took 1.31299996376 seconds! Notes: •  Tasks should be roughly equal size -‐ adjust manually if possible •  map() will block un3l job complete, can use map_async() to return result immediately •  mul3ple args will need to be combined into a single list, unwrap with *

Further reading: •  mul3processing supports inter-‐process communica3on using
Queue() and Pipe() ! •  support for sharing objects in memory using Value() and Array()! •  "premature op,miza,on is the root of all evil”. Discuss. In Conclusion: •  Use threading if you have a poten3ally blocking I/O procedure, like a download or SQL query •  Use mul3processing.Process() and mul3processing.Pool() to run CPU-‐intensive tasks in parallel References: hcp://sebas3anraschka.com/Ar3cles/2014_mul3processing_intro.html#An-‐introduc3on-‐to-‐parallel-‐ programming-‐using-‐Python%27s-‐mul3processing-‐module hcp://www.quantstart.com/ar3cles/Parallelising-‐Python-‐with-‐Threading-‐and-‐Mul3processing hcp://www.dabeaz.com/python/GIL.pdf hcp://calcul.math.cnrs.fr/Documents/Ecoles/2010/cours_mul3processing.pdf hcp://pymotw.com/2/mul3processing/communica3on.html#process-‐pools

Rudy Gilmore - Parallel processing - PyDSLA mee...

Rudy Gilmore - Parallel processing - PyDSLA meetup - Nov 2014

Data Science LA

More Decks by Data Science LA

Featured

Transcript

Intro to Multiprocessing with Python Rudy Gilmore

Code Paralleliza,on •  Modern processors are not becoming

“Embarrassingly Parallel” (Processes completely independent) Examples:

Python’s Global Interpreter Lock (GIL) Only one thread may

Python’s thread and threading modules •  Provide resources

mul,processing module •  part of standard lib as

Some simple examples of threading and mul,processing Running Cpython

import time! import threading as th! import multiprocessing as mp!

Output: 224729 224737 224743 224759! Serial test took

mul3processing.Pool() provides a map-‐like interface with automa3c paralleliza3on among

Further reading: •  mul3processing supports inter-‐process communica3on using