much faster, but are more numerous • Many problems in analy3cs are easily parallelizable • Wri3ng parallel code will oGen allow you to get done in 1/nth the 3me • Amdahl’s Law: • Python has some barriers to paralleliza3on, but there are simple workarounds There are many op3ons for high-‐performance parallel compu3ng Ø Cluster Compu,ng? Ø Hadoop? Ø Distributed Processing? Ø GPGPUs? Let’s start simple, how to get mul,ple cores on one machine into the ac,on
1 independent for loop 2 .map ops on dataset 3 integra3on 4 Monte-‐Carlo methods 5 Some ML problems “Inherently Serial” (Diﬃcult or impossible to run in parallel) Example: numerical PDE “Somewhat Parallelizable” (Some communica3on needed) Example: sor3ng Parallel algorithms can be classiﬁed by data transfer required between processes -‐ this can be done via message passing or shared memory
access code in python interpreter at a ,me • Mul3ple threads will automa3cally switch oﬀ at standard interval • GIL appears in Cython; some other distros like Jython and PyPy do not have this limita3on
for spli^ng program into mul3ple threads • However, for CPU-‐intensive tasks... ....there will not be any speedup from mul3threading alone • GIL s3ll in eﬀect • So what good is mul3threading anyways? • CPU-‐bound vs I/O bound: threading useful in lacer but not former What you want What you’re gonna get
of python 2.6 • launchs mul3ple processes • processes include separate interpreters -‐ and therefore separate GILs • each process operates on a separate copy of memory from 3me of launch • similar syntax to threading • beware, processes have signiﬁcant overhead in some OS, namely Windows GIL 1 GIL 2
v2.7.6 First, let’s set up a CPU-‐bound task: def isprime(n):! for i in range(2,int(n**(0.5))+1):! if n%i==0:! return False! return True! ! def prime(Nth,q=None): # prints Nth prime! n_found = 0! i = 0! while n_found<Nth:! i+=1! n_found = n_found+int(isprime(i))! if q:! q.put(i) # send to Queue object if set! return i!
pool of workers # converting into a pool process! t4 = time.time()! pool = mp.Pool(processes=4)! result = pool.map(prime,range(start,start+4))! print result! print 'Pool test took',time.time() - t4,'seconds'! Output: Serial test took 3.68699979782 seconds! Multithreaded test took 5.64900016785 seconds! Multiprocessing test took 1.29299998283 seconds! [224729, 224737, 224743, 224759]! Pool test took 1.31299996376 seconds! Notes: • Tasks should be roughly equal size -‐ adjust manually if possible • map() will block un3l job complete, can use map_async() to return result immediately • mul3ple args will need to be combined into a single list, unwrap with *
Queue() and Pipe() ! • support for sharing objects in memory using Value() and Array()! • "premature op,miza,on is the root of all evil”. Discuss. In Conclusion: • Use threading if you have a poten3ally blocking I/O procedure, like a download or SQL query • Use mul3processing.Process() and mul3processing.Pool() to run CPU-‐intensive tasks in parallel References: hcp://sebas3anraschka.com/Ar3cles/2014_mul3processing_intro.html#An-‐introduc3on-‐to-‐parallel-‐ programming-‐using-‐Python%27s-‐mul3processing-‐module hcp://www.quantstart.com/ar3cles/Parallelising-‐Python-‐with-‐Threading-‐and-‐Mul3processing hcp://www.dabeaz.com/python/GIL.pdf hcp://calcul.math.cnrs.fr/Documents/Ecoles/2010/cours_mul3processing.pdf hcp://pymotw.com/2/mul3processing/communica3on.html#process-‐pools