of data a day! • How do I process this humongous data (huge data)? • Can I become quicker (less time)? • I don’t want to compromise with a lower dataset! Attribution-NonCommercial CC BY-NC
patterns, seismographic data, astronomical analysis etc, deal with huge data- set. A telescope is nothing but a huge camera generating billion bytes of data a day! generating billion bytes of data a day! • Processing of this raw data for analysis is a highly CPU-intensive task • Typically applications are not “well” designed Attribution-NonCommercial CC BY-NC
chunks • Understands trade-off of using parallelism and concurrency • Aware of nuances of the programming • Aware of nuances of the programming language • Design and development of applications utilizing multiple CPU cores in an efficient manner for high performance is THE way to go! Attribution-NonCommercial CC BY-NC
address space, memory; hence are lightweight • Context switching is less expensive • Improved performance and concurrency • Python allows writing multi-threaded applications with threading module. Attribution-NonCommercial CC BY-NC
• Each ‘running’ thread requires exclusive access to data structures in Python interpreter • Global interpreter lock (GIL) provides the • Global interpreter lock (GIL) provides the synchronization (bookkeeping) • GIL is necessary, mainly because CPython's memory management is not thread-safe. Attribution-NonCommercial CC BY-NC
Signals thread2 GIL released waiting for GIL thread1 Suspended Wakeup Running waiting for GIL GIL acquired time check1 thread2 check2 Attribution-NonCommercial CC BY-NC
Communication (Signaling) – Thread wake-up – GIL acquisition • Result – Significant overhead – Thread waits if GIL in unavailable – Threads run sequentially, rather than concurrently Attribution-NonCommercial CC BY-NC
time • Conflicting goals of OS scheduler and Python interpreter • Host OS can schedule threads concurrently on multi-core • GIL battle Attribution-NonCommercial CC BY-NC
thread runs only for 5ms • Less context switching and fewer signals • Multicore perspective: GIL battle eliminated! • Multicore perspective: GIL battle eliminated! • More responsive threads (fair scheduling) • Convoy Effect Attribution-NonCommercial CC BY-NC
– Single Core 150 Execution Time – Dual Core Python v3.2 Execution Time Single Core 55 s Dual Core 65 s Python v2.7 Execution Time Single Core 74 s Dual Core 116 s 0 20 40 60 v2.7 v3.2 Execution Time 0 50 100 v2.7 v3.2 Execution Time 50 52 54 56 58 60 62 64 66 Single Core Dual Core Execution Time – Python v3.2 Execution Time Performance dip still observed in dual cores Attribution-NonCommercial CC BY-NC
in a system • Attractive alternative of multi-threading • True parallelism with processes • True parallelism with processes • Python supports it with the multiprocessing module.. Attribution-NonCommercial CC BY-NC
“multiprocessing” module spawns a new Python interpreter instance for a process. • Each process is independent and GIL is irrelevant. • Allows leveraging multiple cores better than threads. • Allows leveraging multiple cores better than threads. • multiprocessing shares API with threading module. – “threading” => “multiprocessing” – “Thread” => “Process” Attribution-NonCommercial CC BY-NC
• Lock – helps synchronization between processes • Value and Array – Shared memory maps for • Value and Array – Shared memory maps for sharing states between processes • Pool – offloading tasks to pool of worker processes Attribution-NonCommercial CC BY-NC
better, as GIL is avoided. Python v2.7 Single Core Dual Core threading 76 s 114 s 100 120 threading 76 s 114 s multiprocessing 72 s 43 s Cool! 40 % improvement in Execution Time on dual core!! ☺ ☺ ☺ ☺ 0 20 40 60 80 Single Core Dual Core threading multiprocessing Attribution-NonCommercial CC BY-NC
• Need separate address space and resources including FHs, heap including FHs, heap • Context switching is costlier • Inter-process communication is cumbersome to implement (message passing, shared memory) Attribution-NonCommercial CC BY-NC
thread safe; Jython threads are real Java threads • Uses Java GC library and no reference counting Hence, Jython is free of GIL ☺ • It can fully exploit multiple cores, as per our experiments with Jython2.5 experiments with Jython2.5 – Run with two CPU threads in tandem • Experiment shows performance improvement on a multi- core system Jython2.5 Execution time Single core 44 s Dual core 25 s Attribution-NonCommercial CC BY-NC
Jython • Partial or no support for core modules like os, expat and mmap. • Jython-Python inconsistencies • Jython-Python inconsistencies • List of Jython issues at http://bugs.jython.org Attribution-NonCommercial CC BY-NC
• Multiprocessing and Jython…Savior but with caveats…. caveats…. • An intelligent awareness of Python (and its flavors) is helpful in exploiting multi-core opportunity in a better way! • Understand and use ☺ Attribution-NonCommercial CC BY-NC
you for your time and attention ☺ • Please share your feedback/ comments/ suggestions to us at: • [email protected] , http://technobeans.com • [email protected], http://freethreads.wordpress.com Attribution-NonCommercial CC BY-NC
• Thread State and the Global Interpreter Lock, http://docs.python.org/c-api/init.html#threads • Python v3.2.2 and v2.7.2 documentation, http://docs.python.org/ • Python v3.2.2 and v2.7.2 documentation, http://docs.python.org/ • Concurrency and Python, http://drdobbs.com/open- source/206103078?pgno=3 • Concurrency and Jython http://www.jython.org/jythonbook/en/1.0/Concurrency.html • Jython http://www.jython.org/faq3.html Attribution-NonCommercial CC BY-NC
in concurrency can be easily leveraged • Java uses ConcurrentHashMap for implementation of dict and set for better concurrency concurrency • Plug-in small scripts and code written in jython in Java application Attribution-NonCommercial CC BY-NC