Slide 1

Slide 1 text

Managed Runtime Systems Lecture 09: Concurrency Foivos Zakkak https://foivos.zakkak.net Except where otherwise noted, this presentation is licensed under the Creative Commons Attribution 4.0 International License. Third party marks and brands are the property of their respective holders.

Slide 2

Slide 2 text

Concurrency Processors are multi-core Take advantage of it Multiple single-threaded VMs or One multi-threaded? Managed Runtime Systems 1 of 14 https://foivos.zakkak.net

Slide 3

Slide 3 text

Multiple Single-threaded VMs Pros ■ Simplicity ■ Less code ■ No data-races Cons ■ Duplication of overheads (e.g. Class Loading, JIT) ■ Slower communication (data-transfers) ■ Slower synchronization Managed Runtime Systems 2 of 14 https://foivos.zakkak.net

Slide 4

Slide 4 text

One Multi-threaded VM Pros ■ Fast communication ■ Better resource Utilization ■ Opportunities for better scheduling/memory-locality Cons ■ Increased complexity ■ Incompatibility with well-known optimizations ■ Increased contention due to shared runtime data structures Managed Runtime Systems 3 of 14 https://foivos.zakkak.net

Slide 5

Slide 5 text

State-of-the-art Multiple multi-threaded VMs VMs still don’t scale well (No smart-scheduling, GC on large heaps, Profiling, etc.) VMs still depend on shared-memory (very slow otherwise) Managed Runtime Systems 4 of 14 https://foivos.zakkak.net

Slide 6

Slide 6 text

Design Decisions Atomicity (necessary for efficient concurrent algorithms) Locking (necessary to provide mutual exclusion) Scheduling (fundamental, especially for fine-grained parallelism) Memory model (defines expected behavior; should be intuitive) Explicit concurrency (Threads, Monitors, Actors, etc.) Implicit concurrency (Auto-parallelization, SIMD HW-acceleration) Managed Runtime Systems 5 of 14 https://foivos.zakkak.net

Slide 7

Slide 7 text

Atomicity Provided through intrinsics Interpreter invokes native code that translates to atomic instructions JIT compilers generate atomic instructions When atomic instructions are unavailable, locks are used instead Managed Runtime Systems 6 of 14 https://foivos.zakkak.net

Slide 8

Slide 8 text

Locking Avoid it if possible (though careful code writing and code analysis) Embedding locks in objects is inefficient (most objects are not used for locking) Empirically: ■ Most locks are not contented ■ Once a thread gets a lock it usually takes ti again later Managed Runtime Systems 7 of 14 https://foivos.zakkak.net

Slide 9

Slide 9 text

Thin Locks 3-state locking (unlocked, thin, fat) Embed pointer to lock record and lock state in objects’ headers Thin locking sets the said pointer to a local record (in stack frame) using CAS No synchronization as long as the lock remains uncontented Managed Runtime Systems 8 of 14 https://foivos.zakkak.net

Slide 10

Slide 10 text

Thin Locks - Contented If the lock is owned by another thread (create a fat-lock): 1. Create OS mutex and condition 2. Set pointer to fat lock record (containing pointers to the mutex and condition) 3. Wait on the fat lock to be notified Owner thread will observe the fat lock on release and notify the waiters Once a lock has been contented it’s never thin-locked again Managed Runtime Systems 9 of 14 https://foivos.zakkak.net

Slide 11

Slide 11 text

Biased Locks See https://dl.acm.org/citation.cfm?id=1167496 for more Managed Runtime Systems 10 of 14 https://foivos.zakkak.net

Slide 12

Slide 12 text

Scheduling Many VMs rely on OS threads both for VM and application threads Scheduling is delegated to the OS BUT: ■ OS threads are preemptable (what about safepoints) ■ Runtime threads may cause big delays when preempted ■ The VM knows more than the OS about the threads Managed Runtime Systems 11 of 14 https://foivos.zakkak.net

Slide 13

Slide 13 text

Memory Model See https://speakerdeck.com/zakkak/the-java-memory-model Managed Runtime Systems 12 of 14 https://foivos.zakkak.net

Slide 14

Slide 14 text

Explicit Concurrency Different models feature different benefits Actors are message passing based, so a better fit for systems without shared memory Threads and Locks give more flexibility but are hard to get right There are more, e.g., task-based, map-reduce, etc. Managed Runtime Systems 13 of 14 https://foivos.zakkak.net

Slide 15

Slide 15 text

Implicit Concurrency - Auto parallelization Limited by: 1. hardware support (HW-acceleration) 2. code analysis 3. register allocation algorithms The use of certain patterns/libraries may help (e.g. vectors, streams, etc.) Managed Runtime Systems 14 of 14 https://foivos.zakkak.net