Managed Runtime Systems: Lecture 09 - Concurrency

Managed Runtime Systems Lecture 09: Concurrency Foivos Zakkak https://foivos.zakkak.net Except
where otherwise noted, this presentation is licensed under the Creative Commons Attribution 4.0 International License. Third party marks and brands are the property of their respective holders.

Concurrency Processors are multi-core Take advantage of it Multiple single-threaded
VMs or One multi-threaded? Managed Runtime Systems 1 of 14 https://foivos.zakkak.net

Multiple Single-threaded VMs Pros ▪ Simplicity ▪ Less code ▪
No data-races Cons ▪ Duplication of overheads (e.g. Class Loading, JIT) ▪ Slower communication (data-transfers) ▪ Slower synchronization Managed Runtime Systems 2 of 14 https://foivos.zakkak.net

One Multi-threaded VM Pros ▪ Fast communication ▪ Better resource
Utilization ▪ Opportunities for better scheduling/memory-locality Cons ▪ Increased complexity ▪ Incompatibility with well-known optimizations ▪ Increased contention due to shared runtime data structures Managed Runtime Systems 3 of 14 https://foivos.zakkak.net

State-of-the-art Multiple multi-threaded VMs VMs still don’t scale well (No
smart-scheduling, GC on large heaps, Profiling, etc.) VMs still depend on shared-memory (very slow otherwise) Managed Runtime Systems 4 of 14 https://foivos.zakkak.net

Design Decisions Atomicity (necessary for efficient concurrent algorithms) Locking (necessary
to provide mutual exclusion) Scheduling (fundamental, especially for fine-grained parallelism) Memory model (defines expected behavior; should be intuitive) Explicit concurrency (Threads, Monitors, Actors, etc.) Implicit concurrency (Auto-parallelization, SIMD HW-acceleration) Managed Runtime Systems 5 of 14 https://foivos.zakkak.net

Atomicity Provided through intrinsics Interpreter invokes native code that translates
to atomic instructions JIT compilers generate atomic instructions When atomic instructions are unavailable, locks are used instead Managed Runtime Systems 6 of 14 https://foivos.zakkak.net

Locking Avoid it if possible (though careful code writing and
code analysis) Embedding locks in objects is inefficient (most objects are not used for locking) Empirically: ▪ Most locks are not contented ▪ Once a thread gets a lock it usually takes ti again later Managed Runtime Systems 7 of 14 https://foivos.zakkak.net

Thin Locks 3-state locking (unlocked, thin, fat) Embed pointer to
lock record and lock state in objects’ headers Thin locking sets the said pointer to a local record (in stack frame) using CAS No synchronization as long as the lock remains uncontented Managed Runtime Systems 8 of 14 https://foivos.zakkak.net

Thin Locks - Contented If the lock is owned by
another thread (create a fat-lock): 1. Create OS mutex and condition 2. Set pointer to fat lock record (containing pointers to the mutex and condition) 3. Wait on the fat lock to be notified Owner thread will observe the fat lock on release and notify the waiters Once a lock has been contented it’s never thin-locked again Managed Runtime Systems 9 of 14 https://foivos.zakkak.net

Biased Locks See https://dl.acm.org/citation.cfm?id=1167496 for more Managed Runtime Systems 10
of 14 https://foivos.zakkak.net

Scheduling Many VMs rely on OS threads both for VM
and application threads Scheduling is delegated to the OS BUT: ▪ OS threads are preemptable (what about safepoints) ▪ Runtime threads may cause big delays when preempted ▪ The VM knows more than the OS about the threads Managed Runtime Systems 11 of 14 https://foivos.zakkak.net

Memory Model See https://speakerdeck.com/zakkak/the-java-memory-model Managed Runtime Systems 12 of 14
https://foivos.zakkak.net

Explicit Concurrency Different models feature different benefits Actors are message
passing based, so a better fit for systems without shared memory Threads and Locks give more flexibility but are hard to get right There are more, e.g., task-based, map-reduce, etc. Managed Runtime Systems 13 of 14 https://foivos.zakkak.net

Implicit Concurrency - Auto parallelization Limited by: 1. hardware support
(HW-acceleration) 2. code analysis 3. register allocation algorithms The use of certain patterns/libraries may help (e.g. vectors, streams, etc.) Managed Runtime Systems 14 of 14 https://foivos.zakkak.net

Managed Runtime Systems: Lecture 09 - Concurrency

Managed Runtime Systems: Lecture 09 - Concurrency

zakkak

More Decks by zakkak

Other Decks in Programming

Featured

Transcript

Managed Runtime Systems Lecture 09: Concurrency Foivos Zakkak https://foivos.zakkak.net Except

Concurrency Processors are multi-core Take advantage of it Multiple single-threaded

Multiple Single-threaded VMs Pros ▪ Simplicity ▪ Less code ▪

One Multi-threaded VM Pros ▪ Fast communication ▪ Better resource

State-of-the-art Multiple multi-threaded VMs VMs still don’t scale well (No

Design Decisions Atomicity (necessary for efficient concurrent algorithms) Locking (necessary

Atomicity Provided through intrinsics Interpreter invokes native code that translates

Locking Avoid it if possible (though careful code writing and

Thin Locks 3-state locking (unlocked, thin, fat) Embed pointer to

Thin Locks - Contented If the lock is owned by

Biased Locks See https://dl.acm.org/citation.cfm?id=1167496 for more Managed Runtime Systems 10

Scheduling Many VMs rely on OS threads both for VM

Memory Model See https://speakerdeck.com/zakkak/the-java-memory-model Managed Runtime Systems 12 of 14

Explicit Concurrency Different models feature different benefits Actors are message

Implicit Concurrency - Auto parallelization Limited by: 1. hardware support