Slide 1

Slide 1 text

Managed Runtime Systems Lecture 11: Advanced Garbage Collection Foivos Zakkak https://foivos.zakkak.net Except where otherwise noted, this presentation is licensed under the Creative Commons Attribution 4.0 International License. Third party marks and brands are the property of their respective holders.

Slide 2

Slide 2 text

GC algorithms’ goals Overall application throughput Short pause times Space: 1. Reclamation accuracy 2. Space overhead of the algorithm Implementation difficulties Managed Runtime Systems 1 of 20 https://foivos.zakkak.net

Slide 3

Slide 3 text

Parallel GC Ɵme (a) Stop-the-world collection, single thread (b) Stop-the-world collection on multiprocessor, single collector thread (c) Stop-the-world parallel collection Figure 14.1: Stop-the-world garbage collection: each bar represents an execu- tion on a single processor. The coloured regions represent different garbage collection cycles. Figure Source: http://gchandbook.org/figures.html Managed Runtime Systems 2 of 20 https://foivos.zakkak.net

Slide 4

Slide 4 text

Things to Consider Is there enough work? Load balancing Synchronization Termination of GC cycle Managed Runtime Systems 3 of 20 https://foivos.zakkak.net

Slide 5

Slide 5 text

Parallel GC Taxonomy Processor-centric y ■ Work-stealing ■ No locality considerations ■ Various sizes of workloads Memory-centric ■ Work on contiguous memory ■ Favor local data ■ Fixed size workloads Managed Runtime Systems 4 of 20 https://foivos.zakkak.net

Slide 6

Slide 6 text

Parallel Marking Atomically(?) acquire object to process ■ Non-atomic acquisition only affects performance on non-moving GCs Push new objects to local pool ■ Deques to the rescue? ■ Push to a global pool when local is full? Work-steal when idle Split large objects? Managed Runtime Systems 5 of 20 https://foivos.zakkak.net

Slide 7

Slide 7 text

Parallel Copying Race on the forwarding pointer (spinning) ■ The first to atomically set it wins ■ If location is not yet known write a special busy value ■ Other threads spin till the final forwarding pointer is written Race on the forwarding pointer (speculative) ■ Speculatively copy object ■ Try to atomically set forwarding pointer ■ In case of failure retract copy ■ In case of immutable objects consider not retracting but replicating Managed Runtime Systems 6 of 20 https://foivos.zakkak.net

Slide 8

Slide 8 text

Parallel Copying Memory locality through dominant-thread tracing Thread stack 2 T0 X Thread stack 1 Y Thread stack 0 Figure 14.4: Dominant-thread tracing. Threads 1 to 3, coloured black, grey and white respectively, have traced a graph of objects. Each object is coloured to indicate the processor to which it will be copied. The first field of each object is its header. Thread T0 was the last to lock object X. Figure Source: http://gchandbook.org/figures.html Managed Runtime Systems 7 of 20 https://foivos.zakkak.net

Slide 9

Slide 9 text

Parallel Copying Breadth-first copying separates parents from children Depth-first copying is expected to yield better locality ■ Requires an auxiliary stack Managed Runtime Systems 8 of 20 https://foivos.zakkak.net

Slide 10

Slide 10 text

Parallel Sweeping Partition the heap in contiguous blocks Managed Runtime Systems 9 of 20 https://foivos.zakkak.net

Slide 11

Slide 11 text

Parallel Compaction 3 2 1 0 regions Heap (before) 3 2 1 0 regions Heap (aŌer) Figure 14.8: Flood et al [2001] divide the heap into one region per thread and alternate the direction in which compacting threads slide live objects (shown in grey). Figure Source: http://gchandbook.org/figures.html Managed Runtime Systems 10 of 20 https://foivos.zakkak.net

Slide 12

Slide 12 text

Parallel Compaction Heap (aŌer) 1 2 3 blocks 0 Heap (before) blocks 0 1 2 3 Figure 14.9: Inter-block compaction. Rather than sliding object by object, Abuaiadh et al [2004] slide only complete blocks: free space within each block is not squeezed out. Figure Source: http://gchandbook.org/figures.html Managed Runtime Systems 11 of 20 https://foivos.zakkak.net

Slide 13

Slide 13 text

Concurrent GC Ɵme (a) Incremental uniprocessor collection (b) Incremental multiprocessor collection (c) Parallel incremental collection (d) Mostly-concurrent collection Figure Source: http://gchandbook.org/figures.html Managed Runtime Systems 12 of 20 https://foivos.zakkak.net

Slide 14

Slide 14 text

Concurrent GC (c) Parallel incremental collection (d) Mostly-concurrent collection (e) Mostly-concurrent incremental collection (f) On-the-fly collection (g) On-the-fly incremental collection Figure 15.1: Incremental and concurrent garbage collection. Each bar rep- resents an execution on a single processor. The coloured regions represent different garbage collection cycles. Figure Source: http://gchandbook.org/figures.html Managed Runtime Systems 13 of 20 https://foivos.zakkak.net

Slide 15

Slide 15 text

Concurrent GC: Issue 1 8 Martin T. Vechev, David F. Bacon, Perry Cheng, and David Grove Y X Z a ROOTS Y X a ROOTS b Z Y X ROOTS b Z Y X ROOTS b Z MUT GC Y X ROOTS b GC MUT D4: Collector incorrectly frees object Z D1: Mutator stores pointer b into scanned object Y D2: Mutator removes pointer a from unscanned object X D3: Collector scans object X Fig. 3. Erroneous collection of live object Z via deletion of direct pointer a from object X. Object Allocation Besides pointer assignments, the mutator can also add objects to the connectivity graph. Similarly to pointer assignments, the allocation interacts with Figure Source: https://link.springer.com/chapter/10.1007%2F11531142_25 Managed Runtime Systems 14 of 20 https://foivos.zakkak.net

Slide 16

Slide 16 text

Concurrent GC: Issue 2 Derivation and Evaluation of Concurrent Collectors 9 Y P Q R c ROOTS P Q c ROOTS e R P Q ROOTS R P Q ROOTS e R S d S d S d S d e MUT MUT GC P Q ROOTS e GC T4: Collector incorrectly frees object S T1: Mutator stores pointer e into scanned object P T2: Mutator removes pointer c from unscanned object Q T3: Collector scans object Q Fig. 4. Erroneous collection of live object S via deletion of pointer c from object Q which transi- tively reaches S through R. Figure Source: https://link.springer.com/chapter/10.1007%2F11531142_25 Managed Runtime Systems 15 of 20 https://foivos.zakkak.net

Slide 17

Slide 17 text

Concurrent GC: Losing a live object 1. The mutator stores a pointer to a white object into a black object 2. All paths from any gray objects to that white object are destroyed Managed Runtime Systems 16 of 20 https://foivos.zakkak.net

Slide 18

Slide 18 text

Concurrent GC: Losing a live object 1. The mutator stores a pointer to a white object into a black object 2. All paths from any gray objects to that white object are destroyed Solution Don’t allow pointers from black objects to white objects? Managed Runtime Systems 16 of 20 https://foivos.zakkak.net

Slide 19

Slide 19 text

Concurrent GC: Mutator’s Coloring White: Mutator’s roots has not been scanned Grey: Mutator’s roots need to be (re-)scanned Black: Mutator’s roots has been scanned Managed Runtime Systems 17 of 20 https://foivos.zakkak.net

Slide 20

Slide 20 text

Concurrent GC: Coloring of new Objects White and grey mutators can allocate white objects Black mutators can only allocate black objects Managed Runtime Systems 18 of 20 https://foivos.zakkak.net

Slide 21

Slide 21 text

Concurrent GC: Allow black-to-white pointers Barrier-based approaches: ■ Change color of white object at assignment ■ Change color of black object to grey ■ Scan object before assignment Snapshot-at-the-beginning: ■ Scan mutators at the beginning of the GC Managed Runtime Systems 19 of 20 https://foivos.zakkak.net

Slide 22

Slide 22 text

For More!!! https://gchandbook.org Managed Runtime Systems 20 of 20 https://foivos.zakkak.net