Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Managed Runtime Systems: Lecture 11 - Advanced Garbage Collection

zakkak
April 24, 2018

Managed Runtime Systems: Lecture 11 - Advanced Garbage Collection

zakkak

April 24, 2018
Tweet

More Decks by zakkak

Other Decks in Programming

Transcript

  1. Managed Runtime Systems Lecture 11: Advanced Garbage Collection Foivos Zakkak

    https://foivos.zakkak.net Except where otherwise noted, this presentation is licensed under the Creative Commons Attribution 4.0 International License. Third party marks and brands are the property of their respective holders.
  2. GC algorithms’ goals Overall application throughput Short pause times Space:

    1. Reclamation accuracy 2. Space overhead of the algorithm Implementation difficulties Managed Runtime Systems 1 of 20 https://foivos.zakkak.net
  3. Parallel GC Ɵme (a) Stop-the-world collection, single thread (b) Stop-the-world

    collection on multiprocessor, single collector thread (c) Stop-the-world parallel collection Figure 14.1: Stop-the-world garbage collection: each bar represents an execu- tion on a single processor. The coloured regions represent different garbage collection cycles. Figure Source: http://gchandbook.org/figures.html Managed Runtime Systems 2 of 20 https://foivos.zakkak.net
  4. Things to Consider Is there enough work? Load balancing Synchronization

    Termination of GC cycle Managed Runtime Systems 3 of 20 https://foivos.zakkak.net
  5. Parallel GC Taxonomy Processor-centric y ▪ Work-stealing ▪ No locality

    considerations ▪ Various sizes of workloads Memory-centric ▪ Work on contiguous memory ▪ Favor local data ▪ Fixed size workloads Managed Runtime Systems 4 of 20 https://foivos.zakkak.net
  6. Parallel Marking Atomically(?) acquire object to process ▪ Non-atomic acquisition

    only affects performance on non-moving GCs Push new objects to local pool ▪ Deques to the rescue? ▪ Push to a global pool when local is full? Work-steal when idle Split large objects? Managed Runtime Systems 5 of 20 https://foivos.zakkak.net
  7. Parallel Copying Race on the forwarding pointer (spinning) ▪ The

    first to atomically set it wins ▪ If location is not yet known write a special busy value ▪ Other threads spin till the final forwarding pointer is written Race on the forwarding pointer (speculative) ▪ Speculatively copy object ▪ Try to atomically set forwarding pointer ▪ In case of failure retract copy ▪ In case of immutable objects consider not retracting but replicating Managed Runtime Systems 6 of 20 https://foivos.zakkak.net
  8. Parallel Copying Memory locality through dominant-thread tracing Thread stack 2

    T0 X Thread stack 1 Y Thread stack 0 Figure 14.4: Dominant-thread tracing. Threads 1 to 3, coloured black, grey and white respectively, have traced a graph of objects. Each object is coloured to indicate the processor to which it will be copied. The first field of each object is its header. Thread T0 was the last to lock object X. Figure Source: http://gchandbook.org/figures.html Managed Runtime Systems 7 of 20 https://foivos.zakkak.net
  9. Parallel Copying Breadth-first copying separates parents from children Depth-first copying

    is expected to yield better locality ▪ Requires an auxiliary stack Managed Runtime Systems 8 of 20 https://foivos.zakkak.net
  10. Parallel Compaction 3 2 1 0 regions Heap (before) 3

    2 1 0 regions Heap (aŌer) Figure 14.8: Flood et al [2001] divide the heap into one region per thread and alternate the direction in which compacting threads slide live objects (shown in grey). Figure Source: http://gchandbook.org/figures.html Managed Runtime Systems 10 of 20 https://foivos.zakkak.net
  11. Parallel Compaction Heap (aŌer) 1 2 3 blocks 0 Heap

    (before) blocks 0 1 2 3 Figure 14.9: Inter-block compaction. Rather than sliding object by object, Abuaiadh et al [2004] slide only complete blocks: free space within each block is not squeezed out. Figure Source: http://gchandbook.org/figures.html Managed Runtime Systems 11 of 20 https://foivos.zakkak.net
  12. Concurrent GC Ɵme (a) Incremental uniprocessor collection (b) Incremental multiprocessor

    collection (c) Parallel incremental collection (d) Mostly-concurrent collection Figure Source: http://gchandbook.org/figures.html Managed Runtime Systems 12 of 20 https://foivos.zakkak.net
  13. Concurrent GC (c) Parallel incremental collection (d) Mostly-concurrent collection (e)

    Mostly-concurrent incremental collection (f) On-the-fly collection (g) On-the-fly incremental collection Figure 15.1: Incremental and concurrent garbage collection. Each bar rep- resents an execution on a single processor. The coloured regions represent different garbage collection cycles. Figure Source: http://gchandbook.org/figures.html Managed Runtime Systems 13 of 20 https://foivos.zakkak.net
  14. Concurrent GC: Issue 1 8 Martin T. Vechev, David F.

    Bacon, Perry Cheng, and David Grove Y X Z a ROOTS Y X a ROOTS b Z Y X ROOTS b Z Y X ROOTS b Z MUT GC Y X ROOTS b GC MUT D4: Collector incorrectly frees object Z D1: Mutator stores pointer b into scanned object Y D2: Mutator removes pointer a from unscanned object X D3: Collector scans object X Fig. 3. Erroneous collection of live object Z via deletion of direct pointer a from object X. Object Allocation Besides pointer assignments, the mutator can also add objects to the connectivity graph. Similarly to pointer assignments, the allocation interacts with Figure Source: https://link.springer.com/chapter/10.1007%2F11531142_25 Managed Runtime Systems 14 of 20 https://foivos.zakkak.net
  15. Concurrent GC: Issue 2 Derivation and Evaluation of Concurrent Collectors

    9 Y P Q R c ROOTS P Q c ROOTS e R P Q ROOTS R P Q ROOTS e R S d S d S d S d e MUT MUT GC P Q ROOTS e GC T4: Collector incorrectly frees object S T1: Mutator stores pointer e into scanned object P T2: Mutator removes pointer c from unscanned object Q T3: Collector scans object Q Fig. 4. Erroneous collection of live object S via deletion of pointer c from object Q which transi- tively reaches S through R. Figure Source: https://link.springer.com/chapter/10.1007%2F11531142_25 Managed Runtime Systems 15 of 20 https://foivos.zakkak.net
  16. Concurrent GC: Losing a live object 1. The mutator stores

    a pointer to a white object into a black object 2. All paths from any gray objects to that white object are destroyed Managed Runtime Systems 16 of 20 https://foivos.zakkak.net
  17. Concurrent GC: Losing a live object 1. The mutator stores

    a pointer to a white object into a black object 2. All paths from any gray objects to that white object are destroyed Solution Don’t allow pointers from black objects to white objects? Managed Runtime Systems 16 of 20 https://foivos.zakkak.net
  18. Concurrent GC: Mutator’s Coloring White: Mutator’s roots has not been

    scanned Grey: Mutator’s roots need to be (re-)scanned Black: Mutator’s roots has been scanned Managed Runtime Systems 17 of 20 https://foivos.zakkak.net
  19. Concurrent GC: Coloring of new Objects White and grey mutators

    can allocate white objects Black mutators can only allocate black objects Managed Runtime Systems 18 of 20 https://foivos.zakkak.net
  20. Concurrent GC: Allow black-to-white pointers Barrier-based approaches: ▪ Change color

    of white object at assignment ▪ Change color of black object to grey ▪ Scan object before assignment Snapshot-at-the-beginning: ▪ Scan mutators at the beginning of the GC Managed Runtime Systems 19 of 20 https://foivos.zakkak.net