Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Multicore OCaml GC

Multicore OCaml GC

In a mostly functional language like OCaml, it is desirable to have each domain (our unit of parallelism) collect its own local garbage independently. Given that OCaml is commonly used for writing latency sensitive code such as trading systems, UIs, networked unikernels, it is also desirable to minimise the stop-the-world phases in the GC. Although obvious, the difficulty is to make this work in the presence of mutations. In this talk, we will present the overall design of Multicore OCaml GC, but also deep dive into a few of the interesting techniques that make it work.

KC Sivaramakrishnan

June 30, 2017
Tweet

More Decks by KC Sivaramakrishnan

Other Decks in Programming

Transcript

  1. Multicore OCaml GC
    KC Sivaramakrishnan, Stephen Dolan
    OCaml Labs
    University of
    Cambridge

    View Slide

  2. Multicore OCaml

    View Slide

  3. • Adds native support for concurrency and parallelism in OCaml
    Multicore OCaml

    View Slide

  4. • Adds native support for concurrency and parallelism in OCaml
    • Fibers for concurrency, Domains for parallelism
    ✦ M fibers over N domains
    ✦ M >>> N
    Multicore OCaml

    View Slide

  5. • Adds native support for concurrency and parallelism in OCaml
    • Fibers for concurrency, Domains for parallelism
    ✦ M fibers over N domains
    ✦ M >>> N
    • This talk
    ✦ Overview of multicore GC with a few deep dives.
    Multicore OCaml

    View Slide

  6. • Adds native support for concurrency and parallelism in OCaml
    • Fibers for concurrency, Domains for parallelism
    ✦ M fibers over N domains
    ✦ M >>> N
    • This talk
    ✦ Overview of multicore GC with a few deep dives.
    Multicore OCaml

    View Slide

  7. Outline
    • Difficult to appreciate GC choices in isolation
    • Begin with a GC for a purely functional language
    ✦ Gradually add mutations, parallelism and concurrency

    View Slide

  8. B
    Purely functional
    stack
    registers heap
    A
    C
    D
    E

    View Slide

  9. B
    Purely functional
    • Stop-the-world mark and sweep
    stack
    registers heap
    A
    C
    D
    E

    View Slide

  10. B
    Purely functional
    • Stop-the-world mark and sweep
    • Tri-color marking
    ✦ States: White (Unmarked), Grey (Marking), Black (Marked)
    ✦ Roots = registers + stack
    stack
    registers heap
    A
    C
    D
    E

    View Slide

  11. B
    Purely functional
    • Stop-the-world mark and sweep
    • Tri-color marking
    ✦ States: White (Unmarked), Grey (Marking), Black (Marked)
    ✦ Roots = registers + stack
    • White —> Grey (mark stack) —> Black
    stack
    registers heap
    A
    C
    B
    D
    E
    B
    A
    mark stack

    View Slide

  12. B
    Purely functional
    • Stop-the-world mark and sweep
    • Tri-color marking
    ✦ States: White (Unmarked), Grey (Marking), Black (Marked)
    ✦ Roots = registers + stack
    • White —> Grey (mark stack) —> Black
    • Mark stack is empty => done
    ✦ White object = garbage
    stack
    registers heap
    A
    C
    B
    D
    E
    A
    mark stack
    B
    D

    View Slide

  13. B
    Purely functional
    stack
    registers heap
    A
    C
    B
    D
    E
    A
    mark stack
    B
    D

    View Slide

  14. B
    Purely functional
    • Pros
    ✦ Simple
    ✦ Can perform the GC incrementally
    ✤ …|—mutator—|—mark—|—mutator—|—mark—|—mutator—|—sweep—|…
    stack
    registers heap
    A
    C
    B
    D
    E
    A
    mark stack
    B
    D

    View Slide

  15. B
    Purely functional
    • Pros
    ✦ Simple
    ✦ Can perform the GC incrementally
    ✤ …|—mutator—|—mark—|—mutator—|—mark—|—mutator—|—sweep—|…
    • Cons
    ✦ Need to maintain free-list of objects => allocations overheads + fragmentation
    stack
    registers heap
    A
    C
    B
    D
    E
    A
    mark stack
    B
    D

    View Slide

  16. Generational GC

    View Slide

  17. Generational GC
    • Generational Hypothesis
    ✦ Young objects are much more likely to die than old objects

    View Slide

  18. Generational GC
    • Generational Hypothesis
    ✦ Young objects are much more likely to die than old objects
    minor heap
    major heap
    stack
    registers

    View Slide

  19. Generational GC
    • Generational Hypothesis
    ✦ Young objects are much more likely to die than old objects
    minor heap
    major heap
    stack
    registers
    frontier

    View Slide

  20. Generational GC
    • Generational Hypothesis
    ✦ Young objects are much more likely to die than old objects
    minor heap
    major heap
    stack
    registers
    frontier
    • Minor heap collected by copying collection
    ✦ Survivors promoted to major heap

    View Slide

  21. Generational GC
    • Generational Hypothesis
    ✦ Young objects are much more likely to die than old objects
    minor heap
    major heap
    stack
    registers
    frontier
    • Minor heap collected by copying collection
    ✦ Survivors promoted to major heap
    • Roots are registers and stack
    ✦ purely functional => no pointers from major to minor

    View Slide

  22. Mutations — Minor GC
    • Old objects might point to young objects
    minor heap
    major heap

    View Slide

  23. Mutations — Minor GC
    • Old objects might point to young objects
    • Must know those pointers for minor GC
    ✦ (Naively) scan the major GC for such pointers
    minor heap
    major heap

    View Slide

  24. Mutations — Minor GC
    • Old objects might point to young objects
    • Must know those pointers for minor GC
    ✦ (Naively) scan the major GC for such pointers
    • Intercept mutations with write barrier
    (* Before r := x *)
    let write_barrier (r, x) =
    if is_major r && is_minor x then
    remembered_set.add r
    minor heap
    major heap

    View Slide

  25. Mutations — Minor GC
    • Old objects might point to young objects
    • Must know those pointers for minor GC
    ✦ (Naively) scan the major GC for such pointers
    • Intercept mutations with write barrier
    (* Before r := x *)
    let write_barrier (r, x) =
    if is_major r && is_minor x then
    remembered_set.add r
    • Remembered set
    ✦ Set of major heap addresses that point to minor heap
    ✦ Used as root for minor collection
    ✦ Cleared after minor collection.
    minor heap
    major heap

    View Slide

  26. Mutations — Major GC
    A
    B
    C

    View Slide

  27. Mutations — Major GC
    A
    B
    C

    View Slide

  28. Mutations — Major GC
    A
    B
    C

    View Slide

  29. Mutations — Major GC
    A
    B
    C
    A

    View Slide

  30. Mutations — Major GC
    A C
    A

    View Slide

  31. Mutations — Major GC
    • Mutations are problematic if both conditions hold
    1. Exists Black —> White
    2. All Grey —> White* —> White paths are deleted
    A C
    A

    View Slide

  32. Mutations — Major GC
    • Mutations are problematic if both conditions hold
    1. Exists Black —> White
    2. All Grey —> White* —> White paths are deleted
    A C
    A
    • Insertion/Dijkstra/Incremental barrier prevents 1

    View Slide

  33. B
    Mutations — Major GC
    • Mutations are problematic if both conditions hold
    1. Exists Black —> White
    2. All Grey —> White* —> White paths are deleted
    A C
    A
    • Insertion/Dijkstra/Incremental barrier prevents 1
    A C

    View Slide

  34. B
    Mutations — Major GC
    • Mutations are problematic if both conditions hold
    1. Exists Black —> White
    2. All Grey —> White* —> White paths are deleted
    A C
    A
    • Insertion/Dijkstra/Incremental barrier prevents 1
    A C

    View Slide

  35. B
    B
    Mutations — Major GC
    • Mutations are problematic if both conditions hold
    1. Exists Black —> White
    2. All Grey —> White* —> White paths are deleted
    A C
    A
    • Insertion/Dijkstra/Incremental barrier prevents 1
    A C

    View Slide

  36. B
    B
    Mutations — Major GC
    • Mutations are problematic if both conditions hold
    1. Exists Black —> White
    2. All Grey —> White* —> White paths are deleted
    A C
    A
    • Insertion/Dijkstra/Incremental barrier prevents 1
    A C
    • Deletion/Yuasa/snapshot-at-beginning prevents 2

    View Slide

  37. B
    B
    Mutations — Major GC
    • Mutations are problematic if both conditions hold
    1. Exists Black —> White
    2. All Grey —> White* —> White paths are deleted
    A C
    A
    • Insertion/Dijkstra/Incremental barrier prevents 1
    A C
    B
    C
    A
    • Deletion/Yuasa/snapshot-at-beginning prevents 2

    View Slide

  38. B
    B
    Mutations — Major GC
    • Mutations are problematic if both conditions hold
    1. Exists Black —> White
    2. All Grey —> White* —> White paths are deleted
    A C
    A
    • Insertion/Dijkstra/Incremental barrier prevents 1
    A C
    B
    C
    A
    • Deletion/Yuasa/snapshot-at-beginning prevents 2

    View Slide

  39. B
    B
    Mutations — Major GC
    • Mutations are problematic if both conditions hold
    1. Exists Black —> White
    2. All Grey —> White* —> White paths are deleted
    A C
    A
    • Insertion/Dijkstra/Incremental barrier prevents 1
    A C
    B
    C
    A
    B
    • Deletion/Yuasa/snapshot-at-beginning prevents 2

    View Slide

  40. B
    B
    Mutations — Major GC
    • Mutations are problematic if both conditions hold
    1. Exists Black —> White
    2. All Grey —> White* —> White paths are deleted
    A C
    A
    • Insertion/Dijkstra/Incremental barrier prevents 1
    A C
    B
    C
    A
    B
    • Deletion/Yuasa/snapshot-at-beginning prevents 2
    (* Before r := x *)
    let write_barrier (r, x) =
    if is_major r && is_minor x then
    remembered_set.add r
    else if is_major r && is_major x then
    mark(!r)

    View Slide

  41. Parallelism — Minor GC
    • Domain.spawn : (unit -> unit) -> unit

    View Slide

  42. Parallelism — Minor GC
    • Domain.spawn : (unit -> unit) -> unit
    • Collect each domain’s young garbage independently?

    View Slide

  43. Parallelism — Minor GC
    • Domain.spawn : (unit -> unit) -> unit
    • Collect each domain’s young garbage independently?
    major heap
    domain n
    minor heap(s)
    domain 0 …

    View Slide

  44. Parallelism — Minor GC
    • Domain.spawn : (unit -> unit) -> unit
    • Collect each domain’s young garbage independently?
    • Invariant: Minor heap objects are only accessed by owning domain
    major heap
    domain n
    minor heap(s)
    domain 0 …

    View Slide

  45. Parallelism — Minor GC
    • Domain.spawn : (unit -> unit) -> unit
    • Collect each domain’s young garbage independently?
    • Invariant: Minor heap objects are only accessed by owning domain
    • Doligez-Leroy POPL’93
    ✦ No pointers between minor heaps
    ✦ No pointers from major to minor heaps
    major heap
    domain n
    minor heap(s)
    domain 0 …

    View Slide

  46. Parallelism — Minor GC
    • Domain.spawn : (unit -> unit) -> unit
    • Collect each domain’s young garbage independently?
    • Invariant: Minor heap objects are only accessed by owning domain
    • Doligez-Leroy POPL’93
    ✦ No pointers between minor heaps
    ✦ No pointers from major to minor heaps
    • Before r := x, if is_major(r) && is_minor(x), then promote(x).
    major heap
    domain n
    minor heap(s)
    domain 0 …

    View Slide

  47. Parallelism — Minor GC
    • Domain.spawn : (unit -> unit) -> unit
    • Collect each domain’s young garbage independently?
    • Invariant: Minor heap objects are only accessed by owning domain
    • Doligez-Leroy POPL’93
    ✦ No pointers between minor heaps
    ✦ No pointers from major to minor heaps
    • Before r := x, if is_major(r) && is_minor(x), then promote(x).
    • Too much promotion. Ex: work-stealing queue
    major heap
    domain n
    minor heap(s)
    domain 0 …

    View Slide

  48. Parallelism — Minor GC
    major heap
    domain n
    minor heap(s)
    domain 0 …

    View Slide

  49. Parallelism — Minor GC
    major heap
    domain n
    minor heap(s)
    • Weaker invariant
    ✦ No pointers between minor heaps
    ✦ Objects in foreign minor heap are not accessed directly
    domain 0 …

    View Slide

  50. Parallelism — Minor GC
    major heap
    domain n
    minor heap(s)
    • Weaker invariant
    ✦ No pointers between minor heaps
    ✦ Objects in foreign minor heap are not accessed directly
    • Read barrier. If the value loaded is
    ✦ integers, object in shared heap or own minor heap => continue
    ✦ object in foreign minor heap => Read fault (Interrupt + promote)
    domain 0 …

    View Slide

  51. Efficient read barrier check

    View Slide

  52. Efficient read barrier check
    • Given x, is x an integer1 or in shared heap2 or own minor heap3

    View Slide

  53. Efficient read barrier check
    • Given x, is x an integer1 or in shared heap2 or own minor heap3
    • Careful VM mapping + bit-twiddling

    View Slide

  54. Efficient read barrier check
    • Given x, is x an integer1 or in shared heap2 or own minor heap3
    • Careful VM mapping + bit-twiddling
    • Example: 16-bit address space, 0xPQRS
    ✦ Minor area 0x4200 — 0x42ff
    ✦ Domain 0 : 0x4220 — 0x422f
    ✦ Domain 1 : 0x4250 — 0x425f
    ✦ Domain 2 : 0x42a0 — 0x42af 0x4200 0x42ff
    0 1 2
    0x4220 0x422f
    0x4250 0x425f
    0x42a0 0x42af

    View Slide

  55. Efficient read barrier check
    • Given x, is x an integer1 or in shared heap2 or own minor heap3
    • Careful VM mapping + bit-twiddling
    • Example: 16-bit address space, 0xPQRS
    ✦ Minor area 0x4200 — 0x42ff
    ✦ Domain 0 : 0x4220 — 0x422f
    ✦ Domain 1 : 0x4250 — 0x425f
    ✦ Domain 2 : 0x42a0 — 0x42af
    • Integer low_bit(S) = 0x1, Minor PQ = 0x42, R determines domain
    0x4200 0x42ff
    0 1 2
    0x4220 0x422f
    0x4250 0x425f
    0x42a0 0x42af

    View Slide

  56. Efficient read barrier check
    • Given x, is x an integer1 or in shared heap2 or own minor heap3
    • Careful VM mapping + bit-twiddling
    • Example: 16-bit address space, 0xPQRS
    ✦ Minor area 0x4200 — 0x42ff
    ✦ Domain 0 : 0x4220 — 0x422f
    ✦ Domain 1 : 0x4250 — 0x425f
    ✦ Domain 2 : 0x42a0 — 0x42af
    • Integer low_bit(S) = 0x1, Minor PQ = 0x42, R determines domain
    • Compare with y, where y lies within domain => allocation pointer!
    ✦ On amd64, allocation pointer is in r15 register
    0x4200 0x42ff
    0 1 2
    0x4220 0x422f
    0x4250 0x425f
    0x42a0 0x42af

    View Slide

  57. Efficient read barrier check
    # %rax holds x (value of interest)
    xor %r15, %rax
    sub 0x0010, %rax
    test 0xff01, %rax
    # Any bit set => ZF not set => not foreign minor

    View Slide

  58. Efficient read barrier check
    # %rax holds x (value of interest)
    xor %r15, %rax
    sub 0x0010, %rax
    test 0xff01, %rax
    # Any bit set => ZF not set => not foreign minor
    # low_bit(%rax) = 1
    xor %r15, %rax
    # low_bit(%rax) = 1
    sub 0x0010, %rax
    # low_bit(%rax) = 1
    test 0xff01, %rax
    # ZF not set
    Integer

    View Slide

  59. Efficient read barrier check
    # %rax holds x (value of interest)
    xor %r15, %rax
    sub 0x0010, %rax
    test 0xff01, %rax
    # Any bit set => ZF not set => not foreign minor
    # low_bit(%rax) = 1
    xor %r15, %rax
    # low_bit(%rax) = 1
    sub 0x0010, %rax
    # low_bit(%rax) = 1
    test 0xff01, %rax
    # ZF not set
    # PQ(%r15) != PQ(%rax)
    xor %r15, %rax
    # PQ(%rax) is non-zero
    sub 0x0010, %rax
    # PQ(%rax) is non-zero
    test 0xff01, %rax
    # ZF not set
    Integer Shared heap

    View Slide

  60. Efficient read barrier check
    # %rax holds x (value of interest)
    xor %r15, %rax
    sub 0x0010, %rax
    test 0xff01, %rax
    # Any bit set => ZF not set => not foreign minor

    View Slide

  61. Efficient read barrier check
    # %rax holds x (value of interest)
    xor %r15, %rax
    sub 0x0010, %rax
    test 0xff01, %rax
    # Any bit set => ZF not set => not foreign minor
    # PQR(%r15) = PQR(%rax)
    xor %r15, %rax
    # PQR(%rax) is zero
    sub 0x0010, %rax
    # PQ(%rax) is non-zero
    test 0xff01, %rax
    # ZF not set
    Own minor heap

    View Slide

  62. Efficient read barrier check
    # %rax holds x (value of interest)
    xor %r15, %rax
    sub 0x0010, %rax
    test 0xff01, %rax
    # Any bit set => ZF not set => not foreign minor
    # PQR(%r15) = PQR(%rax)
    xor %r15, %rax
    # PQR(%rax) is zero
    sub 0x0010, %rax
    # PQ(%rax) is non-zero
    test 0xff01, %rax
    # ZF not set
    Own minor heap
    # PQ(%r15) = PQ(%rax)
    # S(%r15) = S(%rax) = 0
    # R(%r15) != R(%rax)
    xor %r15, %rax
    # R(%rax) is non-zero, rest 0
    sub 0x0010, %rax
    # rest 0
    test 0xff01, %rax
    # ZF set
    Foreign minor heap

    View Slide

  63. Promotion

    View Slide

  64. • How do you promote objects to the major heap on read fault?
    Promotion

    View Slide

  65. • How do you promote objects to the major heap on read fault?
    • Several alternatives
    1. Copy the object to major heap.
    ✤ Mutable objects, Abstract_tag, …
    2. Move the object closure + minor GC.
    ✤ False promotions, latency, …
    3. Move the object closure + scan the minor GC
    ✤ Need to examine all objects on minor GC
    Promotion

    View Slide

  66. • How do you promote objects to the major heap on read fault?
    • Several alternatives
    1. Copy the object to major heap.
    ✤ Mutable objects, Abstract_tag, …
    2. Move the object closure + minor GC.
    ✤ False promotions, latency, …
    3. Move the object closure + scan the minor GC
    ✤ Need to examine all objects on minor GC
    • Hypothesis: most objects promoted on read faults are young.
    ✦ 95% promoted objects among the youngest 5%
    Promotion

    View Slide

  67. • How do you promote objects to the major heap on read fault?
    • Several alternatives
    1. Copy the object to major heap.
    ✤ Mutable objects, Abstract_tag, …
    2. Move the object closure + minor GC.
    ✤ False promotions, latency, …
    3. Move the object closure + scan the minor GC
    ✤ Need to examine all objects on minor GC
    • Hypothesis: most objects promoted on read faults are young.
    ✦ 95% promoted objects among the youngest 5%
    • Combine 2 & 3
    Promotion

    View Slide

  68. Promotion

    View Slide

  69. • If promoted object among youngest x%,
    ✦ move + fix pointers to promoted object
    ❖ Scan roots = registers + current stack + remembered set
    ❖ Younger minor objects
    ❖ Older minor objects referring to younger objects (mutations!)
    Promotion

    View Slide

  70. • If promoted object among youngest x%,
    ✦ move + fix pointers to promoted object
    ❖ Scan roots = registers + current stack + remembered set
    ❖ Younger minor objects
    ❖ Older minor objects referring to younger objects (mutations!)
    Promotion
    (* r := x *)
    let write_barrier (r, x) =
    if is_major r && is_minor x then
    remembered_set.add r
    else if is_major r && is_major x then
    mark(!r)
    else if is_minor r && is_minor x && addr r > addr x then
    promotion_set.add r

    View Slide

  71. • If promoted object among youngest x%,
    ✦ move + fix pointers to promoted object
    ❖ Scan roots = registers + current stack + remembered set
    ❖ Younger minor objects
    ❖ Older minor objects referring to younger objects (mutations!)
    • Otherwise, move + minor GC
    Promotion
    (* r := x *)
    let write_barrier (r, x) =
    if is_major r && is_minor x then
    remembered_set.add r
    else if is_major r && is_major x then
    mark(!r)
    else if is_minor r && is_minor x && addr r > addr x then
    promotion_set.add r

    View Slide

  72. Parallelism — Major GC

    View Slide

  73. Parallelism — Major GC
    • OCaml’s GC is incremental, needs to be concurrent w/ parallelism

    View Slide

  74. Parallelism — Major GC
    • OCaml’s GC is incremental, needs to be concurrent w/ parallelism
    • Design based on VCGC from Inferno project (ISMM’98)

    View Slide

  75. Parallelism — Major GC
    • OCaml’s GC is incremental, needs to be concurrent w/ parallelism
    • Design based on VCGC from Inferno project (ISMM’98)
    ✦ Allows mutator, marker, sweeper threads to concurrently

    View Slide

  76. Parallelism — Major GC
    • OCaml’s GC is incremental, needs to be concurrent w/ parallelism
    • Design based on VCGC from Inferno project (ISMM’98)
    ✦ Allows mutator, marker, sweeper threads to concurrently
    • Multicore OCaml is MCGC

    View Slide

  77. Parallelism — Major GC
    • OCaml’s GC is incremental, needs to be concurrent w/ parallelism
    • Design based on VCGC from Inferno project (ISMM’98)
    ✦ Allows mutator, marker, sweeper threads to concurrently
    • Multicore OCaml is MCGC
    ✦ States Garbage Free
    Unmarked Marked

    View Slide

  78. Parallelism — Major GC
    • OCaml’s GC is incremental, needs to be concurrent w/ parallelism
    • Design based on VCGC from Inferno project (ISMM’98)
    ✦ Allows mutator, marker, sweeper threads to concurrently
    • Multicore OCaml is MCGC
    ✦ States
    ✦ Domains alternate between mutator and gc thread
    Garbage Free
    Unmarked Marked

    View Slide

  79. Parallelism — Major GC
    • OCaml’s GC is incremental, needs to be concurrent w/ parallelism
    • Design based on VCGC from Inferno project (ISMM’98)
    ✦ Allows mutator, marker, sweeper threads to concurrently
    • Multicore OCaml is MCGC
    ✦ States
    ✦ Domains alternate between mutator and gc thread
    ✦ GC thread Garbage Free
    Unmarked Marked
    Garbage Free
    Unmarked Marked

    View Slide

  80. Parallelism — Major GC
    • OCaml’s GC is incremental, needs to be concurrent w/ parallelism
    • Design based on VCGC from Inferno project (ISMM’98)
    ✦ Allows mutator, marker, sweeper threads to concurrently
    • Multicore OCaml is MCGC
    ✦ States
    ✦ Domains alternate between mutator and gc thread
    ✦ GC thread
    ✦ Marking is racy but idempotent
    Garbage Free
    Unmarked Marked
    Garbage Free
    Unmarked Marked

    View Slide

  81. Parallelism — Major GC
    • OCaml’s GC is incremental, needs to be concurrent w/ parallelism
    • Design based on VCGC from Inferno project (ISMM’98)
    ✦ Allows mutator, marker, sweeper threads to concurrently
    • Multicore OCaml is MCGC
    ✦ States
    ✦ Domains alternate between mutator and gc thread
    ✦ GC thread
    ✦ Marking is racy but idempotent
    • Stop-the-world
    Garbage Free
    Unmarked Marked
    Garbage Free
    Unmarked Marked

    View Slide

  82. Parallelism — Major GC
    • OCaml’s GC is incremental, needs to be concurrent w/ parallelism
    • Design based on VCGC from Inferno project (ISMM’98)
    ✦ Allows mutator, marker, sweeper threads to concurrently
    • Multicore OCaml is MCGC
    ✦ States
    ✦ Domains alternate between mutator and gc thread
    ✦ GC thread
    ✦ Marking is racy but idempotent
    • Stop-the-world
    Garbage Free
    Unmarked Marked
    Garbage Free
    Unmarked Marked
    Garbage Free
    Unmarked Marked
    Garbage Free
    Unmarked Marked

    View Slide

  83. • Fibers = stack segment on heap
    Concurrency — Minor GC

    View Slide

  84. • Fibers = stack segment on heap
    Concurrency — Minor GC
    minor heap (domain x)
    major heap
    current stack
    registers
    y
    x
    remembered
    fiber set
    remembered
    set

    View Slide

  85. • Fibers = stack segment on heap
    Concurrency — Minor GC
    minor heap (domain x)
    major heap
    current stack
    registers
    y
    x
    remembered
    fiber set
    remembered
    set
    • Remembered fiber set
    ✦ Set of fibers in major heap that were ran in the current cycle of
    domain x
    ✦ Cleared after minor GC

    View Slide

  86. • Fibers transitively reachable are not promoted automatically
    ✦ Avoids false promotions
    Concurrency — Promotions
    minor heap (domain 0)
    major heap
    r
    x f z

    View Slide

  87. Concurrency — Promotions
    minor heap (domain 0)
    major heap
    r x
    f
    remembered
    set
    z

    View Slide

  88. • Fibers transitively reachable are not promoted automatically
    ✦ Avoids false promotions
    Concurrency — Promotions
    minor heap (domain 0)
    major heap
    r x
    f
    remembered
    set
    z

    View Slide

  89. • Fibers transitively reachable are not promoted automatically
    ✦ Avoids false promotions
    ✦ Promote on continuing foreign fiber
    Concurrency — Promotions
    minor heap (domain 0)
    major heap
    r x
    f
    remembered
    set
    continue f v
    @
    domain 1
    z

    View Slide

  90. • Fibers transitively reachable are not promoted automatically
    ✦ Avoids false promotions
    ✦ Promote on continuing foreign fiber
    Concurrency — Promotions
    minor heap (domain 0)
    major heap
    r x
    f
    remembered
    set
    continue f v
    @
    domain 1
    z

    View Slide

  91. Concurrency — Promotions

    View Slide

  92. • Recall, promotion fast path = move + scan and forward
    ✦ Do not scan remembered fiber set
    ✤ Context switches <<< promotions
    Concurrency — Promotions

    View Slide

  93. • Recall, promotion fast path = move + scan and forward
    ✦ Do not scan remembered fiber set
    ✤ Context switches <<< promotions
    • Scan lazily before context switch
    ✦ Only once per fiber per promotion
    Concurrency — Promotions

    View Slide

  94. Concurrency — Major GC

    View Slide

  95. • (Multicore) OCaml uses deletion barrier
    Concurrency — Major GC

    View Slide

  96. • (Multicore) OCaml uses deletion barrier
    • Fiber stack pop is a deletion
    ✦ Before switching to unmarked fiber, complete marking fiber
    Concurrency — Major GC

    View Slide

  97. • (Multicore) OCaml uses deletion barrier
    • Fiber stack pop is a deletion
    ✦ Before switching to unmarked fiber, complete marking fiber
    • Marking is racy but idempotent
    ✦ Race between mutator (context switch) and gc (marking) unsafe
    Concurrency — Major GC

    View Slide

  98. • (Multicore) OCaml uses deletion barrier
    • Fiber stack pop is a deletion
    ✦ Before switching to unmarked fiber, complete marking fiber
    • Marking is racy but idempotent
    ✦ Race between mutator (context switch) and gc (marking) unsafe
    Concurrency — Major GC
    Unmarked Marked
    Marking
    Fibers

    View Slide

  99. Summary
    • Multicore OCaml GC
    ✦ Optimize for latency
    ✦ Independent minor GCs + mostly-concurrent mark-and-sweep
    Mutations Concurrency Parallelism
    Minor GC rem set rem fiber set local heaps
    Promotions o2y rem set
    lazy
    scanning
    read faults
    Major GC
    deletion
    barrier
    mark &
    switch
    MCGC

    View Slide

  100. Questions?

    View Slide

  101. Backup Slides

    View Slide

  102. Purely functional GC
    stack
    registers heap

    View Slide

  103. Purely functional GC
    stack
    registers heap
    • Stop-the-world mark and sweep

    View Slide

  104. Purely functional GC
    stack
    registers heap
    0x0000 0xffff
    • Stop-the-world mark and sweep

    View Slide

  105. Purely functional GC
    stack
    registers heap
    0x0000 0xffff
    frontier
    • Stop-the-world mark and sweep

    View Slide

  106. Purely functional GC
    stack
    registers heap
    0x0000 0xffff
    frontier
    • Stop-the-world mark and sweep
    • 2-pass mark compact
    ✦ Fast allocations by bumping the frontier

    View Slide

  107. Purely functional GC
    stack
    registers heap
    0x0000 0xffff
    frontier
    • Stop-the-world mark and sweep
    • 2-pass mark compact
    ✦ Fast allocations by bumping the frontier
    • All heap pointers go right

    View Slide

  108. Purely functional GC
    stack
    registers heap
    0x0000 0xffff
    frontier
    • Mark roots

    View Slide

  109. Purely functional GC
    stack
    registers heap
    0x0000 0xffff
    frontier
    • Mark roots
    • Scan from frontier to start. For each marked object,
    • Mark reachable object & reverse pointers

    View Slide

  110. Purely functional GC
    stack
    registers 0x0000 0xffff
    frontier
    • Mark roots
    • Scan from frontier to start. For each marked object,
    • Mark reachable object & reverse pointers
    • Scan from start to frontier. For each marked object,
    • Copy to next available free space & reverse pointers pointing left

    View Slide

  111. Purely functional GC
    stack
    registers 0x0000 0xffff
    frontier

    View Slide

  112. Purely functional GC
    stack
    registers 0x0000 0xffff
    frontier
    • Pros
    ✦ Simple & fast allocation
    ✦ Efficient use of space

    View Slide

  113. Purely functional GC
    stack
    registers 0x0000 0xffff
    frontier
    • Pros
    ✦ Simple & fast allocation
    ✦ Efficient use of space
    • Cons
    ✦ Need to touch all the objects on the heap
    ✦ Compaction as default is leads to long pause times

    View Slide