Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Bounding Data Races in Space and Time

Bounding Data Races in Space and Time

Multicore OCaml Memory Model

KC Sivaramakrishnan

February 26, 2018
Tweet

More Decks by KC Sivaramakrishnan

Other Decks in Programming

Transcript

  1. Bounding Data Races in
    Space and Time
    KC Sivaramakrishnan
    University of
    Cambridge
    OCaml Labs Darwin College,
    Cambridge
    1851 Royal
    Commission
    1

    View full-size slide

  2. Multicore OCaml
    !2

    View full-size slide

  3. Multicore OCaml
    • OCaml is an industrial-strength, functional programming
    language
    ★ Projects: MirageOS unikernel, Coq proof assistant, F* programming language
    ★ Companies: Facebook (Hack, Flow, Infer, Reason), Microsoft (Everest, F*),
    JaneStreet (all trading & support systems), Docker (Docker for Mac &
    Windows), Citrix (XenStore)
    !2

    View full-size slide

  4. Multicore OCaml
    • OCaml is an industrial-strength, functional programming
    language
    ★ Projects: MirageOS unikernel, Coq proof assistant, F* programming language
    ★ Companies: Facebook (Hack, Flow, Infer, Reason), Microsoft (Everest, F*),
    JaneStreet (all trading & support systems), Docker (Docker for Mac &
    Windows), Citrix (XenStore)
    • No multicore support!
    !2

    View full-size slide

  5. Multicore OCaml
    • OCaml is an industrial-strength, functional programming
    language
    ★ Projects: MirageOS unikernel, Coq proof assistant, F* programming language
    ★ Companies: Facebook (Hack, Flow, Infer, Reason), Microsoft (Everest, F*),
    JaneStreet (all trading & support systems), Docker (Docker for Mac &
    Windows), Citrix (XenStore)
    • No multicore support!
    • Multicore OCaml
    ★ Native support for concurrency and parallelism in OCaml
    ★ Lead from OCaml Labs + (JaneStreet, Microsoft Research, INRIA).
    !2

    View full-size slide

  6. Modelling Memory
    !3

    View full-size slide

  7. Modelling Memory
    • How do you reason about access to memory?
    !3

    View full-size slide

  8. Modelling Memory
    • How do you reason about access to memory?
    ★ Spoiler: No single global sequentially consistent memory
    !3

    View full-size slide

  9. Modelling Memory
    • How do you reason about access to memory?
    ★ Spoiler: No single global sequentially consistent memory
    • Modern multicore processors reorder instructions for
    performance
    !3

    View full-size slide

  10. Modelling Memory
    • How do you reason about access to memory?
    ★ Spoiler: No single global sequentially consistent memory
    • Modern multicore processors reorder instructions for
    performance
    Thread 1
    r1 = b
    Thread 2
    r2 = a
    Initially a = 0 && b =0
    r1 == 0 && r2 ==0 ???
    a = 1 b = 1
    !3

    View full-size slide

  11. Modelling Memory
    • How do you reason about access to memory?
    ★ Spoiler: No single global sequentially consistent memory
    • Modern multicore processors reorder instructions for
    performance
    Thread 1
    r1 = b
    Thread 2
    r2 = a
    Initially a = 0 && b =0
    r1 == 0 && r2 ==0 ???
    Allowed under x86, ARM, POWER
    a = 1 b = 1
    !3

    View full-size slide

  12. Modelling Memory
    • How do you reason about access to memory?
    ★ Spoiler: No single global sequentially consistent memory
    • Modern multicore processors reorder instructions for
    performance
    Thread 1
    r1 = b
    Thread 2
    r2 = a
    Initially a = 0 && b =0
    r1 == 0 && r2 ==0 ???
    Allowed under x86, ARM, POWER
    a = 1 b = 1
    Write buffering
    !3

    View full-size slide

  13. Modelling Memory
    • How do you reason about access to memory?
    ★ Spoiler: No single global sequentially consistent memory
    • Modern multicore processors reorder instructions for
    performance
    Thread 1
    r1 = b
    Thread 2
    r2 = a
    Initially a = 0 && b =0
    r1 == 0 && r2 ==0 ???
    Allowed under x86, ARM, POWER
    a = 1 b = 1
    Write buffering
    !4

    View full-size slide

  14. Modelling Memory
    • Compilers optimisations also reorder memory access
    instructions
    !5

    View full-size slide

  15. Modelling Memory
    • Compilers optimisations also reorder memory access
    instructions
    !5
    Thread 1
    r1 = a * 2
    r2 = b + 1
    r3 = a * 2
    Thread 1
    r1 = a * 2
    r2 = b + 1
    r3 = r1
    CSE
    !

    View full-size slide

  16. Modelling Memory
    • Compilers optimisations also reorder memory access
    instructions
    !5
    Thread 1
    r1 = a * 2
    r2 = b + 1
    r3 = a * 2
    Thread 1
    r1 = a * 2
    r2 = b + 1
    r3 = r1
    Initially
    &a == &b
    &&
    a = b = 1
    CSE
    !

    View full-size slide

  17. Modelling Memory
    • Compilers optimisations also reorder memory access
    instructions
    !5
    Thread 1
    r1 = a * 2
    r2 = b + 1
    r3 = a * 2
    Thread 1
    r1 = a * 2
    r2 = b + 1
    r3 = r1
    Initially
    &a == &b
    &&
    a = b = 1
    Thread 2
    b = 0
    CSE
    !

    View full-size slide

  18. Modelling Memory
    • Compilers optimisations also reorder memory access
    instructions
    !5
    Thread 1
    r1 = a * 2
    r2 = b + 1
    r3 = a * 2
    Thread 1
    r1 = a * 2
    r2 = b + 1
    r3 = r1
    Initially
    &a == &b
    &&
    a = b = 1
    Thread 2
    b = 0
    r1 == 2 &&
    r2 == 0 &&
    r3 == 0
    CSE
    !

    View full-size slide

  19. Modelling Memory
    • Compilers optimisations also reorder memory access
    instructions
    !5
    Thread 1
    r1 = a * 2
    r2 = b + 1
    r3 = a * 2
    Thread 1
    r1 = a * 2
    r2 = b + 1
    r3 = r1
    Initially
    &a == &b
    &&
    a = b = 1
    Thread 2
    b = 0
    r1 == 2 &&
    r2 == 0 &&
    r3 == 0
    r1 == 2 &&
    r2 == 0 &&
    r3 == 2
    CSE
    !

    View full-size slide

  20. Modelling Memory
    • Compilers optimisations also reorder memory access
    instructions
    !6
    Thread 1
    r1 = a * 2
    r2 = b + 1
    r3 = a * 2
    Thread 1
    r1 = a * 2
    r2 = b + 1
    r3 = r1
    Thread 2
    b = 0
    r1 == 2 &&
    r2 == 0 &&
    r3 == 0
    r1 == 2 &&
    r2 == 0 &&
    r3 == 2
    Initially
    &a == &b
    &&
    a = b = 1
    CSE
    !

    View full-size slide

  21. Memory Model
    • Unambiguous specification of program outcomes
    ★ More than just thread interleavings
    !7
    Memory model
    OCaml compiler

    View full-size slide

  22. Memory Model
    • Unambiguous specification of program outcomes
    ★ More than just thread interleavings
    • Memory Model Desiderata
    ★ Not too weak (good for programmers)
    ★ Not too strong (good for hardware)
    ★ Admits optimisations (good for compilers)
    ★ Mathematically rigorous (good for verification)
    !7
    Memory model
    OCaml compiler

    View full-size slide

  23. Memory Model
    • Unambiguous specification of program outcomes
    ★ More than just thread interleavings
    • Memory Model Desiderata
    ★ Not too weak (good for programmers)
    ★ Not too strong (good for hardware)
    ★ Admits optimisations (good for compilers)
    ★ Mathematically rigorous (good for verification)
    • Difficult to get right
    ★ C/C++11 memory model is flawed
    ★ Java memory model is flawed
    ★ Several papers every year in top PL conferences
    proposing / fixing models
    !7
    Memory model
    OCaml compiler

    View full-size slide

  24. Memory Model: Programmer’s view
    !8

    View full-size slide

  25. Memory Model: Programmer’s view
    • Data race
    ★ Concurrent access to memory location, one of which is a write
    !8

    View full-size slide

  26. Memory Model: Programmer’s view
    • Data race
    ★ Concurrent access to memory location, one of which is a write
    • Sequential consistency (SC)
    ★ No intra-thread reordering, only inter-thread interleaving
    !8

    View full-size slide

  27. Memory Model: Programmer’s view
    • Data race
    ★ Concurrent access to memory location, one of which is a write
    • Sequential consistency (SC)
    ★ No intra-thread reordering, only inter-thread interleaving
    • DRF-SC: primary tool in concurrent programmers arsenal
    ★ If a program has no races (under SC semantics), then the program has SC
    semantics
    ★ Well-synchronised programs do not have surprising behaviours
    !8

    View full-size slide

  28. Memory Model: Programmer’s view
    • Data race
    ★ Concurrent access to memory location, one of which is a write
    • Sequential consistency (SC)
    ★ No intra-thread reordering, only inter-thread interleaving
    • DRF-SC: primary tool in concurrent programmers arsenal
    ★ If a program has no races (under SC semantics), then the program has SC
    semantics
    ★ Well-synchronised programs do not have surprising behaviours
    • Our observation: DRF-SC is too weak for programmers
    !8

    View full-size slide

  29. C/C++ Memory Model
    • C/C++ (C11) memory model offers DRF-SC, but..
    !9

    View full-size slide

  30. C/C++ Memory Model
    • C/C++ (C11) memory model offers DRF-SC, but..
    ★ If a program has races (even benign), then the behaviour is undefined!
    !9

    View full-size slide

  31. C/C++ Memory Model
    • C/C++ (C11) memory model offers DRF-SC, but..
    ★ If a program has races (even benign), then the behaviour is undefined!
    ★ Most C/C++ programs have races => most C/C++ programs are
    allowed to crash and burn
    !9

    View full-size slide

  32. C/C++ Memory Model
    • C/C++ (C11) memory model offers DRF-SC, but..
    ★ If a program has races (even benign), then the behaviour is undefined!
    ★ Most C/C++ programs have races => most C/C++ programs are
    allowed to crash and burn
    • Races on unrelated locations can affect behaviour
    !9

    View full-size slide

  33. C/C++ Memory Model
    • C/C++ (C11) memory model offers DRF-SC, but..
    ★ If a program has races (even benign), then the behaviour is undefined!
    ★ Most C/C++ programs have races => most C/C++ programs are
    allowed to crash and burn
    • Races on unrelated locations can affect behaviour
    ★ We would like a memory model where data races are bounded in
    space
    !9

    View full-size slide

  34. • Java also offers DRF-SC
    ★ Unlike C++, type safety necessitates defined behaviour under races
    !10
    Java Memory Model

    View full-size slide

  35. • Java also offers DRF-SC
    ★ Unlike C++, type safety necessitates defined behaviour under races
    ★ No data races in space, but allows races in time…
    !10
    Java Memory Model

    View full-size slide

  36. • Java also offers DRF-SC
    ★ Unlike C++, type safety necessitates defined behaviour under races
    ★ No data races in space, but allows races in time…
    !10
    Java Memory Model
    int a;
    volatile bool flag;

    View full-size slide

  37. • Java also offers DRF-SC
    ★ Unlike C++, type safety necessitates defined behaviour under races
    ★ No data races in space, but allows races in time…
    !10
    Java Memory Model
    int a;
    volatile bool flag;
    Thread 1
    a = 1;
    flag = true;

    View full-size slide

  38. • Java also offers DRF-SC
    ★ Unlike C++, type safety necessitates defined behaviour under races
    ★ No data races in space, but allows races in time…
    !10
    Java Memory Model
    int a;
    volatile bool flag;
    Thread 1
    a = 1;
    flag = true;
    Thread 2
    a = 2;
    if (flag) {
    // no race here
    r1 = a;
    r2 = a;
    }

    View full-size slide

  39. • Java also offers DRF-SC
    ★ Unlike C++, type safety necessitates defined behaviour under races
    ★ No data races in space, but allows races in time…
    !10
    Java Memory Model
    int a;
    volatile bool flag;
    Thread 1
    a = 1;
    flag = true;
    Thread 2
    a = 2;
    if (flag) {
    // no race here
    r1 = a;
    r2 = a;
    }
    r1 == 1 && r2 == 2 is allowed

    View full-size slide

  40. • Java also offers DRF-SC
    ★ Unlike C++, type safety necessitates defined behaviour under races
    ★ No data races in space, but allows races in time…
    !10
    Java Memory Model
    int a;
    volatile bool flag;
    Thread 1
    a = 1;
    flag = true;
    Thread 2
    a = 2;
    if (flag) {
    // no race here
    r1 = a;
    r2 = a;
    }
    r1 == 1 && r2 == 2 is allowed
    Races in the past
    affects future

    View full-size slide

  41. Java Memory Model
    • Future data races can affect the past
    !11

    View full-size slide

  42. Java Memory Model
    • Future data races can affect the past
    !11
    Class C { int x; }

    View full-size slide

  43. Thread 1
    C c = new C();
    c.x = 42;
    r1 = c.x;
    Java Memory Model
    • Future data races can affect the past
    !11
    Class C { int x; }
    Can assert (r1 == 42) fail?

    View full-size slide

  44. Java Memory Model
    • Future data races can affect the past
    !12
    Class C { int x; }
    C g;
    Thread 1
    C c = new C();
    c.x = 42;
    r1 = c.x;
    g = c;
    Thread 2
    g.x = 7;
    Can assert (r1 == 42) fail?

    View full-size slide

  45. Java Memory Model
    • Future data races can affect the past
    !13
    Class C { int x; }
    C g;
    Thread 1
    C c = new C();
    c.x = 42;
    r1 = c.x;
    g = c;
    Thread 2
    g.x = 7;

    View full-size slide

  46. Java Memory Model
    • Future data races can affect the past
    !13
    Class C { int x; }
    C g;
    Thread 1
    C c = new C();
    c.x = 42;
    r1 = c.x;
    g = c;
    Thread 2
    g.x = 7;
    assert (r1 == 42) fails

    View full-size slide

  47. Java Memory Model
    • Future data races can affect the past
    !13
    Class C { int x; }
    C g;
    Thread 1
    C c = new C();
    c.x = 42;
    r1 = c.x;
    g = c;
    Thread 2
    g.x = 7;
    assert (r1 == 42) fails
    • We would like a memory model that bounds data races in time

    View full-size slide

  48. OCaml Memory Model: Goal
    !14

    View full-size slide

  49. • Language memory models should specify behaviours under data
    races
    OCaml Memory Model: Goal
    !14

    View full-size slide

  50. • Language memory models should specify behaviours under data
    races
    ★ Not because they are useful
    OCaml Memory Model: Goal
    !14

    View full-size slide

  51. • Language memory models should specify behaviours under data
    races
    ★ Not because they are useful
    ★ But to limit their damage
    OCaml Memory Model: Goal
    !14

    View full-size slide

  52. • Language memory models should specify behaviours under data
    races
    ★ Not because they are useful
    ★ But to limit their damage
    OCaml Memory Model: Goal
    !14
    If I read a variable twice and there are no concurrent writes,
    then both reads return the same value

    View full-size slide

  53. OCaml MM: Contributions
    !15
    • Memory Model Desiderata
    ★ Not too weak (good for
    programmers)
    ★ Not too strong (good for
    hardware)
    ★ Admits optimisations (good for
    compilers)
    ★ Mathematically rigorous (good for
    verification)
    • OCaml Memory model
    ★ Local version of DRF-SC — key
    discovery
    ★ Free on x86, 0.6% overhead on
    ARM, 2.6% overhead on POWER
    ★ Allows most common compiler
    optimisations
    ★ Simple operational and axiomatic
    semantics + proved soundness
    (optimization + to-hardware)

    View full-size slide

  54. Local DRF
    !16

    View full-size slide

  55. Local DRF
    • If there are no data races,
    !16

    View full-size slide

  56. Local DRF
    • If there are no data races,
    ★ on some variables (space)
    !16

    View full-size slide

  57. Local DRF
    • If there are no data races,
    ★ on some variables (space)
    ★ in some interval (time)
    !16

    View full-size slide

  58. Local DRF
    • If there are no data races,
    ★ on some variables (space)
    ★ in some interval (time)
    ★ then the program has SC behaviour on those variables in that time interval
    !16

    View full-size slide

  59. Local DRF
    • If there are no data races,
    ★ on some variables (space)
    ★ in some interval (time)
    ★ then the program has SC behaviour on those variables in that time interval
    • Space = {all variables} && Time = whole execution => DRF-SC
    !16

    View full-size slide

  60. Local DRF
    • If there are no data races,
    ★ on some variables (space)
    ★ in some interval (time)
    ★ then the program has SC behaviour on those variables in that time interval
    • Space = {all variables} && Time = whole execution => DRF-SC
    !16
    Thread 1
    msg = 1;
    b = 0;
    Flag = 1;
    Thread 2
    b = 1;
    if (Flag) {
    r = msg;
    }
    Flag is atomic

    View full-size slide

  61. Local DRF
    • If there are no data races,
    ★ on some variables (space)
    ★ in some interval (time)
    ★ then the program has SC behaviour on those variables in that time interval
    • Space = {all variables} && Time = whole execution => DRF-SC
    !16
    Thread 1
    msg = 1;
    b = 0;
    Flag = 1;
    Thread 2
    b = 1;
    if (Flag) {
    r = msg;
    }
    Flag is atomic

    View full-size slide

  62. Local DRF
    • If there are no data races,
    ★ on some variables (space)
    ★ in some interval (time)
    ★ then the program has SC behaviour on those variables in that time interval
    • Space = {all variables} && Time = whole execution => DRF-SC
    !16
    Thread 1
    msg = 1;
    b = 0;
    Flag = 1;
    Thread 2
    b = 1;
    if (Flag) {
    r = msg;
    }
    Flag is atomic
    Due to local DRF, despite the race on b, message-passing idiom still works!

    View full-size slide

  63. Formal Memory Model
    !17

    View full-size slide

  64. Formal Memory Model
    !17
    • Most programmers can live with local DRF
    ★ Experts demand more (concurrency libraries, high-performance code, etc.)

    View full-size slide

  65. Formal Memory Model
    !17
    • Most programmers can live with local DRF
    ★ Experts demand more (concurrency libraries, high-performance code, etc.)
    • Simple operational semantics that captures all of the allowed
    behaviours

    View full-size slide

  66. Formal Memory Model
    !17
    • Most programmers can live with local DRF
    ★ Experts demand more (concurrency libraries, high-performance code, etc.)
    • Simple operational semantics that captures all of the allowed
    behaviours

    View full-size slide

  67. Visualising operational semantics
    !18
    Non atomic
    a
    b
    c
    1 2
    3 4
    5 6 7
    Histories
    time
    !
    5

    View full-size slide

  68. Visualising operational semantics
    !18
    Non atomic
    a
    b
    c
    1 2
    3 4
    5 6 7
    Thread 1 Thread 2
    Histories
    time
    !
    5

    View full-size slide

  69. Visualising operational semantics
    !18
    Non atomic
    a
    b
    c
    1 2
    3 4
    5 6 7
    Thread 1 Thread 2
    Histories
    read(b)
    time
    !
    5

    View full-size slide

  70. Visualising operational semantics
    !18
    Non atomic
    a
    b
    c
    1 2
    3 4
    5 6 7
    Thread 1 Thread 2
    Histories
    read(b) -> 3/4/5
    time
    !
    5

    View full-size slide

  71. Visualising operational semantics
    !18
    Non atomic
    a
    b
    c
    1 2
    3 4
    5 6 7
    Thread 1 Thread 2
    Histories
    read(b) -> 3/4/5 write(c,10)
    time
    !
    5

    View full-size slide

  72. Visualising operational semantics
    !19
    Non atomic
    a
    b
    c
    1 2
    3 4
    5 6 7
    Thread 1 Thread 2
    Histories
    read(b) -> 3/4/5 write(c,10)
    10
    time
    !
    5

    View full-size slide

  73. Visualising operational semantics
    !19
    Non atomic
    a
    b
    c
    1 2
    3 4
    5 6 7
    Thread 1 Thread 2
    Histories
    read(b) -> 3/4/5 write(c,10)
    10
    time
    !
    Atomic
    A
    B
    10
    5
    5

    View full-size slide

  74. Visualising operational semantics
    !19
    Non atomic
    a
    b
    c
    1 2
    3 4
    5 6 7
    Thread 1 Thread 2
    Histories
    read(b) -> 3/4/5 write(c,10)
    10
    time
    !
    Atomic
    A
    B
    10
    5
    5

    View full-size slide

  75. Visualising operational semantics
    !20
    Non atomic
    a
    b
    c
    1 2
    3 4
    5 6 7
    Thread 1 Thread 2
    Histories
    read(B)
    10
    time
    !
    Atomic
    A
    B
    10
    5
    5

    View full-size slide

  76. Visualising operational semantics
    !20
    Non atomic
    a
    b
    c
    1 2
    3 4
    5 6 7
    Thread 1 Thread 2
    Histories
    read(B)
    10
    time
    !
    Atomic
    A
    B
    10
    5
    -> 5
    5

    View full-size slide

  77. Visualising operational semantics
    !21
    Non atomic
    a
    b
    c
    1 2
    3 4
    5 6 7
    Thread 1 Thread 2
    Histories
    read(B)
    10
    time
    !
    Atomic
    A
    B
    10
    5
    -> 5
    5

    View full-size slide

  78. Visualising operational semantics
    !21
    Non atomic
    a
    b
    c
    1 2
    3 4
    5 6 7
    Thread 1 Thread 2
    Histories
    read(B)
    10
    time
    !
    Atomic
    A
    B
    10
    5
    -> 5 write (A,20)
    5

    View full-size slide

  79. Visualising operational semantics
    !22
    Non atomic
    a
    b
    c
    1 2
    3 4
    5 6 7
    Thread 1 Thread 2
    Histories
    read(B)
    10
    time
    !
    Atomic
    A
    B
    20
    5
    -> 5 write (A,20)
    5

    View full-size slide

  80. Formalizing Local DRF
    !23
    Trace

    View full-size slide

  81. Formalizing Local DRF
    !23
    Trace
    Machine state =
    State of all threads +
    Heap

    View full-size slide

  82. Formalizing Local DRF
    !23
    Trace
    Machine state =
    State of all threads +
    Heap
    Memory
    access

    View full-size slide

  83. Formalizing Local DRF
    !23
    Trace
    Machine state =
    State of all threads +
    Heap
    Memory
    access
    • Pick a set of L of locations

    View full-size slide

  84. Formalizing Local DRF
    !23
    Trace
    Machine state =
    State of all threads +
    Heap
    Memory
    access
    • Pick a set of L of locations Space

    View full-size slide

  85. Formalizing Local DRF
    !23
    Trace
    Machine state =
    State of all threads +
    Heap
    Memory
    access
    • Pick a set of L of locations
    • Pick a machine state M where there are no ongoing races in L
    ★ M is said to be L-stable
    Space

    View full-size slide

  86. Formalizing Local DRF
    !23
    Trace
    Machine state =
    State of all threads +
    Heap
    Memory
    access
    • Pick a set of L of locations
    • Pick a machine state M where there are no ongoing races in L
    ★ M is said to be L-stable
    • Local DRF Theorem
    ★ Starting from an L-stable state M, until the next race on any location in L
    under SC semantics, the program has SC semantics
    Space

    View full-size slide

  87. Formalizing Local DRF
    !23
    Trace
    Machine state =
    State of all threads +
    Heap
    Memory
    access
    • Pick a set of L of locations
    • Pick a machine state M where there are no ongoing races in L
    ★ M is said to be L-stable
    • Local DRF Theorem
    ★ Starting from an L-stable state M, until the next race on any location in L
    under SC semantics, the program has SC semantics
    Space
    Time

    View full-size slide

  88. • Local DRF prohibits certain hardware and software
    optimisations
    ★ Preserve load-to-store ordering
    Performance Implication
    !24

    View full-size slide

  89. • Local DRF prohibits certain hardware and software
    optimisations
    ★ Preserve load-to-store ordering
    • No compiler optimisation that reorders load-to-store ordering
    is allowed
    Performance Implication
    !24

    View full-size slide

  90. • Local DRF prohibits certain hardware and software
    optimisations
    ★ Preserve load-to-store ordering
    • No compiler optimisation that reorders load-to-store ordering
    is allowed
    Performance Implication
    !24
    r1 = a;
    b = c;
    a = r1;
    Redundant store elimination
    !
    r1 = a;
    b = c;
    ;

    View full-size slide

  91. • Local DRF prohibits certain hardware and software
    optimisations
    ★ Preserve load-to-store ordering
    • No compiler optimisation that reorders load-to-store ordering
    is allowed
    Performance Implication
    !24
    r1 = a;
    b = c;
    a = r1;
    Redundant store elimination
    !
    r1 = a;
    b = c;
    ;

    View full-size slide

  92. • Local DRF prohibits certain hardware and software
    optimisations
    ★ Preserve load-to-store ordering
    • No compiler optimisation that reorders load-to-store ordering
    is allowed
    Performance Implication
    !24
    r1 = a;
    b = c;
    a = r1;
    Redundant store elimination
    !
    r1 = a;
    b = c;
    ;

    View full-size slide

  93. • Local DRF prohibits certain hardware and software
    optimisations
    ★ Preserve load-to-store ordering
    • No compiler optimisation that reorders load-to-store ordering
    is allowed
    Performance Implication
    !24
    r1 = a;
    b = c;
    a = r1;
    Redundant store elimination
    !
    r1 = a;
    b = c;
    ;

    View full-size slide

  94. • Local DRF prohibits certain hardware and software
    optimisations
    ★ Preserve load-to-store ordering
    • No compiler optimisation that reorders load-to-store ordering
    is allowed
    • ARM & POWER do not preserve load-to-store ordering
    ★ Insert necessary synchronisation between every mutable load and store
    ★ What is the performance cost?
    Performance Implication
    !24
    r1 = a;
    b = c;
    a = r1;
    Redundant store elimination
    !
    r1 = a;
    b = c;
    ;

    View full-size slide

  95. Performance
    !25

    View full-size slide

  96. Performance
    !25
    0.6% overhead on AArch64 (ARMv8)

    View full-size slide

  97. Performance
    !25
    0.6% overhead on AArch64 (ARMv8) Free on x86, 2.6% on POWER

    View full-size slide

  98. Summary
    • OCaml memory model
    ★ Balances comprehensibility (Local DRF theorem) and Performance (free on
    x86, 0.6% on ARMv8, 2.6% on POWER)
    ★ Allows common compiler optimisations
    ★ Compilation + Optimisations proved sound
    !26

    View full-size slide

  99. Summary
    • OCaml memory model
    ★ Balances comprehensibility (Local DRF theorem) and Performance (free on
    x86, 0.6% on ARMv8, 2.6% on POWER)
    ★ Allows common compiler optimisations
    ★ Compilation + Optimisations proved sound
    • Proposed as the memory model for OCaml
    ★ Also suitable for other safe languages (Swift, WebAssembly, JavaScript)
    !26

    View full-size slide