Slide 1

Slide 1 text

“KC” Sivaramakrishnan Concurrent and Parallel Programming with OCaml 5

Slide 2

Slide 2 text

• Building functional systems using OCaml • We work on ‣ OCaml platform: Compiler, Build system (dune), package manager (opam), documentation tools (odoc), editor support (LSP, merlin), etc. ‣ OCaml community: ocaml.org, CI for package repository, managing community infrastructure, run conferences and events ‣ OCaml consulting & training: helping commercial users with OCaml needs ‣ Research: SpaceOS — Satellite IaaS as a service, formal veri fi cation, blockchain forensics

Slide 3

Slide 3 text

OCaml 5 • Native-support for concurrency and parallelism to OCaml

Slide 4

Slide 4 text

OCaml 5 • Native-support for concurrency and parallelism to OCaml • Started in 2014 as “Multicore OCaml” project ‣ OCaml 5.0 released in Dec 2022 ‣ 5.1 — Sep 2023; 5.2 — May 2024; 5.3 — Nov 2024 (expected)

Slide 5

Slide 5 text

OCaml 5 • Native-support for concurrency and parallelism to OCaml • Started in 2014 as “Multicore OCaml” project ‣ OCaml 5.0 released in Dec 2022 ‣ 5.1 — Sep 2023; 5.2 — May 2024; 5.3 — Nov 2024 (expected) • This talk ‣ Concurrency ‣ Parallelism ‣ Experience porting from multi-process to multi-core

Slide 6

Slide 6 text

OCaml 5 • Native-support for concurrency and parallelism to OCaml programming language

Slide 7

Slide 7 text

OCaml 5 • Native-support for concurrency and parallelism to OCaml programming language Overlapped A B A C B Time

Slide 8

Slide 8 text

OCaml 5 • Native-support for concurrency and parallelism to OCaml programming language Overlapped A B A C B Time Simultaneous A B C Time

Slide 9

Slide 9 text

OCaml 5 • Native-support for concurrency and parallelism to OCaml programming language Overlapped A B A C B Time Simultaneous A B C Time Effect Handlers

Slide 10

Slide 10 text

OCaml 5 • Native-support for concurrency and parallelism to OCaml programming language Overlapped A B A C B Time Simultaneous A B C Time Effect Handlers Domains

Slide 11

Slide 11 text

OCaml 5 • Native-support for concurrency and parallelism to OCaml programming language Overlapped A B A C B Time Simultaneous A B C Time “Retro fi tting E ff ect Handlers onto OCaml”, PLDI 2021 Effect Handlers Domains

Slide 12

Slide 12 text

OCaml 5 • Native-support for concurrency and parallelism to OCaml programming language Overlapped A B A C B Time Simultaneous A B C Time “Retro fi tting E ff ect Handlers onto OCaml”, PLDI 2021 Effect Handlers Domains “Retro fi tting Parallelism onto OCaml”, ICFP 2020

Slide 13

Slide 13 text

Concurrency Overlapped A B A C B Time

Slide 14

Slide 14 text

• Computations may be suspended and resumed later Concurrent Programming

Slide 15

Slide 15 text

• Computations may be suspended and resumed later • Many languages provide concurrent programming mechanisms as primitives ✦ async/await — JavaScript, Python, Rust, C# 5.0, F#, Swift, … ✦ generators — Python, Javascript, … ✦ coroutines — C++, Kotlin, Lua, … ✦ futures & promises — JavaScript, Swift, … ✦ Lightweight threads/processes — Haskell, Go, Erlang Concurrent Programming

Slide 16

Slide 16 text

• Computations may be suspended and resumed later • Many languages provide concurrent programming mechanisms as primitives ✦ async/await — JavaScript, Python, Rust, C# 5.0, F#, Swift, … ✦ generators — Python, Javascript, … ✦ coroutines — C++, Kotlin, Lua, … ✦ futures & promises — JavaScript, Swift, … ✦ Lightweight threads/processes — Haskell, Go, Erlang • Often include many di ff erent primitives in the same language! ✦ JavaScript has async/await, generators, promises, and callbacks Concurrent Programming

Slide 17

Slide 17 text

Concurrent Programming in OCaml 4 • No primitive support for concurrent programming

Slide 18

Slide 18 text

Concurrent Programming in OCaml 4 • No primitive support for concurrent programming • Lwt and Async - concurrent programming libraries in OCaml ‣ Callback-oriented programming with monadic syntax

Slide 19

Slide 19 text

Concurrent Programming in OCaml 4 • No primitive support for concurrent programming • Lwt and Async - concurrent programming libraries in OCaml ‣ Callback-oriented programming with monadic syntax • Su ff ers the pitfalls of callback-orinted programming ‣ Incomprehensible (“callback hell”), no backtraces, poor performance, function colouring

Slide 20

Slide 20 text

Concurrent Programming in OCaml 4 • No primitive support for concurrent programming • Lwt and Async - concurrent programming libraries in OCaml ‣ Callback-oriented programming with monadic syntax • Su ff ers the pitfalls of callback-orinted programming ‣ Incomprehensible (“callback hell”), no backtraces, poor performance, function colouring Synchronous Asynchronous Normal calls Special calling convention

Slide 21

Slide 21 text

Concurrent Programming in OCaml 4 • No primitive support for concurrent programming • Lwt and Async - concurrent programming libraries in OCaml ‣ Callback-oriented programming with monadic syntax • Su ff ers the pitfalls of callback-orinted programming ‣ Incomprehensible (“callback hell”), no backtraces, poor performance, function colouring • Don’t want a zoo of primitives, but need expressivity! ‣ Add the smallest primitive that captures many concurrent programming patterns Synchronous Asynchronous Normal calls Special calling convention

Slide 22

Slide 22 text

Effect handlers • A mechanism for programming with user-de fi ned e ff ects

Slide 23

Slide 23 text

Effect handlers • A mechanism for programming with user-de fi ned e ff ects • Modular and composable basis of non-local control- fl ow mechanisms

Slide 24

Slide 24 text

Effect handlers • A mechanism for programming with user-de fi ned e ff ects • Modular and composable basis of non-local control- fl ow mechanisms ✦ Exceptions, generators, lightweight threads, promises, asynchronous IO, coroutines as libraries

Slide 25

Slide 25 text

Effect handlers • A mechanism for programming with user-de fi ned e ff ects • Modular and composable basis of non-local control- fl ow mechanisms ✦ Exceptions, generators, lightweight threads, promises, asynchronous IO, coroutines as libraries • E ff ect handlers ~= fi rst-class, restartable exceptions

Slide 26

Slide 26 text

Effect handlers • A mechanism for programming with user-de fi ned e ff ects • Modular and composable basis of non-local control- fl ow mechanisms ✦ Exceptions, generators, lightweight threads, promises, asynchronous IO, coroutines as libraries • E ff ect handlers ~= fi rst-class, restartable exceptions ✦ Structured programming with delimited continuations

Slide 27

Slide 27 text

Effect handlers • A mechanism for programming with user-de fi ned e ff ects • Modular and composable basis of non-local control- fl ow mechanisms ✦ Exceptions, generators, lightweight threads, promises, asynchronous IO, coroutines as libraries • E ff ect handlers ~= fi rst-class, restartable exceptions ✦ Structured programming with delimited continuations https://github.com/ocaml-multicore/effects-examples • Direct-style asynchronous I/O • Generators • Resumable parsers • Probabilistic Programming • Reactive UIs • ….

Slide 28

Slide 28 text

Effect handlers type _ eff += E : string eff let comp () = print_string "0 "; print_string (perform E); print_string "3 " let main () = try comp () with effect E, k -> print_string "1 "; continue k "2 "; print_string "4 "

Slide 29

Slide 29 text

Effect handlers type _ eff += E : string eff let comp () = print_string "0 "; print_string (perform E); print_string "3 " let main () = try comp () with effect E, k -> print_string "1 "; continue k "2 "; print_string "4 " effect declaration

Slide 30

Slide 30 text

Effect handlers type _ eff += E : string eff let comp () = print_string "0 "; print_string (perform E); print_string "3 " let main () = try comp () with effect E, k -> print_string "1 "; continue k "2 "; print_string "4 " computation effect declaration

Slide 31

Slide 31 text

Effect handlers type _ eff += E : string eff let comp () = print_string "0 "; print_string (perform E); print_string "3 " let main () = try comp () with effect E, k -> print_string "1 "; continue k "2 "; print_string "4 " computation handler effect declaration

Slide 32

Slide 32 text

Effect handlers type _ eff += E : string eff let comp () = print_string "0 "; print_string (perform E); print_string "3 " let main () = try comp () with effect E, k -> print_string "1 "; continue k "2 "; print_string "4 " computation handler suspends current computation effect declaration

Slide 33

Slide 33 text

Effect handlers type _ eff += E : string eff let comp () = print_string "0 "; print_string (perform E); print_string "3 " let main () = try comp () with effect E, k -> print_string "1 "; continue k "2 "; print_string "4 " computation handler delimited continuation suspends current computation effect declaration

Slide 34

Slide 34 text

Effect handlers type _ eff += E : string eff let comp () = print_string "0 "; print_string (perform E); print_string "3 " let main () = try comp () with effect E, k -> print_string "1 "; continue k "2 "; print_string "4 " computation handler delimited continuation suspends current computation resume suspended computation effect declaration

Slide 35

Slide 35 text

type 'a eff += E : string eff let comp () = print_string "0 "; print_string (perform E); print_string "3 " let main () = try comp () with effect E, k -> print_string "1 "; continue k "2 "; print_string "4 " pc main sp Stepping through the example

Slide 36

Slide 36 text

type 'a eff += E : string eff let comp () = print_string "0 "; print_string (perform E); print_string "3 " let main () = try comp () with effect E, k -> print_string "1 "; continue k "2 "; print_string "4 " pc main sp Stepping through the example

Slide 37

Slide 37 text

Fiber: A piece of stack + effect handler type 'a eff += E : string eff let comp () = print_string "0 "; print_string (perform E); print_string "3 " let main () = try comp () with effect E, k -> print_string "1 "; continue k "2 "; print_string "4 " comp pc main sp parent Stepping through the example

Slide 38

Slide 38 text

type 'a eff += E : string eff let comp () = print_string "0 "; print_string (perform E); print_string "3 " let main () = try comp () with effect E, k -> print_string "1 "; continue k "2 "; print_string "4 " comp comp pc main sp parent 0 Stepping through the example

Slide 39

Slide 39 text

type 'a eff += E : string eff let comp () = print_string "0 "; print_string (perform E); print_string "3 " let main () = try comp () with effect E, k -> print_string "1 "; continue k "2 "; print_string "4 " comp comp pc main sp k 0 Stepping through the example

Slide 40

Slide 40 text

type 'a eff += E : string eff let comp () = print_string "0 "; print_string (perform E); print_string "3 " let main () = try comp () with effect E, k -> print_string "1 "; continue k "2 "; print_string "4 " comp comp pc main sp k 0 Stepping through the example

Slide 41

Slide 41 text

type 'a eff += E : string eff let comp () = print_string "0 "; print_string (perform E); print_string "3 " let main () = try comp () with effect E, k -> print_string "1 "; continue k "2 "; print_string "4 " comp comp pc main sp k 0 Stepping through the example

Slide 42

Slide 42 text

type 'a eff += E : string eff let comp () = print_string "0 "; print_string (perform E); print_string "3 " let main () = try comp () with effect E, k -> print_string "1 "; continue k "2 "; print_string "4 " comp comp pc main sp k 0 1 Stepping through the example

Slide 43

Slide 43 text

type 'a eff += E : string eff let comp () = print_string "0 "; print_string (perform E); print_string "3 " let main () = try comp () with effect E, k -> print_string "1 "; continue k "2 "; print_string "4 " comp comp pc main sp k 0 1 Stepping through the example

Slide 44

Slide 44 text

type 'a eff += E : string eff let comp () = print_string "0 "; print_string (perform E); print_string "3 " let main () = try comp () with effect E, k -> print_string "1 "; continue k "2 "; print_string "4 " comp comp pc main sp k parent 0 1 Stepping through the example

Slide 45

Slide 45 text

type 'a eff += E : string eff let comp () = print_string "0 "; print_string (perform E); print_string "3 " let main () = try comp () with effect E, k -> print_string "1 "; continue k "2 "; print_string "4 " comp comp pc main sp k parent 0 1 2 Stepping through the example

Slide 46

Slide 46 text

type 'a eff += E : string eff let comp () = print_string "0 "; print_string (perform E); print_string "3 " let main () = try comp () with effect E, k -> print_string "1 "; continue k "2 "; print_string "4 " pc main sp k 0 1 2 3 Stepping through the example

Slide 47

Slide 47 text

type 'a eff += E : string eff let comp () = print_string "0 "; print_string (perform E); print_string "3 " let main () = try comp () with effect E, k -> print_string "1 "; continue k "2 "; print_string "4 " pc main sp k 0 1 2 3 4 Stepping through the example

Slide 48

Slide 48 text

type _ eff += Fork : (unit -> unit) -> unit eff | Yield : unit eff Lightweight threading

Slide 49

Slide 49 text

type _ eff += Fork : (unit -> unit) -> unit eff | Yield : unit eff let run main = ... (* assume queue of continuations *) let run_next () = match dequeue () with | Some k -> continue k () | None -> () in let rec spawn f = match f () with | () -> run_next () (* value case *) | effect Yield, k -> enqueue k; run_next () | effect (Fork f), k -> enqueue k; spawn f in spawn main Lightweight threading

Slide 50

Slide 50 text

type _ eff += Fork : (unit -> unit) -> unit eff | Yield : unit eff let run main = ... (* assume queue of continuations *) let run_next () = match dequeue () with | Some k -> continue k () | None -> () in let rec spawn f = match f () with | () -> run_next () (* value case *) | effect Yield, k -> enqueue k; run_next () | effect (Fork f), k -> enqueue k; spawn f in spawn main let fork f = perform (Fork f) let yield () = perform Yield Lightweight threading

Slide 51

Slide 51 text

let main () = fork (fun _ -> print_endline "1.a"; yield (); print_endline "1.b"); fork (fun _ -> print_endline "2.a"; yield (); print_endline “2.b") ;; run main Lightweight threading

Slide 52

Slide 52 text

let main () = fork (fun _ -> print_endline "1.a"; yield (); print_endline "1.b"); fork (fun _ -> print_endline "2.a"; yield (); print_endline “2.b") ;; run main 1.a 2.a 1.b 2.b Lightweight threading

Slide 53

Slide 53 text

let main () = fork (fun _ -> print_endline "1.a"; yield (); print_endline "1.b"); fork (fun _ -> print_endline "2.a"; yield (); print_endline “2.b") ;; run main 1.a 2.a 1.b 2.b •Direct-style (no monads) •User-code need not be aware of effects •No Async vs Sync distinction Lightweight threading

Slide 54

Slide 54 text

let main () = fork (fun _ -> print_endline "1.a"; yield (); print_endline "1.b"); fork (fun _ -> print_endline "2.a"; yield (); print_endline “2.b") ;; run main 1.a 2.a 1.b 2.b •Direct-style (no monads) •User-code need not be aware of effects •No Async vs Sync distinction Ability to specialise scheduler unlike GHC Haskell / Go Lightweight threading

Slide 55

Slide 55 text

https://github.com/ocaml-multicore/eio • eio: e ff ects-based direct-style I/O ✦ Multiple backends — epoll, select, io_uring (new async io in Linux kernel) Lightweight threading

Slide 56

Slide 56 text

• eio: e ff ects-based direct-style I/O ✦ Multiple backends — epoll, select, io_uring (new async io in Linux kernel) 100 open connections, 60 seconds w/ io_uring OCaml eio Rust Hyper OCaml (Http/af + Lwt) Go NetHttp OCaml (cohttp + Lwt) https://github.com/ocaml-multicore/eio Lightweight threading

Slide 57

Slide 57 text

Representing Stack & Continuations • Program stack is a stack of runtime-managed dynamically growing fi bers ‣ No pointers into the OCaml stack ➔ reallocate fi bers on stack over fl ow

Slide 58

Slide 58 text

Representing Stack & Continuations • Program stack is a stack of runtime-managed dynamically growing fi bers ‣ No pointers into the OCaml stack ➔ reallocate fi bers on stack over fl ow • Stack switching is fast!! ‣ One shot continuations ➔ No copying of frames ‣ No callee-saved registers in OCaml ➔ No registers to save and restore at switches ‣ Few 10s of intructions; 5 to 10ns for stack switch

Slide 59

Slide 59 text

Representing Stack & Continuations • Program stack is a stack of runtime-managed dynamically growing fi bers ‣ No pointers into the OCaml stack ➔ reallocate fi bers on stack over fl ow • Stack switching is fast!! ‣ One shot continuations ➔ No copying of frames ‣ No callee-saved registers in OCaml ➔ No registers to save and restore at switches ‣ Few 10s of intructions; 5 to 10ns for stack switch • Need stack over fl ow checks in OCaml function prologue ‣ Branch predictor correctly predicts almost always

Slide 60

Slide 60 text

Representing Stack & Continuations • No stack over fl ow checks in C code ‣ Need to perform C calls on system stack!

Slide 61

Slide 61 text

Representing Stack & Continuations • No stack over fl ow checks in C code ‣ Need to perform C calls on system stack! C frames OCaml Frames C frames OCaml Frames OCaml 4.xx Stack grows down Main entry External call Callback

Slide 62

Slide 62 text

Representing Stack & Continuations • No stack over fl ow checks in C code ‣ Need to perform C calls on system stack! C frames C frames Fiber 1 (Many OCaml Frames) Fiber 2 C frames Fiber 3 Main entry Effect handler External Call Callback System Stack OCaml 5.xx C frames OCaml Frames C frames OCaml Frames OCaml 4.xx Stack grows down Main entry External call Callback Made fast enough to be not noticable!

Slide 63

Slide 63 text

Summary — Effect Handlers • E ff ect handlers brings simple, fast, backwards compatible native concurrency to OCaml

Slide 64

Slide 64 text

Summary — Effect Handlers • E ff ect handlers brings simple, fast, backwards compatible native concurrency to OCaml • Support for ‣ Integration with GDB (DWARF backtraces) ‣ frame-pointers (perf, eBPF)

Slide 65

Slide 65 text

Summary — Effect Handlers • E ff ect handlers brings simple, fast, backwards compatible native concurrency to OCaml • Support for ‣ Integration with GDB (DWARF backtraces) ‣ frame-pointers (perf, eBPF) • No static type system ‣ Unhandled e ff ects are runtime errors (just like exceptions)!

Slide 66

Slide 66 text

Parallelism Simultaneous A B C Time

Slide 67

Slide 67 text

Domains • A unit of parallelism • Heavyweight — maps onto an OS thread ‣ Aim to have 1 domain per physical core

Slide 68

Slide 68 text

Domains • A unit of parallelism • Heavyweight — maps onto an OS thread ‣ Aim to have 1 domain per physical core • Stdlib exposes ‣ Spawn & join, Mutex, Condition, domain-local storage ‣ Atomic references

Slide 69

Slide 69 text

Domains • A unit of parallelism • Heavyweight — maps onto an OS thread ‣ Aim to have 1 domain per physical core • Stdlib exposes ‣ Spawn & join, Mutex, Condition, domain-local storage ‣ Atomic references • Relaxed memory model ‣ Data-race-free programs have sequential consistency

Slide 70

Slide 70 text

Domains • A unit of parallelism • Heavyweight — maps onto an OS thread ‣ Aim to have 1 domain per physical core • Stdlib exposes ‣ Spawn & join, Mutex, Condition, domain-local storage ‣ Atomic references • Relaxed memory model ‣ Data-race-free programs have sequential consistency ‣ Programs with data races are type/memory safe! - Unlike C++, unsafe Rust - Important when porting sequential code to be made parallel

Slide 71

Slide 71 text

OCaml 4 GC • Generational, mark-and-sweep, incremental GC Incremental and non-moving Minor Heap Major Heap • Small (2 MB default) • Bump pointer allocation • Survivors copied to major heap

Slide 72

Slide 72 text

OCaml 4 GC • Generational, mark-and-sweep, incremental GC Incremental and non-moving Minor Heap Major Heap • Small (2 MB default) • Bump pointer allocation • Survivors copied to major heap Mutator Start of major cycle Idle

Slide 73

Slide 73 text

OCaml 4 GC • Generational, mark-and-sweep, incremental GC Incremental and non-moving Minor Heap Major Heap • Small (2 MB default) • Bump pointer allocation • Survivors copied to major heap Mutator Start of major cycle Idle Mark Roots mark roots

Slide 74

Slide 74 text

OCaml 4 GC • Generational, mark-and-sweep, incremental GC Incremental and non-moving Minor Heap Major Heap • Small (2 MB default) • Bump pointer allocation • Survivors copied to major heap Mark mark main Mutator Start of major cycle Idle Mark Roots mark roots

Slide 75

Slide 75 text

OCaml 4 GC • Generational, mark-and-sweep, incremental GC Incremental and non-moving Minor Heap Major Heap • Small (2 MB default) • Bump pointer allocation • Survivors copied to major heap Mark mark main Sweep sweep Mutator Start of major cycle Idle Mark Roots mark roots

Slide 76

Slide 76 text

OCaml 4 GC • Generational, mark-and-sweep, incremental GC Incremental and non-moving Minor Heap Major Heap • Small (2 MB default) • Bump pointer allocation • Survivors copied to major heap Mark mark main Sweep sweep End of major cycle Mutator Start of major cycle Idle Mark Roots mark roots

Slide 77

Slide 77 text

OCaml 4 GC • Generational, mark-and-sweep, incremental GC Incremental and non-moving Minor Heap Major Heap • Small (2 MB default) • Bump pointer allocation • Survivors copied to major heap Mark mark main Sweep sweep End of major cycle Mutator Start of major cycle Idle Mark Roots mark roots • Fast local allocations • Max GC latency < 10 ms, 99th percentile latency < 1 ms

Slide 78

Slide 78 text

OCaml 5 minor GC • Private minor heap arenas per domain ‣ Fast allocations without syncrhonization Major Heap Dom 0 Dom 1 Minor Heap Arena (2 mb) Minor Heap Arena (2 mb) Allocation Pointer

Slide 79

Slide 79 text

OCaml 5 minor GC • Private minor heap arenas per domain ‣ Fast allocations without syncrhonization • No restrictions on pointers between minor heap arenas and major heap Major Heap Dom 0 Dom 1 Minor Heap Arena (2 mb) Minor Heap Arena (2 mb) Allocation Pointer

Slide 80

Slide 80 text

OCaml 5 minor GC Major Heap Dom 0 Dom 1 Minor Heap Arena (2 mb) Minor Heap Arena (2 mb) Allocation Pointer • Stop-the-world parallel collection for minor heaps ‣ 2 barriers / minor gc; (some) work sharing between gc threads

Slide 81

Slide 81 text

OCaml 5 minor GC Major Heap Dom 0 Dom 1 Minor Heap Arena (2 mb) Minor Heap Arena (2 mb) Allocation Pointer • Stop-the-world parallel collection for minor heaps ‣ 2 barriers / minor gc; (some) work sharing between gc threads • On 24 cores, w/ default heap size (2MB / arena), < 10 ms pause for completeing minor GC

Slide 82

Slide 82 text

OCaml 5 major GC • Mostly concurrent mark-and-sweep GC Sweep Mark Mark Roots Mutator Sweep Mark Mark Roots Start of major cycle End of major cycle mark and sweep phases may overlap Domain 0 Domain 1

Slide 83

Slide 83 text

OCaml 5 major GC • Mostly concurrent mark-and-sweep GC • 3 barriers / cycle (when not using ephemerons) ‣ 1 each at the end of mark, fi nalise_ fi rst, fi nalise_last phases Sweep Mark Mark Roots Mutator Sweep Mark Mark Roots Start of major cycle End of major cycle mark and sweep phases may overlap Domain 0 Domain 1

Slide 84

Slide 84 text

OCaml 5 major GC • Mostly concurrent mark-and-sweep GC • 3 barriers / cycle (when not using ephemerons) ‣ 1 each at the end of mark, fi nalise_ fi rst, fi nalise_last phases • On 24 cores, < 5 ms pauses at barriers ‣ Only to agree that the phase has ended Sweep Mark Mark Roots Mutator Sweep Mark Mark Roots Start of major cycle End of major cycle mark and sweep phases may overlap Domain 0 Domain 1

Slide 85

Slide 85 text

Scalability

Slide 86

Slide 86 text

Backwards compatibility • Both e ff ect handlers and GC designed for backwards compatibility ‣ Performance, tooling support, features (almost all of them)

Slide 87

Slide 87 text

Backwards compatibility • Both e ff ect handlers and GC designed for backwards compatibility ‣ Performance, tooling support, features (almost all of them) • Performance ‣ OCaml 5 is designed to run sequential programs as well as OCaml 4 ‣ Any signi fi cant performance regressions (5%+) is a bug; please report it!

Slide 88

Slide 88 text

Backwards compatibility • Feature set ‣ All of the language including fi nalisers, weak references, ephemerons, systhreads supported - Compaction (manual) is manual, no naked pointers ‣ Programs with data races are type and memory safe! ‣ Racy use of Stdlib may yield surprising results, but will not crash! - think Queue, Hashtbl, Lazy, Unix, etc.

Slide 89

Slide 89 text

Backwards compatibility • Feature set ‣ All of the language including fi nalisers, weak references, ephemerons, systhreads supported - Compaction (manual) is manual, no naked pointers ‣ Programs with data races are type and memory safe! ‣ Racy use of Stdlib may yield surprising results, but will not crash! - think Queue, Hashtbl, Lazy, Unix, etc. • Existing tools continue to work ‣ GDB, perf, eBFP, statmemprof

Slide 90

Slide 90 text

Porting Applications to OCaml 5 Based on work done by Thomas Leonard @ Tarides https://roscidus.com/blog/blog/2024/07/22/performance-2/

Slide 91

Slide 91 text

Solver service • ocaml-ci — CI for OCaml projects ‣ Free to use for the OCaml community ‣ Build and run tests on a matrix of platforms on every commit - OCaml compilers (4.02 — 5.2), architectures (32- and 64-bit x86, ARM, PPC64, s390x), OSes (Alpine, Debian, Fedora, FreeBSD, macOS, OpenSUSE and Ubuntu, in multiple versions)

Slide 92

Slide 92 text

Solver service • ocaml-ci — CI for OCaml projects ‣ Free to use for the OCaml community ‣ Build and run tests on a matrix of platforms on every commit - OCaml compilers (4.02 — 5.2), architectures (32- and 64-bit x86, ARM, PPC64, s390x), OSes (Alpine, Debian, Fedora, FreeBSD, macOS, OpenSUSE and Ubuntu, in multiple versions) • Select compatible versions of its dependencies ‣ ~1s per solve ‣ 132 solves runs per commit!

Slide 93

Slide 93 text

Solver service • ocaml-ci — CI for OCaml projects ‣ Free to use for the OCaml community ‣ Build and run tests on a matrix of platforms on every commit - OCaml compilers (4.02 — 5.2), architectures (32- and 64-bit x86, ARM, PPC64, s390x), OSes (Alpine, Debian, Fedora, FreeBSD, macOS, OpenSUSE and Ubuntu, in multiple versions) • Select compatible versions of its dependencies ‣ ~1s per solve ‣ 132 solves runs per commit! • Solves are done by solver-service ‣ 160-core ARM machine ‣ Lwt-based; sub-process based parallelism for solves

Slide 94

Slide 94 text

Solver service • ocaml-ci — CI for OCaml projects ‣ Free to use for the OCaml community ‣ Build and run tests on a matrix of platforms on every commit - OCaml compilers (4.02 — 5.2), architectures (32- and 64-bit x86, ARM, PPC64, s390x), OSes (Alpine, Debian, Fedora, FreeBSD, macOS, OpenSUSE and Ubuntu, in multiple versions) • Select compatible versions of its dependencies ‣ ~1s per solve ‣ 132 solves runs per commit! • Solves are done by solver-service ‣ 160-core ARM machine ‣ Lwt-based; sub-process based parallelism for solves • Port it to OCaml 5 to take advantage of better concurrency and shared-memory parallelism

Slide 95

Slide 95 text

Solver service in OCaml 5 • Used Eio to port from multi-process parallel to shared-memory parallel ‣ Support for asynchronous IO (incl io_uring!) and parallelism ‣ Structured concurrency and switches for resource management

Slide 96

Slide 96 text

Solver service in OCaml 5 • Used Eio to port from multi-process parallel to shared-memory parallel ‣ Support for asynchronous IO (incl io_uring!) and parallelism ‣ Structured concurrency and switches for resource management • Outcome ‣ Simple code, more stable (switches), removal of lots of communication logic ‣ No function colouring! - Reclaim the use of try…with, for and while loops!

Slide 97

Slide 97 text

Solver service in OCaml 5 • Used Eio to port from multi-process parallel to shared-memory parallel ‣ Support for asynchronous IO (incl io_uring!) and parallelism ‣ Structured concurrency and switches for resource management • Outcome ‣ Simple code, more stable (switches), removal of lots of communication logic ‣ No function colouring! - Reclaim the use of try…with, for and while loops! • Used TSan to ensure that data races are removed

Slide 98

Slide 98 text

ThreadSanitizer (since 5.2) • Detect data races dynamically • Part of the LLVM project — C++, Go, Swift

Slide 99

Slide 99 text

ThreadSanitizer (since 5.2) • Detect data races dynamically • Part of the LLVM project — C++, Go, Swift 1 let a = ref 0 and b = ref 0 2 3 let d1 () = 4 a := 1; 5 !b 6 7 let d2 () = 8 b := 1; 9 !a 10 11 let () = 12 let h = Domain.spawn d2 in 13 let r1 = d1 () in 14 let r2 = Domain.join h in 15 assert (not (r1 = 0 && r2 = 0))

Slide 100

Slide 100 text

ThreadSanitizer (since 5.2) • Detect data races dynamically • Part of the LLVM project — C++, Go, Swift 1 let a = ref 0 and b = ref 0 2 3 let d1 () = 4 a := 1; 5 !b 6 7 let d2 () = 8 b := 1; 9 !a 10 11 let () = 12 let h = Domain.spawn d2 in 13 let r1 = d1 () in 14 let r2 = Domain.join h in 15 assert (not (r1 = 0 && r2 = 0)) ================== WARNING: ThreadSanitizer: data race (pid=3808831) Write of size 8 at 0x8febe0 by thread T1 (mutexes: write M90 #0 camlSimple_race.d2_274 simple_race.ml:8 (simple_race.ex #1 camlDomain.body_706 stdlib/domain.ml:211 (simple_race.e #2 caml_start_program (simple_race.exe+0x47cf37) #3 caml_callback_exn runtime/callback.c:197 (simple_race.e #4 domain_thread_func runtime/domain.c:1167 (simple_race.e Previous read of size 8 at 0x8febe0 by main thread (mutexes: #0 camlSimple_race.d1_271 simple_race.ml:5 (simple_race.ex #1 camlSimple_race.entry simple_race.ml:13 (simple_race.ex #2 caml_program (simple_race.exe+0x41ffb9) #3 caml_start_program (simple_race.exe+0x47cf37) [...]

Slide 101

Slide 101 text

Eio solver service performance • … was underwhelming ….initially

Slide 102

Slide 102 text

Eio solver service performance • … was underwhelming ….initially

Slide 103

Slide 103 text

Performance analysis • perf (incl. call graph), eBFP works ‣ Frame-pointers across e ff ect handlers!

Slide 104

Slide 104 text

Performance analysis • perf (incl. call graph), eBFP works ‣ Frame-pointers across e ff ect handlers! • Runtime Events ‣ Every OCaml 5 program has tracing support built-in ‣ Events are written to a shared ring bu ff er that can be read by an external process

Slide 105

Slide 105 text

Performance analysis • perf (incl. call graph), eBFP works ‣ Frame-pointers across e ff ect handlers! • Runtime Events ‣ Every OCaml 5 program has tracing support built-in ‣ Events are written to a shared ring bu ff er that can be read by an external process $ olly trace foo.trace foo.exe

Slide 106

Slide 106 text

Performance analysis • perf (incl. call graph), eBFP works ‣ Frame-pointers across e ff ect handlers! • Runtime Events ‣ Every OCaml 5 program has tracing support built-in ‣ Events are written to a shared ring bu ff er that can be read by an external process $ olly trace foo.trace foo.exe https://perfetto.dev/

Slide 107

Slide 107 text

Problem indentified • Switch from sched_other to sched_rr • git log for each solve to fi nd earliest commit ‣ 50ms penalty for STW subprocess spawn ‣ Avoid by implementing it in OCaml

Slide 108

Slide 108 text

Problem indentified • Switch from sched_other to sched_rr • git log for each solve to fi nd earliest commit ‣ 50ms penalty for STW subprocess spawn ‣ Avoid by implementing it in OCaml Still some work to do

Slide 109

Slide 109 text

Porting hack_parallel to domain parallelism • hack_parallel — an optimised o ff -heap multi-process hash table ‣ Used by Hack, Flow, Pyre ‣ Infer uses multi-process parallelism but not hack_parallel (?) Based on work done by Olivier Nicole @ Tarides
 https://hackmd.io/@l9pOcjkYQpyZ9sK5nuS6mw/HyyL1AG8R

Slide 110

Slide 110 text

Porting hack_parallel to domain parallelism • hack_parallel — an optimised o ff -heap multi-process hash table ‣ Used by Hack, Flow, Pyre ‣ Infer uses multi-process parallelism but not hack_parallel (?) • Experiments ‣ Pyre builds and runs very easily - Not successful building Hack ‣ 2 days of work to replace hack_parallel with parallelism-safe hash table from KCas library ‣ All tests pass (except 1 Lwt-based one which is expected to fail with parallelism) Based on work done by Olivier Nicole @ Tarides
 https://hackmd.io/@l9pOcjkYQpyZ9sK5nuS6mw/HyyL1AG8R

Slide 111

Slide 111 text

Porting hack_parallel to domain parallelism • hack_parallel — an optimised o ff -heap multi-process hash table ‣ Used by Hack, Flow, Pyre ‣ Infer uses multi-process parallelism but not hack_parallel (?) • Experiments ‣ Pyre builds and runs very easily - Not successful building Hack ‣ 2 days of work to replace hack_parallel with parallelism-safe hash table from KCas library ‣ All tests pass (except 1 Lwt-based one which is expected to fail with parallelism) • Very very very early performance numbers ‣ Domain parallel version ~10% slower running Pyre testsuite ‣ Need better benchmarks! Based on work done by Olivier Nicole @ Tarides
 https://hackmd.io/@l9pOcjkYQpyZ9sK5nuS6mw/HyyL1AG8R

Slide 112

Slide 112 text

Takeaways for introducing shared-memory parallellism • Use Eio for concurrency and parallelism in OCaml 5 ‣ Makes your asynchronous IO program more reliable

Slide 113

Slide 113 text

Takeaways for introducing shared-memory parallellism • Use Eio for concurrency and parallelism in OCaml 5 ‣ Makes your asynchronous IO program more reliable • Other libraries ‣ Saturn: Veri fi ed multicore safe data structures ‣ Kcas: Software transactional memory for OCaml

Slide 114

Slide 114 text

Takeaways for introducing shared-memory parallellism • Use Eio for concurrency and parallelism in OCaml 5 ‣ Makes your asynchronous IO program more reliable • Other libraries ‣ Saturn: Veri fi ed multicore safe data structures ‣ Kcas: Software transactional memory for OCaml • Use TSan to remove data races ‣ Data races will not lead to crashes

Slide 115

Slide 115 text

Takeaways for introducing shared-memory parallellism • Use Eio for concurrency and parallelism in OCaml 5 ‣ Makes your asynchronous IO program more reliable • Other libraries ‣ Saturn: Veri fi ed multicore safe data structures ‣ Kcas: Software transactional memory for OCaml • Use TSan to remove data races ‣ Data races will not lead to crashes • Expect that the initial performance may be underwhelming ‣ Existing external tools such as perf, eBPF based pro fi ling, statmemprof continue to work ‣ New tools are available on OCaml 5 enabled through runtime events — Olly, eio-trace, etc.