Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Concurrent & Multicore OCaml: A deep dive

Concurrent & Multicore OCaml: A deep dive

Talk was given at Facebook, Menlo Park.

KC Sivaramakrishnan

January 28, 2016
Tweet

More Decks by KC Sivaramakrishnan

Other Decks in Programming

Transcript

  1. Concurrent & Multicore OCaml: A deep dive KC Sivaramakrishnan1 &

    Stephen Dolan1 Leo White2, Jeremy Yallop1,3, Armaël Guéneau4, Anil Madhavapeddy1,3 1 2 3 4
  2. Concurrency ≠ Parallelism • Concurrency • Programming technique • Overlapped

    execution of processes • Parallelism • (Extreme) Performance hack • Simultaneous execution of computations
  3. Concurrency ≠ Parallelism • Concurrency • Programming technique • Overlapped

    execution of processes • Parallelism • (Extreme) Performance hack • Simultaneous execution of computations Concurrency ∩ Parallelism ➔ Scalable Concurrency
  4. Concurrency ≠ Parallelism • Concurrency • Programming technique • Overlapped

    execution of processes • Parallelism • (Extreme) Performance hack • Simultaneous execution of computations Concurrency ∩ Parallelism ➔ Scalable Concurrency (Fibers) (Domains)
  5. Schedulers • Multiplexing fibers over domain(s) • Bake scheduler into

    the runtime system (GHC) • Allow programmers to describe schedulers! • Parallel search —> LIFO work-stealing • Web-server —> FIFO runqueue • Data parallel —> Gang scheduling
  6. Schedulers • Multiplexing fibers over domain(s) • Bake scheduler into

    the runtime system (GHC) • Allow programmers to describe schedulers! • Parallel search —> LIFO work-stealing • Web-server —> FIFO runqueue • Data parallel —> Gang scheduling • Algebraic Effects and Handlers
  7. • Programming and reasoning about computational effects in a pure

    setting. • Cf. Monads Algebraic effects & handlers
  8. • Programming and reasoning about computational effects in a pure

    setting. • Cf. Monads • Eff — http://www.eff-lang.org/ Algebraic effects & handlers
  9. Algebraic Effects: Example exception Foo of int let f ()

    = 1 + (raise (Foo 3)) let r = try f () with Foo i -> i + 1
  10. Algebraic Effects: Example exception Foo of int let f ()

    = 1 + (raise (Foo 3)) let r = try f () with Foo i -> i + 1
  11. Algebraic Effects: Example exception Foo of int let f ()

    = 1 + (raise (Foo 3)) let r = try f () with Foo i -> i + 1 val r : int = 4
  12. Algebraic Effects: Example exception Foo of int let f ()

    = 1 + (raise (Foo 3)) let r = try f () with Foo i -> i + 1 val r : int = 4 effect Foo : int -> int let f () = 1 + (perform (Foo 3)) let r = try f () with effect (Foo i) k -> continue k (i + 1)
  13. Algebraic Effects: Example exception Foo of int let f ()

    = 1 + (raise (Foo 3)) let r = try f () with Foo i -> i + 1 val r : int = 4 effect Foo : int -> int let f () = 1 + (perform (Foo 3)) let r = try f () with effect (Foo i) k -> continue k (i + 1)
  14. Algebraic Effects: Example exception Foo of int let f ()

    = 1 + (raise (Foo 3)) let r = try f () with Foo i -> i + 1 val r : int = 4 effect Foo : int -> int let f () = 1 + (perform (Foo 3)) let r = try f () with effect (Foo i) k -> continue k (i + 1)
  15. Algebraic Effects: Example exception Foo of int let f ()

    = 1 + (raise (Foo 3)) let r = try f () with Foo i -> i + 1 val r : int = 4 effect Foo : int -> int let f () = 1 + (perform (Foo 3)) 4 let r = try f () with effect (Foo i) k -> continue k (i + 1)
  16. Algebraic Effects: Example exception Foo of int let f ()

    = 1 + (raise (Foo 3)) let r = try f () with Foo i -> i + 1 val r : int = 4 effect Foo : int -> int let f () = 1 + (perform (Foo 3)) 4 let r = try f () with effect (Foo i) k -> continue k (i + 1) val r : int = 5
  17. Algebraic Effects: Example exception Foo of int let f ()

    = 1 + (raise (Foo 3)) let r = try f () with Foo i -> i + 1 val r : int = 4 effect Foo : int -> int let f () = 1 + (perform (Foo 3)) 4 let r = try f () with effect (Foo i) k -> continue k (i + 1) val r : int = 5 fiber — lightweight stack
  18. • Fibers: Heap allocated, dynamically resized stacks • ~10s of

    bytes • No unnecessary closure allocation costs unlike CPS Implementation
  19. • Fibers: Heap allocated, dynamically resized stacks • ~10s of

    bytes • No unnecessary closure allocation costs unlike CPS • One-shot delimited continuations • Simplifies reasoning about resources - sockets, locks, etc. Implementation
  20. • Fibers: Heap allocated, dynamically resized stacks • ~10s of

    bytes • No unnecessary closure allocation costs unlike CPS • One-shot delimited continuations • Simplifies reasoning about resources - sockets, locks, etc. • Handlers —> Linked-list of fibers Implementation
  21. • Fibers: Heap allocated, dynamically resized stacks • ~10s of

    bytes • No unnecessary closure allocation costs unlike CPS • One-shot delimited continuations • Simplifies reasoning about resources - sockets, locks, etc. • Handlers —> Linked-list of fibers Implementation handle / continue handler sp call chain reference
  22. Implementation handle / continue handle / continue sp handler call

    chain reference • Fibers: Heap allocated, dynamically resized stacks • ~10s of bytes • No unnecessary closure allocation costs unlike CPS • One-shot delimited continuations • Simplifies reasoning about resources - sockets, locks, etc. • Handlers —> Linked-list of fibers
  23. perform sp handle / continue Implementation handler call chain reference

    • Fibers: Heap allocated, dynamically resized stacks • ~10s of bytes • No unnecessary closure allocation costs unlike CPS • One-shot delimited continuations • Simplifies reasoning about resources - sockets, locks, etc. • Handlers —> Linked-list of fibers
  24. Native-code fibers — Vanilla OCaml start program C call OCaml

    callback C call OCaml callback C OCaml C OCaml C OCaml
  25. C system stack Native-code fibers — Effects OCaml heap OCaml

    start program C call handle OCaml callback C
  26. C system stack Native-code fibers — Effects OCaml heap OCaml

    start program C call handle OCaml callback C call C C
  27. C system stack Native-code fibers — Effects OCaml heap OCaml

    start program C call handle OCaml callback C call C C 1. Stack overflow checks for OCaml functions • Simple static analysis eliminates many checks
  28. C system stack Native-code fibers — Effects OCaml heap OCaml

    start program C call handle OCaml callback C call C C 1. Stack overflow checks for OCaml functions • Simple static analysis eliminates many checks 2. FFI calls are more expensive due to stack switching • Specialise for calls which {allocate / pass arguments on stack / do neither}
  29. Performance : Vanilla OCaml 0 0.25 0.5 0.75 1 almabench

    alt-ergo-parameter_smallest_divisor alt-ergo-carte_autorisee_3 alt-ergo-parameter_relabel alt-ergo-OBF__ggjj_2 alt-ergo-parameter_def alt-ergo-parameter_def alt-ergo-OBF__yyll_1 alt-ergo-bbvv_351 alt-ergo-controler_carte_13 alt-ergo-div2_sub alt-ergo-parameter_def alt-ergo-ccgg_2055 alt-ergo-parameter_def alt-ergo-parameter_inverse_in_place alt-ergo-ccgg_1759 alt-ergo-parameter_def alt-ergo-induction_step alt-ergo-ccgg_1618 alt-ergo-ccgg_219 alt-ergo-advance_automaton_25 alt-ergo-nsec_sum_higher_than_1s alt-ergo-fill_assert_39_Alt-Ergo bdd chameneos-async chameneos-lwt cohttp-lwt core_micro cpdf-merge cpdf-reformat cpdf-squeeze cpdf-transform frama-c-deflate frama-c-idct js_of_ocaml jsontrip-sample kb kb-no-exc lexifi-g2pp menhir-fancy menhir-sql menhir-standard minilight numal-durand-kerner-aberth numal-fft numal-k-means numal-levinson-durbin numal-lu-decomposition numal-naive-multilayer numal-qr-decomposition numal-rnd_access numal-simple_access patdiff sauvola-contrast sequence sequence-cps setrip setrip-smallbuf thread-ring-async-pipe thread-ring-lwt-mvar thread-ring-lwt-stream thread-sleep-async thread-sleep-lwt valet-async valet-lwt ydump-sample 4.02.2+effects 4.02.2+vanilla Normalised time (lower is better)
  30. Performance : Vanilla OCaml 0 0.25 0.5 0.75 1 almabench

    alt-ergo-parameter_smallest_divisor alt-ergo-carte_autorisee_3 alt-ergo-parameter_relabel alt-ergo-OBF__ggjj_2 alt-ergo-parameter_def alt-ergo-parameter_def alt-ergo-OBF__yyll_1 alt-ergo-bbvv_351 alt-ergo-controler_carte_13 alt-ergo-div2_sub alt-ergo-parameter_def alt-ergo-ccgg_2055 alt-ergo-parameter_def alt-ergo-parameter_inverse_in_place alt-ergo-ccgg_1759 alt-ergo-parameter_def alt-ergo-induction_step alt-ergo-ccgg_1618 alt-ergo-ccgg_219 alt-ergo-advance_automaton_25 alt-ergo-nsec_sum_higher_than_1s alt-ergo-fill_assert_39_Alt-Ergo bdd chameneos-async chameneos-lwt cohttp-lwt core_micro cpdf-merge cpdf-reformat cpdf-squeeze cpdf-transform frama-c-deflate frama-c-idct js_of_ocaml jsontrip-sample kb kb-no-exc lexifi-g2pp menhir-fancy menhir-sql menhir-standard minilight numal-durand-kerner-aberth numal-fft numal-k-means numal-levinson-durbin numal-lu-decomposition numal-naive-multilayer numal-qr-decomposition numal-rnd_access numal-simple_access patdiff sauvola-contrast sequence sequence-cps setrip setrip-smallbuf thread-ring-async-pipe thread-ring-lwt-mvar thread-ring-lwt-stream thread-sleep-async thread-sleep-lwt valet-async valet-lwt ydump-sample 4.02.2+effects 4.02.2+vanilla Normalised time (lower is better) 4.02.2+effects ~5.4% slower
  31. Performance : Chameneos-Redux Time (S) 0 0.45 0.9 1.35 1.8

    Iterations (X100,000) 1 2 3 4 5 6 7 8 9 10 Lwt Concurrency Monad GHC Fibers
  32. Generator from Iterator1 [1] https://github.com/kayceesrk/ocaml15-eff/blob/master/generator.ml let rec iter f =

    function | Leaf -> () | Node (l, x, r) -> iter f l; f x; iter f r type 'a t = | Leaf | Node of 'a t * 'a * 'a t
  33. Generator from Iterator1 [1] https://github.com/kayceesrk/ocaml15-eff/blob/master/generator.ml (* val to_gen : 'a

    t -> (unit -> 'a option) *) let to_gen (type a) (t : a t) = let module M = struct effect Next : a -> unit end in let open M in let step = ref (fun () -> assert false) in let first_step () = try iter (fun x -> perform (Next x)) t; None with effect (Next v) k -> step := continue k; Some v in step := first_step; fun () -> !step () let rec iter f = function | Leaf -> () | Node (l, x, r) -> iter f l; f x; iter f r type 'a t = | Leaf | Node of 'a t * 'a * 'a t
  34. Performance : Generator Time (S) 0 1 2 3 4

    Binary tree depth 15 16 17 18 19 20 21 22 23 24 25 Iterator Fiber Generator H/W Generator
  35. Javascript backend • js_of_ocaml • OCaml bytecode —> Javascript •

    js_of_ocaml compiler pass • Whole-program selective CPS transformation
  36. Javascript backend • js_of_ocaml • OCaml bytecode —> Javascript •

    js_of_ocaml compiler pass • Whole-program selective CPS transformation • Work-in-progress! • Runs “hello-effects-world”!