Reagents: Lock-free programming for the masses

Reagents: Lock-free programming for the masses

Efficient concurrent programming libraries are essential for taking advantage of fine-grained parallelism on multicore hardware. In this post, I will introduce reagents, a composable, lock-free concurrency library for expressing fine-grained parallel programs on Multicore OCaml. Reagents offer a high-level DSL for experts to specify efficient concurrency libraries, but also allows the consumers of the libraries to extend them further without knowing the details of the underlying implementation.

C29f097d23f8904532ca088ac23ce801?s=128

KC Sivaramakrishnan

August 02, 2016
Tweet

Transcript

  1. Reagents: lock-free programming for the masses “KC” Sivaramakrishnan OCaml Labs

    University of Cambridge
  2. Multicore OCaml 2 Concurrency Parallelism Compiler Language + Stdlib Libraries

  3. Multicore OCaml 2 Concurrency Parallelism Compiler Language + Stdlib Libraries

  4. Multicore OCaml 2 Concurrency Parallelism Compiler Fibers Language + Stdlib

    Libraries
  5. Multicore OCaml 2 Concurrency Parallelism Compiler Fibers Language + Stdlib

    • 12M fibers/s on 1 core • 30M fibers/s on 4 cores Libraries
  6. Multicore OCaml 2 Domains Concurrency Parallelism Compiler Fibers Language +

    Stdlib • 12M fibers/s on 1 core • 30M fibers/s on 4 cores Libraries
  7. Multicore OCaml 2 Effects Domains Concurrency Parallelism Compiler Fibers Language

    + Stdlib Domain API • 12M fibers/s on 1 core • 30M fibers/s on 4 cores Libraries
  8. Multicore OCaml 2 Effects Cooperative threading libraries Domains Concurrency Parallelism

    Compiler Fibers Language + Stdlib Domain API • 12M fibers/s on 1 core • 30M fibers/s on 4 cores Libraries
  9. Multicore OCaml 2 Effects Cooperative threading libraries Reagents: lock- free

    programming Domains Concurrency Parallelism Compiler Fibers Language + Stdlib Domain API • 12M fibers/s on 1 core • 30M fibers/s on 4 cores Libraries
  10. JVM: java.util.concurrent 3 .Net: System.Concurrent.Collections

  11. JVM: java.util.concurrent Synchronization Data structures Reentrant locks Semaphores R/W locks

    Reentrant R/W locks Condition variables Countdown latches Cyclic barriers Phasers Exchangers Queues Nonblocking Blocking (array & list) Synchronous Priority, nonblocking Priority, blocking Deques Sets Maps (hash & skiplist) 3 .Net: System.Concurrent.Collections
  12. JVM: java.util.concurrent Synchronization Data structures Reentrant locks Semaphores R/W locks

    Reentrant R/W locks Condition variables Countdown latches Cyclic barriers Phasers Exchangers Queues Nonblocking Blocking (array & list) Synchronous Priority, nonblocking Priority, blocking Deques Sets Maps (hash & skiplist) 3 .Net: System.Concurrent.Collections Not Composable
  13. How to build composable lock-free programs? 4

  14. lock-free 5

  15. lock-free 5 Under contention, at least 1 thread makes progress

  16. lock-free 5 Under contention, at least 1 thread makes progress

    Single thread in isolation makes progress obstruction-free
  17. lock-free 5 Under contention, at least 1 thread makes progress

    Under contention, each thread makes progress wait-free Single thread in isolation makes progress obstruction-free
  18. Compare-and-swap (CAS) module CAS : sig val cas : 'a

    ref -> expect:'a -> update:'a -> bool end = struct (* atomically... *) let cas r ~expect ~update = if !r = expect then (r:= update; true) else false end 6
  19. Compare-and-swap (CAS) module CAS : sig val cas : 'a

    ref -> expect:'a -> update:'a -> bool end = struct (* atomically... *) let cas r ~expect ~update = if !r = expect then (r:= update; true) else false end • Implemented atomically by processors • x86: CMPXCHG and friends • arm: LDREX, STREX, etc. • ppc: lwarx, stwcx, etc. 6
  20. 3 2 Head 7

  21. 3 2 Head 7 7

  22. 3 2 Head 7 7 CAS attempt

  23. 3 2 Head 7 5 7 CAS attempt

  24. 3 2 Head 7 5 CAS fail 7

  25. 3 2 Head 7 5 7

  26. 3 2 Head 7 5 8

  27. module type TREIBER_STACK = sig type 'a t val push

    : 'a t -> 'a -> unit ... end module Treiber_stack : TREIBER_STACK = struct type 'a t = 'a list ref let rec push s t = let cur = !s in if CAS.cas s cur (t::cur) then () else (backoff (); push s t) end 9
  28. module type TREIBER_STACK = sig type 'a t val push

    : 'a t -> 'a -> unit val try_pop : 'a t -> 'a option end module Treiber_stack : TREIBER_STACK = struct type 'a t = 'a list ref let rec push s t = ... let rec try_pop s = match !s with | [] -> None | (x::xs) as cur -> if CAS.cas s cur xs then Some x else (backoff (); try_pop s) end 10
  29. let v = Treiber_stack.pop s1 in Treiber_stack.push s2 v is

    not atomic 11
  30. Concurrency libraries are indispensable, but hard to build and extend

    The Problem: let v = Treiber_stack.pop s1 in Treiber_stack.push s2 v is not atomic 11
  31. Scalable concurrent algorithms can be built and extended using abstraction

    and composition Reagents Treiber_stack.pop s1 >>> Treiber_stack.push s2 is atomic 12
  32. 13 PLDI 2012

  33. 13 Sequential >>> — Software transactional memory Parallel <*> —

    Join Calculus Selective <+> — Concurrent ML PLDI 2012
  34. 13 Sequential >>> — Software transactional memory Parallel <*> —

    Join Calculus Selective <+> — Concurrent ML PLDI 2012 still lock-free!
  35. Design 14

  36. Lambda: the ultimate abstraction f 'a 'b g 'b 'c

    val f : 'a -> 'b val g : 'b -> 'c 15
  37. Lambda: the ultimate abstraction f 'a g 'b 'c (compose

    g f): 'a -> 'c 16
  38. f 'a 'b Lambda abstraction: 17

  39. f 'a 'b Lambda abstraction: Reagent abstraction: 'a 'b R

    ('a,'b) Reagent.t 17
  40. f 'a 'b Lambda abstraction: Reagent abstraction: 'a 'b R

    ('a,'b) Reagent.t 17 val run : ('a,'b) Reagent.t -> 'a -> ‘b
  41. Thread Interaction 18 module type Reagents = sig type ('a,'b)

    t (* shared memory *) module Ref : Ref.S with type ('a,'b) reagent = ('a,'b) t (* communication channels *) module Channel : Channel.S with type ('a,'b) reagent = ('a,'b) t ... end
  42. module type Channel = sig type ('a,'b) endpoint type ('a,'b)

    reagent val mk_chan : unit -> ('a,'b) endpoint * ('b,'a) endpoint val swap : ('a,'b) endpoint -> ('a,'b) reagent end
  43. c: ('a,'b) endpoint c swap 'a 'b module type Channel

    = sig type ('a,'b) endpoint type ('a,'b) reagent val mk_chan : unit -> ('a,'b) endpoint * ('b,'a) endpoint val swap : ('a,'b) endpoint -> ('a,'b) reagent end
  44. c: ('a,'b) endpoint c swap 'a 'b c swap 'b

    'a module type Channel = sig type ('a,'b) endpoint type ('a,'b) reagent val mk_chan : unit -> ('a,'b) endpoint * ('b,'a) endpoint val swap : ('a,'b) endpoint -> ('a,'b) reagent end
  45. c swap 'a 'b c: ('a,'b) endpoint

  46. swap Message passing type 'a ref val upd : 'a

    ref -> f:(‘a -> 'b -> ('a * ‘c) option) -> ('b, 'c) Reagent.t 21
  47. swap upd f r 'a 'a 'b 'c Message passing

    type 'a ref val upd : 'a ref -> f:(‘a -> 'b -> ('a * ‘c) option) -> ('b, 'c) Reagent.t 21
  48. swap upd f Message passing Shared state 22

  49. swap upd f 'a 'b R 'a 'b S Message

    passing Shared state 22
  50. swap upd f R S <+> 'a 'b Message passing

    Shared state 22
  51. swap upd f R S <+> Message passing Shared state

    Disjunction 23
  52. swap upd f R S <+> 'a 'b R 'a

    'c S Message passing Shared state Disjunction 23
  53. swap upd f R S <+> R S <*> 'a

    ('b * 'c) Message passing Shared state Disjunction 23
  54. swap upd f R S <+> R S <*> Message

    passing Shared state Disjunction Conjunction 24
  55. module type TREIBER_STACK = sig type 'a t val create

    : unit -> 'a t val push : 'a t -> ('a, unit) Reagent.t val pop : 'a t -> (unit, 'a) Reagent.t ... end module Treiber_stack : TREIBER_STACK = struct type 'a t = 'a list Ref.ref let create () = Ref.ref [] let push r x = Ref.upd r (fun xs x -> Some (x::xs,())) let pop r = Ref.upd r (fun l () -> match l with | [] -> None (* block *) | x::xs -> Some (xs,x)) ... end 25
  56. Composability Treiber_stack.pop s1 >>> Treiber_stack.push s2 Transfer elements atomically 26

  57. Composability Treiber_stack.pop s1 >>> Treiber_stack.push s2 Transfer elements atomically Consume

    elements atomically Treiber_stack.pop s1 <*> Treiber_stack.pop s2 26
  58. Composability Treiber_stack.pop s1 >>> Treiber_stack.push s2 Transfer elements atomically Consume

    elements atomically Treiber_stack.pop s1 <*> Treiber_stack.pop s2 Consume elements from either Treiber_stack.pop s1 <+> Treiber_stack.pop s2 26
  59. Composability 27 Transform arbitrary blocking reagent to a non-blocking reagent

  60. Composability 27 val lift : ('a -> 'b option) ->

    ('a,'b) t val constant : 'a -> ('b,'a) t Transform arbitrary blocking reagent to a non-blocking reagent
  61. Composability 27 let attempt (r : ('a,'b) t) : ('a,'b

    option) t = (r >>> lift (fun x -> Some (Some x))) <+> (constant None) val lift : ('a -> 'b option) -> ('a,'b) t val constant : 'a -> ('b,'a) t Transform arbitrary blocking reagent to a non-blocking reagent
  62. Composability 27 let attempt (r : ('a,'b) t) : ('a,'b

    option) t = (r >>> lift (fun x -> Some (Some x))) <+> (constant None) val lift : ('a -> 'b option) -> ('a,'b) t val constant : 'a -> ('b,'a) t Transform arbitrary blocking reagent to a non-blocking reagent let try_pop stack = attempt (pop stack)
  63. • Philosopher’s alternate between thinking and eating • Philosopher can

    only eat after obtaining both forks • No philosopher starves
  64. type fork = {drop : (unit,unit) endpoint; take : (unit,unit)

    endpoint} let mk_fork () = let drop, take = mk_chan () in {drop; take} let drop f = swap f.drop let take f = swap f.take • Philosopher’s alternate between thinking and eating • Philosopher can only eat after obtaining both forks • No philosopher starves
  65. type fork = {drop : (unit,unit) endpoint; take : (unit,unit)

    endpoint} let mk_fork () = let drop, take = mk_chan () in {drop; take} let drop f = swap f.drop let take f = swap f.take let eat l_fork r_fork = run (take l_fork <*> take r_fork) (); (* ... * eat * ... *) spawn @@ run (drop l_fork); spawn @@ run (drop r_fork) • Philosopher’s alternate between thinking and eating • Philosopher can only eat after obtaining both forks • No philosopher starves
  66. Implementation 29

  67. Phase 1 Phase 2 30

  68. Phase 1 Phase 2 Accumulate CASes 30

  69. Phase 1 Phase 2 Accumulate CASes Attempt k-CAS 30

  70. Accumulate CASes Attempt k-CAS 31

  71. Accumulate CASes Attempt k-CAS Permanent failure 31

  72. Accumulate CASes Attempt k-CAS Permanent failure Transient failure 31

  73. Accumulate CASes Attempt k-CAS Permanent failure Transient failure 31 HTM

    Ready
  74. Status https://github.com/ocamllabs/reagents Synchronization Data structures Locks Reentrant locks Semaphores R/W

    locks Reentrant R/W locks Condition variables Countdown latches Cyclic barriers Phasers Exchangers Queues Nonblocking Blocking (array & list) Synchronous Priority, nonblocking Priority, blocking Stacks Treiber Elimination backoff Counters Deques Sets Maps (hash & skiplist)
  75. STM vs Reagents • STM is more ambitious — atomic

    { … }. Reagents are conservative. • Reagents don’t allow multiple writes to the same memory location. • Reagents are lock-free. STMs are typically obstruction- free. 33