Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Reagents: Lock-free programming for the masses

Reagents: Lock-free programming for the masses

Efficient concurrent programming libraries are essential for taking advantage of fine-grained parallelism on multicore hardware. In this post, I will introduce reagents, a composable, lock-free concurrency library for expressing fine-grained parallel programs on Multicore OCaml. Reagents offer a high-level DSL for experts to specify efficient concurrency libraries, but also allows the consumers of the libraries to extend them further without knowing the details of the underlying implementation.

KC Sivaramakrishnan

August 02, 2016
Tweet

More Decks by KC Sivaramakrishnan

Other Decks in Programming

Transcript

  1. Reagents: lock-free programming for the masses “KC” Sivaramakrishnan OCaml Labs

    University of Cambridge
  2. Multicore OCaml 2 Concurrency Parallelism Compiler Language + Stdlib Libraries

  3. Multicore OCaml 2 Concurrency Parallelism Compiler Language + Stdlib Libraries

  4. Multicore OCaml 2 Concurrency Parallelism Compiler Fibers Language + Stdlib

    Libraries
  5. Multicore OCaml 2 Concurrency Parallelism Compiler Fibers Language + Stdlib

    • 12M fibers/s on 1 core • 30M fibers/s on 4 cores Libraries
  6. Multicore OCaml 2 Domains Concurrency Parallelism Compiler Fibers Language +

    Stdlib • 12M fibers/s on 1 core • 30M fibers/s on 4 cores Libraries
  7. Multicore OCaml 2 Effects Domains Concurrency Parallelism Compiler Fibers Language

    + Stdlib Domain API • 12M fibers/s on 1 core • 30M fibers/s on 4 cores Libraries
  8. Multicore OCaml 2 Effects Cooperative threading libraries Domains Concurrency Parallelism

    Compiler Fibers Language + Stdlib Domain API • 12M fibers/s on 1 core • 30M fibers/s on 4 cores Libraries
  9. Multicore OCaml 2 Effects Cooperative threading libraries Reagents: lock- free

    programming Domains Concurrency Parallelism Compiler Fibers Language + Stdlib Domain API • 12M fibers/s on 1 core • 30M fibers/s on 4 cores Libraries
  10. JVM: java.util.concurrent 3 .Net: System.Concurrent.Collections

  11. JVM: java.util.concurrent Synchronization Data structures Reentrant locks Semaphores R/W locks

    Reentrant R/W locks Condition variables Countdown latches Cyclic barriers Phasers Exchangers Queues Nonblocking Blocking (array & list) Synchronous Priority, nonblocking Priority, blocking Deques Sets Maps (hash & skiplist) 3 .Net: System.Concurrent.Collections
  12. JVM: java.util.concurrent Synchronization Data structures Reentrant locks Semaphores R/W locks

    Reentrant R/W locks Condition variables Countdown latches Cyclic barriers Phasers Exchangers Queues Nonblocking Blocking (array & list) Synchronous Priority, nonblocking Priority, blocking Deques Sets Maps (hash & skiplist) 3 .Net: System.Concurrent.Collections Not Composable
  13. How to build composable lock-free programs? 4

  14. lock-free 5

  15. lock-free 5 Under contention, at least 1 thread makes progress

  16. lock-free 5 Under contention, at least 1 thread makes progress

    Single thread in isolation makes progress obstruction-free
  17. lock-free 5 Under contention, at least 1 thread makes progress

    Under contention, each thread makes progress wait-free Single thread in isolation makes progress obstruction-free
  18. Compare-and-swap (CAS) module CAS : sig val cas : 'a

    ref -> expect:'a -> update:'a -> bool end = struct (* atomically... *) let cas r ~expect ~update = if !r = expect then (r:= update; true) else false end 6
  19. Compare-and-swap (CAS) module CAS : sig val cas : 'a

    ref -> expect:'a -> update:'a -> bool end = struct (* atomically... *) let cas r ~expect ~update = if !r = expect then (r:= update; true) else false end • Implemented atomically by processors • x86: CMPXCHG and friends • arm: LDREX, STREX, etc. • ppc: lwarx, stwcx, etc. 6
  20. 3 2 Head 7

  21. 3 2 Head 7 7

  22. 3 2 Head 7 7 CAS attempt

  23. 3 2 Head 7 5 7 CAS attempt

  24. 3 2 Head 7 5 CAS fail 7

  25. 3 2 Head 7 5 7

  26. 3 2 Head 7 5 8

  27. module type TREIBER_STACK = sig type 'a t val push

    : 'a t -> 'a -> unit ... end module Treiber_stack : TREIBER_STACK = struct type 'a t = 'a list ref let rec push s t = let cur = !s in if CAS.cas s cur (t::cur) then () else (backoff (); push s t) end 9
  28. module type TREIBER_STACK = sig type 'a t val push

    : 'a t -> 'a -> unit val try_pop : 'a t -> 'a option end module Treiber_stack : TREIBER_STACK = struct type 'a t = 'a list ref let rec push s t = ... let rec try_pop s = match !s with | [] -> None | (x::xs) as cur -> if CAS.cas s cur xs then Some x else (backoff (); try_pop s) end 10
  29. let v = Treiber_stack.pop s1 in Treiber_stack.push s2 v is

    not atomic 11
  30. Concurrency libraries are indispensable, but hard to build and extend

    The Problem: let v = Treiber_stack.pop s1 in Treiber_stack.push s2 v is not atomic 11
  31. Scalable concurrent algorithms can be built and extended using abstraction

    and composition Reagents Treiber_stack.pop s1 >>> Treiber_stack.push s2 is atomic 12
  32. 13 PLDI 2012

  33. 13 Sequential >>> — Software transactional memory Parallel <*> —

    Join Calculus Selective <+> — Concurrent ML PLDI 2012
  34. 13 Sequential >>> — Software transactional memory Parallel <*> —

    Join Calculus Selective <+> — Concurrent ML PLDI 2012 still lock-free!
  35. Design 14

  36. Lambda: the ultimate abstraction f 'a 'b g 'b 'c

    val f : 'a -> 'b val g : 'b -> 'c 15
  37. Lambda: the ultimate abstraction f 'a g 'b 'c (compose

    g f): 'a -> 'c 16
  38. f 'a 'b Lambda abstraction: 17

  39. f 'a 'b Lambda abstraction: Reagent abstraction: 'a 'b R

    ('a,'b) Reagent.t 17
  40. f 'a 'b Lambda abstraction: Reagent abstraction: 'a 'b R

    ('a,'b) Reagent.t 17 val run : ('a,'b) Reagent.t -> 'a -> ‘b
  41. Thread Interaction 18 module type Reagents = sig type ('a,'b)

    t (* shared memory *) module Ref : Ref.S with type ('a,'b) reagent = ('a,'b) t (* communication channels *) module Channel : Channel.S with type ('a,'b) reagent = ('a,'b) t ... end
  42. module type Channel = sig type ('a,'b) endpoint type ('a,'b)

    reagent val mk_chan : unit -> ('a,'b) endpoint * ('b,'a) endpoint val swap : ('a,'b) endpoint -> ('a,'b) reagent end
  43. c: ('a,'b) endpoint c swap 'a 'b module type Channel

    = sig type ('a,'b) endpoint type ('a,'b) reagent val mk_chan : unit -> ('a,'b) endpoint * ('b,'a) endpoint val swap : ('a,'b) endpoint -> ('a,'b) reagent end
  44. c: ('a,'b) endpoint c swap 'a 'b c swap 'b

    'a module type Channel = sig type ('a,'b) endpoint type ('a,'b) reagent val mk_chan : unit -> ('a,'b) endpoint * ('b,'a) endpoint val swap : ('a,'b) endpoint -> ('a,'b) reagent end
  45. c swap 'a 'b c: ('a,'b) endpoint

  46. swap Message passing type 'a ref val upd : 'a

    ref -> f:(‘a -> 'b -> ('a * ‘c) option) -> ('b, 'c) Reagent.t 21
  47. swap upd f r 'a 'a 'b 'c Message passing

    type 'a ref val upd : 'a ref -> f:(‘a -> 'b -> ('a * ‘c) option) -> ('b, 'c) Reagent.t 21
  48. swap upd f Message passing Shared state 22

  49. swap upd f 'a 'b R 'a 'b S Message

    passing Shared state 22
  50. swap upd f R S <+> 'a 'b Message passing

    Shared state 22
  51. swap upd f R S <+> Message passing Shared state

    Disjunction 23
  52. swap upd f R S <+> 'a 'b R 'a

    'c S Message passing Shared state Disjunction 23
  53. swap upd f R S <+> R S <*> 'a

    ('b * 'c) Message passing Shared state Disjunction 23
  54. swap upd f R S <+> R S <*> Message

    passing Shared state Disjunction Conjunction 24
  55. module type TREIBER_STACK = sig type 'a t val create

    : unit -> 'a t val push : 'a t -> ('a, unit) Reagent.t val pop : 'a t -> (unit, 'a) Reagent.t ... end module Treiber_stack : TREIBER_STACK = struct type 'a t = 'a list Ref.ref let create () = Ref.ref [] let push r x = Ref.upd r (fun xs x -> Some (x::xs,())) let pop r = Ref.upd r (fun l () -> match l with | [] -> None (* block *) | x::xs -> Some (xs,x)) ... end 25
  56. Composability Treiber_stack.pop s1 >>> Treiber_stack.push s2 Transfer elements atomically 26

  57. Composability Treiber_stack.pop s1 >>> Treiber_stack.push s2 Transfer elements atomically Consume

    elements atomically Treiber_stack.pop s1 <*> Treiber_stack.pop s2 26
  58. Composability Treiber_stack.pop s1 >>> Treiber_stack.push s2 Transfer elements atomically Consume

    elements atomically Treiber_stack.pop s1 <*> Treiber_stack.pop s2 Consume elements from either Treiber_stack.pop s1 <+> Treiber_stack.pop s2 26
  59. Composability 27 Transform arbitrary blocking reagent to a non-blocking reagent

  60. Composability 27 val lift : ('a -> 'b option) ->

    ('a,'b) t val constant : 'a -> ('b,'a) t Transform arbitrary blocking reagent to a non-blocking reagent
  61. Composability 27 let attempt (r : ('a,'b) t) : ('a,'b

    option) t = (r >>> lift (fun x -> Some (Some x))) <+> (constant None) val lift : ('a -> 'b option) -> ('a,'b) t val constant : 'a -> ('b,'a) t Transform arbitrary blocking reagent to a non-blocking reagent
  62. Composability 27 let attempt (r : ('a,'b) t) : ('a,'b

    option) t = (r >>> lift (fun x -> Some (Some x))) <+> (constant None) val lift : ('a -> 'b option) -> ('a,'b) t val constant : 'a -> ('b,'a) t Transform arbitrary blocking reagent to a non-blocking reagent let try_pop stack = attempt (pop stack)
  63. • Philosopher’s alternate between thinking and eating • Philosopher can

    only eat after obtaining both forks • No philosopher starves
  64. type fork = {drop : (unit,unit) endpoint; take : (unit,unit)

    endpoint} let mk_fork () = let drop, take = mk_chan () in {drop; take} let drop f = swap f.drop let take f = swap f.take • Philosopher’s alternate between thinking and eating • Philosopher can only eat after obtaining both forks • No philosopher starves
  65. type fork = {drop : (unit,unit) endpoint; take : (unit,unit)

    endpoint} let mk_fork () = let drop, take = mk_chan () in {drop; take} let drop f = swap f.drop let take f = swap f.take let eat l_fork r_fork = run (take l_fork <*> take r_fork) (); (* ... * eat * ... *) spawn @@ run (drop l_fork); spawn @@ run (drop r_fork) • Philosopher’s alternate between thinking and eating • Philosopher can only eat after obtaining both forks • No philosopher starves
  66. Implementation 29

  67. Phase 1 Phase 2 30

  68. Phase 1 Phase 2 Accumulate CASes 30

  69. Phase 1 Phase 2 Accumulate CASes Attempt k-CAS 30

  70. Accumulate CASes Attempt k-CAS 31

  71. Accumulate CASes Attempt k-CAS Permanent failure 31

  72. Accumulate CASes Attempt k-CAS Permanent failure Transient failure 31

  73. Accumulate CASes Attempt k-CAS Permanent failure Transient failure 31 HTM

    Ready
  74. Status https://github.com/ocamllabs/reagents Synchronization Data structures Locks Reentrant locks Semaphores R/W

    locks Reentrant R/W locks Condition variables Countdown latches Cyclic barriers Phasers Exchangers Queues Nonblocking Blocking (array & list) Synchronous Priority, nonblocking Priority, blocking Stacks Treiber Elimination backoff Counters Deques Sets Maps (hash & skiplist)
  75. STM vs Reagents • STM is more ambitious — atomic

    { … }. Reagents are conservative. • Reagents don’t allow multiple writes to the same memory location. • Reagents are lock-free. STMs are typically obstruction- free. 33