Slide 1

Slide 1 text

What Came First? The Ordering of Events in Systems @kavya719

Slide 2

Slide 2 text

kavya

Slide 3

Slide 3 text

the design of concurrent systems

Slide 4

Slide 4 text

No content

Slide 5

Slide 5 text

Slack architecture on AWS

Slide 6

Slide 6 text

systems with multiple independent actors. nodes in a distributed system. threads in a multithreaded program. concurrent actors

Slide 7

Slide 7 text

user-space or system threads threads

Slide 8

Slide 8 text

R W R W func main() { for { if len(tasks) > 0 { task := dequeue(tasks)
 process(task) } } } user-space or system threads threads var tasks []Task

Slide 9

Slide 9 text

multiple threads: // Shared variable var tasks []Task func worker() { for len(tasks) > 0 { task := dequeue(tasks) process(task) } } func main() { // Spawn fixed-pool of worker threads.
 startWorkers(3, worker) // Populate task queue. for _, t := range hellaTasks {
 tasks = append(tasks, t) } } R W R W g2 g1 “when two+ threads concurrently access a shared memory location, at least one access is a write.” data race

Slide 10

Slide 10 text

…many threads provides concurrency, may introduce data races.

Slide 11

Slide 11 text

nodes processes i.e. logical nodes
 (but term can also refer to machines i.e.
 physical nodes). communicate by message-passing i.e.
 connected by unreliable network, 
 no shared memory. are sequential. no global clock.

Slide 12

Slide 12 text

distributed key-value store.
 three nodes with master and two replicas. M R R cart: [ apple crepe,
 blueberry crepe ] cart: [ ] ADD apple crepe userX ADD blueberry crepe userY

Slide 13

Slide 13 text

distributed key-value store.
 three nodes with three equal replicas. read_quorum = write_quorum = 1. eventually consistent. cart: [ ] N2 N3 N1 cart: [ apple crepe ] ADD apple crepe userX cart: [ blueberry crepe ] ADD blueberry crepe userY

Slide 14

Slide 14 text

…multiple nodes accepting writes
 provides availability, may introduce conflicts.

Slide 15

Slide 15 text

given we want concurrent systems, we need to deal with data races,
 conflict resolution.

Slide 16

Slide 16 text

riak: distributed key-value store channels: Go concurrency primitive stepping back: similarity,
 meta-lessons

Slide 17

Slide 17 text

riak a distributed datastore

Slide 18

Slide 18 text

riak • Distributed key-value database:
 // A data item = 
 {“uuid1234”: {“name”:”ada”}}
 • v1.0 released in 2011.
 Based on Amazon’s Dynamo. • Eventually consistent:
 uses optimistic replication i.e. 
 replicas can temporarily diverge,
 will eventually converge.
 • Highly available:
 data partitioned and replicated,
 decentralized,
 sloppy quorum. ]AP system (CAP theorem)

Slide 19

Slide 19 text

cart: [ ] N2 N3 N1 cart: [ apple crepe ] cart: [ blueberry crepe ] ADD apple crepe ADD blueberry crepe cart: [ apple crepe ] N2 N3 N1 cart: [ date crepe ] UPDATE to date crepe conflict resolution causal updates

Slide 20

Slide 20 text

how do we determine causal vs. concurrent updates?

Slide 21

Slide 21 text

{ cart : [ A ] } N1 N2 N3 userY { cart : [ B ] } userX { cart : [ A ]} userX { cart : [ D ]} A B C D concurrent events? A: apple B: blueberry D: date

Slide 22

Slide 22 text

N1 N2 N3 A B C D concurrent events?

Slide 23

Slide 23 text

A B C D N1 N2 N3 A, C: not concurrent — same sequential actor

Slide 24

Slide 24 text

A B C D N1 N2 N3 A, C: not concurrent — same sequential actor C, D: not concurrent — fetch/ update pair

Slide 25

Slide 25 text

happens-before X ≺ Y IF one of: — same actor — are a synchronization pair — X ≺ E ≺ Y across actors. IF X not ≺ Y and Y not ≺ X , concurrent! orders events Formulated in Lamport’s 
 Time, Clocks, and the Ordering of Events paper in 1978. establishes causality and concurrency. (threads or nodes)

Slide 26

Slide 26 text

A ≺ C (same actor) C ≺ D (synchronization pair) So, A ≺ D (transitivity) causality and concurrency A B C D N1 N2 N3

Slide 27

Slide 27 text

…but B ? D
 D ? B So, B, D concurrent! A B C D N1 N2 N3 causality and concurrency

Slide 28

Slide 28 text

A B C D N1 N2 N3 { cart : [ A ] } { cart : [ B ] } { cart : [ A ]} { cart : [ D ]} A ≺ D
 D should update A 
 B, D concurrent B, D need resolution

Slide 29

Slide 29 text

how do we implement happens-before?

Slide 30

Slide 30 text

0 0 1 0 0 0 n1 n2 n3 0 0 0 0 0 0 n1 n2 n3 n1 n2 n3 n1 n2 n3 vector clocks means to establish happens-before edges. 1 0 0

Slide 31

Slide 31 text

0 0 0 n1 n2 n3 0 0 0 1 0 0 2 0 0 0 0 0 0 0 1 n1 n2 n3 n1 n2 n3 n1 n2 n3 vector clocks means to establish happens-before edges.

Slide 32

Slide 32 text

0 0 0 n1 n2 n3 0 0 0 1 0 0 2 0 0 0 0 0 0 0 1 0 1 0 n1 n2 n3 n1 n2 n3 n1 n2 n3 vector clocks means to establish happens-before edges.

Slide 33

Slide 33 text

0 0 0 2 1 0 n1 n2 n3 0 0 0 1 0 0 2 0 0 0 0 0 0 0 1 n1 n2 n3 n1 n2 n3 n1 n2 n3 vector clocks means to establish happens-before edges. max ((2, 0, 0), (0, 1, 0))

Slide 34

Slide 34 text

0 0 0 2 1 0 n1 n2 n3 0 0 0 1 0 0 2 0 0 0 0 0 0 0 1 n1 n2 n3 n1 n2 n3 n1 n2 n3 vector clocks means to establish happens-before edges. max ((2, 0, 0), (0, 1, 0)) happens-before comparison: X ≺ Y iff VCx < VCy

Slide 35

Slide 35 text

A B C D N1 N2 N3 1 0 0 0 0 1 2 0 0 2 0 0 2 1 0 1 0 0 1 0 0 2 1 0 So, A ≺ D VC at D: VC at A:

Slide 36

Slide 36 text

A B C D N1 N2 N3 1 0 0 0 0 1 2 0 0 2 0 0 2 1 0 1 0 0 0 0 1 2 1 0 VC at D: VC at B: So, B, D concurrent

Slide 37

Slide 37 text

causality tracking in riak GET, PUT operations on a key pass around a casual context object, that contains the vector clocks. Therefore, able to detect conflicts. a more precise form,
 “dotted version vector” Riak stores a vector clock with each version of the data. 2 1 0 2 0 0 n1 n2 max ((2, 0, 0), (0, 1, 0))

Slide 38

Slide 38 text

…what about resolving those conflicts? causality tracking in riak GET, PUT operations on a key pass around a casual context object, that contains the vector clocks. a more precise form,
 “dotted version vector” Riak stores a vector clock with each version of the data. Therefore, able to detect conflicts.

Slide 39

Slide 39 text

conflict resolution in riak Behavior is configurable.
 Assuming vector clock analysis enabled:
 • last-write-wins
 i.e. version with higher timestamp picked. • merge, iff the underlying data type is a CRDT • return conflicting versions to application
 riak stores “siblings” or conflicting versions,
 returned to application for resolution.

Slide 40

Slide 40 text

return conflicting versions to application: 0 0 1 2 1 0 D: { cart: [ “date crepe” ] } B: { cart: [ “blueberry crepe” ] } Riak stores both versions next op returns both to application application must resolve conflict { cart: [ “blueberry crepe”, “date crepe” ] } 2 1 1 which creates a causal update { cart: [ “blueberry crepe”, “date crepe” ] }

Slide 41

Slide 41 text

…what about resolving those conflicts? doesn’t (default behavior). instead, exposes happens-before graph to the application for conflict resolution.

Slide 42

Slide 42 text

riak: uses vector clocks to track causality and conflicts. exposes happens-before graph to the user for conflict resolution.

Slide 43

Slide 43 text

channels Go concurrency primitive

Slide 44

Slide 44 text

R W R W g2 g1 multiple threads: // Shared variable var tasks []Task func worker() { for len(tasks) > 0 { task := dequeue(tasks) process(task) } } func main() { // Spawn fixed-pool of worker threads.
 startWorkers(3, worker) // Populate task queue. for _, t := range hellaTasks {
 tasks = append(tasks, t) } } “when two+ threads concurrently access a shared memory location, at least one access is a write.” data race

Slide 45

Slide 45 text

specifies when an event happens before another. memory model X ≺ Y IF one of: — same thread — are a synchronization pair — X ≺ E ≺ Y IF X not ≺ Y and Y not ≺ X , concurrent! x = 1 print(x) X Y unlock/ lock on a mutex, send / recv on a channel, spawn/ first event of a thread. etc.

Slide 46

Slide 46 text

The unit of concurrent execution: goroutines user-space threads
 use as you would threads 
 > go handle_request(r) Go memory model specified in terms of goroutines within a goroutine: reads + writes are ordered with multiple goroutines: shared data must be synchronized…else data races! goroutines

Slide 47

Slide 47 text

The synchronization primitives are: mutexes, conditional vars, …
 > import “sync” 
 > mu.Lock() atomics
 > import “sync/ atomic"
 > atomic.AddUint64(&myInt, 1) channels synchronization

Slide 48

Slide 48 text

“Do not communicate by sharing memory; 
 instead, share memory by communicating.” standard type in Go — chan safe for concurrent use. mechanism for goroutines to communicate, and synchronize. Conceptually similar to Unix pipes:
 
 > ch := make(chan int) // Initialize
 > go func() { ch <- 1 } () // Send
 > <-ch // Receive, blocks until sent.
 channels

Slide 49

Slide 49 text

// Shared variable var tasks []Task func worker() { for len(tasks) > 0 { task := dequeue(tasks) process(task) } } func main() { // Spawn fixed-pool of workers.
 startWorkers(3, worker) // Populate task queue. for _, t := range hellaTasks {
 tasks = append(tasks, t) } } want: main: * give tasks to workers. worker: * get a task. * process it. * repeat.

Slide 50

Slide 50 text

var taskCh = make(chan Task, n) var resultCh = make(chan Result) func worker() { for { // Get a task. t := <-taskCh process(t)
 // Send the result. resultCh <- r } } func main() { // Spawn fixed-pool of workers.
 startWorkers(3, worker) // Populate task queue. for _, t := range hellaTasks {
 taskCh <- t } // Wait for and amalgamate results. var results []Result for r := range resultCh { results = append(results, r) } }

Slide 51

Slide 51 text

// Shared variable var tasks []Task func worker() { for len(tasks) > 0 { task := dequeue(tasks) process(task) } } func main() { // Spawn fixed-pool of workers.
 startWorkers(3, worker) // Populate task queue. for _, t := range hellaTasks {
 tasks = append(tasks, t) } } ] ] mu mu ] mu …but workers can exit early. mutex?

Slide 52

Slide 52 text

want: worker: * wait for task * process it * repeat main: * send tasks main worker send task wait for task process recv task channel semantics
 (as used): send task to happen before worker runs. …channels allow us to express happens-before constraints.

Slide 53

Slide 53 text

channels: allow, and force, the user to express happens-before constraints.

Slide 54

Slide 54 text

stepping back…

Slide 55

Slide 55 text

first principle:
 happens-before riak: distributed key-value store channels: Go concurrency primitive surface happens-before to the user similarities

Slide 56

Slide 56 text

meta-lessons

Slide 57

Slide 57 text

new technologies cleverly decompose into old ideas

Slide 58

Slide 58 text

the “right” boundaries for abstractions are flexible.

Slide 59

Slide 59 text

@kavya719 ≺ happens-before riak channels https://speakerdeck.com/kavya719/what-came-first

Slide 60

Slide 60 text

nodes in Riak: > virtual nodes (“vnodes”) > key-space partitioning by consistent hashing,1 vnode per partition.
 > sequential because Erlang processes, use message queues.
 replicas:
 > N, R, W, etc. configurable by key. > on network partition, defaults to sloppy quorum w/ hinted-handoff. conflict-resolution: > by read-repair, active anti-entropy. riak: a note (or two)…

Slide 61

Slide 61 text

riak: dotted version vectors problem with standard vector clocks: false concurrency.
 
 userX: PUT “cart”:”A”, {} —> (1, 0); “A” userY: PUT “cart”:”B”, {} —> (2, 0); [“A”, “B”] userX: PUT “cart”:”C”, {(1, 0); “A”} —> (1, 0) !< (2, 0) —> (3, 0); [“A”, “B”, “C”]
 This is false concurrency; leads to “sibling explosion”.
 
 dotted version vectors fine-grained mechanism to detect causal updates.
 decompose each vector clock into its set of discrete events, so:
 userX: PUT “cart”:”A”, {} —> (1, 0); “A” userY: PUT “cart”:”B”, {} —> (2, 0); [(1, 0)->”A”, (2, 0)->”B”] userX: PUT “cart”:”C”, {} —> (3, 0); [(2, 0)->”B”, (3, 0)->”C”]

Slide 62

Slide 62 text

riak: CRDTs Conflict-free / Convergent / Commutative Replicated Data Type
 > data structure with property:
 replicas can be updated concurrently without coordination, and 
 it’s mathematically possible to always resolve conflicts. 
 > two types: op-based (commutative) and state-based (convergent). 
 > examples: G-Set (Grow-Only Set), G-Counter, PN-Counter
 
 > Riak DT is state-based CRDTs.

Slide 63

Slide 63 text

ch := make(chan int, 3) channels: implementation nil nil buf sendq recvq lock ... waiting senders waiting receivers ring buffer mutex hchan

Slide 64

Slide 64 text

ch <- t1 g1 ch <- t4 ch <- t2 ch <- t3 nil nil nil buf sendq recvq lock g1 buf sendq recvq lock

Slide 65

Slide 65 text

ch <- t1 g1 buf sendq recvq lock g1 nil <-ch g2

Slide 66

Slide 66 text

buf sendq recvq lock nil nil <-ch g2 g1

Slide 67

Slide 67 text

buf sendq recvq lock nil nil <-ch g2 g1 ch <- t4 buf sendq recvq lock nil nil

Slide 68

Slide 68 text

A B C D W send R g1 g2 recv // Shared variable var count = 0 var ch = make(chan bool, 1) func setCount() { count++ ch <- true } func printCount() { <- ch
 print(count) } go setCount()
 go printCount() B ≺ C
 So, A ≺ D 1. send happens-before corresponding receive

Slide 69

Slide 69 text

2. nth receive on a channel of size C happens-before n+Cth send completes. var maxOutstanding = 3 var taskCh = make(chan int, maxOutstanding) func worker() { for { t := <-taskCh processAndStore(t) } } func main() { go worker()
 tasks := generateHellaTasks() for _, t := range tasks { taskCh <- t } }

Slide 70

Slide 70 text

If channel empty:
 receiver goroutine paused;
 resumed after a channel send occurs. 
 If channel not empty:
 receiver gets first unreceived element
 i.e. buffer is a FIFO queue. Sends must have completed due to mutex. 1. send happens-before corresponding receive.

Slide 71

Slide 71 text

“2nd receive happens-before 5th send.”
 
 2. nth receive on a channel of size C happens-before n+Cth send completes. send #3 can occur. send #4 can occur after receive #1. send #5 can occur after receive #2. Fixed-size, circular buffer.

Slide 72

Slide 72 text

2. nth receive on a channel of size C happens-before n+Cth send completes. If channel full:
 sender goroutine paused;
 resumed after a channel recv occurs. 
 If channel not empty:
 receiver gets first unreceived element
 i.e. buffer is a FIFO queue. Send of that element must have completed due to 
 channel mutex