Slide 1

Slide 1 text

Prac%cal Data Synchroniza%on & CRDTs Dmitry Ivanov @idajan0s 2016 1

Slide 2

Slide 2 text

2

Slide 3

Slide 3 text

NavCloud 3

Slide 4

Slide 4 text

Who We Are "Fool" stack developers hacking on: • Backend services • Client libraries • Infrastructure && DevOps 4

Slide 5

Slide 5 text

Backend stack 5

Slide 6

Slide 6 text

Client Libraries 6

Slide 7

Slide 7 text

NavCloud Nature • Unstable connec,ons • Limited data plans & bandwidth • Seamless edit/view in offline mode • Concurrent changes with poten8al conflicts • No guarantee on updates order • No data loss • Data convergence to expected value 7

Slide 8

Slide 8 text

How to Deal with this Nature? 8

Slide 9

Slide 9 text

Bad programmers worry about the code. Good programmers worry about data structures — Linus Torvalds 9

Slide 10

Slide 10 text

CRDT 10

Slide 11

Slide 11 text

CRDT DT: Data Type CRDT is a data type with its own algebra 11

Slide 12

Slide 12 text

CRDT R: Replicated CRDT is a family of data structures which has been designed to be distributed 12

Slide 13

Slide 13 text

CRDT C: Conflict Free Resolving conflicts is done automa2cally 13

Slide 14

Slide 14 text

How? 14

Slide 15

Slide 15 text

Merge 15

Slide 16

Slide 16 text

What is Merge? • A binary opera-on on two CRDTs • Commuta've: x • y = y • x • Associa've: ( x • y ) • z = x • ( y • z ) • Idempotent: x • x = x 16

Slide 17

Slide 17 text

How Does it Help? In Distributed Systems: • Order is not guaranteed: • No Problem: Merge is Commuta-ve and Associa-ve • Events can be delivered more than once: • No problem: Merge is Idempotent 17

Slide 18

Slide 18 text

What Does it Bring in Prac1ce? • Local updates • Local merge of receiving data • All local merges converge 18

Slide 19

Slide 19 text

Examples 19

Slide 20

Slide 20 text

G-Counter 20

Slide 21

Slide 21 text

G-Counter Merge: Max of corresponding elements: A:6 B:3 C:9 TotalValue: Sum of all elements: 6 + 3 + 9 = 18 21

Slide 22

Slide 22 text

Max Func)on • A binary opera-on on two CRDTs • Commuta've: x max y = y max x • Associa've: ( x max y ) max z = x max ( y max z ) • Idempotent: x max x = x 22

Slide 23

Slide 23 text

G-Set 23

Slide 24

Slide 24 text

Union Func)on • A binary opera-on on two CRDTs • Commuta've: x ∪ y = y ∪ x • Associa've: ( x ∪ y ) ∪ z = x ∪ ( y ∪ z ) • Idempotent: x ∪ x = x 24

Slide 25

Slide 25 text

G-Set Merge: Union of sets: { x, y, z, a, b, c } Total Value: The same as the merge result 25

Slide 26

Slide 26 text

CRDT in NavCloud 26

Slide 27

Slide 27 text

Favorite Loca,ons Synchroniza,on 27

Slide 28

Slide 28 text

Naive Approach? 28

Slide 29

Slide 29 text

Last Write Wins 29

Slide 30

Slide 30 text

Problems • Unstable connec-ons • Actual update -me < Sent -me • Network latency • Sent -me < Received -me • Unreliable clocks 30

Slide 31

Slide 31 text

Stale update may win! 31

Slide 32

Slide 32 text

So What? 32

Slide 33

Slide 33 text

CRDT 33

Slide 34

Slide 34 text

NavCloud Nature vs CRDT • Unstable connec,ons ✔ • Limited data plans & bandwidth ✔ • Seamless edit/view in offline mode ✔ • Concurrent changes with poten8al conflicts ✔ • No guarantee on updates order ✔ • No data loss ✔ • Data convergence to expected value ✔ 34

Slide 35

Slide 35 text

Same Data Model Everywhere • Server • Clients • Data store 35

Slide 36

Slide 36 text

Merging Conflicts in Riak 36

Slide 37

Slide 37 text

The data consistency is determined by 'the weakest link' in your pipeline 37

Slide 38

Slide 38 text

Implemen'ng a CRDT Set What do we want? • Support for addi-on and removal opera-ons. • Op-mized for element muta-ons. • Footprint as compact as possible. 38

Slide 39

Slide 39 text

2-Phase-Set Supports addi,ons and removals. • G-Set for added elements • G-Set for removed elements aka Tombstones 39

Slide 40

Slide 40 text

2-Phase-Set 40

Slide 41

Slide 41 text

2-Phase-Set Merge: [ Add { "cat", "dog", "ape" }; Rem { "ape" } ] Lookup: { "cat", "dog" } 41

Slide 42

Slide 42 text

2-Phase-Set Lookup def lookup: Set[E] = addSet.diff(removeSet).lookup Merge def merge(anotherSet: TwoPSet[E]): TwoPSet[E] = new TwoPSet( addset.merge(anotherSet.addSet), removeSet.merge(anotherSet.removeSet)) 42

Slide 43

Slide 43 text

2-Phase-Set Doesn't work for us: • Removed element can't be added again • Immutable elements: no updates possible 43

Slide 44

Slide 44 text

LWW-Element-Set Supports addi,ons and removals, with !mestamps. • G-Set for added elements • G-Set for removed elements aka Tombstones • Each element has a 3mestamp • Supports re-adding removed element using a higher 3mestamp 44

Slide 45

Slide 45 text

LWW-Element-Set 45

Slide 46

Slide 46 text

LWW-Element-Set Merge Add { (1, "cat"), (5, "cat"), (1, "dog"), (1, "ape") } Rem { (1, "cat"), (3, "cat") } 46

Slide 47

Slide 47 text

LWW-Element-Set Merge Add { (1, "cat"), (5, "cat"), (1, "dog"), (1, "ape") } Rem { (1, "cat"), (3, "cat") } Lookup { "cat", "dog", "ape" } 47

Slide 48

Slide 48 text

LWW-Element-Set Lookup def lookup: Set[E] = addSet.lookup.filter { addElem => !removeSet.exists { removeElem => removeElem.value == addElem.value && removeElem.timestamp > addElem.timestamp } }.map(_.value) Merge def merge(LWWSet anotherSet): LWWSet = new LWWSet( addset.merge(anotherSet.addSet), removeSet.merge(anotherSet.removeSet)) 48

Slide 49

Slide 49 text

LWW-Element-Set Doesn't work for us: • Immutable elements: no updates possible. 49

Slide 50

Slide 50 text

OR-Set OR - Observed / Removed Supports addi,ons and removals, with tags. • G-Set for added elements • G-Set for removed elements aka Tombstones • Unique tag is associated with each element • Supports re-adding removed elements 50

Slide 51

Slide 51 text

OR-Set 51

Slide 52

Slide 52 text

OR-Set Merge Add { (#a, "cat"), (#c, "cat"), (#b, "dog"), (#d, "ape") } Rem { (#a, "cat") } 52

Slide 53

Slide 53 text

OR-Set Merge Add { (#a, "cat"), (#c, "cat"), (#b, "dog"), (#d, "ape") } Rem { (#a, "cat") } Lookup { "cat", "dog", "ape" } 53

Slide 54

Slide 54 text

OR-Set Lookup E exists iff it has in AddSet a tag that is not in the RemoveSet. def lookup(): Set = addSet.filter { addElem => !removeSet.exists { remElem => addElem.value == remElem.value && remElem.tag.equals(addElem.tag) } } .map(_.value); 54

Slide 55

Slide 55 text

OR-Set Merge def merge(anotherSet: ORSet[E]): ORSet[E] = new ORSet( addset.merge(anotherSet.addSet), removeSet.merge(anotherSet.removeSet)) 55

Slide 56

Slide 56 text

OR-Set Doesn't work for us: • Immutable elements: no updates possible. 56

Slide 57

Slide 57 text

OUR-Set Our take on Observed-Updated-Removed Set • Each element has a unique iden%fier • Element can be changed if iden4fier remains the same • Each element has a %mestamp • Timestamp is updated on each element muta4on Iden%ty (immutable unique id) vs Value (mutable) 57

Slide 58

Slide 58 text

OUR-Set Contains a single underlying set of elements with metadata: • Each element has a unique id field (e.g. a UUID) • Each element has a "removed" boolean flag • Each element has a )mestamp • Set can only contain one element with a par'cular id 58

Slide 59

Slide 59 text

OUR-Set 59

Slide 60

Slide 60 text

OUR-Set Merge { (id1, 5, "*ger"), (id2, 2, "dog", removed), (id3, 1, "ape") } 60

Slide 61

Slide 61 text

OUR-Set Merge: { (id1, 5, "*ger"), (id2, 2, "dog", removed), (id3, 1, "ape") } Lookup { "$ger", "ape" } 61

Slide 62

Slide 62 text

OUR-Set Merge def merge(anotherSet: OURSet[E]]): OURSet[E] = OURSet[E]( elements ++ anotherSet.elements) .groupBy (_.id) .map (group => group._2.maxBy(_.timestamp)) .toSet) Lookup def lookup(ourSet: OURSet[E]): Set[E] = ourSet.filter (!_.removed) .map (_.value) 62

Slide 63

Slide 63 text

Implementa)on NavCloud CRDT Model: Favorites 63

Slide 64

Slide 64 text

CRDT Model: Favorites FavoriteState element: • ID (to uniquely iden.fy a favorite) • Timestamp (to indicate the last change .me) • Removed flag (to indicate if favorite has been removed) • Favorite data: ( Name, Loca2on, ... ) 64

Slide 65

Slide 65 text

Convergence in case of equal !mestamps Compare func-on checks all the fields in order of priority: • Timestamp • Removed flag (Add or Delete bias) • .. rest a6ributes .. 65

Slide 66

Slide 66 text

Using CRDT everywhere • Use the same algorithm everywhere As simple as calling the merge func8on 66

Slide 67

Slide 67 text

Using CRDT everywhere Client <-> Server <-> Database def update(fromClient: OURSet[E]): OURSet[E] = { val fromDatabase = database.fetch(...) val newSet = fromDatabase.merge(fromClient) database.store(..., newSet) newSet } 67

Slide 68

Slide 68 text

68

Slide 69

Slide 69 text

Considera*ons & Limita*ons 69

Slide 70

Slide 70 text

"What about garbage?" • CRDTs tend to grow because of tombstones. • Deleted Element in the Set == Tombstone. • A poten?ally unbounded growth. 70

Slide 71

Slide 71 text

Prune deleted elements But when? Requirement: All nodes holding a CRDT Set replica should have seen a deleted element before it can be pruned. Otherwise deleted elements can be resurrected. 71

Slide 72

Slide 72 text

Time-To-Live for tombstones Prune tombstones once TTL exceeded. if ((DateTime.now() - tombstone.timestamp) > TimeToLive) { crdtSet.remove(tombstone) } Requirement: all nodes holding a CRDT set should apply the same TTL rule independently. 72

Slide 73

Slide 73 text

Prune deleted elements Problem Synchroniza+on between all replicas is needed for correctness. 73

Slide 74

Slide 74 text

Distributed transac.ons 74

Slide 75

Slide 75 text

- Academia, help! 75

Slide 76

Slide 76 text

76

Slide 77

Slide 77 text

Op#mized OR-Set Introduces replica awareness 77

Slide 78

Slide 78 text

Op#mized OR-Set Addi$onal metadata is added to every transferred state. { (replica_id -> seq_nr) } where: - replica_id - is a unique & stable replica iden5fier. - seq_nr - monotonically growing (a=er each op) local counter. 78

Slide 79

Slide 79 text

Op#mized OR-Set Each local state maintains a map: { replica_A: 1, replica_B: 1, replica_C: 3 } If a received state has a seq_nr lower than the corresponding local value -> ignore. 79

Slide 80

Slide 80 text

Op#mized OR-Set No Tombstones, yay! ☺ (Slightly) more complicated API: stable replica_id needed. ☹ 80

Slide 81

Slide 81 text

Update & Reply with a Diff Client modifies and sends only updated elements (Diff). Before: Server responds with a full merge result. 81

Slide 82

Slide 82 text

Update & Reply with a Diff We introduced a 'Scoped Diff': Server responds only with the elements which have won against those sent by the client. 82

Slide 83

Slide 83 text

Server -> Client Diff 83

Slide 84

Slide 84 text

- Academia, help?.. 84

Slide 85

Slide 85 text

85

Slide 86

Slide 86 text

δ-CRDT Builds on replica awareness Introduces a Causal Context: map of (replica_id -> seq_nr). Introduces a Dot Store: CRDT state (No tombstones). 86

Slide 87

Slide 87 text

δ-CRDT A formalized way to compute a minimal δ-CRDT instances against a target replica. 87

Slide 88

Slide 88 text

δ-CRDT Adrian Colyer (The Morning Paper) wrote a great paper review: blog.acolyer.org/2016/04/25/delta-state-replicated-data-types 88

Slide 89

Slide 89 text

Trouble With Time 89

Slide 90

Slide 90 text

There is no such thing as reliable (me*. 90

Slide 91

Slide 91 text

Tracking *me is actually tracking causality. — Jonas Bonér, "Life Beyond the Illusion of Present" 91

Slide 92

Slide 92 text

Causality & Ordering of events. 92

Slide 93

Slide 93 text

Time can be just good enough. 93

Slide 94

Slide 94 text

Ordering updates within a single node Timestamp field as a logical clock. Absolute value is not important, but it should always grow monotonically. 94

Slide 95

Slide 95 text

Ordering updates within a single node "+1 Strategy" (aka ensure monotonicity): Long resolveNewTimestamp(ElementState state) { return Math.max( retrieveTimestamp(), state.lastModified() + 1 ); } 95

Slide 96

Slide 96 text

Ordering updates from different nodes If GPS clock is available -> use it (mainly Naviga&on Devices case). Prefer the server &me to a client's local 0me. 96

Slide 97

Slide 97 text

Edge case Mul$ple Clients modify the same element (concurrently || without a reliable clock). 97

Slide 98

Slide 98 text

One "merge" to rule them all 98

Slide 99

Slide 99 text

Clients & Server MUST have same 'merge' behaviour. == Given the same input, their 'merge' func/ons emit the same results. 99

Slide 100

Slide 100 text

Divergence may lead to endless synchroniza1on loops! 100

Slide 101

Slide 101 text

Lazy (data) loading OURSet Element • Metadata: UUID, $mestamp, "removed" flag • Data: 101

Slide 102

Slide 102 text

Lazy (data) loading New OURSet Element • Metadata: UUID, $mestamp, "removed" flag, + tag / hash • (Op(onal) Data: Flexible synchroniza1on strategy Eager || Lazy Fetch 102

Slide 103

Slide 103 text

What have we learned? • Academia is not as scary as it some-mes seems to pragma,c devs. • We need be2er and simpler abstrac-ons to develop Offline-friendly apps. • CRDTs give a great value, but there are some caveats. • Things like Lasp (lasp-lang.org) also could be the answer (?). 103

Slide 104

Slide 104 text

Show me the code github.com/ajan/s/{scala | java}-crdt 104

Slide 105

Slide 105 text

Thanks! Slides: h*p:/ /bit.ly/2fBlroS Dmitry Ivanov @idajan0s 105