Slide 1

Slide 1 text

Prac%cal Data Synchroniza%on with CRDTs Dmitry Ivanov @idajan0s St. Louis, 2016 1

Slide 2

Slide 2 text

Disclaimer I'm NOT: • Distributed systems expert. • Hardcore academia guy. Just a curious engineer hacking on real world problems. 2

Slide 3

Slide 3 text

NavCloud 3

Slide 4

Slide 4 text

Who We Are "Fool" stack developers hacking on: • Backend services • Mobile || SDKs • Infrastructure && AWS && DevOps 4

Slide 5

Slide 5 text

Server Development Stack 5

Slide 6

Slide 6 text

Client Libraries 6

Slide 7

Slide 7 text

NavCloud Nature • Unstable connec,ons • Limited bandwidth • Seamless edit/view in offline mode • Concurrent changes with poten7al conflicts • No guarantee on updates order • No data loss • Data convergence to expected value 7

Slide 8

Slide 8 text

How to Deal with this Nature? 8

Slide 9

Slide 9 text

Bad programmers worry about the code. Good programmers worry about data structures — Linus Torvalds 9

Slide 10

Slide 10 text

CRDT 10

Slide 11

Slide 11 text

CRDT DT: Data Type CRDT is a data type with its own algebra 11

Slide 12

Slide 12 text

CRDT R: Replicated CRDT is a family of data structures which has been designed to be distributed 12

Slide 13

Slide 13 text

CRDT C: Conflict Free Resolving conflicts is done automa2cally 13

Slide 14

Slide 14 text

How? 14

Slide 15

Slide 15 text

Merge 15

Slide 16

Slide 16 text

What is Merge? • A binary opera-on on two CRDTs • Commuta've: x • y = y • x • Associa've: ( x • y ) • z = x • ( y • z ) • Idempotent: x • x = x 16

Slide 17

Slide 17 text

How Does it Help? In Distributed Systems: • Order is not guaranteed: • No Problem: Merge is Commuta-ve and Associa-ve • Events can be delivered more than once: • No problem: Merge is Idempotent 17

Slide 18

Slide 18 text

What Does it Bring in Prac1ce? • Local updates • Local merge of receiving data • All local merges converge 18

Slide 19

Slide 19 text

Examples 19

Slide 20

Slide 20 text

G-Counter 20

Slide 21

Slide 21 text

G-Counter Merge: Max of corresponding elements: A:6 B:3 C:9 TotalValue: Sum of all elements: 6 + 3 + 9 = 18 21

Slide 22

Slide 22 text

Max Func)on • A binary opera-on on two CRDTs • Commuta've: x max y = y max x • Associa've: ( x max y ) max z = x max ( y max z ) • Idempotent: x max x = x 22

Slide 23

Slide 23 text

G-Set 23

Slide 24

Slide 24 text

G-Set Merge: Union of sets: { x, y, z, a, b, c } Total Value: The same as the merge result 24

Slide 25

Slide 25 text

Union Func)on • A binary opera-on on two CRDTs • Commuta've: x ∪ y = y ∪ x • Associa've: ( x ∪ y ) ∪ z = x ∪ ( y ∪ z ) • Idempotent: x ∪ x = x 25

Slide 26

Slide 26 text

CRDT in NavCloud 26

Slide 27

Slide 27 text

Favorite Loca,ons Synchroniza,on 27

Slide 28

Slide 28 text

Naive Approach? 28

Slide 29

Slide 29 text

Last Write Wins 29

Slide 30

Slide 30 text

Problems • Unstable connec-ons • Actual update -me < Sent -me • Network latency • Sent -me < Received -me • Unreliable clocks 30

Slide 31

Slide 31 text

Stale update may win! 31

Slide 32

Slide 32 text

So What? 32

Slide 33

Slide 33 text

CRDT 33

Slide 34

Slide 34 text

NavCloud Nature vs CRDT • Unstable connec,ons ✔ • Limited bandwidth ✔ • Seamless edit/view in offline mode ✔ • Concurrent changes with poten7al conflicts ✔ • No guarantee on updates order ✔ • No data loss ✔ • Data convergence to expected value ✔ 34

Slide 35

Slide 35 text

Same Data Model Everywhere • Server • Clients • Data store 35

Slide 36

Slide 36 text

Merging Conflicts in Riak 36

Slide 37

Slide 37 text

Implemen'ng a CRDT Set What do we want? • Support for addi-on and removal opera-ons. • Op-mized for element muta-ons. • Footprint as compact as possible. 37

Slide 38

Slide 38 text

2-Phase-Set Supports addi,ons and removals. • G-Set for added elements • G-Set for removed elements aka Tombstones 38

Slide 39

Slide 39 text

2-Phase-Set 39

Slide 40

Slide 40 text

2-Phase-Set Merge: [ Add { "cat", "dog", "ape" }; Rem { "ape" } ] Lookup: { "cat", "dog" } 40

Slide 41

Slide 41 text

2-Phase-Set Lookup def lookup: Set[E] = addSet.diff(removeSet).lookup Merge def merge(anotherSet: TwoPSet[E]): TwoPSet[E] = new TwoPSet( union(addset, anotherSet.addSet ), union(removeSet, anotherSet.removeSet )) 41

Slide 42

Slide 42 text

2-Phase-Set Doesn't work for us: • Removed element can't be added again • Immutable elements: no updates possible 42

Slide 43

Slide 43 text

LWW-Element-Set Supports addi,ons and removals, with !mestamps. • G-Set for added elements • G-Set for removed elements aka Tombstones • Each element has a 3mestamp • Supports re-adding removed element using a higher 3mestamp 43

Slide 44

Slide 44 text

LWW-Element-Set 44

Slide 45

Slide 45 text

LWW-Element-Set Merge Add { (1, "cat"), (5, "cat"), (1, "dog"), (1, "ape") } Rem { (1, "cat"), (3, "cat") } 45

Slide 46

Slide 46 text

LWW-Element-Set Merge Add { (1, "cat"), (5, "cat"), (1, "dog"), (1, "ape") } Rem { (1, "cat"), (3, "cat") } Lookup { "cat", "dog", "ape" } 46

Slide 47

Slide 47 text

LWW-Element-Set Lookup def lookup: Set[E] = addSet.lookup.filter { addElem => !removeSet.exists { removeElem => removeElem.value == addElem.value && removeElem.timestamp > addElem.timestamp } }.map(_.value) Merge def merge(LWWSet anotherSet): LWWSet = new LWWSet( union(addset, anotherSet.addSet ), union(removeSet, anotherSet.removeSet )) 47

Slide 48

Slide 48 text

LWW-Element-Set Doesn't work for us: • Immutable elements: no updates possible. 48

Slide 49

Slide 49 text

OR-Set OR - Observed / Removed Supports addi,ons and removals, with tags. • G-Set for added elements • G-Set for removed elements aka Tombstones • Unique tag is associated with each element • Supports re-adding removed elements 49

Slide 50

Slide 50 text

OR-Set 50

Slide 51

Slide 51 text

OR-Set Merge Add { (#a, "cat"), (#c, "cat"), (#b, "dog"), (#d, "ape") } Rem { (#a, "cat") } 51

Slide 52

Slide 52 text

OR-Set Merge Add { (#a, "cat"), (#c, "cat"), (#b, "dog"), (#d, "ape") } Rem { (#a, "cat") } Lookup { "cat", "dog", "ape" } 52

Slide 53

Slide 53 text

OR-Set Lookup E exists iff it has in AddSet a tag that is not in the RemoveSet. def lookup(): Set = addSet.filter { addElem => !removeSet.exists { remElem => addElem.value == remElem.value && remElem.tag.equals(addElem.tag) } } .map(_.value); 53

Slide 54

Slide 54 text

OR-Set Merge def merge(anotherSet: ORSet[E]): ORSet[E] = new ORSet( union(addset, anotherSet.addSet ), union(removeSet, anotherSet.removeSet)) 54

Slide 55

Slide 55 text

OR-Set Doesn't work for us: • Immutable elements: no updates possible. 55

Slide 56

Slide 56 text

OUR-Set Our take on Observed-Updated-Removed Set • Each element has a unique iden%fier • Element can be changed if iden4fier remains the same • Each element has a %mestamp • Timestamp is updated on each element muta4on Iden%ty (immutable unique id) vs Value (mutable) 56

Slide 57

Slide 57 text

OUR-Set Contains a single underlying set of elements with metadata: • Each element has a unique id field (e.g. a UUID) • Each element has a "removed" boolean flag • Each element has a )mestamp • Set can only contain one element with a par'cular id 57

Slide 58

Slide 58 text

OUR-Set 58

Slide 59

Slide 59 text

OUR-Set Merge { (id1, 5, "*ger"), (id2, 2, "dog", removed), (id3, 1, "ape") } 59

Slide 60

Slide 60 text

OUR-Set Merge: { (id1, 5, "*ger"), (id2, 2, "dog", removed), (id3, 1, "ape") } Lookup { "$ger", "ape" } 60

Slide 61

Slide 61 text

OUR-Set Merge def merge(anotherSet: OURSet[E]]): OURSet[E] = OURSet[E]( elements ++ anotherSet.elements) .groupBy (_.id) .map (group => group._2.maxBy(_.timestamp)) .toSet) Lookup def lookup(ourSet: OURSet[E]): Set[E] = ourSet.filter (!_.removed) .map (_.value) 61

Slide 62

Slide 62 text

Implementa)on NavCloud CRDT Model: Favorites 62

Slide 63

Slide 63 text

CRDT Model: Favorites FavoriteState element: • ID (to uniquely iden.fy a favorite) • Timestamp (to indicate the last change .me) • Removed flag (to indicate if favorite has been removed) • Favorite data: ( Name, Loca2on, ... ) 63

Slide 64

Slide 64 text

Convergence in case of equal !mestamps Compare func-on checks all the fields in order of priority: • Timestamp • Removed flag (Add or Delete bias) • .. rest a6ributes .. 64

Slide 65

Slide 65 text

Using CRDT everywhere • Use the same algorithm everywhere As simple as calling the merge func8on 65

Slide 66

Slide 66 text

Using CRDT everywhere Client <-> Server <-> Database def update(fromClient: OURSet[E]): OURSet[E] = { val fromDatabase = database.fetch(...) val newSet = fromDatabase.merge(fromClient) database.store(..., newSet) newSet } 66

Slide 67

Slide 67 text

67

Slide 68

Slide 68 text

Considera*ons & Limita*ons 68

Slide 69

Slide 69 text

"What about garbage?" • CRDTs tend to grow because of tombstones. • Deleted Element in the Set == Tombstone. • A poten?ally unbounded growth. 69

Slide 70

Slide 70 text

Prune deleted elements But when? Requirement: All nodes holding a CRDT Set replica should have seen a deleted element before it can be pruned. Otherwise deleted elements can be resurrected. 70

Slide 71

Slide 71 text

Time-To-Live for tombstones Prune tombstones once TTL exceeded. if ((DateTime.now() - tombstone.timestamp) > TimeToLive) { crdtSet.remove(tombstone) } Requirement: all nodes holding a CRDT set should apply the same TTL rule independently. 71

Slide 72

Slide 72 text

Send and reply with a Diff Client modifies and sends only updated elements (Diff). Before: Server responds with a full merge result. 72

Slide 73

Slide 73 text

Send and reply with a Diff We introduced a 'Scoped Diff': Server responds only with the elements which have won against those sent by the client. 73

Slide 74

Slide 74 text

Server -> Client Diff 74

Slide 75

Slide 75 text

Trouble With Time 75

Slide 76

Slide 76 text

There is no such thing as reliable (me*. 76

Slide 77

Slide 77 text

Tracking *me is actually tracking causality. — Jonas Bonér, "Life Beyond the Illusion of Present" 77

Slide 78

Slide 78 text

Causality & Ordering of events. 78

Slide 79

Slide 79 text

Time can be just good enough. 79

Slide 80

Slide 80 text

Ordering updates within a single node Timestamp field as a logical clock. Absolute value is not important, but it should always grow monotonically. 80

Slide 81

Slide 81 text

Ordering updates within a single node "+1 Strategy" (aka ensure monotonicity): Long resolveNewTimestamp(ElementState state) { return Math.max( retrieveTimestamp(), state.lastModified() + 1 ); } 81

Slide 82

Slide 82 text

Ordering updates from different nodes If GPS clock is available -> use it (mainly Naviga&on Devices case). Prefer the server &me to a client's local 0me. 82

Slide 83

Slide 83 text

Edge case Mul$ple Clients modify the same element (concurrently || without a reliable clock). 83

Slide 84

Slide 84 text

One "merge" to rule them all 84

Slide 85

Slide 85 text

Clients & Server MUST have same 'merge' behaviour. == Given the same input, their 'merge' func/ons emit the same results. 85

Slide 86

Slide 86 text

Divergence may lead to endless synchroniza1on loops! 86

Slide 87

Slide 87 text

Lazy (data) loading OURSet Element • Metadata: UUID, $mestamp, "removed" flag • Data: 87

Slide 88

Slide 88 text

Lazy (data) loading New OURSet Element • Metadata: UUID, $mestamp, "removed" flag, + tag / hash • (Op(onal) Data: Flexible synchroniza1on strategy Eager || Lazy Fetch 88

Slide 89

Slide 89 text

What have we learned? • Academia is not as scary as it some-mes seems to pragma,c devs. • We need be2er and simpler abstrac-ons to develop Offline-friendly apps. • CRDTs give a great value, but there are some caveats. • Things like Lasp (lasp-lang.org) also could be the answer (?). 89

Slide 90

Slide 90 text

Show me the code github.com/ajan/s/{scala | java}-crdt 90

Slide 91

Slide 91 text

Want to know more? (Part 1) • CRDTs for fun and eventual profit, - Noel Welsh, 2013. • Readings in conflict-free replicated data types, - Christopher Meiklejohn, 2015. • A comprehensive study of Convergent and CommutaJve Replicated Data Types, - Marc Shapiro, Nuno Preguiça, Carlos Baquero, Marek Zawirski, 2011. 91

Slide 92

Slide 92 text

Want to know more? (Part 2) • Lasp: A language for distributed, coordina7on-free programming, - Meiklejohn & Van Roy, 2015. • Swarm.js+React — real-7me, offline-ready Holy Grail web apps, - Victor Grishchenko. 92

Slide 93

Slide 93 text

Thanks! Dmitry Ivanov @idajan0s 93