Slide 1

Slide 1 text

Prac%cal Demys%fica%on of CRDTs Nami Nasserazad (@namiazad, nami.me) Dmitry Ivanov (@idajan8s) Didier Liauw Curry On, Rome, 2016 1

Slide 2

Slide 2 text

Disclaimer We are NOT: • Distributed systems experts. • Hardcore academia guys. Just curious engineers hacking on real world problems. 2

Slide 3

Slide 3 text

Who We Are "Fool" stack developers hacking on: • Backend services • Mobile || SDKs • Infrastructure && AWS && DevOps 3

Slide 4

Slide 4 text

NavCloud 4

Slide 5

Slide 5 text

Server Development Stack 5

Slide 6

Slide 6 text

Client Libraries 6

Slide 7

Slide 7 text

NavCloud Nature • Unstable connec,ons • Limited bandwidth • Seamless edit/view in offline mode • Concurrent changes with poten7al conflicts • No guarantee on updates order • No data loss • Data convergence to expected value 7

Slide 8

Slide 8 text

How to Deal with this Nature? 8

Slide 9

Slide 9 text

Bad programmers worry about the code. Good programmers worry about data structures — Linus Torvalds 9

Slide 10

Slide 10 text

CRDT 10

Slide 11

Slide 11 text

CRDT DT: Data Type CRDT is a data type with its own algebra 11

Slide 12

Slide 12 text

CRDT R: Replicated CRDT is a family of data structures which has been designed to be distributed 12

Slide 13

Slide 13 text

CRDT C: Conflict Free Resolving conflicts is done automa2cally 13

Slide 14

Slide 14 text

How? 14

Slide 15

Slide 15 text

Merge 15

Slide 16

Slide 16 text

What is Merge? • A binary opera-on on two CRDTs • Commuta've: x • y = y • x • Associa've: ( x • y ) • z = x • ( y • z ) • Idempotent: x • x = x 16

Slide 17

Slide 17 text

How Does it Help? In Distributed Systems: • Order is not guaranteed: • No Problem: Merge is Commuta-ve and Associa-ve • Events can be delivered more than once: • No problem: Merge is Idempotent 17

Slide 18

Slide 18 text

What Does it Bring in Prac1ce? • Local updates • Local merge of receiving data • All local merges converge 18

Slide 19

Slide 19 text

Examples 19

Slide 20

Slide 20 text

G-Counter 20

Slide 21

Slide 21 text

G-Counter Merge: Max of corresponding elements: A:6 B:3 C:9 TotalValue: Sum of all elements: 6 + 3 + 9 = 18 21

Slide 22

Slide 22 text

Max Func)on • A binary opera-on on two CRDTs • Commuta've: x max y = y max x • Associa've: ( x max y ) max z = x max ( y max z ) • Idempotent: x max x = x 22

Slide 23

Slide 23 text

G-Set 23

Slide 24

Slide 24 text

G-Set Merge: Union of sets: { x, y, z, a, b, c } Total Value: The same as the merge result 24

Slide 25

Slide 25 text

Union Func)on • A binary opera-on on two CRDTs • Commuta've: x ∪ y = y ∪ x • Associa've: ( x ∪ y ) ∪ z = x ∪ ( y ∪ z ) • Idempotent: x ∪ x = x 25

Slide 26

Slide 26 text

CRDT in NavCloud 26

Slide 27

Slide 27 text

Favorite Loca,ons Synchronisa,on 27

Slide 28

Slide 28 text

Naive Approach? 28

Slide 29

Slide 29 text

Last Write Wins 29

Slide 30

Slide 30 text

Problems • Unstable connec-ons • Actual update -me < Sent -me • Network latency • Sent -me < Received -me • Unreliable clocks 30

Slide 31

Slide 31 text

Stale update may win! 31

Slide 32

Slide 32 text

So What? 32

Slide 33

Slide 33 text

CRDT 33

Slide 34

Slide 34 text

NavCloud Nature vs CRDT • Unstable connec,ons ✔ • Limited bandwidth ✔ • Seamless edit/view in offline mode ✔ • Concurrent changes with poten7al conflicts ✔ • No guarantee on updates order ✔ • No data loss ✔ • Data convergence to expected value ✔ 34

Slide 35

Slide 35 text

Same Data Model Everywhere • Server • Clients • Data store 35

Slide 36

Slide 36 text

Merging Conflicts in Riak 36

Slide 37

Slide 37 text

Implemen'ng a CRDT Set What do we want? • Support for addi-on and removal opera-ons. • Op-mized for element muta-ons. • Footprint as compact as possible. 37

Slide 38

Slide 38 text

2-Phase-Set Supports addi,ons and removals. • G-Set for added elements • G-Set for removed elements aka Tombstones 38

Slide 39

Slide 39 text

2-Phase-Set 39

Slide 40

Slide 40 text

2-Phase-Set Merge: [ Add { "cat", "dog", "ape" }; Rem { "ape" } ] Lookup: { "cat", "dog" } 40

Slide 41

Slide 41 text

2-Phase-Set Lookup def lookup: Set[E] = addSet.diff(removeSet).lookup Merge def merge(anotherSet: TwoPSet[E]): TwoPSet[E] = new TwoPSet( union(addset, anotherSet.addSet ), union(removeSet, anotherSet.removeSet )) 41

Slide 42

Slide 42 text

2-Phase-Set Doesn't work for us: • Removed element can't be added again • Immutable elements: no updates possible 42

Slide 43

Slide 43 text

LWW-Element-Set Supports addi,ons and removals, with !mestamps. • G-Set for added elements • G-Set for removed elements aka Tombstones • Each element has a 3mestamp • Supports re-adding removed element using a higher 3mestamp 43

Slide 44

Slide 44 text

LWW-Element-Set 44

Slide 45

Slide 45 text

LWW-Element-Set Merge Add { (1, "cat"), (5, "cat"), (1, "dog"), (1, "ape") } Rem { (1, "cat"), (3, "cat") } 45

Slide 46

Slide 46 text

LWW-Element-Set Merge Add { (1, "cat"), (5, "cat"), (1, "dog"), (1, "ape") } Rem { (1, "cat"), (3, "cat") } Lookup { "cat", "dog", "ape" } 46

Slide 47

Slide 47 text

LWW-Element-Set Lookup def lookup: Set[E] = addSet.lookup.filter { addElem => !removeSet.exists { removeElem => removeElem.value == addElem.value && removeElem.timestamp > addElem.timestamp } }.map(_.value) Merge def merge(LWWSet anotherSet): LWWSet = new LWWSet( union(addset, anotherSet.addSet ), union(removeSet, anotherSet.removeSet )) 47

Slide 48

Slide 48 text

LWW-Element-Set Doesn't work for us: • Immutable elements: no updates possible. 48

Slide 49

Slide 49 text

OR-Set OR - Observed / Removed Supports addi,ons and removals, with tags. • G-Set for added elements • G-Set for removed elements aka Tombstones • Unique tag is associated with each element • Supports re-adding removed elements 49

Slide 50

Slide 50 text

OR-Set 50

Slide 51

Slide 51 text

OR-Set Merge Add { (#a, "cat"), (#c, "cat"), (#b, "dog"), (#d, "ape") } Rem { (#a, "cat") } 51

Slide 52

Slide 52 text

OR-Set Merge Add { (#a, "cat"), (#c, "cat"), (#b, "dog"), (#d, "ape") } Rem { (#a, "cat") } Lookup { "cat", "dog", "ape" } 52

Slide 53

Slide 53 text

OR-Set Lookup E exists iff it has in AddSet a tag that is not in the RemoveSet. def lookup(): Set = addSet.filter { addElem => !removeSet.exists { remElem => addElem.value == remElem.value && remElem.tag.equals(addElem.tag) } } .map(_.value); 53

Slide 54

Slide 54 text

OR-Set Merge def merge(anotherSet: ORSet[E]): ORSet[E] = new ORSet( union(addset, anotherSet.addSet ), union(removeSet, anotherSet.removeSet)) 54

Slide 55

Slide 55 text

OR-Set Doesn't work for us: • Immutable elements: no updates possible. 55

Slide 56

Slide 56 text

OUR-Set Our take on Observed-Updated-Removed Set • Each element has a unique iden%fier • Element can be changed if iden4fier remains the same • Each element has a %mestamp • Timestamp is updated on each element muta4on Iden%ty (immutable unique id) vs Value (mutable) 56

Slide 57

Slide 57 text

OUR-Set Contains a single underlying set of elements with metadata: • Each element has a unique id field (e.g. a UUID) • Each element has a "removed" boolean flag • Each element has a )mestamp • Set can only contain one element with a par'cular id 57

Slide 58

Slide 58 text

OUR-Set 58

Slide 59

Slide 59 text

OUR-Set Merge { (id1, 5, "*ger"), (id2, 2, "dog", removed), (id3, 1, "ape") } 59

Slide 60

Slide 60 text

OUR-Set Merge: { (id1, 5, "*ger"), (id2, 2, "dog", removed), (id3, 1, "ape") } Lookup { "$ger", "ape" } 60

Slide 61

Slide 61 text

OUR-Set Merge def merge(anotherSet: OURSet[E]]): OURSet[E] = OURSet[E]( elements ++ anotherSet.elements) .groupBy (_.id) .map (group => group._2.maxBy(_.timestamp)) .toSet) Lookup def lookup(ourSet: OURSet[E]): Set[E] = ourSet.filter (!_.removed) .map (_.value) 61

Slide 62

Slide 62 text

Implementa)on NavCloud CRDT Model: Favorites 62

Slide 63

Slide 63 text

CRDT Model: Favorites FavoriteState element: • ID (to uniquely iden.fy a favorite) • Timestamp (to indicate the last change .me) • Removed flag (to indicate if favorite has been removed) • Favorite data: ( Name, Loca2on, ... ) 63

Slide 64

Slide 64 text

Convergence in case of equal !mestamps Compare func-on checks all the fields in order of priority: • Timestamp • Removed flag (Add or Delete bias) • .. rest a6ributes .. 64

Slide 65

Slide 65 text

Using CRDT everywhere • Use the same algorithm everywhere As simple as calling the merge func8on 65

Slide 66

Slide 66 text

Using CRDT everywhere Client <-> Server <-> Database def update(fromClient: OURSet[FavoriteState]): OURSet[FavoriteState] = { val fromDatabase = database.fetch(...) val newSet = fromDatabase.merge(fromClient) database.push(newSet) newSet } 66

Slide 67

Slide 67 text

Considera*ons & Limita*ons 67

Slide 68

Slide 68 text

"What about garbage?" • CRDTs tend to grow because of tombstones. • Deleted Favorite in the Set == Tombstone. • A poten?ally unbounded growth. 68

Slide 69

Slide 69 text

Prune deleted elements But when? Requirement: All nodes holding a CRDT Set replica should have seen a deleted element before it can be pruned. Otherwise deleted elements can be resurrected. 69

Slide 70

Slide 70 text

Time-To-Live for tombstones Prune tombstones once TTL exceeded. if ((DateTime.now() - tombstone.timestamp) > TimeToLive) { crdtSet.remove(tombstone) } Requirement: all nodes holding a CRDT set should apply the same TTL rule independently. 70

Slide 71

Slide 71 text

Send and reply with a Diff Client modifies and sends only updated elements (Diff). Before: Server responds with a full merged result. 71

Slide 72

Slide 72 text

Send and reply with a Diff We introduced a 'Scoped Diff': Server responds only with the elements which have won against those sent by the client. 72

Slide 73

Slide 73 text

Trouble With Time 73

Slide 74

Slide 74 text

There is no such thing as reliable (me*. 74

Slide 75

Slide 75 text

Tracking *me is actually tracking causality. — Jonas Bonér, "Life Beyond the Illusion of Present" 75

Slide 76

Slide 76 text

Causality & Ordering of events. 76

Slide 77

Slide 77 text

Time can be just good enough. 77

Slide 78

Slide 78 text

Ordering updates within a single node Timestamp field as a logical clock. Absolute value is not important, but it should always grow monotonically. 78

Slide 79

Slide 79 text

Ordering updates within a single node "+1 Strategy": Long resolveNewTimestamp(ElementState state) { return Math.max( retrieveTimestamp(), state.lastModified() + 1 ); } 79

Slide 80

Slide 80 text

Ordering updates from different nodes If GPS clock is available -> use it (mainly Naviga&on Devices case). Prefer the server &me to a client's local 0me. 80

Slide 81

Slide 81 text

Edge case Mul$ple Clients modify the same element (concurrently || without a reliable clock). 81

Slide 82

Slide 82 text

One "merge" to rule them all 82

Slide 83

Slide 83 text

Clients & Server MUST have same 'merge' behaviour. == Given the same input, their 'merge' func/ons emit the same results. 83

Slide 84

Slide 84 text

Divergence may lead to endless synchronisa0on loops! 84

Slide 85

Slide 85 text

What have we learned? • Academia is not as scary as it some-mes seems to pragma&c devs. • Check for the exis,ng && simple(r) solu-ons. • Understand their limita,ons. • Analyze and monitor the real usage. • Never se=le: constantly search how to tune & improve. 85

Slide 86

Slide 86 text

"Show me the code" github.com/ajan/s/scala-crdt github.com/ajan/s/java-crdt 86

Slide 87

Slide 87 text

Want to know more? (Part 1) • CRDTs for fun and eventual profit, - Noel Welsh, 2013. • Readings in conflict-free replicated data types, - Christopher Meiklejohn, 2015. • A comprehensive study of Convergent and CommutaJve Replicated Data Types, - Marc Shapiro, Nuno Preguiça, Carlos Baquero, Marek Zawirski, 2011. 87

Slide 88

Slide 88 text

Want to know more? (Part 2) • Lasp: A language for distributed, coordina7on-free programming, - Meiklejohn & Van Roy, 2015. • Swarm.js+React — real-7me, offline-ready Holy Grail web apps, - Victor Grishchenko. 88

Slide 89

Slide 89 text

Any ques)ons? :) @namiazad @idajan)s 89