Practical Demystification of CRDTs (LambdaDays 2016)

60916144983b100cfd846a31016f1e90?s=47 Dmitry Ivanov
February 20, 2016

Practical Demystification of CRDTs (LambdaDays 2016)

Prepared & presented together with Nami Nasserazad (https://twitter.com/namiazad) and Didier Liauw.

Abstract:

In a connected world, synchronising mutable information between different devices with different clock precision can be a difficult problem. A piece of data may have many out-of-sync replicas but all of those should eventually be in a consistent state. For example, TomTom users, having personal navigation devices, smartphones, MyDrive website accounts, expect their navigation information be synchronised properly even in the occasional absence of network connection.

Conflict-free Replicated Data Types (CRDTs) provide robust data structures to achieve proper synchronisation in an unreliable network of devices. They enable the conflict resolution being done locally at the data type level while guaranteeing the eventual consistency between replicas.

In this talk, in addition to an introduction to CRDT, our main focus will be on a special type of CRDT-set called OUR-set (Observed, Updated, Removed) which we created to extend known CRDT-sets with update functionality. We will explain the advantages of this data structure to solve many synchronisation problems as well as its limitations. We also show how a basic implementation of OUR-set CRDT in Scala and its counterpart in Java looks like and enumerate a set of subtle considerations which should be taken into account.

60916144983b100cfd846a31016f1e90?s=128

Dmitry Ivanov

February 20, 2016
Tweet

Transcript

  1. Prac%cal Demys%fica%on of CRDTs Nami Nasserazad (@namiazad, nami.me) Dmitry Ivanov

    (@idajan8s) Didier Liauw 17-18 February 2016, Kraków 1
  2. Disclaimer We are NOT: • Distributed systems experts. • Hardcore

    academia guys. Just curious engineers hacking on real world problems. 2
  3. Who we are? "Fool" stack developers hacking on: • Backend

    services • Mobile || SDKs • Infrastructure && AWS && DevOps 3
  4. NavCloud 4

  5. Server Development Stack 5

  6. Client Libraries 6

  7. NavCloud Nature • Unstable connec,ons • Limited bandwidth • Seamless

    edit/view in offline mode • Concurrent changes with poten7al conflicts • No guarantee on updates order • No data loss • Data convergence to expected value 7
  8. How to Deal with this Nature? 8

  9. Bad programmers worry about the code. Good programmers worry about

    data structures — Linus Torvalds 9
  10. CRDT 10

  11. CRDT DT: Data Type CRDT is a data type with

    its own algebra 11
  12. CRDT R: Replicated CRDT is a family of data structures

    which has been designed to be distributed 12
  13. CRDT C: Conflict Free Resolving conflicts is done automa2cally 13

  14. How? 14

  15. Merge 15

  16. What is Merge? • A binary opera-on on two CRDTs

    • Commuta've: x • y = y • x • Associa've: ( x • y ) • z = x • ( y • z ) • Idempotent: x • x = x 16
  17. How Does it Help? In Distributed Systems: • Order is

    not guaranteed: • No Problem: Merge is Commuta-ve and Associa-ve • Events can be delivered more than once: • No problem: Merge is Idempotent 17
  18. What Does it Bring in Prac1ce? • Local updates •

    Local merge of receiving data • All local merges converge 18
  19. Examples 19

  20. G-Counter 20

  21. G-Counter Merge: Max of corresponding elements: A:6 B:3 C:9 TotalValue:

    Sum of all elements: 6 + 3 + 9 = 18 21
  22. Max Func)on • A binary opera-on on two CRDTs •

    Commuta've: x max y = y max x • Associa've: ( x max y ) max z = x max ( y max z ) • Idempotent: x max x = x 22
  23. G-Set 23

  24. G-Set Merge: Union of sets: { x, y, z, a,

    b, c } Total Value: The same as the merge result 24
  25. Union Func)on • A binary opera-on on two CRDTs •

    Commuta've: x ∪ y = y ∪ x • Associa've: ( x ∪ y ) ∪ z = x ∪ ( y ∪ z ) • Idempotent: x ∪ x = x 25
  26. CRDT in NavCloud 26

  27. Favourite Loca-ons Synchronisa-on 27

  28. Naive Approach? 28

  29. Last Write Wins 29

  30. Problems • Unstable connec-ons • Actual update -me < Sent

    -me • Network latency • Sent -me < Received -me • Unreliable clocks 30
  31. Stale update may win! 31

  32. So What? 32

  33. CRDT 33

  34. NavCloud Nature vs CRDT • Unstable connec,ons ✔ • Limited

    bandwidth ✔ • Seamless edit/view in offline mode ✔ • Concurrent changes with poten7al conflicts ✔ • No guarantee on updates order ✔ • No data loss ✔ • Data convergence to expected value ✔ 34
  35. Same Data Model Everywhere • Server • Clients • Data

    store 35
  36. CRDT Set Implementa/ons Let's do our homework :) 36

  37. 2-Phase-Set Stores addi+ons and removals. • G-Set for added elements

    • G-Set for removed elements aka Tombstones 37
  38. 2-Phase-Set 38

  39. 2-Phase-Set Merge: [ Add { "cat", "dog", "ape" }; Rem

    { "ape" } ] Lookup: { "dog", "ape" } 39
  40. 2-Phase-Set Lookup def lookup: Set[E] = addSet.diff(removeSet).lookup Merge def merge(anotherSet:

    TwoPSet[E]): TwoPSet[E] = new TwoPSet( union(addset, anotherSet.addSet ), union(removeSet, anotherSet.removeSet )) 40
  41. 2-Phase-Set Doesn't work for us: • Removed element can't be

    added again • Immutable elements: no updates possible 41
  42. LWW-Element-Set Stores addi+ons and removals, with !mestamps. • G-Set for

    added elements • G-Set for removed elements aka Tombstones • Each element has a 3mestamp • Supports re-adding removed element using a higher 3mestamp 42
  43. LWW-Element-Set 43

  44. LWW-Element-Set Merge Add { (1, "cat"), (5, "cat"), (1, "dog"),

    (1, "ape") } Rem { (1, "cat"), (3, "cat") } 44
  45. LWW-Element-Set Merge Add { (1, "cat"), (5, "cat"), (1, "dog"),

    (1, "ape") } Rem { (1, "cat"), (3, "cat") } Lookup { "cat", "dog", "ape" } 45
  46. LWW-Element-Set Lookup def lookup: Set[E] = addSet.lookup.filter { addElem =>

    !removeSet.exists { removeElem => removeElem.value == addElem.value && removeElem.timestamp > addElem.timestamp } }.map(_.value) Merge def merge(LWWSet<E> anotherSet): LWWSet<E> = new LWWSet( union(addset, anotherSet.addSet ), union(removeSet, anotherSet.removeSet )) 46
  47. LWW-Element-Set Doesn't work for us: • Immutable elements: no updates

    possible. 47
  48. OR-Set OR - Observed / Removed Stores addi+ons and removals,

    with tags. • G-Set for added elements • G-Set for removed elements aka Tombstones • Unique tag is associated with each inser7on or dele7on • Supports re-adding removed elements 48
  49. OR-Set 49

  50. OR-Set Merge Add { (#a, "cat"), (#c, "cat"), (#b, "dog"),

    (#d, "ape") } Rem { (#a, "cat") } 50
  51. OR-Set Merge Add { (#a, "cat"), (#c, "cat"), (#b, "dog"),

    (#d, "ape") } Rem { (#a, "cat") } Lookup { "cat", "dog", "ape" } 51
  52. OR-Set Lookup E exists iff it has in AddSet a

    tag that is not in the RemoveSet. def lookup(): Set<E> = addSet.filter { addElem => !removeSet.exists { remElem => addElem.value == remElem.value && remElem.tag.equals(addElem.tag) } } .map(_.value); 52
  53. OR-Set Merge def merge(anotherSet: ORSet[E]): ORSet[E] = new ORSet( union(addset,

    anotherSet.addSet ), union(removeSet, anotherSet.removeSet)) 53
  54. OR-Set Doesn't work for us: • Immutable elements: no updates

    possible. 54
  55. OUR-Set Our take on Observed-Updated-Removed Set • Each element has

    a unique iden%fier • Element can be changed if iden4fier remains the same • Each element has a %mestamp • Timestamp is updated on each element muta4on Iden%ty (immutable unique id) vs Value (mutable) 55
  56. OUR-Set Contains a single underlying set of elements with metadata:

    • Each element has a unique id field (e.g. a UUID) • Each element has a "removed" boolean flag • Each element has a )mestamp • Set can only contain one element with a par'cular id 56
  57. OUR-Set 57

  58. OUR-Set Merge { (id1, 5, "*ger"), (id2, 2, "dog", removed),

    (id3, 1, "ape") } 58
  59. OUR-Set Merge: { (id1, 5, "*ger"), (id2, 2, "dog", removed),

    (id3, 1, "ape") } Lookup { "$ger", "ape" } 59
  60. OUR-Set Merge def merge(anotherSet: OURSet[E]]): OURSet[E] = OURSet[E]( elements ++

    anotherSet.elements) .groupBy (_.id) .map (group => group._2.maxBy(_.timestamp)) .toSet) Lookup def lookup(ourSet: OURSet[E]): Set[E] = ourSet.filter (!_.removed) .map (_.value) 60
  61. Implementa)on NavCloud CRDT Model: Favorites 61

  62. CRDT Model: Favorites FavoriteState element: • ID (to uniquely iden.fy

    a favorite) • Timestamp (to indicate the last change .me) • Removed flag (to indicate if favorite has been removed) • Favorite data: ( Name, Loca2on, ... ) 62
  63. Convergence in case of equal !mestamps Compare func-on checks all

    the fields in order of priority: • Timestamp • Removed flag (Add or Delete bias) • .. rest a6ributes .. 63
  64. Using CRDT everywhere • Use the same algorithm everywhere As

    simple as calling the merge func8on 64
  65. Using CRDT everywhere Client <-> Server <-> Database def update(fromClient:

    OURSet[FavoriteState]): OURSet[FavoriteState] = { val fromDatabase = database.fetch(...) val newSet = fromDatabase.merge(fromClient) database.push(newSet) newSet } 65
  66. Considera*ons & Limita*ons 66

  67. "What About Garbage?" • CRDTs tend to grow because of

    tombstones. • Deleted Favorite in the Set == Tombstone. • A poten?ally unbounded growth. 67
  68. Case MyDrive beta-test user with ~3000 deleted favorites and 5

    non-deleted ones. => 1 Mb Favorites.json 68
  69. Prune deleted favorites But when? Requirement: all nodes holding a

    Favorites set should have seen a deleted element before it can be pruned. Otherwise deleted elements can be resurrected. 69
  70. Solu%on #1: Client-awareness & LastSyncTime Capturing a +me of the

    last sync between a client and the service. if (clients.forAll(_.lastSyncTimestamp > tombstone.timestamp)) { crdtSet.remove(tombstone) } 70
  71. Solu%on #2: Time-To-Live for tombstones Prune tombstones once TTL exceeded.

    if ((DateTime.now() - tombstone.timestamp) > TimeToLive) { crdtSet.drop(tombstone) } Requirement: all nodes holding a CRDT set should apply the same TTL rule independently. 71
  72. Solu%on #3: Send only diff upon any update. Client has

    a set of [ A, B, C ]; Server has a set of [ A, B'', C ]. Client modifies and sends only updated favorites: [ A', B' ] Before: Server responds with a full merged set [ A', B'', C ]. 72
  73. Solu%on #3: Send only diff upon any update. We introduced

    a scoped diff: Server responds with a diff set [ B'' ] as B' update from the client has lost to B'' on the server. A' element is skipped as it has won on the server. 73
  74. Trouble With Time 74

  75. There is no such thing as reliable (me. 75

  76. Tracking *me is actually tracking causality. — Jonas Bonér, "Life

    Beyond the Illusion of Present" Causality & Ordering events. 76
  77. Time can be just good enough. 77

  78. Ordering updates within a single node Timestamp field as a

    logical clock. Actual value is not important, but it should always grow monotonically. 78
  79. Ordering updates within a single node "+1 Strategy": Long resolveNewTimestamp(ElementState<E>

    state) { return Math.max( retrieveTimestamp(), state.lastModified() + 1 ); } 79
  80. Ordering updates from different nodes If GPS clock is available

    -> use it (mainly PND case). Prefer the server &me to a client's local 0me. 80
  81. Edge case Mul$ple Clients modify the same element (concurrently ||

    without a reliable clock). 81
  82. One "merge" to rule them all 82

  83. Clients & Server MUST have same 'merge' behaviour. == Given

    the same input, their 'merge' func/ons emit the same results. 83
  84. Divergence may lead to endless synchronisa0on loops! 84

  85. "So what?" :) • Academia is not as scary as

    might seem to pragma1c devs. • Look for the best && simplest solu1ons. • Understand their limita/ons. • Analyze and monitor real usage. • Never se?le: constantly search how to tune & improve. 85
  86. "Show me the code" Scala samples h+ps:/ /github.com/ajan7s/scala-crdt Java samples

    h+ps:/ /github.com/ajan8s/java-crdt 86
  87. Homework for the curious minds (Part 1) • CRDTs for

    fun and eventual profit, - Noel Welsh, 2013. • Readings in conflict-free replicated data types, - Christopher Meiklejohn, 2015. • A comprehensive study of Convergent and CommutaJve Replicated Data Types, - Marc Shapiro, Nuno Preguiça, Carlos Baquero, Marek Zawirski, 2011. 87
  88. Homework for the curious minds (Part 2) • Lasp: A

    language for distributed, coordina7on-free programming, - Meiklejohn & Van Roy, 2015. • Swarm.js+React — real-7me, offline-ready Holy Grail web apps, - Victor Grishchenko, 2014. 88
  89. Any ques)ons? :) 89