Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Practical Data Synchronization using CRDTs QConSF 2016

Practical Data Synchronization using CRDTs QConSF 2016

Presented @ QConSF 2016: https://qconsf.com/sf2016/presentation/practical-data-synchronization-using-CRDTs

Abstract:

In a connected world, synchronising mutable information between different devices with different clock precision can be a difficult problem. A piece of data may have many out-of-sync replicas but all of those should eventually be in a consistent state. For example, TomTom users, having personal navigation devices, smartphones, MyDrive website accounts, expect their navigation information be synchronised properly even in the occasional absence of network connection. Conflict-free Replicated Data Types (CRDTs) provide robust data structures to achieve proper synchronisation in an unreliable network of devices. They enable the conflict resolution being done locally at the data type level while guaranteeing the eventual consistency between replicas.

In addition to an introduction to common CRDT types, the main focus is on the special subtype of CRDT-Set called OUR-Set (Observed, Updated, Removed), which we created to extend known CRDT sets with update functionality.

I will demonstrate basic implementations of various CRDTs in Scala and enumerate subtle considerations which should be taken into account. I will also explain the advantages of these data structures to solve many synchronisation problems as well as their limitations.

Dmitry Ivanov

November 08, 2016
Tweet

More Decks by Dmitry Ivanov

Other Decks in Programming

Transcript

  1. Prac%cal Data Synchroniza%on
    &
    CRDTs
    Dmitry Ivanov @idajan0s
    2016
    1

    View Slide

  2. 2

    View Slide

  3. NavCloud
    3

    View Slide

  4. Who We Are
    "Fool" stack developers hacking on:
    • Backend services
    • Client libraries
    • Infrastructure && DevOps
    4

    View Slide

  5. Backend stack
    5

    View Slide

  6. Client Libraries
    6

    View Slide

  7. NavCloud Nature
    • Unstable connec,ons
    • Limited data plans & bandwidth
    • Seamless edit/view in offline mode
    • Concurrent changes with poten8al
    conflicts
    • No guarantee on updates order
    • No data loss
    • Data convergence to expected value
    7

    View Slide

  8. How to Deal with this Nature?
    8

    View Slide

  9. Bad programmers worry about the
    code. Good programmers worry
    about data structures
    — Linus Torvalds
    9

    View Slide

  10. CRDT
    10

    View Slide

  11. CRDT
    DT: Data Type
    CRDT is a data type with its own algebra
    11

    View Slide

  12. CRDT
    R: Replicated
    CRDT is a family of data structures which
    has been designed to be distributed
    12

    View Slide

  13. CRDT
    C: Conflict Free
    Resolving conflicts is done automa2cally
    13

    View Slide

  14. How?
    14

    View Slide

  15. Merge
    15

    View Slide

  16. What is Merge?
    • A binary opera-on on two CRDTs
    • Commuta've: x • y = y • x
    • Associa've: ( x • y ) • z = x • ( y • z )
    • Idempotent: x • x = x
    16

    View Slide

  17. How Does it Help?
    In Distributed Systems:
    • Order is not guaranteed:
    • No Problem: Merge is Commuta-ve and Associa-ve
    • Events can be delivered more than once:
    • No problem: Merge is Idempotent
    17

    View Slide

  18. What Does it Bring in Prac1ce?
    • Local updates
    • Local merge of receiving data
    • All local merges converge
    18

    View Slide

  19. Examples
    19

    View Slide

  20. G-Counter
    20

    View Slide

  21. G-Counter
    Merge: Max of corresponding elements: A:6 B:3 C:9
    TotalValue: Sum of all elements: 6 + 3 + 9 = 18
    21

    View Slide

  22. Max Func)on
    • A binary opera-on on two CRDTs
    • Commuta've: x max y = y max x
    • Associa've: ( x max y ) max z = x max ( y max z )
    • Idempotent: x max x = x
    22

    View Slide

  23. G-Set
    23

    View Slide

  24. Union Func)on
    • A binary opera-on on two CRDTs
    • Commuta've: x ∪ y = y ∪ x
    • Associa've: ( x ∪ y ) ∪ z = x ∪ ( y ∪ z )
    • Idempotent: x ∪ x = x
    24

    View Slide

  25. G-Set
    Merge: Union of sets: { x, y, z, a, b, c }
    Total Value: The same as the merge result
    25

    View Slide

  26. CRDT in NavCloud
    26

    View Slide

  27. Favorite Loca,ons
    Synchroniza,on
    27

    View Slide

  28. Naive Approach?
    28

    View Slide

  29. Last Write Wins
    29

    View Slide

  30. Problems
    • Unstable connec-ons
    • Actual update -me < Sent -me
    • Network latency
    • Sent -me < Received -me
    • Unreliable clocks
    30

    View Slide

  31. Stale update may win!
    31

    View Slide

  32. So What?
    32

    View Slide

  33. CRDT
    33

    View Slide

  34. NavCloud Nature vs CRDT
    • Unstable connec,ons ✔
    • Limited data plans & bandwidth ✔
    • Seamless edit/view in offline mode ✔
    • Concurrent changes with poten8al
    conflicts ✔
    • No guarantee on updates order ✔
    • No data loss ✔
    • Data convergence to expected value ✔
    34

    View Slide

  35. Same Data Model Everywhere
    • Server
    • Clients
    • Data store
    35

    View Slide

  36. Merging Conflicts in Riak
    36

    View Slide

  37. The data consistency is determined
    by 'the weakest link' in your pipeline
    37

    View Slide

  38. Implemen'ng a CRDT Set
    What do we want?
    • Support for addi-on and removal opera-ons.
    • Op-mized for element muta-ons.
    • Footprint as compact as possible.
    38

    View Slide

  39. 2-Phase-Set
    Supports addi,ons and removals.
    • G-Set for added elements
    • G-Set for removed elements aka Tombstones
    39

    View Slide

  40. 2-Phase-Set
    40

    View Slide

  41. 2-Phase-Set
    Merge: [ Add { "cat", "dog", "ape" }; Rem { "ape" } ]
    Lookup: { "cat", "dog" }
    41

    View Slide

  42. 2-Phase-Set
    Lookup
    def lookup: Set[E] = addSet.diff(removeSet).lookup
    Merge
    def merge(anotherSet: TwoPSet[E]): TwoPSet[E] =
    new TwoPSet( addset.merge(anotherSet.addSet),
    removeSet.merge(anotherSet.removeSet))
    42

    View Slide

  43. 2-Phase-Set
    Doesn't work for us:
    • Removed element can't be added again
    • Immutable elements: no updates possible
    43

    View Slide

  44. LWW-Element-Set
    Supports addi,ons and removals, with !mestamps.
    • G-Set for added elements
    • G-Set for removed elements aka Tombstones
    • Each element has a 3mestamp
    • Supports re-adding removed element using a higher 3mestamp
    44

    View Slide

  45. LWW-Element-Set
    45

    View Slide

  46. LWW-Element-Set
    Merge
    Add { (1, "cat"), (5, "cat"), (1, "dog"), (1, "ape") }
    Rem { (1, "cat"), (3, "cat") }
    46

    View Slide

  47. LWW-Element-Set
    Merge
    Add { (1, "cat"), (5, "cat"), (1, "dog"), (1, "ape") }
    Rem { (1, "cat"), (3, "cat") }
    Lookup
    { "cat", "dog", "ape" }
    47

    View Slide

  48. LWW-Element-Set
    Lookup
    def lookup: Set[E] = addSet.lookup.filter { addElem =>
    !removeSet.exists { removeElem =>
    removeElem.value == addElem.value && removeElem.timestamp > addElem.timestamp
    }
    }.map(_.value)
    Merge
    def merge(LWWSet anotherSet): LWWSet =
    new LWWSet( addset.merge(anotherSet.addSet),
    removeSet.merge(anotherSet.removeSet))
    48

    View Slide

  49. LWW-Element-Set
    Doesn't work for us:
    • Immutable elements: no updates possible.
    49

    View Slide

  50. OR-Set
    OR - Observed / Removed
    Supports addi,ons and removals, with tags.
    • G-Set for added elements
    • G-Set for removed elements aka Tombstones
    • Unique tag is associated with each element
    • Supports re-adding removed elements
    50

    View Slide

  51. OR-Set
    51

    View Slide

  52. OR-Set
    Merge
    Add { (#a, "cat"), (#c, "cat"), (#b, "dog"), (#d, "ape") }
    Rem { (#a, "cat") }
    52

    View Slide

  53. OR-Set
    Merge
    Add { (#a, "cat"), (#c, "cat"), (#b, "dog"), (#d, "ape") }
    Rem { (#a, "cat") }
    Lookup
    { "cat", "dog", "ape" }
    53

    View Slide

  54. OR-Set
    Lookup
    E exists iff it has in AddSet a tag that is not in the RemoveSet.
    def lookup(): Set =
    addSet.filter { addElem =>
    !removeSet.exists { remElem =>
    addElem.value == remElem.value
    && remElem.tag.equals(addElem.tag) }
    }
    .map(_.value);
    54

    View Slide

  55. OR-Set
    Merge
    def merge(anotherSet: ORSet[E]): ORSet[E] =
    new ORSet( addset.merge(anotherSet.addSet),
    removeSet.merge(anotherSet.removeSet))
    55

    View Slide

  56. OR-Set
    Doesn't work for us:
    • Immutable elements: no updates possible.
    56

    View Slide

  57. OUR-Set
    Our take on Observed-Updated-Removed Set
    • Each element has a unique iden%fier
    • Element can be changed if iden4fier remains the same
    • Each element has a %mestamp
    • Timestamp is updated on each element muta4on
    Iden%ty (immutable unique id) vs Value (mutable)
    57

    View Slide

  58. OUR-Set
    Contains a single underlying set of elements with metadata:
    • Each element has a unique id field (e.g. a UUID)
    • Each element has a "removed" boolean flag
    • Each element has a )mestamp
    • Set can only contain one element with a par'cular id
    58

    View Slide

  59. OUR-Set
    59

    View Slide

  60. OUR-Set
    Merge
    { (id1, 5, "*ger"), (id2, 2, "dog", removed), (id3, 1, "ape") }
    60

    View Slide

  61. OUR-Set
    Merge:
    { (id1, 5, "*ger"), (id2, 2, "dog", removed), (id3, 1, "ape") }
    Lookup
    { "$ger", "ape" }
    61

    View Slide

  62. OUR-Set
    Merge
    def merge(anotherSet: OURSet[E]]): OURSet[E] =
    OURSet[E]( elements ++ anotherSet.elements)
    .groupBy (_.id)
    .map (group => group._2.maxBy(_.timestamp))
    .toSet)
    Lookup
    def lookup(ourSet: OURSet[E]): Set[E] =
    ourSet.filter (!_.removed)
    .map (_.value)
    62

    View Slide

  63. Implementa)on
    NavCloud CRDT Model: Favorites
    63

    View Slide

  64. CRDT Model: Favorites
    FavoriteState element:
    • ID (to uniquely iden.fy a favorite)
    • Timestamp (to indicate the last change .me)
    • Removed flag (to indicate if favorite has been removed)
    • Favorite data: ( Name, Loca2on, ... )
    64

    View Slide

  65. Convergence in case of equal !mestamps
    Compare func-on checks all the fields in order of priority:
    • Timestamp
    • Removed flag (Add or Delete bias)
    • .. rest a6ributes ..
    65

    View Slide

  66. Using CRDT everywhere
    • Use the same algorithm everywhere
    As simple as calling the merge func8on
    66

    View Slide

  67. Using CRDT everywhere
    Client <-> Server <-> Database
    def update(fromClient: OURSet[E]): OURSet[E] = {
    val fromDatabase = database.fetch(...)
    val newSet = fromDatabase.merge(fromClient)
    database.store(..., newSet)
    newSet
    }
    67

    View Slide

  68. 68

    View Slide

  69. Considera*ons & Limita*ons
    69

    View Slide

  70. "What about garbage?"
    • CRDTs tend to grow because of tombstones.
    • Deleted Element in the Set == Tombstone.
    • A poten?ally unbounded growth.
    70

    View Slide

  71. Prune deleted elements
    But when?
    Requirement:
    All nodes holding a CRDT Set replica should have seen a deleted
    element before it can be pruned.
    Otherwise deleted elements can be resurrected.
    71

    View Slide

  72. Time-To-Live for tombstones
    Prune tombstones once TTL exceeded.
    if ((DateTime.now() - tombstone.timestamp) > TimeToLive) {
    crdtSet.remove(tombstone)
    }
    Requirement: all nodes holding a CRDT set should apply the same
    TTL rule independently.
    72

    View Slide

  73. Prune deleted elements
    Problem
    Synchroniza+on between all replicas is needed for correctness.
    73

    View Slide

  74. Distributed
    transac.ons
    74

    View Slide

  75. - Academia, help!
    75

    View Slide

  76. 76

    View Slide

  77. Op#mized OR-Set
    Introduces replica awareness
    77

    View Slide

  78. Op#mized OR-Set
    Addi$onal metadata is added to every transferred state.
    { (replica_id -> seq_nr) }
    where:
    - replica_id - is a unique & stable replica iden5fier.
    - seq_nr - monotonically growing (a=er each op) local counter.
    78

    View Slide

  79. Op#mized OR-Set
    Each local state maintains a map:
    { replica_A: 1, replica_B: 1, replica_C: 3 }
    If a received state has a seq_nr lower than the corresponding local
    value -> ignore.
    79

    View Slide

  80. Op#mized OR-Set
    No Tombstones, yay! ☺
    (Slightly) more complicated API: stable replica_id needed. ☹
    80

    View Slide

  81. Update & Reply with a Diff
    Client modifies and sends only updated elements (Diff).
    Before: Server responds with a full merge result.
    81

    View Slide

  82. Update & Reply with a Diff
    We introduced a 'Scoped Diff':
    Server responds only with the elements which have won against
    those sent by the client.
    82

    View Slide

  83. Server -> Client Diff
    83

    View Slide

  84. - Academia, help?..
    84

    View Slide

  85. 85

    View Slide

  86. δ-CRDT
    Builds on replica awareness
    Introduces a Causal Context:
    map of (replica_id -> seq_nr).
    Introduces a Dot Store: CRDT state (No tombstones).
    86

    View Slide

  87. δ-CRDT
    A formalized way to compute a minimal δ-CRDT instances
    against a target replica.
    87

    View Slide

  88. δ-CRDT
    Adrian Colyer (The Morning Paper) wrote a great paper review:
    blog.acolyer.org/2016/04/25/delta-state-replicated-data-types
    88

    View Slide

  89. Trouble With Time
    89

    View Slide

  90. There is no such thing as reliable (me*.
    90

    View Slide

  91. Tracking *me is actually
    tracking causality.
    — Jonas Bonér, "Life Beyond the Illusion of Present"
    91

    View Slide

  92. Causality & Ordering of events.
    92

    View Slide

  93. Time can be just good enough.
    93

    View Slide

  94. Ordering updates within a single node
    Timestamp field as a logical clock.
    Absolute value is not important,
    but it should always grow monotonically.
    94

    View Slide

  95. Ordering updates within a single node
    "+1 Strategy" (aka ensure monotonicity):
    Long resolveNewTimestamp(ElementState state) {
    return Math.max( retrieveTimestamp(),
    state.lastModified() + 1 );
    }
    95

    View Slide

  96. Ordering updates from different nodes
    If GPS clock is available -> use it (mainly Naviga&on Devices case).
    Prefer the server &me to a client's local 0me.
    96

    View Slide

  97. Edge case
    Mul$ple Clients modify the same element
    (concurrently || without a reliable clock).
    97

    View Slide

  98. One "merge" to rule them all
    98

    View Slide

  99. Clients & Server MUST have same 'merge'
    behaviour.
    ==
    Given the same input, their 'merge' func/ons
    emit the same results.
    99

    View Slide

  100. Divergence may lead to endless synchroniza1on loops!
    100

    View Slide

  101. Lazy (data) loading
    OURSet Element
    • Metadata: UUID, $mestamp, "removed" flag
    • Data:
    101

    View Slide

  102. Lazy (data) loading
    New OURSet Element
    • Metadata: UUID, $mestamp, "removed" flag, + tag / hash
    • (Op(onal) Data:
    Flexible synchroniza1on strategy
    Eager || Lazy Fetch
    102

    View Slide

  103. What have we learned?
    • Academia is not as scary as it some-mes seems to pragma,c devs.
    • We need be2er and simpler abstrac-ons to develop
    Offline-friendly apps.
    • CRDTs give a great value, but there are some caveats.
    • Things like Lasp (lasp-lang.org) also could be the answer (?).
    103

    View Slide

  104. Show me the code
    github.com/ajan/s/{scala | java}-crdt
    104

    View Slide

  105. Thanks!
    Slides: h*p:/
    /bit.ly/2fBlroS
    Dmitry Ivanov @idajan0s
    105

    View Slide