Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Practical Demystification of CRDTs (LambdaDays 2016)

Dmitry Ivanov
February 20, 2016

Practical Demystification of CRDTs (LambdaDays 2016)

Prepared & presented together with Nami Nasserazad (https://twitter.com/namiazad) and Didier Liauw.

Abstract:

In a connected world, synchronising mutable information between different devices with different clock precision can be a difficult problem. A piece of data may have many out-of-sync replicas but all of those should eventually be in a consistent state. For example, TomTom users, having personal navigation devices, smartphones, MyDrive website accounts, expect their navigation information be synchronised properly even in the occasional absence of network connection.

Conflict-free Replicated Data Types (CRDTs) provide robust data structures to achieve proper synchronisation in an unreliable network of devices. They enable the conflict resolution being done locally at the data type level while guaranteeing the eventual consistency between replicas.

In this talk, in addition to an introduction to CRDT, our main focus will be on a special type of CRDT-set called OUR-set (Observed, Updated, Removed) which we created to extend known CRDT-sets with update functionality. We will explain the advantages of this data structure to solve many synchronisation problems as well as its limitations. We also show how a basic implementation of OUR-set CRDT in Scala and its counterpart in Java looks like and enumerate a set of subtle considerations which should be taken into account.

Dmitry Ivanov

February 20, 2016
Tweet

More Decks by Dmitry Ivanov

Other Decks in Programming

Transcript

  1. Prac%cal Demys%fica%on
    of
    CRDTs
    Nami Nasserazad (@namiazad, nami.me)
    Dmitry Ivanov (@idajan8s)
    Didier Liauw
    17-18 February 2016, Kraków
    1

    View full-size slide

  2. Disclaimer
    We are NOT:
    • Distributed systems experts.
    • Hardcore academia guys.
    Just curious engineers hacking on real
    world problems.
    2

    View full-size slide

  3. Who we are?
    "Fool" stack developers hacking on:
    • Backend services
    • Mobile || SDKs
    • Infrastructure && AWS && DevOps
    3

    View full-size slide

  4. Server Development Stack
    5

    View full-size slide

  5. Client Libraries
    6

    View full-size slide

  6. NavCloud Nature
    • Unstable connec,ons
    • Limited bandwidth
    • Seamless edit/view in offline mode
    • Concurrent changes with poten7al
    conflicts
    • No guarantee on updates order
    • No data loss
    • Data convergence to expected value
    7

    View full-size slide

  7. How to Deal with this Nature?
    8

    View full-size slide

  8. Bad programmers worry about the
    code. Good programmers worry
    about data structures
    — Linus Torvalds
    9

    View full-size slide

  9. CRDT
    DT: Data Type
    CRDT is a data type with its own algebra
    11

    View full-size slide

  10. CRDT
    R: Replicated
    CRDT is a family of data structures which
    has been designed to be distributed
    12

    View full-size slide

  11. CRDT
    C: Conflict Free
    Resolving conflicts is done automa2cally
    13

    View full-size slide

  12. What is Merge?
    • A binary opera-on on two CRDTs
    • Commuta've: x • y = y • x
    • Associa've: ( x • y ) • z = x • ( y • z )
    • Idempotent: x • x = x
    16

    View full-size slide

  13. How Does it Help?
    In Distributed Systems:
    • Order is not guaranteed:
    • No Problem: Merge is Commuta-ve and Associa-ve
    • Events can be delivered more than once:
    • No problem: Merge is Idempotent
    17

    View full-size slide

  14. What Does it Bring in Prac1ce?
    • Local updates
    • Local merge of receiving data
    • All local merges converge
    18

    View full-size slide

  15. G-Counter
    Merge: Max of corresponding elements: A:6 B:3 C:9
    TotalValue: Sum of all elements: 6 + 3 + 9 = 18
    21

    View full-size slide

  16. Max Func)on
    • A binary opera-on on two CRDTs
    • Commuta've: x max y = y max x
    • Associa've: ( x max y ) max z = x max ( y max z )
    • Idempotent: x max x = x
    22

    View full-size slide

  17. G-Set
    Merge: Union of sets: { x, y, z, a, b, c }
    Total Value: The same as the merge result
    24

    View full-size slide

  18. Union Func)on
    • A binary opera-on on two CRDTs
    • Commuta've: x ∪ y = y ∪ x
    • Associa've: ( x ∪ y ) ∪ z = x ∪ ( y ∪ z )
    • Idempotent: x ∪ x = x
    25

    View full-size slide

  19. CRDT in NavCloud
    26

    View full-size slide

  20. Favourite Loca-ons
    Synchronisa-on
    27

    View full-size slide

  21. Naive Approach?
    28

    View full-size slide

  22. Last Write Wins
    29

    View full-size slide

  23. Problems
    • Unstable connec-ons
    • Actual update -me < Sent -me
    • Network latency
    • Sent -me < Received -me
    • Unreliable clocks
    30

    View full-size slide

  24. Stale update may win!
    31

    View full-size slide

  25. NavCloud Nature vs CRDT
    • Unstable connec,ons ✔
    • Limited bandwidth ✔
    • Seamless edit/view in offline mode ✔
    • Concurrent changes with poten7al
    conflicts ✔
    • No guarantee on updates order ✔
    • No data loss ✔
    • Data convergence to expected value ✔
    34

    View full-size slide

  26. Same Data Model Everywhere
    • Server
    • Clients
    • Data store
    35

    View full-size slide

  27. CRDT Set Implementa/ons
    Let's do our homework :)
    36

    View full-size slide

  28. 2-Phase-Set
    Stores addi+ons and removals.
    • G-Set for added elements
    • G-Set for removed elements aka Tombstones
    37

    View full-size slide

  29. 2-Phase-Set
    38

    View full-size slide

  30. 2-Phase-Set
    Merge: [ Add { "cat", "dog", "ape" }; Rem { "ape" } ]
    Lookup: { "dog", "ape" }
    39

    View full-size slide

  31. 2-Phase-Set
    Lookup
    def lookup: Set[E] = addSet.diff(removeSet).lookup
    Merge
    def merge(anotherSet: TwoPSet[E]): TwoPSet[E] =
    new TwoPSet( union(addset, anotherSet.addSet ),
    union(removeSet, anotherSet.removeSet ))
    40

    View full-size slide

  32. 2-Phase-Set
    Doesn't work for us:
    • Removed element can't be added again
    • Immutable elements: no updates possible
    41

    View full-size slide

  33. LWW-Element-Set
    Stores addi+ons and removals, with !mestamps.
    • G-Set for added elements
    • G-Set for removed elements aka Tombstones
    • Each element has a 3mestamp
    • Supports re-adding removed element using a higher 3mestamp
    42

    View full-size slide

  34. LWW-Element-Set
    43

    View full-size slide

  35. LWW-Element-Set
    Merge
    Add { (1, "cat"), (5, "cat"), (1, "dog"), (1, "ape") }
    Rem { (1, "cat"), (3, "cat") }
    44

    View full-size slide

  36. LWW-Element-Set
    Merge
    Add { (1, "cat"), (5, "cat"), (1, "dog"), (1, "ape") }
    Rem { (1, "cat"), (3, "cat") }
    Lookup
    { "cat", "dog", "ape" }
    45

    View full-size slide

  37. LWW-Element-Set
    Lookup
    def lookup: Set[E] = addSet.lookup.filter { addElem =>
    !removeSet.exists { removeElem =>
    removeElem.value == addElem.value && removeElem.timestamp > addElem.timestamp
    }
    }.map(_.value)
    Merge
    def merge(LWWSet anotherSet): LWWSet =
    new LWWSet( union(addset, anotherSet.addSet ),
    union(removeSet, anotherSet.removeSet ))
    46

    View full-size slide

  38. LWW-Element-Set
    Doesn't work for us:
    • Immutable elements: no updates possible.
    47

    View full-size slide

  39. OR-Set
    OR - Observed / Removed
    Stores addi+ons and removals, with tags.
    • G-Set for added elements
    • G-Set for removed elements aka Tombstones
    • Unique tag is associated with each inser7on or dele7on
    • Supports re-adding removed elements
    48

    View full-size slide

  40. OR-Set
    Merge
    Add { (#a, "cat"), (#c, "cat"), (#b, "dog"), (#d, "ape") }
    Rem { (#a, "cat") }
    50

    View full-size slide

  41. OR-Set
    Merge
    Add { (#a, "cat"), (#c, "cat"), (#b, "dog"), (#d, "ape") }
    Rem { (#a, "cat") }
    Lookup
    { "cat", "dog", "ape" }
    51

    View full-size slide

  42. OR-Set
    Lookup
    E exists iff it has in AddSet a tag that is not in the RemoveSet.
    def lookup(): Set =
    addSet.filter { addElem =>
    !removeSet.exists { remElem =>
    addElem.value == remElem.value
    && remElem.tag.equals(addElem.tag) }
    }
    .map(_.value);
    52

    View full-size slide

  43. OR-Set
    Merge
    def merge(anotherSet: ORSet[E]): ORSet[E] =
    new ORSet( union(addset, anotherSet.addSet ),
    union(removeSet, anotherSet.removeSet))
    53

    View full-size slide

  44. OR-Set
    Doesn't work for us:
    • Immutable elements: no updates possible.
    54

    View full-size slide

  45. OUR-Set
    Our take on Observed-Updated-Removed Set
    • Each element has a unique iden%fier
    • Element can be changed if iden4fier remains the same
    • Each element has a %mestamp
    • Timestamp is updated on each element muta4on
    Iden%ty (immutable unique id) vs Value (mutable)
    55

    View full-size slide

  46. OUR-Set
    Contains a single underlying set of elements with metadata:
    • Each element has a unique id field (e.g. a UUID)
    • Each element has a "removed" boolean flag
    • Each element has a )mestamp
    • Set can only contain one element with a par'cular id
    56

    View full-size slide

  47. OUR-Set
    Merge
    { (id1, 5, "*ger"), (id2, 2, "dog", removed), (id3, 1, "ape") }
    58

    View full-size slide

  48. OUR-Set
    Merge:
    { (id1, 5, "*ger"), (id2, 2, "dog", removed), (id3, 1, "ape") }
    Lookup
    { "$ger", "ape" }
    59

    View full-size slide

  49. OUR-Set
    Merge
    def merge(anotherSet: OURSet[E]]): OURSet[E] =
    OURSet[E]( elements ++ anotherSet.elements)
    .groupBy (_.id)
    .map (group => group._2.maxBy(_.timestamp))
    .toSet)
    Lookup
    def lookup(ourSet: OURSet[E]): Set[E] =
    ourSet.filter (!_.removed)
    .map (_.value)
    60

    View full-size slide

  50. Implementa)on
    NavCloud CRDT Model: Favorites
    61

    View full-size slide

  51. CRDT Model: Favorites
    FavoriteState element:
    • ID (to uniquely iden.fy a favorite)
    • Timestamp (to indicate the last change .me)
    • Removed flag (to indicate if favorite has been removed)
    • Favorite data: ( Name, Loca2on, ... )
    62

    View full-size slide

  52. Convergence in case of equal !mestamps
    Compare func-on checks all the fields in order of priority:
    • Timestamp
    • Removed flag (Add or Delete bias)
    • .. rest a6ributes ..
    63

    View full-size slide

  53. Using CRDT everywhere
    • Use the same algorithm everywhere
    As simple as calling the merge func8on
    64

    View full-size slide

  54. Using CRDT everywhere
    Client <-> Server <-> Database
    def update(fromClient: OURSet[FavoriteState]): OURSet[FavoriteState] = {
    val fromDatabase = database.fetch(...)
    val newSet = fromDatabase.merge(fromClient)
    database.push(newSet)
    newSet
    }
    65

    View full-size slide

  55. Considera*ons & Limita*ons
    66

    View full-size slide

  56. "What About Garbage?"
    • CRDTs tend to grow because of tombstones.
    • Deleted Favorite in the Set == Tombstone.
    • A poten?ally unbounded growth.
    67

    View full-size slide

  57. Case
    MyDrive beta-test user with ~3000 deleted
    favorites and 5 non-deleted ones.
    => 1 Mb Favorites.json
    68

    View full-size slide

  58. Prune deleted favorites
    But when?
    Requirement: all nodes holding a Favorites set should have seen a
    deleted element before it can be pruned.
    Otherwise deleted elements can be resurrected.
    69

    View full-size slide

  59. Solu%on #1: Client-awareness & LastSyncTime
    Capturing a +me of the last sync between a client and the service.
    if (clients.forAll(_.lastSyncTimestamp > tombstone.timestamp)) {
    crdtSet.remove(tombstone)
    }
    70

    View full-size slide

  60. Solu%on #2: Time-To-Live for tombstones
    Prune tombstones once TTL exceeded.
    if ((DateTime.now() - tombstone.timestamp) > TimeToLive) {
    crdtSet.drop(tombstone)
    }
    Requirement: all nodes holding a CRDT set should apply the same
    TTL rule independently.
    71

    View full-size slide

  61. Solu%on #3: Send only diff upon any update.
    Client has a set of [ A, B, C ]; Server has a set of [ A, B'', C ].
    Client modifies and sends only updated favorites: [ A', B' ]
    Before: Server responds with a full merged set [ A', B'', C ].
    72

    View full-size slide

  62. Solu%on #3: Send only diff upon any update.
    We introduced a scoped diff:
    Server responds with a diff set [ B'' ] as B' update from the client
    has lost to B'' on the server.
    A' element is skipped as it has won on the server.
    73

    View full-size slide

  63. Trouble With Time
    74

    View full-size slide

  64. There is no such thing as reliable (me.
    75

    View full-size slide

  65. Tracking *me is actually
    tracking causality.
    — Jonas Bonér, "Life Beyond the Illusion of Present"
    Causality & Ordering events.
    76

    View full-size slide

  66. Time can be just good enough.
    77

    View full-size slide

  67. Ordering updates within a single node
    Timestamp field as a logical clock.
    Actual value is not important, but it should always grow
    monotonically.
    78

    View full-size slide

  68. Ordering updates within a single node
    "+1 Strategy":
    Long resolveNewTimestamp(ElementState state) {
    return Math.max( retrieveTimestamp(),
    state.lastModified() + 1 );
    }
    79

    View full-size slide

  69. Ordering updates from different nodes
    If GPS clock is available -> use it (mainly PND case).
    Prefer the server &me to a client's local 0me.
    80

    View full-size slide

  70. Edge case
    Mul$ple Clients modify the same element
    (concurrently || without a reliable clock).
    81

    View full-size slide

  71. One "merge" to rule them all
    82

    View full-size slide

  72. Clients & Server MUST have same 'merge'
    behaviour.
    ==
    Given the same input, their 'merge' func/ons
    emit the same results.
    83

    View full-size slide

  73. Divergence may lead to endless synchronisa0on loops!
    84

    View full-size slide

  74. "So what?" :)
    • Academia is not as scary as might seem to pragma1c devs.
    • Look for the best && simplest solu1ons.
    • Understand their limita/ons.
    • Analyze and monitor real usage.
    • Never se?le: constantly search how to tune & improve.
    85

    View full-size slide

  75. "Show me the code"
    Scala samples
    h+ps:/
    /github.com/ajan7s/scala-crdt
    Java samples
    h+ps:/
    /github.com/ajan8s/java-crdt
    86

    View full-size slide

  76. Homework for the curious minds (Part 1)
    • CRDTs for fun and eventual profit, - Noel Welsh, 2013.
    • Readings in conflict-free replicated data types, - Christopher
    Meiklejohn, 2015.
    • A comprehensive study of Convergent and CommutaJve
    Replicated Data Types, - Marc Shapiro, Nuno Preguiça, Carlos
    Baquero, Marek Zawirski, 2011.
    87

    View full-size slide

  77. Homework for the curious minds (Part 2)
    • Lasp: A language for distributed, coordina7on-free programming,
    - Meiklejohn & Van Roy, 2015.
    • Swarm.js+React — real-7me, offline-ready Holy Grail web apps, -
    Victor Grishchenko, 2014.
    88

    View full-size slide

  78. Any ques)ons? :)
    89

    View full-size slide