Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Practical Demystification of CRDTs

Practical Demystification of CRDTs

This talk was presented at Amsterdam.Scala meetup (http://www.meetup.com/amsterdam-scala/events/224469203/) on 27th August 2015 by Nami Nasserazad (https://twitter.com/namiazad), Didier Liauw and Dmitry Ivanov (https://twitter.com/idajantis)

In a connected world, synchronising mutable information between different devices with different clock precision can be a difficult problem. A piece of data may have many out-of-sync replicas but all of those should eventually be in a consistent state. For example, TomTom (https://tomtom.com) users, having personal navigation devices, smartphones, MyDrive (https://mydrive.tomtom.com/en_gb/) website accounts, expect their navigation information be synchronised properly even in the occasional absence of network connection.
Conflict-free Replicated Data Types (CRDTs) provide robust data structures to achieve proper synchronisation in an unreliable network of devices. They enable the conflict resolution being done locally at the data type level while guaranteeing the eventual consistency between replicas.
In this talk, in addition to an introduction to CRDT, our main focus is on a special type of CRDT-set called OUR-set (Observed, Updated, Removed) which we created to extend known CRDT-sets with update functionality. We will explain the advantages of this data structure to solve many synchronisation problems as well as its limitations. We also show how a basic implementation of OUR-set CRDT in Scala and its counterpart in Java looks like and enumerate a set of subtle considerations which should be taken into account.

Dmitry Ivanov

August 27, 2015
Tweet

More Decks by Dmitry Ivanov

Other Decks in Programming

Transcript

  1. PRACTICAL
    PRACTICAL
    PRACTICAL
    PRACTICAL
    PRACTICAL
    PRACTICAL
    PRACTICAL
    PRACTICAL
    PRACTICAL
    PRACTICAL
    PRACTICAL
    PRACTICAL
    PRACTICAL
    PRACTICAL
    PRACTICAL
    PRACTICAL
    DEMYSTIFICATION OF
    DEMYSTIFICATION OF
    DEMYSTIFICATION OF
    DEMYSTIFICATION OF
    DEMYSTIFICATION OF
    DEMYSTIFICATION OF
    DEMYSTIFICATION OF
    DEMYSTIFICATION OF
    DEMYSTIFICATION OF
    DEMYSTIFICATION OF
    DEMYSTIFICATION OF
    DEMYSTIFICATION OF
    DEMYSTIFICATION OF
    DEMYSTIFICATION OF
    DEMYSTIFICATION OF
    DEMYSTIFICATION OF
    CRDTS
    CRDTS
    CRDTS
    CRDTS
    CRDTS
    CRDTS
    AMSTERDAM.SCALA MEETUP, 27TH AUGUST 2015
    Nami Nasserazad < > , Didier Liauw ,
    Dmitry Ivanov < >
    @nami4552
    @idajantis

    View Slide

  2. DISCLAIMER & WARNING
    We are neither distributed systems experts,
    nor hardcore academia guys.
    There is no Scala-only specific stuff in the talk.

    View Slide

  3. WHO ARE WE?
    Full stack developers*
    Server
    Mobile / SDKs for
    different platforms
    Infrastructure / AWS
    * - sorry for a buzz-word :)

    View Slide

  4. WHAT IS NAVCLOUD?
    A cloud based storage service to allow users to
    seamlessly synchronize trip information
    (destination, favorite locations, community
    points of interest, routes etc.) between devices
    as well as MyDrive website.
    NavCloud aims to be scalable and reactive
    while ensuring privacy and security.

    View Slide

  5. DEVELOPMENT STACK
    Scala
    Akka
    Spray
    RabbitMQ
    Riak
    AWS

    View Slide

  6. SDKS
    Java stateless SDK
    Encryption/Decryption
    Android and iOS statefull SDK
    Seamlessly working in offline mode
    Re-establishing push notification channel if
    connection drops
    Refreshing session upon token expiration
    Resumable and bandwidth optimized
    download/upload for large contents

    View Slide

  7. CHARACTERISTICS
    Devices are not always available
    Edit/View should work in offline mode: No
    Strong Consistency
    Data should be converged to a correct
    eventual state
    Order is not guaranteed
    Bandwidth is limited: Only changes should be
    transmitted

    View Slide

  8. WHAT IS CRDT?
    DT
    Data Type
    “ Bad programmers worry about the
    code.
    Good programmers worry about
    data structures and their
    relationships ”

    View Slide

  9. WHAT IS CRDT?
    R
    Replicated
    CRDT is a family of data structures which has
    been designed to be distributed

    View Slide

  10. WHAT IS CRDT?
    C
    Conflict Free
    Resolving conflicts is done automatically

    View Slide

  11. WHAT DOES IT BRING IN PRACTICE?
    local updates without needing remote
    synchronization
    local merge upon receiving data from other
    nodes
    guaranteeing that all local merges converge

    View Slide

  12. HOW?
    MERGE
    MERGE
    MERGE
    MERGE
    MERGE
    MERGE

    View Slide

  13. WHAT IS MERGE?
    Binary operation on two CRDTs
    Commutative: x • y = y • x
    Associative: ( x • y ) • z = x • ( y • z )
    Idempotent: x • x = x

    View Slide

  14. HOW DOES IT HELP?
    In Distributed Systems:
    Order is not guaranteed: No problem:
    Merge is Commutative and Associative
    Events can be delivered more than one
    time: No problem: Merge is Idempotent

    View Slide

  15. EXAMPLE
    G-COUNTER

    View Slide

  16. G-COUNTER
    Each node has a counter
    Each node should only increase its own
    counter
    G-Counter Data Type: An array of counters
    where each element belongs to a node

    View Slide

  17. G-COUNTER
    Machine A: A:6 B:0 C:0
    Machine B: A:0 B:3 C:0
    Machine C: A:0 B:0 C:9
    Merge: Max on corresponding elements: A:6 B:3
    C:9
    Total value: Sum of all elements: 6 + 3 + 9 = 18

    View Slide

  18. MAX FUNCTION
    Binary operation on two CRDTs
    Commutative: x max y = y max x
    Associative: ( x max y ) max z = x max ( y max z )
    Idempotent: x max x = x

    View Slide

  19. EXAMPLE
    G-SET

    View Slide

  20. G-SET
    Each node has a set
    Each node should add element to its own set
    G-Set Data Type: An array of sets where each
    set belongs to a node

    View Slide

  21. G-SET
    Machine A: A:{x, y} B:{} C:{}
    Machine B: A:{} B:{z} C:{}
    Machine C: A:{} B:{} C:{a, b, c}
    Merge: Union on corresponding sets: A:{x, y} B:
    {z} C:{a, b, c}
    Total value: Union of all sets: {x, y, z, a, b, c}

    View Slide

  22. UNION FUNCTION
    Binary operation on two CRDTs
    Commutative: x ∪ y = y ∪ x
    Associative: ( x ∪ y ) ∪ z = x ∪ ( y ∪ z )
    Idempotent: x ∪ x = x

    View Slide

  23. HOW DID CRDT HELP IN NAVCLOUD?

    View Slide

  24. SYNCHRONIZING FAVORITES
    SET OF FAVORITES
    Name
    Latitude/Longitude
    ...

    View Slide

  25. SYNCHRONIZING FAVORITES
    USE CASES

    View Slide

  26. SYNCHRONIZING FAVORITES
    USE CASES
    Users can add, delete or modify
    Replica's are spread over multiple devices:
    Client devices might not be connected (yet)
    Modifications have to be done without
    synchronization with remote replicas

    View Slide

  27. SYNCHRONIZING FAVORITES
    NAIVE APPROACH AND PROBLEMS
    Whenever clients make connections to the
    server local state is sent to the server
    Synchronization is done on the server by
    using a Last Write Wins strategy
    This can result in inconsistencies, due to:
    Time when updates are sent to the server
    differs from the update time
    Network latency
    Unreliable clocks on the server

    View Slide

  28. SYNCHRONIZING FAVORITES
    CRDTS MATCH VERY WELL WITH OUR SITUATION: CLIENTS
    Synchronization can be done locally:
    Changes are made instantly
    No connection is needed for proper
    synchronization
    Synchronization is decentralized
    Order is not important
    Latency, failed requests etc. have no affect
    Implementation becomes easier, because it
    is contained within a single component

    View Slide

  29. SYNCHRONIZING FAVORITES
    CRDTS MATCH VERY WELL WITH OUR SITUATION: SERVER
    We use CRDTs everywhere, from the clients
    to even the individual database nodes
    We use Riak which supports CRDTs by
    storing siblings and allows users to resolve
    them themselves
    Which means that our database is fully
    partition tolerant

    View Slide

  30. CRDT
    WHICH ONE: 2P-SET
    2 Sets:
    Add Set
    Remove Set: Also known as tombstone set
    Merge: Take the union of the add-sets and
    remove-sets
    Lookup: Contains an element if it is in the
    add-set and not in the remove-set
    Doesn't work for us:
    1. Once removed you cannot add again
    2. Mutating values (updates) is not possible

    View Slide

  31. CRDT
    WHICH ONE: 2P-SET
    A B
    Add-Set {"cat", "dog" } {"cat", "ape"}
    Remove-Set {"cat"} {}
    Merge Add-Set {"cat", "dog", "ape"}
    Remove-Set {"cat"}
    Lookup {"dog", "ape"}

    View Slide

  32. CRDT
    WHICH ONE: LWW-ELEMENT-SET
    Attaches a timestamp to each element
    You can add again by adding the element with
    a higher timestamp than the one in the
    remove-set
    Merge: Take the union of the add-sets and
    remove-sets
    Lookup: Contains the element if it is in add-
    set and not in remove-set with a higher
    timestamp
    Still doesn't work for us, because mutating is

    View Slide

  33. CRDT
    WHICH ONE: LWW-ELEMENT-SET
    A B
    Add-Set {(1,"cat"),
    (1,"dog")}
    {(5,"cat"),
    (1,"ape")}
    Remove-
    Set
    {(3,"cat")} {(1,"cat")
    Merge Add-Set {(1,"cat"), (5,"cat"), (1,"dog"),
    (1,"ape")}
    Remove-
    Set
    {(1,"cat"),(3,"cat")}

    View Slide

  34. CRDT
    WHICH ONE: OR-SET
    You can add again
    Store elements with a unique identifier
    Deleting an element adds it to the remove-set for
    all the (element,id) in the add-set
    Merge: Take the union of the add-sets and
    remove-sets
    Lookup: Contains the element if there is an
    element in the add-set with an identifier that is
    not in the remove-set
    Doesn't work for us, because it doesn't support
    updates

    View Slide

  35. CRDT
    WHICH ONE: OR-SET
    A B
    Add-Set {(#a,"cat"),
    (#b,"dog")}
    {(#c,"cat"),
    (#d,"ape")}
    Remove-
    Set
    {(#a,"cat")} {(#a,"cat")
    Merge Add-Set {(#a,"cat"), (#c,"cat"),
    (#b,"dog"), (#d,"ape")}
    Remove-Set {(#a,"cat")}
    Lookup {"cat", "dog", "ape"}

    View Slide

  36. CRDT
    OUR-SET
    Combination of all the sets
    Store elements with a unique identifier
    This identifier is actually used to identify an
    element
    Element can be changed if identifier
    remains the same
    Updates are possible!
    Store elements with a timestamp
    Is updated on any change

    View Slide

  37. CRDT
    OUR-SET
    Single Set
    No add-set and removed sets
    Replaced by a removed flag
    Set can only contain one element with a
    particular id
    Element with highest timestamp wins
    Merge: Take union of the two sets and for every
    element with the same identifier take only the
    highest one
    Lookup: Contains the element if there is an
    element with the same id and the removed flag is

    View Slide

  38. CRDT
    OUR-SET
    A B
    Set {(#a,1,"cat",removed),
    (#b,2,"dog",removed)}
    {(#a,5,"tiger"),
    (#c,1,"ape"),
    (#b,1,"dog"}
    Merge Set {(#a,5,"tiger"),
    (#b,2,"dog",removed), (#c,1,"ape")}
    Lookup {"tiger", "ape"}

    View Slide

  39. IMPLEMENTATION
    CRDT MODEL: FAVORITE
    ID
    to uniquely identify a favorite
    Timestamp
    to indicate when the last change was made
    Removed Flag
    to indicate that the favorite has been
    removed
    Name
    Latitude/Longitude
    ...

    View Slide

  40. IMPLEMENTATION
    METHODS
    Add a compare function, which compares all
    the fields in order of priority:
    Timestamp
    Removed flag
    ...
    Add an equals and hash function

    View Slide

  41. IMPLEMENTATION
    USING THE CRDT
    Use the same algorithm everywhere
    As simple as calling the merge function

    View Slide

  42. IMPLEMENTATION
    USING THE CRDT
    //Synchronize doesn't return anything because client is
    already synced
    //This is purely for the server and database
    def synchronize(fromClient: CRDTSet, database: CRDTcomp
    onent): Unit = {
    val changedSet = fromClient
    val currentSet = database.crdtset
    val newSet = currentSet.merge(changedSet)
    database.push(newSet) //This is fire and forget
    }

    View Slide

  43. CONSIDERATIONS & LIMITATIONS

    View Slide

  44. "WHAT ABOUT GARBAGE?"
    CRDTs tend to grow because of tombstones
    Deleted Favorite in the Set == Tombstone
    A potentially unbounded growth.
    Case: MyDrive user with ~3000 deleted
    favorites and 5 non-deleted ones. -> 1Mb
    Favorites.json

    View Slide

  45. "WHAT ABOUT GARBAGE?"
    Solution #1: Prune deleted favorites
    But when?
    Requirement: all nodes holding a Favorites set should
    have seen a deleted element before it can be pruned.
    Otherwise deleted elements can be resurrected.

    View Slide

  46. "WHAT ABOUT GARBAGE?"
    Client-awareness: capturing a timestamp of the last sync
    between a client and the service.
    if (clients.forAll(_.lastSyncTimestamp > deletedFavorite.lastUpdat
    edTimestamp)) {
    favorites.drop(deletedFavorite)
    }

    View Slide

  47. "WHAT ABOUT GARBAGE?"
    Solution #2: Sending only diff upon any update.
    Client has a set of [A', B', C']; Server has a set of [A'', B''',
    C'].
    Client modifies and sends [B'']
    Before: responding with a full merged set [A'', B''', C'].
    We introduced a scoped diff:
    Now: responding with a diff set [B'''] as B'' update from
    the client has lost to B''' on the server.

    View Slide

  48. TROUBLE WITH TIME
    There is no such thing as reliable time.
    (c) Jonas Bonér, "Life Beyond the Illusion of Present"
    Important: Causality and events Ordering.
    “ Tracking time is actually tracking
    causality. ”

    View Slide

  49. TROUBLE WITH TIME
    A time that is just good enough.
    Ordering updates between different nodes:
    If GPS clock is available -> use it (PND case).
    Prefer the server time to a client local time.
    WARN: conflicts may happen if two or more devices are
    modifying the same Favorite element concurrently.

    View Slide

  50. TROUBLE WITH TIME
    Ordering updates within a node boundary:
    Timestamp field as a logical clock.
    Timestamp should always grow monotonically.
    "+1 Strategy"
    def getFavoriteTimestamp(favorite: Favorite): Long = {
    Math.max(client.retrieveServerTimestamp(), favorite.lastModified +
    1)
    }

    View Slide

  51. ONE 'MERGE' TO RULE THEM ALL
    Client and server should behave the same way
    when merging Favorites CRDT states.
    ==
    When given the same input,
    their merge functions should emit the same
    results.
    WARN: divergence can lead to endless
    synchronisation loops!

    View Slide

  52. ONE 'MERGE' TO RULE THEM ALL
    Sharing common CRDT-related code (Classes,
    merge/diff/equals/compare logic) FTW.
    Case #1: Scala.JS client with Web Server in Scala.
    Richard Dallaway
    Case #2: Using a TCK library to verify client
    compatibility.
    "Towards Browser and Server Utopia with Scala.JS"

    View Slide

  53. RIAK & CRDTS
    Data Agnostic vs Data Awareness
    Counters
    Sets
    Maps
    Flags *
    Registers *
    * - Embedded within a Map only

    View Slide

  54. RIAK & CRDTS
    Pros
    Simplicity. No 'Read -> Merge -> Write' code
    is needed on the server.
    Composability: Most of the data can be
    modelled by combining supported primitive
    types.
    Proven and tested*.
    * - Basho is serious about testing their stuff:
    "Distributed data structures with Coq", Christopher Meiklejohn.

    View Slide

  55. RIAK & CRDTS
    Cons
    No fine-grained merge: lack of merge
    strategy control on the server.
    Clients complexity: clients have to carry a
    Data Type context (a-la 'causal context' with
    Vector Clocks).
    Riak 2.0+ only. *
    * - For those who is still on Riak 1.4. ^_^

    View Slide

  56. CONCLUSIONS
    Academia sometimes is not as scary as it seems to
    pragmatic devs.
    Look for the best & simplest solutions.
    Understand your solution limitations.
    Analyse and monitor real usage.
    Always search how to tune & improve your solutions

    View Slide

  57. USEFUL REFERENCES
    , - Noel Welsh, 2013.
    ,
    Christopher Meiklejohn
    , - Marc Shapiro, Nuno Preguiça, Carlos Baquero,
    Marek Zawirski, 2011
    , - Meiklejohn & Van Roy, 2015
    CRDTs for fun and eventual profit
    Readings in conflict-free replicated data types
    A comprehensive study of Convergent and
    Commutative Replicated Data Types
    Lasp: A language for distributed, coordination-free
    programming

    View Slide

  58. Image credit: Dex Media

    View Slide

  59. WE ARE HIRING!
    Interested in hacking on this stuff?
    http://www.tomtom.jobs/

    View Slide