Practical Demystification of CRDTs

Practical Demystification of CRDTs

This talk was presented at Amsterdam.Scala meetup (http://www.meetup.com/amsterdam-scala/events/224469203/) on 27th August 2015 by Nami Nasserazad (https://twitter.com/namiazad), Didier Liauw and Dmitry Ivanov (https://twitter.com/idajantis)

In a connected world, synchronising mutable information between different devices with different clock precision can be a difficult problem. A piece of data may have many out-of-sync replicas but all of those should eventually be in a consistent state. For example, TomTom (https://tomtom.com) users, having personal navigation devices, smartphones, MyDrive (https://mydrive.tomtom.com/en_gb/) website accounts, expect their navigation information be synchronised properly even in the occasional absence of network connection.
Conflict-free Replicated Data Types (CRDTs) provide robust data structures to achieve proper synchronisation in an unreliable network of devices. They enable the conflict resolution being done locally at the data type level while guaranteeing the eventual consistency between replicas.
In this talk, in addition to an introduction to CRDT, our main focus is on a special type of CRDT-set called OUR-set (Observed, Updated, Removed) which we created to extend known CRDT-sets with update functionality. We will explain the advantages of this data structure to solve many synchronisation problems as well as its limitations. We also show how a basic implementation of OUR-set CRDT in Scala and its counterpart in Java looks like and enumerate a set of subtle considerations which should be taken into account.

60916144983b100cfd846a31016f1e90?s=128

Dmitry Ivanov

August 27, 2015
Tweet

Transcript

  1. PRACTICAL PRACTICAL PRACTICAL PRACTICAL PRACTICAL PRACTICAL PRACTICAL PRACTICAL PRACTICAL PRACTICAL

    PRACTICAL PRACTICAL PRACTICAL PRACTICAL PRACTICAL PRACTICAL DEMYSTIFICATION OF DEMYSTIFICATION OF DEMYSTIFICATION OF DEMYSTIFICATION OF DEMYSTIFICATION OF DEMYSTIFICATION OF DEMYSTIFICATION OF DEMYSTIFICATION OF DEMYSTIFICATION OF DEMYSTIFICATION OF DEMYSTIFICATION OF DEMYSTIFICATION OF DEMYSTIFICATION OF DEMYSTIFICATION OF DEMYSTIFICATION OF DEMYSTIFICATION OF CRDTS CRDTS CRDTS CRDTS CRDTS CRDTS AMSTERDAM.SCALA MEETUP, 27TH AUGUST 2015 Nami Nasserazad < > , Didier Liauw <didier.liauw@tomtom.com> , Dmitry Ivanov < > @nami4552 @idajantis
  2. DISCLAIMER & WARNING We are neither distributed systems experts, nor

    hardcore academia guys. There is no Scala-only specific stuff in the talk.
  3. WHO ARE WE? Full stack developers* Server Mobile / SDKs

    for different platforms Infrastructure / AWS * - sorry for a buzz-word :)
  4. WHAT IS NAVCLOUD? A cloud based storage service to allow

    users to seamlessly synchronize trip information (destination, favorite locations, community points of interest, routes etc.) between devices as well as MyDrive website. NavCloud aims to be scalable and reactive while ensuring privacy and security.
  5. DEVELOPMENT STACK Scala Akka Spray RabbitMQ Riak AWS

  6. SDKS Java stateless SDK Encryption/Decryption Android and iOS statefull SDK

    Seamlessly working in offline mode Re-establishing push notification channel if connection drops Refreshing session upon token expiration Resumable and bandwidth optimized download/upload for large contents
  7. CHARACTERISTICS Devices are not always available Edit/View should work in

    offline mode: No Strong Consistency Data should be converged to a correct eventual state Order is not guaranteed Bandwidth is limited: Only changes should be transmitted
  8. WHAT IS CRDT? DT Data Type “ Bad programmers worry

    about the code. Good programmers worry about data structures and their relationships ”
  9. WHAT IS CRDT? R Replicated CRDT is a family of

    data structures which has been designed to be distributed
  10. WHAT IS CRDT? C Conflict Free Resolving conflicts is done

    automatically
  11. WHAT DOES IT BRING IN PRACTICE? local updates without needing

    remote synchronization local merge upon receiving data from other nodes guaranteeing that all local merges converge
  12. HOW? MERGE MERGE MERGE MERGE MERGE MERGE

  13. WHAT IS MERGE? Binary operation on two CRDTs Commutative: x

    • y = y • x Associative: ( x • y ) • z = x • ( y • z ) Idempotent: x • x = x
  14. HOW DOES IT HELP? In Distributed Systems: Order is not

    guaranteed: No problem: Merge is Commutative and Associative Events can be delivered more than one time: No problem: Merge is Idempotent
  15. EXAMPLE G-COUNTER

  16. G-COUNTER Each node has a counter Each node should only

    increase its own counter G-Counter Data Type: An array of counters where each element belongs to a node
  17. G-COUNTER Machine A: A:6 B:0 C:0 Machine B: A:0 B:3

    C:0 Machine C: A:0 B:0 C:9 Merge: Max on corresponding elements: A:6 B:3 C:9 Total value: Sum of all elements: 6 + 3 + 9 = 18
  18. MAX FUNCTION Binary operation on two CRDTs Commutative: x max

    y = y max x Associative: ( x max y ) max z = x max ( y max z ) Idempotent: x max x = x
  19. EXAMPLE G-SET

  20. G-SET Each node has a set Each node should add

    element to its own set G-Set Data Type: An array of sets where each set belongs to a node
  21. G-SET Machine A: A:{x, y} B:{} C:{} Machine B: A:{}

    B:{z} C:{} Machine C: A:{} B:{} C:{a, b, c} Merge: Union on corresponding sets: A:{x, y} B: {z} C:{a, b, c} Total value: Union of all sets: {x, y, z, a, b, c}
  22. UNION FUNCTION Binary operation on two CRDTs Commutative: x ∪

    y = y ∪ x Associative: ( x ∪ y ) ∪ z = x ∪ ( y ∪ z ) Idempotent: x ∪ x = x
  23. HOW DID CRDT HELP IN NAVCLOUD?

  24. SYNCHRONIZING FAVORITES SET OF FAVORITES Name Latitude/Longitude ...

  25. SYNCHRONIZING FAVORITES USE CASES

  26. SYNCHRONIZING FAVORITES USE CASES Users can add, delete or modify

    Replica's are spread over multiple devices: Client devices might not be connected (yet) Modifications have to be done without synchronization with remote replicas
  27. SYNCHRONIZING FAVORITES NAIVE APPROACH AND PROBLEMS Whenever clients make connections

    to the server local state is sent to the server Synchronization is done on the server by using a Last Write Wins strategy This can result in inconsistencies, due to: Time when updates are sent to the server differs from the update time Network latency Unreliable clocks on the server
  28. SYNCHRONIZING FAVORITES CRDTS MATCH VERY WELL WITH OUR SITUATION: CLIENTS

    Synchronization can be done locally: Changes are made instantly No connection is needed for proper synchronization Synchronization is decentralized Order is not important Latency, failed requests etc. have no affect Implementation becomes easier, because it is contained within a single component
  29. SYNCHRONIZING FAVORITES CRDTS MATCH VERY WELL WITH OUR SITUATION: SERVER

    We use CRDTs everywhere, from the clients to even the individual database nodes We use Riak which supports CRDTs by storing siblings and allows users to resolve them themselves Which means that our database is fully partition tolerant
  30. CRDT WHICH ONE: 2P-SET 2 Sets: Add Set Remove Set:

    Also known as tombstone set Merge: Take the union of the add-sets and remove-sets Lookup: Contains an element if it is in the add-set and not in the remove-set Doesn't work for us: 1. Once removed you cannot add again 2. Mutating values (updates) is not possible
  31. CRDT WHICH ONE: 2P-SET A B Add-Set {"cat", "dog" }

    {"cat", "ape"} Remove-Set {"cat"} {} Merge Add-Set {"cat", "dog", "ape"} Remove-Set {"cat"} Lookup {"dog", "ape"}
  32. CRDT WHICH ONE: LWW-ELEMENT-SET Attaches a timestamp to each element

    You can add again by adding the element with a higher timestamp than the one in the remove-set Merge: Take the union of the add-sets and remove-sets Lookup: Contains the element if it is in add- set and not in remove-set with a higher timestamp Still doesn't work for us, because mutating is
  33. CRDT WHICH ONE: LWW-ELEMENT-SET A B Add-Set {(1,"cat"), (1,"dog")} {(5,"cat"),

    (1,"ape")} Remove- Set {(3,"cat")} {(1,"cat") Merge Add-Set {(1,"cat"), (5,"cat"), (1,"dog"), (1,"ape")} Remove- Set {(1,"cat"),(3,"cat")}
  34. CRDT WHICH ONE: OR-SET You can add again Store elements

    with a unique identifier Deleting an element adds it to the remove-set for all the (element,id) in the add-set Merge: Take the union of the add-sets and remove-sets Lookup: Contains the element if there is an element in the add-set with an identifier that is not in the remove-set Doesn't work for us, because it doesn't support updates
  35. CRDT WHICH ONE: OR-SET A B Add-Set {(#a,"cat"), (#b,"dog")} {(#c,"cat"),

    (#d,"ape")} Remove- Set {(#a,"cat")} {(#a,"cat") Merge Add-Set {(#a,"cat"), (#c,"cat"), (#b,"dog"), (#d,"ape")} Remove-Set {(#a,"cat")} Lookup {"cat", "dog", "ape"}
  36. CRDT OUR-SET Combination of all the sets Store elements with

    a unique identifier This identifier is actually used to identify an element Element can be changed if identifier remains the same Updates are possible! Store elements with a timestamp Is updated on any change
  37. CRDT OUR-SET Single Set No add-set and removed sets Replaced

    by a removed flag Set can only contain one element with a particular id Element with highest timestamp wins Merge: Take union of the two sets and for every element with the same identifier take only the highest one Lookup: Contains the element if there is an element with the same id and the removed flag is
  38. CRDT OUR-SET A B Set {(#a,1,"cat",removed), (#b,2,"dog",removed)} {(#a,5,"tiger"), (#c,1,"ape"), (#b,1,"dog"}

    Merge Set {(#a,5,"tiger"), (#b,2,"dog",removed), (#c,1,"ape")} Lookup {"tiger", "ape"}
  39. IMPLEMENTATION CRDT MODEL: FAVORITE ID to uniquely identify a favorite

    Timestamp to indicate when the last change was made Removed Flag to indicate that the favorite has been removed Name Latitude/Longitude ...
  40. IMPLEMENTATION METHODS Add a compare function, which compares all the

    fields in order of priority: Timestamp Removed flag ... Add an equals and hash function
  41. IMPLEMENTATION USING THE CRDT Use the same algorithm everywhere As

    simple as calling the merge function
  42. IMPLEMENTATION USING THE CRDT //Synchronize doesn't return anything because client

    is already synced //This is purely for the server and database def synchronize(fromClient: CRDTSet, database: CRDTcomp onent): Unit = { val changedSet = fromClient val currentSet = database.crdtset val newSet = currentSet.merge(changedSet) database.push(newSet) //This is fire and forget }
  43. CONSIDERATIONS & LIMITATIONS

  44. "WHAT ABOUT GARBAGE?" CRDTs tend to grow because of tombstones

    Deleted Favorite in the Set == Tombstone A potentially unbounded growth. Case: MyDrive user with ~3000 deleted favorites and 5 non-deleted ones. -> 1Mb Favorites.json
  45. "WHAT ABOUT GARBAGE?" Solution #1: Prune deleted favorites But when?

    Requirement: all nodes holding a Favorites set should have seen a deleted element before it can be pruned. Otherwise deleted elements can be resurrected.
  46. "WHAT ABOUT GARBAGE?" Client-awareness: capturing a timestamp of the last

    sync between a client and the service. if (clients.forAll(_.lastSyncTimestamp > deletedFavorite.lastUpdat edTimestamp)) { favorites.drop(deletedFavorite) }
  47. "WHAT ABOUT GARBAGE?" Solution #2: Sending only diff upon any

    update. Client has a set of [A', B', C']; Server has a set of [A'', B''', C']. Client modifies and sends [B''] Before: responding with a full merged set [A'', B''', C']. We introduced a scoped diff: Now: responding with a diff set [B'''] as B'' update from the client has lost to B''' on the server.
  48. TROUBLE WITH TIME There is no such thing as reliable

    time. (c) Jonas Bonér, "Life Beyond the Illusion of Present" Important: Causality and events Ordering. “ Tracking time is actually tracking causality. ”
  49. TROUBLE WITH TIME A time that is just good enough.

    Ordering updates between different nodes: If GPS clock is available -> use it (PND case). Prefer the server time to a client local time. WARN: conflicts may happen if two or more devices are modifying the same Favorite element concurrently.
  50. TROUBLE WITH TIME Ordering updates within a node boundary: Timestamp

    field as a logical clock. Timestamp should always grow monotonically. "+1 Strategy" def getFavoriteTimestamp(favorite: Favorite): Long = { Math.max(client.retrieveServerTimestamp(), favorite.lastModified + 1) }
  51. ONE 'MERGE' TO RULE THEM ALL Client and server should

    behave the same way when merging Favorites CRDT states. == When given the same input, their merge functions should emit the same results. WARN: divergence can lead to endless synchronisation loops!
  52. ONE 'MERGE' TO RULE THEM ALL Sharing common CRDT-related code

    (Classes, merge/diff/equals/compare logic) FTW. Case #1: Scala.JS client with Web Server in Scala. Richard Dallaway Case #2: Using a TCK library to verify client compatibility. "Towards Browser and Server Utopia with Scala.JS"
  53. RIAK & CRDTS Data Agnostic vs Data Awareness Counters Sets

    Maps Flags * Registers * * - Embedded within a Map only
  54. RIAK & CRDTS Pros Simplicity. No 'Read -> Merge ->

    Write' code is needed on the server. Composability: Most of the data can be modelled by combining supported primitive types. Proven and tested*. * - Basho is serious about testing their stuff: "Distributed data structures with Coq", Christopher Meiklejohn.
  55. RIAK & CRDTS Cons No fine-grained merge: lack of merge

    strategy control on the server. Clients complexity: clients have to carry a Data Type context (a-la 'causal context' with Vector Clocks). Riak 2.0+ only. * * - For those who is still on Riak 1.4. ^_^
  56. CONCLUSIONS Academia sometimes is not as scary as it seems

    to pragmatic devs. Look for the best & simplest solutions. Understand your solution limitations. Analyse and monitor real usage. Always search how to tune & improve your solutions
  57. USEFUL REFERENCES , - Noel Welsh, 2013. , Christopher Meiklejohn

    , - Marc Shapiro, Nuno Preguiça, Carlos Baquero, Marek Zawirski, 2011 , - Meiklejohn & Van Roy, 2015 CRDTs for fun and eventual profit Readings in conflict-free replicated data types A comprehensive study of Convergent and Commutative Replicated Data Types Lasp: A language for distributed, coordination-free programming
  58. Image credit: Dex Media

  59. WE ARE HIRING! Interested in hacking on this stuff? http://www.tomtom.jobs/