Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Building a collaborative authoring experience w...

Building a collaborative authoring experience with CRDTs

Do you intend to provide a collaborative and distributed authoring experience to your users?

Join our speaker Alexandre Piveteau to learn the fundamentals, key properties, and use cases of CRDTs, data structures that automatically resolve synchronization conflicts.

Discover Markdown Party, Alexandre's peer-to-peer collaborative web text editor, which is highly available, eventually consistent, and enables its users to work concurrently on a tree of documents.

Learn how to adapt and design CRDTs that match your application domain-specific rules.

Avatar for Alexandre Piveteau

Alexandre Piveteau

November 07, 2022
Tweet

Other Decks in Programming

Transcript

  1. 2

  2. 3

  3. REQUIREMENTS Functional • Web client • Unique project-specific URL for

    sharing • Tree of Markdown text documents Non-functional • Edits possible when the network is partitioned • Distributed protocol • Deleting unnecessary meta-data 5
  4. EVENTUAL CONVERGENCE Initial text : Thanks … Local edits •

    Site 1 : Thanks Alice, • Site 2 : Thanks Bob, 10
  5. EVENTUAL CONVERGENCE Initial text : Thanks … Local edits •

    Site 1 : Thanks Alice, • Site 2 : Thanks Bob, 10
  6. EVENTUAL CONVERGENCE Initial text : Thanks … Local edits •

    Site 1 : Thanks Alice, • Site 2 : Thanks Bob, 10
  7. EVENTUAL CONVERGENCE Initial text : Thanks … Local edits •

    Site 1 : Thanks Alice, • Site 2 : Thanks Bob, After merge • Site 1 : Thanks Alice, Bob, • Site 2 : Thanks Alice, Bob, 10
  8. EVENTUAL CONVERGENCE Initial text : Thanks … Local edits •

    Site 1 : Thanks Alice, • Site 2 : Thanks Bob, After merge • Site 1 : Thanks Alice, Bob, • Site 2 : Thanks Bob, Alice, 10
  9. PRESERVING USER INTENT Concurrent edits • Site 1 : Thanks

    Alice, Bob, – del. Alice@[7:13] • Site 2 : Thanks Alice, Bob, – del. Alice@[7:13] Possible results • Expected : “Thanks Bob,” • Unexpected : “Thanks” – range [7:13] removed twice 12
  10. CAP THEOREM1 “Any distributed data store can provide only two

    of the following three guarantees: • Consistency: all participants see the most recent data; • Availability: every request receives a response; • Partition-tolerance: the system continues to operate despite an arbitrary number of messages being lost.” 1See https://en.wikipedia.org/wiki/CAP_theorem 14
  11. OPERATIONAL TRANSFORMATION Operational transformation • Server transforms operations before forwarding

    them • Centralized server =⇒ Single point of failure • Notoriously difficult to implement Figure 4: Centralized OT 15
  12. CONFLICT-FREE REPLICATED DATA TYPES Characterizing CRDTs • Exist as CvRDTs

    and CmRDTs • CvRDTs have a function m(s1,s2) with the 3 following properties : • Associativity – m(m(a,b),c) = m(a,m(b,c)) • Commutativity – m(a,b) = m(b,a) • Idempotence – m(a,a) = a • All states si form a semi-lattice • CRDTs are composable 16
  13. REPLICATED EVENT LOG • Operations are totally ordered • State

    is computed by applying operations • Deltas are stored to revert operations Figure 6: ”Accordion-like” movement to insert operations 19
  14. REPLICATED EVENT LOG Totally causally ordered log Each event has

    a globally unique Lamport (site, seqno) timestamp. • Timestamps define a total order for all events • All events respect causal ordering • The log is just a set (GSet) of timestamped events =⇒ it’s a CvRDT! 20
  15. REPLICATED EVENT LOG Totally causally ordered log Each event has

    a globally unique Lamport (site, seqno) timestamp. • Timestamps define a total order for all events • All events respect causal ordering • The log is just a set (GSet) of timestamped events =⇒ it’s a CvRDT! 20
  16. REPLICATION ALGORITHM Log replication 1. A sends the list of

    all known participants Ii . 2. B sends, for each participant Ii , the next expected sequence number Si and the number Ni ≥ 0 of expected events. 3. While Ni > 0, A sends the events (Ii,Sj ) with Sj > Si and Sj minimal, decrements N and sets Si = Sj . 4. Replication stops when A or B disconnects. 21
  17. TREES – OPERATIONS Type Arguments Create text file – Create

    folder – Move File: file op. ref. To Parent: file op. ref. Set name File: file op. ref. Name: text Delete File: file op. ref. Table 1: Operations supported by replicated trees 23
  18. TREES – CONCURRENT UPDATES Figure 7: Conflicting concurrent Moves Locally

    valid operations may produce graphs that don’t respect tree invariants. 24
  19. TREES – CONCURRENT UPDATES Figure 7: Invalid operations are ignored

    Locally valid operations may produce graphs that don’t respect tree invariants. 24
  20. REPLICATED GROWABLE ARRAY RGAs – operations Type Arguments Insert Prev:

    text op. ref. Char: letter Remove Id: text op. ref. Table 2: Operations on RGA 27
  21. REPLICATED GROWABLE ARRAY RGAs – operations Type Arguments Insert Prev:

    text op. ref. Char: letter Remove Id: text op. ref. Table 2: Operations on RGA Figure 9: RGA structure 27
  22. HYBRID LOGICAL CLOCKS We can improve concurrent updates using HLCs

    : • Specialization of Lamport timestamps • Physical time is used to order concurrent events • Happens-before relationship preserved, as with Lamport timestamps • Implemented at the event log level 29
  23. COMPRESSION USING RUN-LENGTH ENCODING Site Seq. no Letter 1 1

    'a' 1 2 'l' 1 3 'e' 1 4 'x' Table 3: COMPRESSION USING RLE 30
  24. COMPRESSION USING RUN-LENGTH ENCODING Site ∆ seq. no Letter 1

    (+1) 'a' 1 (+1) 'l' 1 (+1) 'e' 1 (+1) 'x' Table 3: COMPRESSION USING RLE 30
  25. COMPRESSION USING RUN-LENGTH ENCODING Site First seq. no Text 1

    1 "alex" Table 3: COMPRESSION USING RLE 30
  26. COMPRESSION USING RLE – DETAILS • Implemented in Automerge2 •

    Not compatible with HLC • Especially relevant for text • Fixed-size operations • Most operations are consecutive 2Martin Kleppmann, github.com/automerge 31
  27. SUMMARY • A replicated log can implement any CRDT •

    Invariants can be enforced by skipping ops • Hybrid logical clocks • Lamport timestamps run-length encoding • Many paths to explore! 34
  28. THANK YOU! Me: Alexandre Piveteau ([email protected]) Live demo: https://markdown.party GitHub:

    https://github.com/markdown-party More about CRDTs: https://crdt.tech 35