$30 off During Our Annual Pro Sale. View Details »

CRDTs: The Hard Parts

CRDTs: The Hard Parts

Slides from a talk given at the Hydra distributed computing conference on 6 July 2020.
https://martin.kleppmann.com/2020/07/06/crdt-hard-parts-hydra.html
https://hydraconf.com/2020/msk/talks/3mkcfa5h151ekfvfqau4qk/

Abstract:

Conflict-free Replicated Data Types (CRDTs) are an increasingly popular family of algorithms for optimistic replication. They allow data to be concurrently updated on several replicas, even while those replicas are offline, and provide a robust way of merging those updates back into a consistent state. CRDTs are used in geo-replicated databases, multi-user collaboration software, distributed processing frameworks, and various other systems.

However, while the basic principles of CRDTs are now quite well known, many challenging problems are lurking below the surface. It turns out that CRDTs are easy to implement badly. Many published algorithms have anomalies that cause them to behave strangely in some situations. Simple implementations often have terrible performance, and making the performance good is challenging.

In this talk Martin goes beyond the introductory material on CRDTs, and discusses some of the hard-won lessons from years of research on making CRDTs work in practice.

Bio:

Dr Martin Kleppmann is a researcher in distributed systems at the University of Cambridge, and author of the acclaimed "Designing Data-Intensive Applications" (O'Reilly Media, 2017). He mainly works on collaboration software, CRDTs, and formal verification of distributed algorithms. Previously he was a software engineer and entrepreneur at Internet companies including LinkedIn and Rapportive, where he worked on large-scale data infrastructure.

Martin Kleppmann

July 06, 2020
Tweet

More Decks by Martin Kleppmann

Other Decks in Research

Transcript

  1. View Slide

  2. View Slide

  3. View Slide

  4. View Slide

  5. View Slide

  6. View Slide

  7. View Slide

  8. View Slide

  9. View Slide

  10. View Slide

  11. View Slide

  12. View Slide

  13. View Slide

  14. View Slide

  15. View Slide

  16. View Slide

  17. View Slide

  18. View Slide

  19. View Slide

  20. View Slide

  21. View Slide

  22. View Slide

  23. View Slide

  24. View Slide

  25. View Slide

  26. View Slide

  27. View Slide

  28. View Slide

  29. View Slide

  30. View Slide

  31. View Slide

  32. View Slide

  33. View Slide

  34. List (sequence, array) CRDTs: WOOT, Treedoc, Logoot,
    RGA, Causal Trees, LSEQ, …

    View Slide

  35. List (sequence, array) CRDTs: WOOT, Treedoc, Logoot,
    RGA, Causal Trees, LSEQ, …
    Emulating “move” as “delete-and-reinsert”:
    concurrent moves of the same item
    à duplication

    View Slide

  36. Concurrent move of the same item to different
    positions — what should happen?

    View Slide

  37. Converge to one of the destination positions
    (pick one arbitrarily but deterministically)

    View Slide

  38. “pick one arbitrarily”
    = last-writer wins register!
    posphone joe
    := “head of the list”
    merge
    posphone joe
    == “head of the list”
    posphone joe
    := “after buy milk”

    View Slide

  39. List CRDT with move operation
    posphone joe
    := “after buy milk”
    need one register per list item
    state = AWSet({ (v1
    , LWWRegister(p1
    )),
    (v2
    , LWWRegister(p2
    )),
    … })
    need a stable way of referencing list positions
    Treedoc: path through binary tree
    Logoot: list of (integer, replicaID) pairs
    RGA: s4vector
    Causal Trees: logical timestamp
    etc…
    Composition of any list CRDT + AWSet +
    LWWRegister = another CRDT

    View Slide

  40. Moving ranges of elements

    View Slide

  41. Moving ranges of elements

    View Slide

  42. Desired outcome

    View Slide

  43. Actual outcome
    Fixing this: an open problem!

    View Slide

  44. View Slide

  45. View Slide

  46. Concurrent moves of same node

    View Slide

  47. Concurrent moves of same node

    View Slide

  48. View Slide

  49. View Slide

  50. Moving A into B, and B into A

    View Slide

  51. Moving A into B, and B into A

    View Slide

  52. View Slide

  53. View Slide

  54. View Slide

  55. View Slide

  56. View Slide

  57. View Slide

  58. View Slide

  59. View Slide

  60. View Slide

  61. View Slide

  62. View Slide

  63. View Slide

  64. View Slide

  65. View Slide

  66. View Slide

  67. File size File size (gzipped)
    Full document history,
    JSON format
    146,406,415 bytes 6,132,895 bytes
    Full document history,
    custom binary format
    695,298 bytes 302,067 bytes
    Document history with
    cursor movements omitted
    570,992 bytes 214,889 bytes
    CRDT document with
    editing history omitted
    228,153 bytes 114,821 bytes
    CRDT document with
    tombstones removed
    154,418 bytes 63,249 bytes
    Baseline: plain text with
    no CRDT metadata
    104,852 bytes 27,569 bytes
    48%
    48%
    150%
    22%
    211 x
    Benchmark data: keystroke-by-keystroke editing trace of a text file (LaTeX source of
    a research paper) containing 182,315 single-character insertions, 77,463 single-
    character deletions, and 102,049 cursor movements.
    Compressing CRDT metadata in Automerge

    View Slide

  68. View Slide

  69. View Slide

  70. View Slide

  71. View Slide

  72. View Slide

  73. View Slide

  74. View Slide

  75. Text editing CRDTs:
    Logoot: Stéphane Weiss, Pascal Urso, and Pascal Molli: “Logoot: A Scalable
    Optimistic Replication Algorithm for Collaborative Editing on P2P
    Networks,” ICDCS 2009.
    LSEQ: Brice Nédelec, Pascal Molli, Achour Mostefaoui, and Emmanuel
    Desmontils: “LSEQ: an Adaptive Structure for Sequences in Distributed
    Collaborative Editing,” DocEng 2013.
    RGA: Hyun-Gul Roh, Myeongjae Jeon, Jin-Soo Kim, and Joonwon Lee:
    “Replicated abstract data types: Building blocks for collaborative
    applications,” Journal of Parallel and Distributed Computing, 71(3):354–368,
    2011.
    Treedoc: Nuno Preguiça, Joan Manuel Marques, Marc Shapiro, and Mihai Letia: “A
    Commutative Replicated Data Type for Cooperative Editing,” ICDCS 2009.
    WOOT: Gérald Oster, Pascal Urso, Pascal Molli, and Abdessamad Imine: “Data
    consistency for P2P collaborative editing,” CSCW 2006.
    Astrong
    : Hagit Attiya, Sebastian Burckhardt, Alexey Gotsman, Adam Morrison,
    Hongseok Yang, and Marek Zawirski: “Specification and Complexity of
    Collaborative Text Editing,” PODC 2016.

    View Slide

  76. More details in these related publications:
    Interleaving anomaly: Martin Kleppmann, Victor B. F. Gomes, Dominic P. Mulligan,
    and Alastair R. Beresford: “Interleaving anomalies in collaborative text editors”.
    PaPoC 2019.
    Proof of no interleaving in RGA: Martin Kleppmann, Victor B F Gomes, Dominic P
    Mulligan, and Alastair R Beresford: “OpSets: Sequential Specifications for
    Replicated Datatypes,” https://arxiv.org/abs/1805.04263, May 2018.
    Moving list items: Martin Kleppmann: “Moving Elements in List CRDTs”. PaPoC 2020.
    Move operation in CRDT trees: Martin Kleppmann, Dominic P. Mulligan, Victor B. F.
    Gomes, and Alastair R. Beresford: “A highly-available move operation for
    replicated trees and distributed filesystems”. Preprint,
    https://martin.kleppmann.com/papers/move-op.pdf
    Reducing metadata overhead: Martin Kleppmann: “Experiment: columnar data
    encoding for Automerge”, 2019. https://github.com/automerge/automerge-
    perf/blob/master/columnar/README.md
    Local-first software: Martin Kleppmann, Adam Wiggins, Peter van Hardenberg, and
    Mark McGranaghan: “Local-first software: You own your data, in spite of the
    cloud”. Onward! 2019. https://www.inkandswitch.com/local-first.html

    View Slide

  77. Thanks!
    • Martin’s email: [email protected]
    • Martin on Twitter: https://twitter.com/martinkl
    • Martin’s book: https://dataintensive.net/
    • CRDT resources: https://crdt.tech/
    • Automerge: https://github.com/automerge/automerge
    Thank you to these organisations for supporting this work!

    View Slide