Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Certified Mergeable Replicated Data Types

Certified Mergeable Replicated Data Types

Replicated data types (RDTs) are data structures that permit concurrent modification of multiple potentially geo-distributed replicas without coordination between them. RDTs are designed in such a way that conflicting operations are eventually deterministically reconciled ensuring convergence. Constructing correct RDTs remains a difficult endeavour due to the complexity of reasoning about independently evolving states of the replicas. With the focus on the correctness of RDTs (and rightly so), existing approaches to RDTs are less efficient compared to their sequential counterparts in terms of time and space complexity. This is unfortunate since RDTs are often used in an local-first setting where the local operations far outweigh remote communication.

In this paper, we present Peepul, a pragmatic approach to building and verifying efficient RDTs. To make reasoning about correctness easier, we cast RDTs in the mould of distributed version control system, and equip it with a threeway merge function for reconciling conflicting versions. Further, we go beyond just verifying convergence, and provide a methodology to verify arbitrarily complex specifications. We develop a replication-aware simulation relation based technique to relate RDT specifications to their efficient purely functional implementations. We have developed Peepul as an F* library that discharges proof obligations to an SMT solver. The verified efficient RDTs are extracted as OCaml code and used in Irmin, a Git-like distributed database.

C29f097d23f8904532ca088ac23ce801?s=128

KC Sivaramakrishnan

April 27, 2022
Tweet

More Decks by KC Sivaramakrishnan

Other Decks in Science

Transcript

  1. Certi fi ed Mergeable Replicated Data Types “KC” Sivaramakrishnan joint

    work with Vimala Soundarapandian, Adharsh Kamath and Kartik Nagar
  2. None
  3. INTERNET

  4. INTERNET

  5. INTERNET ≠

  6. INTERNET ≠ • Serializability • Linearizability • Weak Consistency &

    Isolation
  7. Even simple data structures attract enormous complexity when made distributed

  8. 4 module Counter : sig type t val read :

    t -> int val add : t -> int -> t val sub : t -> int -> t end = struct type t = int let read x = x let add x d = x + d let sub x d = x - d end Sequential Counter
  9. • Written in idiomatic style • Composable 4 module Counter

    : sig type t val read : t -> int val add : t -> int -> t val sub : t -> int -> t end = struct type t = int let read x = x let add x d = x + d let sub x d = x - d end type counter_list = Counter.t list Sequential Counter
  10. INTERNET 0 0 0 0 Replicated Counter

  11. INTERNET 0 0 0 0 Replicated Counter

  12. INTERNET 0 0 0 0 Replicated Counter

  13. INTERNET 0 0 0 0 Replicated Counter

  14. INTERNET 0 0 0 0 Replicated Counter

  15. INTERNET 0 0 0 Replicated Counter +2 2

  16. INTERNET 0 0 0 Replicated Counter +2 2

  17. INTERNET 0 0 Replicated Counter +2 2 +3 3

  18. INTERNET 0 0 Replicated Counter +2 2 +3 3 •

    Idea: Apply the local operations at all replicas
  19. INTERNET Replicated Counter +2 +3 5 5 5 5 •

    Idea: Apply the local operations at all replicas
  20. 7 module Counter : sig type t val read :

    t -> int val add : t -> int -> t val sub : t -> int -> t val mult : t -> int -> t end = struct type t = int let read x = x let add x d = x + d let sub x d = x - d let mult x n = x * n end
  21. 7 module Counter : sig type t val read :

    t -> int val add : t -> int -> t val sub : t -> int -> t val mult : t -> int -> t end = struct type t = int let read x = x let add x d = x + d let sub x d = x - d let mult x n = x * n end 7
  22. 7 module Counter : sig type t val read :

    t -> int val add : t -> int -> t val sub : t -> int -> t val mult : t -> int -> t end = struct type t = int let read x = x let add x d = x + d let sub x d = x - d let mult x n = x * n end 7 8 +1
  23. 7 module Counter : sig type t val read :

    t -> int val add : t -> int -> t val sub : t -> int -> t val mult : t -> int -> t end = struct type t = int let read x = x let add x d = x + d let sub x d = x - d let mult x n = x * n end 7 8 21 +1 *3
  24. 7 module Counter : sig type t val read :

    t -> int val add : t -> int -> t val sub : t -> int -> t val mult : t -> int -> t end = struct type t = int let read x = x let add x d = x + d let sub x d = x - d let mult x n = x * n end 7 8 21 +1 *3 24 *3
  25. 7 module Counter : sig type t val read :

    t -> int val add : t -> int -> t val sub : t -> int -> t val mult : t -> int -> t end = struct type t = int let read x = x let add x d = x + d let sub x d = x - d let mult x n = x * n end 7 8 21 +1 *3 24 22 *3 +1
  26. 7 module Counter : sig type t val read :

    t -> int val add : t -> int -> t val sub : t -> int -> t val mult : t -> int -> t end = struct type t = int let read x = x let add x d = x + d let sub x d = x - d let mult x n = x * n end 7 8 21 +1 *3 24 22 *3 +1 Diverges
  27. 7 module Counter : sig type t val read :

    t -> int val add : t -> int -> t val sub : t -> int -> t val mult : t -> int -> t end = struct type t = int let read x = x let add x d = x + d let sub x d = x - d let mult x n = x * n end 7 8 21 +1 *3 24 22 *3 +1 Diverges Addition and multiplication do not commute
  28. 8 module Counter : sig type t val read :

    t -> int val add : t -> int -> t val sub : t -> int -> t val mult : t -> int -> t end = struct type t = int let read x = x let add x d = x + d let sub x d = x - d let mult x n = x * n end
  29. 8 module Counter : sig type t val read :

    t -> int val add : t -> int -> t val sub : t -> int -> t val mult : t -> int -> t end = struct type t = int let read x = x let add x d = x + d let sub x d = x - d let mult x n = x * n end • Idea: Capture the effect of multiplication through the commutative addition operation
  30. 8 module Counter : sig type t val read :

    t -> int val add : t -> int -> t val sub : t -> int -> t val mult : t -> int -> t end = struct type t = int let read x = x let add x d = x + d let sub x d = x - d let mult x n = x * n end 7 • Idea: Capture the effect of multiplication through the commutative addition operation
  31. 8 module Counter : sig type t val read :

    t -> int val add : t -> int -> t val sub : t -> int -> t val mult : t -> int -> t end = struct type t = int let read x = x let add x d = x + d let sub x d = x - d let mult x n = x * n end 7 8 +1 • Idea: Capture the effect of multiplication through the commutative addition operation
  32. 8 module Counter : sig type t val read :

    t -> int val add : t -> int -> t val sub : t -> int -> t val mult : t -> int -> t end = struct type t = int let read x = x let add x d = x + d let sub x d = x - d let mult x n = x * n end 7 8 21 +1 *3 • Idea: Capture the effect of multiplication through the commutative addition operation
  33. 8 module Counter : sig type t val read :

    t -> int val add : t -> int -> t val sub : t -> int -> t val mult : t -> int -> t end = struct type t = int let read x = x let add x d = x + d let sub x d = x - d let mult x n = x * n end 7 8 21 +1 *3 22 +14 • Idea: Capture the effect of multiplication through the commutative addition operation
  34. 8 module Counter : sig type t val read :

    t -> int val add : t -> int -> t val sub : t -> int -> t val mult : t -> int -> t end = struct type t = int let read x = x let add x d = x + d let sub x d = x - d let mult x n = x * n end 7 8 21 +1 *3 22 22 +14 +1 • Idea: Capture the effect of multiplication through the commutative addition operation
  35. 8 module Counter : sig type t val read :

    t -> int val add : t -> int -> t val sub : t -> int -> t val mult : t -> int -> t end = struct type t = int let read x = x let add x d = x + d let sub x d = x - d let mult x n = x * n end 7 8 21 +1 *3 22 22 +14 +1 Converges • Idea: Capture the effect of multiplication through the commutative addition operation
  36. 8 module Counter : sig type t val read :

    t -> int val add : t -> int -> t val sub : t -> int -> t val mult : t -> int -> t end = struct type t = int let read x = x let add x d = x + d let sub x d = x - d let mult x n = x * n end 7 8 21 +1 *3 22 22 +14 +1 Converges • Idea: Capture the effect of multiplication through the commutative addition operation • CRDTs
  37. Convergent Replicated Data Types (CRDT) 9

  38. Convergent Replicated Data Types (CRDT) • CRDT is guaranteed to

    ensure strong eventual consistency (SEC) ★ G-counters, PN-counters, OR-Sets, Graphs, Ropes, docs, sheets ★ Simple interface for the clients of CRDTs 9
  39. Convergent Replicated Data Types (CRDT) • CRDT is guaranteed to

    ensure strong eventual consistency (SEC) ★ G-counters, PN-counters, OR-Sets, Graphs, Ropes, docs, sheets ★ Simple interface for the clients of CRDTs • Need to reengineer every datatype to ensure SEC (commutativity) ★ Do not mirror sequential counter parts => implementation & proof burden ★ Do not compose! ✦ counter set is not a composition of counter and set CRDTs 9
  40. Can we program & reason about replicated data types as

    an extension of their sequential counterparts?
  41. Can we program & reason about replicated data types as

    an extension of their sequential counterparts? MRDT
  42. 11 module Counter : sig type t val read :

    t -> int val add : t -> int -> t val sub : t -> int -> t val mult : t -> int -> t val merge : lca:t -> v1:t -> v2:t -> t end = struct type t = int let read x = x let add x d = x + d let sub x d = x - d let mult x n = x * n let merge ~lca ~v1 ~v2 = lca + (v1 - lca) + (v2 - lca) end
  43. 11 module Counter : sig type t val read :

    t -> int val add : t -> int -> t val sub : t -> int -> t val mult : t -> int -> t val merge : lca:t -> v1:t -> v2:t -> t end = struct type t = int let read x = x let add x d = x + d let sub x d = x - d let mult x n = x * n let merge ~lca ~v1 ~v2 = lca + (v1 - lca) + (v2 - lca) end 7
  44. 11 module Counter : sig type t val read :

    t -> int val add : t -> int -> t val sub : t -> int -> t val mult : t -> int -> t val merge : lca:t -> v1:t -> v2:t -> t end = struct type t = int let read x = x let add x d = x + d let sub x d = x - d let mult x n = x * n let merge ~lca ~v1 ~v2 = lca + (v1 - lca) + (v2 - lca) end 7 8 +1
  45. 11 module Counter : sig type t val read :

    t -> int val add : t -> int -> t val sub : t -> int -> t val mult : t -> int -> t val merge : lca:t -> v1:t -> v2:t -> t end = struct type t = int let read x = x let add x d = x + d let sub x d = x - d let mult x n = x * n let merge ~lca ~v1 ~v2 = lca + (v1 - lca) + (v2 - lca) end 7 8 21 +1 *3
  46. 11 module Counter : sig type t val read :

    t -> int val add : t -> int -> t val sub : t -> int -> t val mult : t -> int -> t val merge : lca:t -> v1:t -> v2:t -> t end = struct type t = int let read x = x let add x d = x + d let sub x d = x - d let mult x n = x * n let merge ~lca ~v1 ~v2 = lca + (v1 - lca) + (v2 - lca) end 7 8 21 +1 *3 22
  47. 11 module Counter : sig type t val read :

    t -> int val add : t -> int -> t val sub : t -> int -> t val mult : t -> int -> t val merge : lca:t -> v1:t -> v2:t -> t end = struct type t = int let read x = x let add x d = x + d let sub x d = x - d let mult x n = x * n let merge ~lca ~v1 ~v2 = lca + (v1 - lca) + (v2 - lca) end 7 8 21 +1 *3 22 22
  48. 11 module Counter : sig type t val read :

    t -> int val add : t -> int -> t val sub : t -> int -> t val mult : t -> int -> t val merge : lca:t -> v1:t -> v2:t -> t end = struct type t = int let read x = x let add x d = x + d let sub x d = x - d let mult x n = x * n let merge ~lca ~v1 ~v2 = lca + (v1 - lca) + (v2 - lca) end 7 8 21 +1 *3 22 22 22 = 7 + (8-1) + (21 -7)
  49. 11 module Counter : sig type t val read :

    t -> int val add : t -> int -> t val sub : t -> int -> t val mult : t -> int -> t val merge : lca:t -> v1:t -> v2:t -> t end = struct type t = int let read x = x let add x d = x + d let sub x d = x - d let mult x n = x * n let merge ~lca ~v1 ~v2 = lca + (v1 - lca) + (v2 - lca) end 7 8 21 +1 *3 22 22 22 = 7 + (8-1) + (21 -7) • 3-way merge function makes the counter suitable for distribution
  50. 11 module Counter : sig type t val read :

    t -> int val add : t -> int -> t val sub : t -> int -> t val mult : t -> int -> t val merge : lca:t -> v1:t -> v2:t -> t end = struct type t = int let read x = x let add x d = x + d let sub x d = x - d let mult x n = x * n let merge ~lca ~v1 ~v2 = lca + (v1 - lca) + (v2 - lca) end 7 8 21 +1 *3 22 22 22 = 7 + (8-1) + (21 -7) • 3-way merge function makes the counter suitable for distribution • Does not appeal to individual operations => independently extend data-type
  51. 12 Systems ➞ PL

  52. 12 Systems ➞ PL • CRDTs need to take care

    of systems level concerns such as message loss, duplication and reordering
  53. 12 Systems ➞ PL • CRDTs need to take care

    of systems level concerns such as message loss, duplication and reordering • 3-way merge is oblivious to these ✦ By leaving those concerns to MRDT middleware
  54. 12 7 8 21 +1 *3 22 22 Systems ➞

    PL • CRDTs need to take care of systems level concerns such as message loss, duplication and reordering • 3-way merge is oblivious to these ✦ By leaving those concerns to MRDT middleware
  55. ?? 12 7 8 21 +1 *3 22 22 Systems

    ➞ PL • CRDTs need to take care of systems level concerns such as message loss, duplication and reordering • 3-way merge is oblivious to these ✦ By leaving those concerns to MRDT middleware
  56. ?? 12 7 8 21 +1 *3 22 22 22

    22 = 21 + (21-21) + (22 -21) Systems ➞ PL • CRDTs need to take care of systems level concerns such as message loss, duplication and reordering • 3-way merge is oblivious to these ✦ By leaving those concerns to MRDT middleware
  57. Does the 3-way merge idea generalise?

  58. Does the 3-way merge idea generalise? Sort of

  59. 14 Observed-Removed Set

  60. 14 • OR-set — add-wins when there is a concurrent

    add and remove of the same element Observed-Removed Set
  61. 14 • OR-set — add-wins when there is a concurrent

    add and remove of the same element Observed-Removed Set let merge ~lca ~v1 ~v2 = (lca ∩ v1 ∩ v2) (* unmodified elements *) ∪ (v1 - lca) (* added in v1 *) ∪ (v2 - lca) (* added in v2 *) Kaki et al. “Mergeable Replicated Data Types”, OOPSLA 2019
  62. 14 {1} • OR-set — add-wins when there is a

    concurrent add and remove of the same element Observed-Removed Set let merge ~lca ~v1 ~v2 = (lca ∩ v1 ∩ v2) (* unmodified elements *) ∪ (v1 - lca) (* added in v1 *) ∪ (v2 - lca) (* added in v2 *) Kaki et al. “Mergeable Replicated Data Types”, OOPSLA 2019
  63. 14 {1} {1} add(1) • OR-set — add-wins when there

    is a concurrent add and remove of the same element Observed-Removed Set let merge ~lca ~v1 ~v2 = (lca ∩ v1 ∩ v2) (* unmodified elements *) ∪ (v1 - lca) (* added in v1 *) ∪ (v2 - lca) (* added in v2 *) Kaki et al. “Mergeable Replicated Data Types”, OOPSLA 2019
  64. 14 {1} {1} { } add(1) rem(1) • OR-set —

    add-wins when there is a concurrent add and remove of the same element Observed-Removed Set let merge ~lca ~v1 ~v2 = (lca ∩ v1 ∩ v2) (* unmodified elements *) ∪ (v1 - lca) (* added in v1 *) ∪ (v2 - lca) (* added in v2 *) Kaki et al. “Mergeable Replicated Data Types”, OOPSLA 2019
  65. 14 {1} {1} { } { } { } add(1)

    rem(1) • OR-set — add-wins when there is a concurrent add and remove of the same element Observed-Removed Set let merge ~lca ~v1 ~v2 = (lca ∩ v1 ∩ v2) (* unmodified elements *) ∪ (v1 - lca) (* added in v1 *) ∪ (v2 - lca) (* added in v2 *) { } ∪ ({1} - {1}) ∪ ({ } - {1}) = { } ∪ { } ∪ { } = { } (expected {1}) Kaki et al. “Mergeable Replicated Data Types”, OOPSLA 2019
  66. 14 {1} {1} { } { } { } add(1)

    rem(1) • Convergence is not suf fi cient; Intent is not preserved • OR-set — add-wins when there is a concurrent add and remove of the same element Observed-Removed Set let merge ~lca ~v1 ~v2 = (lca ∩ v1 ∩ v2) (* unmodified elements *) ∪ (v1 - lca) (* added in v1 *) ∪ (v2 - lca) (* added in v2 *) { } ∪ ({1} - {1}) ∪ ({ } - {1}) = { } ∪ { } ∪ { } = { } (expected {1}) Kaki et al. “Mergeable Replicated Data Types”, OOPSLA 2019
  67. Concretising Intent • Intent is a woolly term ★ How

    can we formalise the intent of operations on a data structure? 15 l v1 v2 v
  68. Concretising Intent • Intent is a woolly term ★ How

    can we formalise the intent of operations on a data structure? • We need ★ A formal language to specify the intent of an RDT ★ Mechanization to bridge the air gap between speci fi cation and implementation due to distributed system complexity 15 l v1 v2 v
  69. Peepul — Certi fi ed MRDTs 16

  70. Peepul — Certi fi ed MRDTs • An F* library

    implementing and proving MRDTs ★ https://github.com/prismlab/peepul 16
  71. Peepul — Certi fi ed MRDTs • An F* library

    implementing and proving MRDTs ★ https://github.com/prismlab/peepul • Speci fi cation language is event-based ★ Burckhardt et al. “Replicated Data Types: Speci fi cation, Veri fi cation and Optimality”, POPL 2014 16
  72. Peepul — Certi fi ed MRDTs • An F* library

    implementing and proving MRDTs ★ https://github.com/prismlab/peepul • Speci fi cation language is event-based ★ Burckhardt et al. “Replicated Data Types: Speci fi cation, Veri fi cation and Optimality”, POPL 2014 • Replication-aware simulation to connect speci fi cation with implementation 16
  73. Peepul — Certi fi ed MRDTs • An F* library

    implementing and proving MRDTs ★ https://github.com/prismlab/peepul • Speci fi cation language is event-based ★ Burckhardt et al. “Replicated Data Types: Speci fi cation, Veri fi cation and Optimality”, POPL 2014 • Replication-aware simulation to connect speci fi cation with implementation • Composition of MRDTs and their proofs! 16
  74. Peepul — Certi fi ed MRDTs • An F* library

    implementing and proving MRDTs ★ https://github.com/prismlab/peepul • Speci fi cation language is event-based ★ Burckhardt et al. “Replicated Data Types: Speci fi cation, Veri fi cation and Optimality”, POPL 2014 • Replication-aware simulation to connect speci fi cation with implementation • Composition of MRDTs and their proofs! • Extracted RDTs are compatible with Irmin — a Git-like distributed database 16
  75. Fixing OR-Set • Discriminate duplicate additions by associating a unique

    id 17
  76. Fixing OR-Set • Discriminate duplicate additions by associating a unique

    id 17 { (a,1) }
  77. Fixing OR-Set • Discriminate duplicate additions by associating a unique

    id 17 { (a,1) } { (a,1); (a,2) } add(a)
  78. Fixing OR-Set • Discriminate duplicate additions by associating a unique

    id 17 { (a,1) } { (a,1); (a,2) } { } add(a) rem(a)
  79. Fixing OR-Set • Discriminate duplicate additions by associating a unique

    id 17 { (a,1) } { (a,1); (a,2) } { } { (a,2) } { (a,2) } add(a) rem(a) { } ∪ ( { (a,1); (a,2) } - { (a,1) }) ∪ ( { } - { (a,1) } ) = { } ∪ { (a,2) } ∪ { } = { (a,2) }
  80. Fixing OR-Set • Discriminate duplicate additions by associating a unique

    id • MRDT implementation 17 { (a,1) } { (a,1); (a,2) } { } { (a,2) } { (a,2) } add(a) rem(a) { } ∪ ( { (a,1); (a,2) } - { (a,1) }) ∪ ( { } - { (a,1) } ) = { } ∪ { (a,2) } ∪ { } = { (a,2) }
  81. Fixing OR-Set • Discriminate duplicate additions by associating a unique

    id • MRDT implementation 17 { (a,1) } { (a,1); (a,2) } { } { (a,2) } { (a,2) } add(a) rem(a) { } ∪ ( { (a,1); (a,2) } - { (a,1) }) ∪ ( { } - { (a,1) } ) = { } ∪ { (a,2) } ∪ { } = { (a,2) }
  82. Fixing OR-Set • Discriminate duplicate additions by associating a unique

    id • MRDT implementation 17 { (a,1) } { (a,1); (a,2) } { } { (a,2) } { (a,2) } add(a) rem(a) { } ∪ ( { (a,1); (a,2) } - { (a,1) }) ∪ ( { } - { (a,1) } ) = { } ∪ { (a,2) } ∪ { } = { (a,2) } Unique Lamport Timestamps
  83. 18 Specifying OR-Set Abstract state

  84. 18 Specifying OR-Set Abstract state add(a) add(a) rem(a) rd vis

    vis vis vis { (a,1) } { (a,1); (a,2) } { } { (a,2) } { (a,2) } add(a) rem(a)
  85. 18 Specifying OR-Set Abstract state add(a) add(a) rem(a) rd vis

    vis vis vis { (a,1) } { (a,1); (a,2) } { } { (a,2) } { (a,2) } add(a) rem(a)
  86. 18 Specifying OR-Set Abstract state = { a } add(a)

    add(a) rem(a) rd vis vis vis vis { (a,1) } { (a,1); (a,2) } { } { (a,2) } { (a,2) } add(a) rem(a)
  87. Simulation Relation • Connects the abstract execution with the concrete

    state • For the OR-set, 19
  88. Verifying Operations 1. Show that the simulation holds for operations

    20
  89. Verifying Operations 1. Show that the simulation holds for operations

    20 to prove Operation de fi nition de fi ned once-and-for-all Simulation relation
  90. Verifying Operations 1. Show that the simulation holds for operations

    20 to prove Operation de fi nition de fi ned once-and-for-all Simulation relation 2. Show that the simulation holds for merge
  91. Verifying Operations 1. Show that the simulation holds for operations

    20 to prove Operation de fi nition de fi ned once-and-for-all Simulation relation 2. Show that the simulation holds for merge
  92. Verifying Operations 1. Show that the simulation holds for operations

    20 to prove Operation de fi nition de fi ned once-and-for-all Simulation relation 2. Show that the simulation holds for merge Merge de fi nition Assume de fi ned once-and-for-all To prove
  93. Verifying Operations 3. Show that the speci fi cation and

    the implementation agree on the return values of operations 21
  94. Verifying Operations 3. Show that the speci fi cation and

    the implementation agree on the return values of operations 21 4. Convergence
  95. Verifying Operations 3. Show that the speci fi cation and

    the implementation agree on the return values of operations 21 4. Convergence ✦ Permits the different replicas to converge to states that are observationally equal but not structurally equal ✤ Example: differently balanced BSTs
  96. Verifying Operations 3. Show that the speci fi cation and

    the implementation agree on the return values of operations 21 4. Convergence ✦ Permits the different replicas to converge to states that are observationally equal but not structurally equal ✤ Example: differently balanced BSTs
  97. Space-ef fi cient OR-Set • Recall that the OR-set has

    duplicates • How can we remove them? 22 { (a,1) } { (a,1); (a,2) } { } { (a,2) } { (a,2) } add(a) rem(a)
  98. Space-ef fi cient OR-Set • Recall that the OR-set has

    duplicates • How can we remove them? • Idea ★ On addition, replace existing element’s timestamp with the new timestamp ★ On merge, pick the larger timestamp 22 { (a,1) } { (a,1); (a,2) } { } { (a,2) } { (a,2) } add(a) rem(a)
  99. Space-ef fi cient OR-Set • Recall that the OR-set has

    duplicates • How can we remove them? • Idea ★ On addition, replace existing element’s timestamp with the new timestamp ★ On merge, pick the larger timestamp 22 { (a,1) } { (a,1); (a,2) } { } { (a,2) } { (a,2) } add(a) rem(a) Correctness argument is tricky
  100. Space-ef fi cient OR-Set 23 { (a,1) } { (a,1);

    (a,2) } { } { (a,2) } { (a,2) } add(a) rem(a) { (a,1) } { (a,2) } { } { (a,2) } { (a,2) } add(a) rem(a)
  101. Space-ef fi cient OR-Set 23 { (a,1) } { (a,1);

    (a,2) } { } { (a,2) } { (a,2) } add(a) rem(a) { (a,1) } { (a,2) } { } { (a,2) } { (a,2) } add(a) rem(a)
  102. Space-ef fi cient OR-Set 23 { (a,1) } { (a,1);

    (a,2) } { } { (a,2) } { (a,2) } add(a) rem(a) { (a,1) } { (a,2) } { } { (a,2) } { (a,2) } add(a) rem(a) Simulation relation is more intricate as one would expect
  103. Veri fi cation effort 24

  104. 25 Composing CRDTs is HARD!

  105. Composing IRC-style chat • Build IRC-style group chat ★ Send

    and read messages in channels ★ For simplicity, channels and messages cannot be deleted • Represent application state as a grow-only map with string (channel name) keys and mergeable-log as values • Goal: ★ map and log proved correct separately ★ Use the proof of underlying RDTs to prove chat application correctness 26
  106. Generic Map MRDT • Speci fi cation 27

  107. Generic Map MRDT • Speci fi cation 27 where

  108. Generic Map MRDT • Speci fi cation • Project fi

    lters the abstract state of the map on the key k and returns an abstract state of the underlying data type ★ Provided by the user once for a generic MRDT 27 where
  109. Generic Map MRDT • Speci fi cation • Project fi

    lters the abstract state of the map on the key k and returns an abstract state of the underlying data type ★ Provided by the user once for a generic MRDT 27 where set (“general”, append (“hello”)) set (“compiler”, append (“error”)) set (“general”, append (“world”)) vis vis get (“general”, rd) [“world”; “hello”]
  110. 28 Generic Map MRDT Implementation Simulation Relation

  111. 28 Generic Map MRDT Implementation Simulation Relation Get applies given

    operation on the value at key k and returns the value
  112. 28 Generic Map MRDT Implementation Simulation Relation Get applies given

    operation on the value at key k and returns the value Set is Get + update the map with the new state
  113. 28 Generic Map MRDT Implementation Simulation Relation Get applies given

    operation on the value at key k and returns the value Set is Get + update the map with the new state Merge uses the merge of the underlying value type!
  114. 28 Generic Map MRDT Implementation Simulation Relation Get applies given

    operation on the value at key k and returns the value Set is Get + update the map with the new state Merge uses the merge of the underlying value type! Simulation relation appeals to the value type’s simulation relation!
  115. • Program state is constructed by instantiating generic map with

    mergeable log ★ The proof of correctness of the chat application directly follows from the composition! 29 Composing IRC-style chat
  116. Mergeable Queues • Replicated queue with at-least-once dequeue semantics ★

    First veri fi ed queue RDT! 30
  117. Mergeable Queues • Replicated queue with at-least-once dequeue semantics ★

    First veri fi ed queue RDT! 30
  118. Mergeable Queues • Replicated queue with at-least-once dequeue semantics ★

    First veri fi ed queue RDT! • Our aim is to have O(1) enqueue and dequeue and O(n) merge 30
  119. Mergeable Queues • Implementation ★ Uses two-list functional queue implementation

    ✦ amortised O(1) enqueue and dequeue operations ★ Merge uses longest common contiguous subsequence algorithm — O(n) 31 M
  120. Mergeable Queues • Implementation ★ Uses two-list functional queue implementation

    ✦ amortised O(1) enqueue and dequeue operations ★ Merge uses longest common contiguous subsequence algorithm — O(n) • Speci fi cation 1.Any element popped in either A or B does not remain in M 2. Any element pushed into either A or B appears in M 3. An element that remains untouched in LCA, A, B remains in M 4. Order of pairs of elements in LCA, A, B must be preserved in M, if those elements are present in M. 31 M
  121. Mergeable Queues • Implementation ★ Uses two-list functional queue implementation

    ✦ amortised O(1) enqueue and dequeue operations ★ Merge uses longest common contiguous subsequence algorithm — O(n) • Speci fi cation 1.Any element popped in either A or B does not remain in M 2. Any element pushed into either A or B appears in M 3. An element that remains untouched in LCA, A, B remains in M 4. Order of pairs of elements in LCA, A, B must be preserved in M, if those elements are present in M. 31 M Implementation far removed from the specification!
  122. Veri fi cation effort 32

  123. 33 Summary

  124. • Programming and proving with RDTs is complicated due to

    concurrency and the lack of suitable programming abstractions 33 Summary
  125. • Programming and proving with RDTs is complicated due to

    concurrency and the lack of suitable programming abstractions • MRDTs simplify RDTs by implementing them as extensions of sequential data types ★ Reasoning about correctness is still hard 33 Summary
  126. • Programming and proving with RDTs is complicated due to

    concurrency and the lack of suitable programming abstractions • MRDTs simplify RDTs by implementing them as extensions of sequential data types ★ Reasoning about correctness is still hard • Peepul is an F* library for certi fi ed MRDTs ★ Replication-aware simulation for proving complex MRDTs ★ Complex MRDTs can be constructed and proved using simpler MRDTs 33 Summary
  127. • Programming and proving with RDTs is complicated due to

    concurrency and the lack of suitable programming abstractions • MRDTs simplify RDTs by implementing them as extensions of sequential data types ★ Reasoning about correctness is still hard • Peepul is an F* library for certi fi ed MRDTs ★ Replication-aware simulation for proving complex MRDTs ★ Complex MRDTs can be constructed and proved using simpler MRDTs • F* allows us to strike a balance between automated and interactive proofs ★ Extract to OCaml and run on Irmin! 33 Summary
  128. Backup Slides 34

  129. Queue Performance 35