Upgrade to Pro — share decks privately, control downloads, hide ads and more …

An audiovisual précis of 'A comprehensive study of Convergent and Commutative Replicated Data Types'

An audiovisual précis of 'A comprehensive study of Convergent and Commutative Replicated Data Types'

Presented at Papers We Love NYC #4: http://www.meetup.com/papers-we-love/events/175964662/

B8ee952abb6620c1c93b3acb46cd391c?s=128

Chas Emerick

May 15, 2014
Tweet

Transcript

  1. Chas Emerick @cemerick @QuiltProject Papers We Love NYC #4 May

    15, 2014 for your edification and entertainment, an audiovisual précis of: A comprehensive study of Convergent and Commutative Replicated Data Types by Shapiro, Preguiça, Baquero, and Zawirski (2011)
  2. @papers_we_love NYC #4, 2014-05-15 @cemerick / @QuiltProject Preface • “Who

    is this guy?” • Not going to follow the order of topics from the paper exactly • Key topics will be introduced along with section numbers from the paper (e.g. §4.3) • I'll be drawing upon materials used in the authors' presentations related to this paper (see references for links)
  3. @papers_we_love NYC #4, 2014-05-15 @cemerick / @QuiltProject Itinerary • The

    paper – Motivating problem – Theoretically- and algebraically-sound solution – Specification of practical data type designs – Challenges – Related work, further reading • Impact outside of academia • Questions & discussion
  4. @papers_we_love NYC #4, 2014-05-15 @cemerick / @QuiltProject Actors within distributed

    systems exchange and share state A t B x a x a x b x b x c x c
  5. @papers_we_love NYC #4, 2014-05-15 @cemerick / @QuiltProject The problem Conflicting

    concurrent modifications A t B x x x Δb x Δa x Δa ∥x Δb Δ b Δ a x ???
  6. @papers_we_love NYC #4, 2014-05-15 @cemerick / @QuiltProject The problem Conflicting

    concurrent modifications require consensus A t B x x x Δab x Δa x Δab Δ b Δ a x Δa
  7. @papers_we_love NYC #4, 2014-05-15 @cemerick / @QuiltProject The problem Conflicting

    concurrent modifications require consensus, which is partition-intolerant and affects availability: A t B x x x Δab x Δa x Δab Δ b Δ a x Δa
  8. @papers_we_love NYC #4, 2014-05-15 @cemerick / @QuiltProject Linearizability✝ • Transactional

    databases, Redis, consensus services (Paxos/Raft) • Global consensus → consistency • Total order of all events • Very expensive & constrains availability A t B x x x Δab x Δa x Δab Δ b Δ a x Δa ✝As well as strict serializability, which has even stronger guarantees.
  9. @papers_we_love NYC #4, 2014-05-15 @cemerick / @QuiltProject Thus, eventual consistency

    • No guarantee of ordering of events • Maximal availability & performance • How to reconcile results from concurrent operations? A t B x x x Δb x Δa x Δa ∥x Δb Δ b Δ a x ???
  10. @papers_we_love NYC #4, 2014-05-15 @cemerick / @QuiltProject How to reconcile

    results of concurrent operations? • “Background” (deferred) consensus – Post-hoc resolution or rollback of conflicting updates • This is what we do today, all the time! – Resolving CouchDB conflicts and Riak siblings within applications – Merging (semi-)textual content via diffs • Very difficult to implement correctly, and no guiding formalisms to indicate correctness or warn against problems
  11. @papers_we_love NYC #4, 2014-05-15 @cemerick / @QuiltProject Proposed solution Conflict-free

    Replicated Data Types (CRDTs) deterministically reconcile concurrent updates such that no conflicts arise – Performance, availability, scale of eventual consistency + reliable reconciliation as if you were using a consensus mechanism – Provably sound – Limitations: • No consensus → limitations on what can be stored, replicated, and reconciled (i.e. no global invariants) • Unbounded growth → “garbage collection”
  12. @papers_we_love NYC #4, 2014-05-15 @cemerick / @QuiltProject Chocolate and vanilla

    CRDTs • What is replicated? – Entire state of the datatype? State-based, a.k.a. convergent replicated data type, a.k.a. CvRDT – Individual operations (+ arguments)? Operation-based, a.k.a. commutative replicated data type a.k.a. CmRDT – Options correspond to the two strategies for implementing optimistic replication✝ • These are formally equivalent §2.4 – Strategy: understand state-based constructions, move on to operation-based as optimization ✝http://research.microsoft.com/apps/pubs/default.aspx?id=66979
  13. @papers_we_love NYC #4, 2014-05-15 @cemerick / @QuiltProject State­based (CvRDT) §2.2.1,

    §2.3.1 All possible states within a CvRDT form a semilattice – Partially-ordered set established by a least upper bound (join, ) or greatest ≤ lower bound (meet, ) ≥ – Both relations are definitionally commutative, associative, and idempotent – Each application of join or meet yields a monotonically increasing or decreasing value {b} ø {a} {a} {a,b} t
  14. @papers_we_love NYC #4, 2014-05-15 @cemerick / @QuiltProject State­based (CvRDT) •

    join and meet are formally equivalent, join presumed throughout the literature • Update locally, propagate results to other replicas, where it must converge (the 'v' in “CvRDT”) • Requires weakest eventual consistency guarantees to yield convergence among all replicas, since join is associative and commutative – “infinitely often” transmission of state – Insensitive to reordered/dropped/repeated messages – Very expensive worst case, but easier to reason about
  15. @papers_we_love NYC #4, 2014-05-15 @cemerick / @QuiltProject Language for specifying

    asynchronous replication • More than boxes, arrows • Better than (most) pseudocode: explicit about preconditions, where things happen, (a)synchrony, etc
  16. @papers_we_love NYC #4, 2014-05-15 @cemerick / @QuiltProject Language for specifying

    asynchronous replication
  17. @papers_we_love NYC #4, 2014-05-15 @cemerick / @QuiltProject A “Portfolio of

    basic CRDTs” §3 • Counters • Registers • Sets • Sequences • CRDTs compose and retain their characteristics – Sets → maps, multimaps, graphs
  18. @papers_we_love NYC #4, 2014-05-15 @cemerick / @QuiltProject Designing a register

    CRDT✝ • Sequential specification: – R.set(v) → R.get() == v • Join relation over R.set(v a ) || R.set(v b ) – Linearizable? – Error state? – Last writer wins? ✝Framework from http://bit.ly/shapiro-msr-talk
  19. @papers_we_love NYC #4, 2014-05-15 @cemerick / @QuiltProject LWW (last writer

    wins) register §3.2.1 • Ensures only a single value in register • Semilattice is ordered by timestamps A t B ∅ y t2 x t1 y t2 Δ t2 Δ t1 ∅ y t2
  20. @papers_we_love NYC #4, 2014-05-15 @cemerick / @QuiltProject Mapping LWW­register to

    its semilattice A t B ∅ y t2 x t1 y t2 Δ t2 Δ t1 ∅ y t2 [t 2 ,y] ø ø [t 1 ,x] [t 2 ,y] t
  21. @papers_we_love NYC #4, 2014-05-15 @cemerick / @QuiltProject MV (multi­value) register

    §3.2.2 • Assignments carry causal history (e.g. version vector) which defines semilattice's partial order • join retains all values assigned concurrently; some client can later assign a single value A t B ∅ y Δb x Δa x Δa ∥y Δb Δ b Δ a ∅ x Δa ∥y Δb client z [Δa,Δb,Δc] z [Δa,Δb,Δc]
  22. @papers_we_love NYC #4, 2014-05-15 @cemerick / @QuiltProject Mapping MV­register to

    its semilattice [Δb,y] ø [Δa,x] #{[Δa,x] [Δb,y]} t [[Δa,Δb,Δc],z] A t B ∅ y Δb x Δa x Δa ∥y Δb Δ b Δ a ∅ x Δa ∥y Δb client z [Δa,Δb,Δc] z [Δa,Δb,Δc]
  23. @papers_we_love NYC #4, 2014-05-15 @cemerick / @QuiltProject Designing a set

    CRDT✝ • Sequential specification: – S.add(e) → S.contains(e) == true – S.remove(e) → S.contains(e) == false • Join relation over S.add(e) || S.remove(e) – Linearizable? – Disallow removals? – Error state? – Last writer wins? – Add wins? – Remove wins? ✝Framework from http://bit.ly/shapiro-msr-talk
  24. @papers_we_love NYC #4, 2014-05-15 @cemerick / @QuiltProject Sets §3.3 •

    Counterintuitive convergent characterizations – G-Set (“grow-only”): can add, cannot remove – 2P-Set (“two phase”): once removed, cannot add an element back • Composition of two G-sets – LWW-Set – PN-Set (positive & negative counters track membership): addition may not yield membership
  25. @papers_we_love NYC #4, 2014-05-15 @cemerick / @QuiltProject Observed­Remove Set §3.3.5

    • Set CRDT with intuitive semantics ≈ – Given S.add(e) || S.remove(e), add wins • Strategy – tag each element uniquely (per actor or per operation), e τ – operation removing e must include set of all previously-unremoved τ for e – Set.contains(e) == true iff an e τ exists where τ has not been implicated in a removal of e • Tags are not exposed in userland API
  26. @papers_we_love NYC #4, 2014-05-15 @cemerick / @QuiltProject Observed­Remove Set Progression

    & Semilattice e a A e b B e b C e a,b e a,-a e a,-a,b {e} ∅ e b,a,-a {e} {e} {e} ∅ ∅ ∅ {e} {e} t • A and B concurrently add e a and e b • A removes e a ; this has no effect on e's membership in C's view of the set because of its knowledge of e b • e b is eventually replicated to A, yielding consistency
  27. @papers_we_love NYC #4, 2014-05-15 @cemerick / @QuiltProject Graphs §3.4 •

    Two sets, vertices + edges • Many different possible constructions given the local invariants one might want to preserve between edges and vertices • Global invariants cannot be guaranteed because of concurrent operations – e.g. cannot prevent cycles
  28. @papers_we_love NYC #4, 2014-05-15 @cemerick / @QuiltProject Sequences §3.5.2 •

    Set of (identifier, value) where identifiers are selected from a dense, totally-ordered set • Explored deeply in papers on Logoot and Treedoc CRDTs
  29. @papers_we_love NYC #4, 2014-05-15 @cemerick / @QuiltProject Operation­based CmRDT §2.2.2,

    §2.3.2 • Requires “reliable broadcast channel” – Operations delivered to each replica in causal order < d – All concurrent operations that are unordered with respect to < d must commute (the 'm' in CmRDT) • Far more efficient than worst-case state-based specification
  30. @papers_we_love NYC #4, 2014-05-15 @cemerick / @QuiltProject Operation­based CmRDT: tradeoffs

    • More complex, more difficult to reason about • More challenging to implement – Causal relationships between operations must be identified + maintained – Generally requires tracking “group membership”
  31. @papers_we_love NYC #4, 2014-05-15 @cemerick / @QuiltProject Garbage Collection §4

    • “Garbage”: additional overhead that accumulates in order to satisfy CRDT semantics – “tombstones” (e.g. remove tags in an OR-Set) – Unbalanced trees of identifiers in sequences • Optimistically collecting garbage and rolling back as necessary is an option in some cases • Others appear to require various levels of consensus to achieve • “Garbage” is not always waste – The right kind of tombstones are what makes consistent snapshot possible
  32. @papers_we_love NYC #4, 2014-05-15 @cemerick / @QuiltProject Prior & related

    work Lots of prior work had portions of CRDTs' semantics, before “CRDT” was identified as a concept: – Wuu and Bernstein, 'Efficient solutions to the replicated log and dictionary problems' (1984!) – Operational transforms – Any Dynamo-style system uses registers for values • LWW-registers: S3 • MV-registers: CouchDB conflicts, Riak siblings
  33. @papers_we_love NYC #4, 2014-05-15 @cemerick / @QuiltProject Prior & related

    work “Consistency as Logical Monotonicity” (CALM theorem) – s/semilattices/monotonic logic • Stricter semantics than semilattices; no way to characterize non-monotonic operations (remove, etc) without consensus – Implemented at the language level by Bloom • Nearly all data structures are monotonic or lattices • Allows for static analysis that identifies parts of your program that aren't monotonic (require synchronization/consensus mechanism to ensure safety)
  34. @papers_we_love NYC #4, 2014-05-15 @cemerick / @QuiltProject Resources • Meetup

    page for this talk: http://bit.ly/pwl-nyc-4 • Shapiro et al. paper: http://bit.ly/shapiro-crdt-pdf • Shapiro talk @ MSR: http://bit.ly/shapiro-msr-talk • Chris Meiklejohn's 'Readings in Distributed Systems': http://bit.ly/cmeik-dist-sys-readings • CRDTs offered in v2.0 of Riak: http://bit.ly/riak-crdts • Bloom, a Ruby DSL for “disorderly programming”, an implementation of CALM: http://www.bloom-lang.net • The Quilt Project: http://quilt.org