Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Chas Emerick on A comprehensive study of Conver...

Chas Emerick on A comprehensive study of Convergent & Commutative Replicated Data Types by Marc Shapiro, et al.

Conflict-free Replicated Data Types (CRDTs) are a formalism for providing practical data and programming primitives for use in distributed systems applications without necessitating expensive (and sometimes impractical) consensus mechanisms. Their key characteristic is that they provide conflict-free "merging" of distributed concurrent updates given only the weak guarantees of eventual consistency.
While this paper did not coin the term 'CRDT', it was the first to provide a comprehensive treatment of their definition, semantics, and possible construction separate from and beyond previous implementations of distributable datatypes that happened to provide CRDT-like semantics.

In the paper, the authors:

• Construct a modern taxonomy of data types that can be characterized has having desirable conflict-resolution properties given multiple distributed, concurrent actors manipulating shared data without coordination or consensus (a.k.a. "eventual consistency").

• Describe a set of formal properties necessary to implement these data types, defining both local programming interfaces and distributed replication semantics and requirements.

Related topics: Eventual consistency, consensus, CAP theorem, (semi-)lattices, Bloom(L), operational transforms, data replication

Papers_We_Love

May 15, 2014
Tweet

More Decks by Papers_We_Love

Other Decks in Technology

Transcript

  1. Chas Emerick @cemerick @QuiltProject Papers We Love NYC #4 May

    15, 2014 for your edification and entertainment, an audiovisual précis of: A comprehensive study of Convergent and Commutative Replicated Data Types by Shapiro, Preguiça, Baquero, and Zawirski (2011)
  2. @papers_we_love NYC #4, 2014-05-15 @cemerick / @QuiltProject Preface • “Who

    is this guy?” • Not going to follow the order of topics from the paper exactly • Key topics will be introduced along with section numbers from the paper (e.g. §4.3) • I'll be drawing upon materials used in the authors' presentations related to this paper (see references for links)
  3. @papers_we_love NYC #4, 2014-05-15 @cemerick / @QuiltProject Itinerary • The

    paper – Motivating problem – Theoretically- and algebraically-sound solution – Specification of practical data type designs – Challenges – Related work, further reading • Impact outside of academia • Questions & discussion
  4. @papers_we_love NYC #4, 2014-05-15 @cemerick / @QuiltProject Actors within distributed

    systems exchange and share state A t B x a x a x b x b x c x c
  5. @papers_we_love NYC #4, 2014-05-15 @cemerick / @QuiltProject The problem Conflicting

    concurrent modifications A t B x x x Δb x Δa x Δa ∥x Δb Δ b Δ a x ???
  6. @papers_we_love NYC #4, 2014-05-15 @cemerick / @QuiltProject The problem Conflicting

    concurrent modifications require consensus A t B x x x Δab x Δa x Δab Δ b Δ a x Δa
  7. @papers_we_love NYC #4, 2014-05-15 @cemerick / @QuiltProject The problem Conflicting

    concurrent modifications require consensus, which is partition-intolerant and affects availability: A t B x x x Δab x Δa x Δab Δ b Δ a x Δa
  8. @papers_we_love NYC #4, 2014-05-15 @cemerick / @QuiltProject Linearizability✝ • Transactional

    databases, Redis, consensus services (Paxos/Raft) • Global consensus → consistency • Total order of all events • Very expensive & constrains availability A t B x x x Δab x Δa x Δab Δ b Δ a x Δa ✝As well as strict serializability, which has even stronger guarantees.
  9. @papers_we_love NYC #4, 2014-05-15 @cemerick / @QuiltProject Thus, eventual consistency

    • No guarantee of ordering of events • Maximal availability & performance • How to reconcile results from concurrent operations? A t B x x x Δb x Δa x Δa ∥x Δb Δ b Δ a x ???
  10. @papers_we_love NYC #4, 2014-05-15 @cemerick / @QuiltProject How to reconcile

    results of concurrent operations? • “Background” (deferred) consensus – Post-hoc resolution or rollback of conflicting updates • This is what we do today, all the time! – Resolving CouchDB conflicts and Riak siblings within applications – Merging (semi-)textual content via diffs • Very difficult to implement correctly, and no guiding formalisms to indicate correctness or warn against problems
  11. @papers_we_love NYC #4, 2014-05-15 @cemerick / @QuiltProject Proposed solution Conflict-free

    Replicated Data Types (CRDTs) deterministically reconcile concurrent updates such that no conflicts arise – Performance, availability, scale of eventual consistency + reliable reconciliation as if you were using a consensus mechanism – Provably sound – Limitations: • No consensus → limitations on what can be stored, replicated, and reconciled (i.e. no global invariants) • Unbounded growth → “garbage collection”
  12. @papers_we_love NYC #4, 2014-05-15 @cemerick / @QuiltProject Chocolate and vanilla

    CRDTs • What is replicated? – Entire state of the datatype? State-based, a.k.a. convergent replicated data type, a.k.a. CvRDT – Individual operations (+ arguments)? Operation-based, a.k.a. commutative replicated data type a.k.a. CmRDT – Options correspond to the two strategies for implementing optimistic replication✝ • These are formally equivalent §2.4 – Strategy: understand state-based constructions, move on to operation-based as optimization ✝http://research.microsoft.com/apps/pubs/default.aspx?id=66979
  13. @papers_we_love NYC #4, 2014-05-15 @cemerick / @QuiltProject State­based (CvRDT) §2.2.1,

    §2.3.1 All possible states within a CvRDT form a semilattice – Partially-ordered set established by a least upper bound (join, ) or greatest ≤ lower bound (meet, ) ≥ – Both relations are definitionally commutative, associative, and idempotent – Each application of join or meet yields a monotonically increasing or decreasing value {b} ø {a} {a} {a,b} t
  14. @papers_we_love NYC #4, 2014-05-15 @cemerick / @QuiltProject State­based (CvRDT) •

    join and meet are formally equivalent, join presumed throughout the literature • Update locally, propagate results to other replicas, where it must converge (the 'v' in “CvRDT”) • Requires weakest eventual consistency guarantees to yield convergence among all replicas, since join is associative and commutative – “infinitely often” transmission of state – Insensitive to reordered/dropped/repeated messages – Very expensive worst case, but easier to reason about
  15. @papers_we_love NYC #4, 2014-05-15 @cemerick / @QuiltProject Language for specifying

    asynchronous replication • More than boxes, arrows • Better than (most) pseudocode: explicit about preconditions, where things happen, (a)synchrony, etc
  16. @papers_we_love NYC #4, 2014-05-15 @cemerick / @QuiltProject A “Portfolio of

    basic CRDTs” §3 • Counters • Registers • Sets • Sequences • CRDTs compose and retain their characteristics – Sets → maps, multimaps, graphs
  17. @papers_we_love NYC #4, 2014-05-15 @cemerick / @QuiltProject Designing a register

    CRDT✝ • Sequential specification: – R.set(v) → R.get() == v • Join relation over R.set(v a ) || R.set(v b ) – Linearizable? – Error state? – Last writer wins? ✝Framework from http://bit.ly/shapiro-msr-talk
  18. @papers_we_love NYC #4, 2014-05-15 @cemerick / @QuiltProject LWW (last writer

    wins) register §3.2.1 • Ensures only a single value in register • Semilattice is ordered by timestamps A t B ∅ y t2 x t1 y t2 Δ t2 Δ t1 ∅ y t2
  19. @papers_we_love NYC #4, 2014-05-15 @cemerick / @QuiltProject Mapping LWW­register to

    its semilattice A t B ∅ y t2 x t1 y t2 Δ t2 Δ t1 ∅ y t2 [t 2 ,y] ø ø [t 1 ,x] [t 2 ,y] t
  20. @papers_we_love NYC #4, 2014-05-15 @cemerick / @QuiltProject MV (multi­value) register

    §3.2.2 • Assignments carry causal history (e.g. version vector) which defines semilattice's partial order • join retains all values assigned concurrently; some client can later assign a single value A t B ∅ y Δb x Δa x Δa ∥y Δb Δ b Δ a ∅ x Δa ∥y Δb client z [Δa,Δb,Δc] z [Δa,Δb,Δc]
  21. @papers_we_love NYC #4, 2014-05-15 @cemerick / @QuiltProject Mapping MV­register to

    its semilattice [Δb,y] ø [Δa,x] #{[Δa,x] [Δb,y]} t [[Δa,Δb,Δc],z] A t B ∅ y Δb x Δa x Δa ∥y Δb Δ b Δ a ∅ x Δa ∥y Δb client z [Δa,Δb,Δc] z [Δa,Δb,Δc]
  22. @papers_we_love NYC #4, 2014-05-15 @cemerick / @QuiltProject Designing a set

    CRDT✝ • Sequential specification: – S.add(e) → S.contains(e) == true – S.remove(e) → S.contains(e) == false • Join relation over S.add(e) || S.remove(e) – Linearizable? – Disallow removals? – Error state? – Last writer wins? – Add wins? – Remove wins? ✝Framework from http://bit.ly/shapiro-msr-talk
  23. @papers_we_love NYC #4, 2014-05-15 @cemerick / @QuiltProject Sets §3.3 •

    Counterintuitive convergent characterizations – G-Set (“grow-only”): can add, cannot remove – 2P-Set (“two phase”): once removed, cannot add an element back • Composition of two G-sets – LWW-Set – PN-Set (positive & negative counters track membership): addition may not yield membership
  24. @papers_we_love NYC #4, 2014-05-15 @cemerick / @QuiltProject Observed­Remove Set §3.3.5

    • Set CRDT with intuitive semantics ≈ – Given S.add(e) || S.remove(e), add wins • Strategy – tag each element uniquely (per actor or per operation), e τ – operation removing e must include set of all previously-unremoved τ for e – Set.contains(e) == true iff an e τ exists where τ has not been implicated in a removal of e • Tags are not exposed in userland API
  25. @papers_we_love NYC #4, 2014-05-15 @cemerick / @QuiltProject Observed­Remove Set Progression

    & Semilattice e a A e b B e b C e a,b e a,-a e a,-a,b {e} ∅ e b,a,-a {e} {e} {e} ∅ ∅ ∅ {e} {e} t • A and B concurrently add e a and e b • A removes e a ; this has no effect on e's membership in C's view of the set because of its knowledge of e b • e b is eventually replicated to A, yielding consistency
  26. @papers_we_love NYC #4, 2014-05-15 @cemerick / @QuiltProject Graphs §3.4 •

    Two sets, vertices + edges • Many different possible constructions given the local invariants one might want to preserve between edges and vertices • Global invariants cannot be guaranteed because of concurrent operations – e.g. cannot prevent cycles
  27. @papers_we_love NYC #4, 2014-05-15 @cemerick / @QuiltProject Sequences §3.5.2 •

    Set of (identifier, value) where identifiers are selected from a dense, totally-ordered set • Explored deeply in papers on Logoot and Treedoc CRDTs
  28. @papers_we_love NYC #4, 2014-05-15 @cemerick / @QuiltProject Operation­based CmRDT §2.2.2,

    §2.3.2 • Requires “reliable broadcast channel” – Operations delivered to each replica in causal order < d – All concurrent operations that are unordered with respect to < d must commute (the 'm' in CmRDT) • Far more efficient than worst-case state-based specification
  29. @papers_we_love NYC #4, 2014-05-15 @cemerick / @QuiltProject Operation­based CmRDT: tradeoffs

    • More complex, more difficult to reason about • More challenging to implement – Causal relationships between operations must be identified + maintained – Generally requires tracking “group membership”
  30. @papers_we_love NYC #4, 2014-05-15 @cemerick / @QuiltProject Garbage Collection §4

    • “Garbage”: additional overhead that accumulates in order to satisfy CRDT semantics – “tombstones” (e.g. remove tags in an OR-Set) – Unbalanced trees of identifiers in sequences • Optimistically collecting garbage and rolling back as necessary is an option in some cases • Others appear to require various levels of consensus to achieve • “Garbage” is not always waste – The right kind of tombstones are what makes consistent snapshot possible
  31. @papers_we_love NYC #4, 2014-05-15 @cemerick / @QuiltProject Prior & related

    work Lots of prior work had portions of CRDTs' semantics, before “CRDT” was identified as a concept: – Wuu and Bernstein, 'Efficient solutions to the replicated log and dictionary problems' (1984!) – Operational transforms – Any Dynamo-style system uses registers for values • LWW-registers: S3 • MV-registers: CouchDB conflicts, Riak siblings
  32. @papers_we_love NYC #4, 2014-05-15 @cemerick / @QuiltProject Prior & related

    work “Consistency as Logical Monotonicity” (CALM theorem) – s/semilattices/monotonic logic • Stricter semantics than semilattices; no way to characterize non-monotonic operations (remove, etc) without consensus – Implemented at the language level by Bloom • Nearly all data structures are monotonic or lattices • Allows for static analysis that identifies parts of your program that aren't monotonic (require synchronization/consensus mechanism to ensure safety)
  33. @papers_we_love NYC #4, 2014-05-15 @cemerick / @QuiltProject Resources • Meetup

    page for this talk: http://bit.ly/pwl-nyc-4 • Shapiro et al. paper: http://bit.ly/shapiro-crdt-pdf • Shapiro talk @ MSR: http://bit.ly/shapiro-msr-talk • Chris Meiklejohn's 'Readings in Distributed Systems': http://bit.ly/cmeik-dist-sys-readings • CRDTs offered in v2.0 of Riak: http://bit.ly/riak-crdts • Bloom, a Ruby DSL for “disorderly programming”, an implementation of CALM: http://www.bloom-lang.net • The Quilt Project: http://quilt.org