Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Eventually Consistent Computations with CRDTs

Eventually Consistent Computations with CRDTs

RICON 2014, Christopher Meiklejohn

Christopher Meiklejohn

October 29, 2014
Tweet

More Decks by Christopher Meiklejohn

Other Decks in Programming

Transcript

  1. Eventually consistent computation with CRDTs Christopher Meiklejohn Basho Technologies, Inc

    and the SyncFree Consortium October 28, 2014 Meiklejohn (Basho / SyncFree) CRDT computation October 28, 2014 1 / 42
  2. Outline 1 Motivation Motivation The Goal 2 Derflow and DerflowL

    Derflow DerflowL 3 Distributed Computation Computation Aggregation of computation results 4 Conclusion Related Work Goals Current Status Future Work Thanks Meiklejohn (Basho / SyncFree) CRDT computation October 28, 2014 2 / 42
  3. Motivation Riak is a Dynamo-inspired key-value store Querying by key

    main mechanism for data storage and retrieval Three mechanisms presently for more expressive data access: MapReduce-like system, secondary indexing, integration with Apache Solr Each additional mechanism contains drawbacks Mechanisms are not fault-tolerant Structure is rigid: need to know schema when storing data Not possible to perform composition of queries Meiklejohn (Basho / SyncFree) CRDT computation October 28, 2014 3 / 42
  4. The Goal Building large scale distributed applications with strong properties

    Users want to be able to compute with their data in an efficient and composable manner while guaranteeing strong properties Provide a framework for building eventually consistent materialized views that have strong convergence properties Data types which have strong convergence properties Deterministic language to compose data types and preserve these strong convergence properties Meiklejohn (Basho / SyncFree) CRDT computation October 28, 2014 4 / 42
  5. Derflow Overview Published at ACM SIGPLAN Erlang Workshop ’14 Distributed

    deterministic dataflow programming Relies on a single-assignment variable store Built on top of Riak Core Programs execute locally; operate on a remote replicated data store Built on top of ets, Erlang Term Storage Meiklejohn (Basho / SyncFree) CRDT computation October 28, 2014 5 / 42
  6. Core Derflow Semantics declare(): declare a new dataflow variable bind(xi

    , vi ): bind a dataflow variable to a value read(xi ): read a dataflow variable spawn(...): introduce concurrency Meiklejohn (Basho / SyncFree) CRDT computation October 28, 2014 7 / 42
  7. Core Derflow Semantics Simple example {ok, Id} = derflow:declare(), {ok,

    _} = derflow:bind(Id, 1), {ok, Value1, _} = derflow:read(Id), error = derflow:bind(Id, 2), {ok, Value2, _} = derflow:read(Id), {ok, Value1, Value2}. Meiklejohn (Basho / SyncFree) CRDT computation October 28, 2014 8 / 42
  8. Concurrent map example Concurrent map example concurrent_map(S1, M, F, S2)

    -> case derflow:consume(S1) of {nil, _} -> derflow:bind(S2, nil); {Value, Next} -> {id, NextOutput} = derflow:extend(S2), spawn(derflow, bind, [S2, M, F, Value]), concurrent_map(Next, F, NextOutput) end. Meiklejohn (Basho / SyncFree) CRDT computation October 28, 2014 9 / 42
  9. Derflow Distribution Model Programs run locally on the client. Variables

    are hashed and replicated across the cluster. Programs perform a roundtrip to access each variable at an available replica. Replication factor is default at n = 3. Figure : Ring with 32 partitions and 3 nodes Meiklejohn (Basho / SyncFree) CRDT computation October 28, 2014 10 / 42
  10. DerflowL Generalizes the model from single-assignment variables to lattices Provides

    new program registration mechanisms for locality Provides new distribution models Provides a threshold read primitive Provides a mechanism for reading results of programs from the cluster Composition of programs Extraction into a core language model separate from persistence QuickCheck model for verification of language semantics Meiklejohn (Basho / SyncFree) CRDT computation October 28, 2014 11 / 42
  11. Generalizing to lattices Single-assignment is a case of a lattice.

    Unbound to bound. Generalize this to lattices of which inflations are performed on. State-based CRDTs only. Meiklejohn (Basho / SyncFree) CRDT computation October 28, 2014 12 / 42
  12. Lattice Example Lattice Example {ok, ObjectSetStream} = derflow:declare(), {ok, ObjectSetId}

    = derflow:declare(riak_dt_gset), ObjectSetFun = fun(X) -> {ok, Set0, _} = derflow:read(ObjectSetId), {ok, Set} = riak_dt_gset:update({add, X}, undefined, Set0), {ok, _} = derflow:bind(ObjectSetId, Set), Set end, derflow:thread(?MODULE, consumer, [ObjectStream, ObjectSetFun, ObjectSetStream]), Meiklejohn (Basho / SyncFree) CRDT computation October 28, 2014 13 / 42
  13. Threshold Read Similar to the threshold read provided by LVars.

    Partial function which is undefined until threshold met. Returns value supplied to the read, regardless of state. Meiklejohn (Basho / SyncFree) CRDT computation October 28, 2014 14 / 42
  14. Threshold Read Example Threshold Read Example spawn(fun() -> Me !

    derflow:read(GSetId, [1,2,3,4]) end), {ok, _} = derflow:bind(GSetId, [1, 2, 3, 4]), {ok, _} = derflow:bind(GSetId, [1, 2, 3, 4, 5]), GSet2 = receive {ok, [1, 2, 3, 4], _} = V1 -> V1 end, Meiklejohn (Basho / SyncFree) CRDT computation October 28, 2014 15 / 42
  15. Extraction of Language Semantics Extraction of language semantics into a

    core library Library extracts over the distribution and data storage layer Allows for distribution over ETS, Riak, and SyncFree reference platform Important for testing out different models; SyncFree reference platform has no replication; variables distributed Riak has replication and persistence ETS, as implemented, is replicated, but not persisted QuickChecked model Meiklejohn (Basho / SyncFree) CRDT computation October 28, 2014 16 / 42
  16. DerflowL Programs Allows composition CRDTs Enforces inflationary updates between input

    and output CRDT Output CRDT: partial order, inflationary updates Mechanism for creating a view over the database Treat programs as values Sequential composition of programs Meiklejohn (Basho / SyncFree) CRDT computation October 28, 2014 17 / 42
  17. Pure-δ State-Based CRDT Program output is a Pure-δ State-Based CRDT

    Update function can produce a value in partial order, without existing state Example: G-counter needs to reference current state Value is merged into current state Contains additional causal information in history: Values observed in computation of function Value contributing to result of function Query returns current state Meiklejohn (Basho / SyncFree) CRDT computation October 28, 2014 18 / 42
  18. Computation functions Figure : Stream of input CRDTs Meiklejohn (Basho

    / SyncFree) CRDT computation October 28, 2014 19 / 42
  19. Computation functions Figure : Stream of input CRDTs with pure

    transformation applied Meiklejohn (Basho / SyncFree) CRDT computation October 28, 2014 20 / 42
  20. Computation functions Figure : Stream of input CRDTs with pure

    transformation applied; merged with current state Meiklejohn (Basho / SyncFree) CRDT computation October 28, 2014 21 / 42
  21. Dynamo System Model Data partitioned across replica sets Disjoint replica

    sets Replicas will eventually be equivalent Replicas written to and read from using quorums Meiklejohn (Basho / SyncFree) CRDT computation October 28, 2014 22 / 42
  22. Replica progress Figure : Values are written to quorums of

    replicas; with some divergence Meiklejohn (Basho / SyncFree) CRDT computation October 28, 2014 23 / 42
  23. Replica progress Figure : Values are written to quorums of

    replicas; with some divergence Meiklejohn (Basho / SyncFree) CRDT computation October 28, 2014 24 / 42
  24. Replica progress Figure : Values are written to quorums of

    replicas; with some divergence; quorum operations increase fault-tolerance and recency Meiklejohn (Basho / SyncFree) CRDT computation October 28, 2014 25 / 42
  25. Replica progress Figure : Values are written to quorums of

    replicas; with some divergence; quorum operations increase fault-tolerance and recency; apply read-repair Meiklejohn (Basho / SyncFree) CRDT computation October 28, 2014 26 / 42
  26. Anti-entropy On reads, apply read-repair mechanisms Use version vector to

    track divergence and repair via anti-entropy Meiklejohn (Basho / SyncFree) CRDT computation October 28, 2014 27 / 42
  27. Sequential composition Pure δ-CRDTs also allow composition Analyze dependencies; inform

    evaluation strategy Program rewriting in terms of existing programs Meiklejohn (Basho / SyncFree) CRDT computation October 28, 2014 28 / 42
  28. Aggregation of results Contact covering set of replicas across nodes

    This can be increased to improve fault-tolerance Merge replicas Sum between disjoint replica sets Computation in replica sets can be restarted Meiklejohn (Basho / SyncFree) CRDT computation October 28, 2014 29 / 42
  29. Aggregation of results Figure : Disjoint replica sets Meiklejohn (Basho

    / SyncFree) CRDT computation October 28, 2014 30 / 42
  30. Aggregation of results Figure : Covering set of replicas from

    each disjoint set Meiklejohn (Basho / SyncFree) CRDT computation October 28, 2014 31 / 42
  31. Aggregation of results Figure : Aggregation of results using associative,

    commutative sum Meiklejohn (Basho / SyncFree) CRDT computation October 28, 2014 32 / 42
  32. Aggregation of results Figure : Fault-tolerant r = 2 covering

    set Meiklejohn (Basho / SyncFree) CRDT computation October 28, 2014 33 / 42
  33. Aggregation of results Figure : Fault-tolerant r = 3 covering

    set Meiklejohn (Basho / SyncFree) CRDT computation October 28, 2014 34 / 42
  34. Aggregation of results Figure : Fault-tolerant r = 3 covering

    set; merge of state Meiklejohn (Basho / SyncFree) CRDT computation October 28, 2014 35 / 42
  35. Aggregation of results Figure : Fault-tolerant r = 3 covering

    set; merge of state, aggregation Meiklejohn (Basho / SyncFree) CRDT computation October 28, 2014 36 / 42
  36. Related Work Spark; lineage-tracking, requires coordination Optimize for requesting more

    replicas up front; track causality Meiklejohn (Basho / SyncFree) CRDT computation October 28, 2014 37 / 42
  37. Goals Derflow allows for creation of programs which create "views’

    Dynamo-style quorum read/write Harvest vs. yield tradeoff Can be repaired via anti-entropy and read-repair Proactively maintained Sequential composition Meiklejohn (Basho / SyncFree) CRDT computation October 28, 2014 38 / 42
  38. Current Status Prototype implementation of programming model Allows for creation

    of programs which create "views’ These views have the following properties: Dynamo-style quorum read/write Harvest vs. yield tradeoff Can be repaired via anti-entropy and read-repair Proactively maintained Sequential composition Mapping between CRDTs is explicit Test suite and example applications Meiklejohn (Basho / SyncFree) CRDT computation October 28, 2014 39 / 42
  39. Future Work Hygene analysis for programs to ensure determinism More

    implicit composition of CRDTs (SyncFree) Higher-level language extension Meiklejohn (Basho / SyncFree) CRDT computation October 28, 2014 40 / 42
  40. Thanks Carlos Baquero Peter Van Roy Marc Shapiro Nuno Pregucia

    Annette Bieniusa Manuel Bravo Meiklejohn (Basho / SyncFree) CRDT computation October 28, 2014 41 / 42
  41. The SyncFree Project Derflow on GitHub: https://github.com/cmeiklejohn/derflow SyncFree on GitHub:

    https://github.com/syncfree Project overview: https://syncfree.lip6.fr Research publications available: https://syncfree.lip6.fr/index.php/publications Meiklejohn (Basho / SyncFree) CRDT computation October 28, 2014 42 / 42