Slide 1

Slide 1 text

Eventually consistent computation with CRDTs Christopher Meiklejohn Basho Technologies, Inc and the SyncFree Consortium October 28, 2014 Meiklejohn (Basho / SyncFree) CRDT computation October 28, 2014 1 / 42

Slide 2

Slide 2 text

Outline 1 Motivation Motivation The Goal 2 Derflow and DerflowL Derflow DerflowL 3 Distributed Computation Computation Aggregation of computation results 4 Conclusion Related Work Goals Current Status Future Work Thanks Meiklejohn (Basho / SyncFree) CRDT computation October 28, 2014 2 / 42

Slide 3

Slide 3 text

Motivation Riak is a Dynamo-inspired key-value store Querying by key main mechanism for data storage and retrieval Three mechanisms presently for more expressive data access: MapReduce-like system, secondary indexing, integration with Apache Solr Each additional mechanism contains drawbacks Mechanisms are not fault-tolerant Structure is rigid: need to know schema when storing data Not possible to perform composition of queries Meiklejohn (Basho / SyncFree) CRDT computation October 28, 2014 3 / 42

Slide 4

Slide 4 text

The Goal Building large scale distributed applications with strong properties Users want to be able to compute with their data in an efficient and composable manner while guaranteeing strong properties Provide a framework for building eventually consistent materialized views that have strong convergence properties Data types which have strong convergence properties Deterministic language to compose data types and preserve these strong convergence properties Meiklejohn (Basho / SyncFree) CRDT computation October 28, 2014 4 / 42

Slide 5

Slide 5 text

Derflow Overview Published at ACM SIGPLAN Erlang Workshop ’14 Distributed deterministic dataflow programming Relies on a single-assignment variable store Built on top of Riak Core Programs execute locally; operate on a remote replicated data store Built on top of ets, Erlang Term Storage Meiklejohn (Basho / SyncFree) CRDT computation October 28, 2014 5 / 42

Slide 6

Slide 6 text

Language Extensions Streams Laziness (non-strict evaluation) Meiklejohn (Basho / SyncFree) CRDT computation October 28, 2014 6 / 42

Slide 7

Slide 7 text

Core Derflow Semantics declare(): declare a new dataflow variable bind(xi , vi ): bind a dataflow variable to a value read(xi ): read a dataflow variable spawn(...): introduce concurrency Meiklejohn (Basho / SyncFree) CRDT computation October 28, 2014 7 / 42

Slide 8

Slide 8 text

Core Derflow Semantics Simple example {ok, Id} = derflow:declare(), {ok, _} = derflow:bind(Id, 1), {ok, Value1, _} = derflow:read(Id), error = derflow:bind(Id, 2), {ok, Value2, _} = derflow:read(Id), {ok, Value1, Value2}. Meiklejohn (Basho / SyncFree) CRDT computation October 28, 2014 8 / 42

Slide 9

Slide 9 text

Concurrent map example Concurrent map example concurrent_map(S1, M, F, S2) -> case derflow:consume(S1) of {nil, _} -> derflow:bind(S2, nil); {Value, Next} -> {id, NextOutput} = derflow:extend(S2), spawn(derflow, bind, [S2, M, F, Value]), concurrent_map(Next, F, NextOutput) end. Meiklejohn (Basho / SyncFree) CRDT computation October 28, 2014 9 / 42

Slide 10

Slide 10 text

Derflow Distribution Model Programs run locally on the client. Variables are hashed and replicated across the cluster. Programs perform a roundtrip to access each variable at an available replica. Replication factor is default at n = 3. Figure : Ring with 32 partitions and 3 nodes Meiklejohn (Basho / SyncFree) CRDT computation October 28, 2014 10 / 42

Slide 11

Slide 11 text

DerflowL Generalizes the model from single-assignment variables to lattices Provides new program registration mechanisms for locality Provides new distribution models Provides a threshold read primitive Provides a mechanism for reading results of programs from the cluster Composition of programs Extraction into a core language model separate from persistence QuickCheck model for verification of language semantics Meiklejohn (Basho / SyncFree) CRDT computation October 28, 2014 11 / 42

Slide 12

Slide 12 text

Generalizing to lattices Single-assignment is a case of a lattice. Unbound to bound. Generalize this to lattices of which inflations are performed on. State-based CRDTs only. Meiklejohn (Basho / SyncFree) CRDT computation October 28, 2014 12 / 42

Slide 13

Slide 13 text

Lattice Example Lattice Example {ok, ObjectSetStream} = derflow:declare(), {ok, ObjectSetId} = derflow:declare(riak_dt_gset), ObjectSetFun = fun(X) -> {ok, Set0, _} = derflow:read(ObjectSetId), {ok, Set} = riak_dt_gset:update({add, X}, undefined, Set0), {ok, _} = derflow:bind(ObjectSetId, Set), Set end, derflow:thread(?MODULE, consumer, [ObjectStream, ObjectSetFun, ObjectSetStream]), Meiklejohn (Basho / SyncFree) CRDT computation October 28, 2014 13 / 42

Slide 14

Slide 14 text

Threshold Read Similar to the threshold read provided by LVars. Partial function which is undefined until threshold met. Returns value supplied to the read, regardless of state. Meiklejohn (Basho / SyncFree) CRDT computation October 28, 2014 14 / 42

Slide 15

Slide 15 text

Threshold Read Example Threshold Read Example spawn(fun() -> Me ! derflow:read(GSetId, [1,2,3,4]) end), {ok, _} = derflow:bind(GSetId, [1, 2, 3, 4]), {ok, _} = derflow:bind(GSetId, [1, 2, 3, 4, 5]), GSet2 = receive {ok, [1, 2, 3, 4], _} = V1 -> V1 end, Meiklejohn (Basho / SyncFree) CRDT computation October 28, 2014 15 / 42

Slide 16

Slide 16 text

Extraction of Language Semantics Extraction of language semantics into a core library Library extracts over the distribution and data storage layer Allows for distribution over ETS, Riak, and SyncFree reference platform Important for testing out different models; SyncFree reference platform has no replication; variables distributed Riak has replication and persistence ETS, as implemented, is replicated, but not persisted QuickChecked model Meiklejohn (Basho / SyncFree) CRDT computation October 28, 2014 16 / 42

Slide 17

Slide 17 text

DerflowL Programs Allows composition CRDTs Enforces inflationary updates between input and output CRDT Output CRDT: partial order, inflationary updates Mechanism for creating a view over the database Treat programs as values Sequential composition of programs Meiklejohn (Basho / SyncFree) CRDT computation October 28, 2014 17 / 42

Slide 18

Slide 18 text

Pure-δ State-Based CRDT Program output is a Pure-δ State-Based CRDT Update function can produce a value in partial order, without existing state Example: G-counter needs to reference current state Value is merged into current state Contains additional causal information in history: Values observed in computation of function Value contributing to result of function Query returns current state Meiklejohn (Basho / SyncFree) CRDT computation October 28, 2014 18 / 42

Slide 19

Slide 19 text

Computation functions Figure : Stream of input CRDTs Meiklejohn (Basho / SyncFree) CRDT computation October 28, 2014 19 / 42

Slide 20

Slide 20 text

Computation functions Figure : Stream of input CRDTs with pure transformation applied Meiklejohn (Basho / SyncFree) CRDT computation October 28, 2014 20 / 42

Slide 21

Slide 21 text

Computation functions Figure : Stream of input CRDTs with pure transformation applied; merged with current state Meiklejohn (Basho / SyncFree) CRDT computation October 28, 2014 21 / 42

Slide 22

Slide 22 text

Dynamo System Model Data partitioned across replica sets Disjoint replica sets Replicas will eventually be equivalent Replicas written to and read from using quorums Meiklejohn (Basho / SyncFree) CRDT computation October 28, 2014 22 / 42

Slide 23

Slide 23 text

Replica progress Figure : Values are written to quorums of replicas; with some divergence Meiklejohn (Basho / SyncFree) CRDT computation October 28, 2014 23 / 42

Slide 24

Slide 24 text

Replica progress Figure : Values are written to quorums of replicas; with some divergence Meiklejohn (Basho / SyncFree) CRDT computation October 28, 2014 24 / 42

Slide 25

Slide 25 text

Replica progress Figure : Values are written to quorums of replicas; with some divergence; quorum operations increase fault-tolerance and recency Meiklejohn (Basho / SyncFree) CRDT computation October 28, 2014 25 / 42

Slide 26

Slide 26 text

Replica progress Figure : Values are written to quorums of replicas; with some divergence; quorum operations increase fault-tolerance and recency; apply read-repair Meiklejohn (Basho / SyncFree) CRDT computation October 28, 2014 26 / 42

Slide 27

Slide 27 text

Anti-entropy On reads, apply read-repair mechanisms Use version vector to track divergence and repair via anti-entropy Meiklejohn (Basho / SyncFree) CRDT computation October 28, 2014 27 / 42

Slide 28

Slide 28 text

Sequential composition Pure δ-CRDTs also allow composition Analyze dependencies; inform evaluation strategy Program rewriting in terms of existing programs Meiklejohn (Basho / SyncFree) CRDT computation October 28, 2014 28 / 42

Slide 29

Slide 29 text

Aggregation of results Contact covering set of replicas across nodes This can be increased to improve fault-tolerance Merge replicas Sum between disjoint replica sets Computation in replica sets can be restarted Meiklejohn (Basho / SyncFree) CRDT computation October 28, 2014 29 / 42

Slide 30

Slide 30 text

Aggregation of results Figure : Disjoint replica sets Meiklejohn (Basho / SyncFree) CRDT computation October 28, 2014 30 / 42

Slide 31

Slide 31 text

Aggregation of results Figure : Covering set of replicas from each disjoint set Meiklejohn (Basho / SyncFree) CRDT computation October 28, 2014 31 / 42

Slide 32

Slide 32 text

Aggregation of results Figure : Aggregation of results using associative, commutative sum Meiklejohn (Basho / SyncFree) CRDT computation October 28, 2014 32 / 42

Slide 33

Slide 33 text

Aggregation of results Figure : Fault-tolerant r = 2 covering set Meiklejohn (Basho / SyncFree) CRDT computation October 28, 2014 33 / 42

Slide 34

Slide 34 text

Aggregation of results Figure : Fault-tolerant r = 3 covering set Meiklejohn (Basho / SyncFree) CRDT computation October 28, 2014 34 / 42

Slide 35

Slide 35 text

Aggregation of results Figure : Fault-tolerant r = 3 covering set; merge of state Meiklejohn (Basho / SyncFree) CRDT computation October 28, 2014 35 / 42

Slide 36

Slide 36 text

Aggregation of results Figure : Fault-tolerant r = 3 covering set; merge of state, aggregation Meiklejohn (Basho / SyncFree) CRDT computation October 28, 2014 36 / 42

Slide 37

Slide 37 text

Related Work Spark; lineage-tracking, requires coordination Optimize for requesting more replicas up front; track causality Meiklejohn (Basho / SyncFree) CRDT computation October 28, 2014 37 / 42

Slide 38

Slide 38 text

Goals Derflow allows for creation of programs which create "views’ Dynamo-style quorum read/write Harvest vs. yield tradeoff Can be repaired via anti-entropy and read-repair Proactively maintained Sequential composition Meiklejohn (Basho / SyncFree) CRDT computation October 28, 2014 38 / 42

Slide 39

Slide 39 text

Current Status Prototype implementation of programming model Allows for creation of programs which create "views’ These views have the following properties: Dynamo-style quorum read/write Harvest vs. yield tradeoff Can be repaired via anti-entropy and read-repair Proactively maintained Sequential composition Mapping between CRDTs is explicit Test suite and example applications Meiklejohn (Basho / SyncFree) CRDT computation October 28, 2014 39 / 42

Slide 40

Slide 40 text

Future Work Hygene analysis for programs to ensure determinism More implicit composition of CRDTs (SyncFree) Higher-level language extension Meiklejohn (Basho / SyncFree) CRDT computation October 28, 2014 40 / 42

Slide 41

Slide 41 text

Thanks Carlos Baquero Peter Van Roy Marc Shapiro Nuno Pregucia Annette Bieniusa Manuel Bravo Meiklejohn (Basho / SyncFree) CRDT computation October 28, 2014 41 / 42

Slide 42

Slide 42 text

The SyncFree Project Derflow on GitHub: https://github.com/cmeiklejohn/derflow SyncFree on GitHub: https://github.com/syncfree Project overview: https://syncfree.lip6.fr Research publications available: https://syncfree.lip6.fr/index.php/publications Meiklejohn (Basho / SyncFree) CRDT computation October 28, 2014 42 / 42