Derflow DerflowL 3 Distributed Computation Computation Aggregation of computation results 4 Conclusion Related Work Goals Current Status Future Work Thanks Meiklejohn (Basho / SyncFree) CRDT computation October 28, 2014 2 / 42
main mechanism for data storage and retrieval Three mechanisms presently for more expressive data access: MapReduce-like system, secondary indexing, integration with Apache Solr Each additional mechanism contains drawbacks Mechanisms are not fault-tolerant Structure is rigid: need to know schema when storing data Not possible to perform composition of queries Meiklejohn (Basho / SyncFree) CRDT computation October 28, 2014 3 / 42
Users want to be able to compute with their data in an efficient and composable manner while guaranteeing strong properties Provide a framework for building eventually consistent materialized views that have strong convergence properties Data types which have strong convergence properties Deterministic language to compose data types and preserve these strong convergence properties Meiklejohn (Basho / SyncFree) CRDT computation October 28, 2014 4 / 42
deterministic dataflow programming Relies on a single-assignment variable store Built on top of Riak Core Programs execute locally; operate on a remote replicated data store Built on top of ets, Erlang Term Storage Meiklejohn (Basho / SyncFree) CRDT computation October 28, 2014 5 / 42
, vi ): bind a dataflow variable to a value read(xi ): read a dataflow variable spawn(...): introduce concurrency Meiklejohn (Basho / SyncFree) CRDT computation October 28, 2014 7 / 42
are hashed and replicated across the cluster. Programs perform a roundtrip to access each variable at an available replica. Replication factor is default at n = 3. Figure : Ring with 32 partitions and 3 nodes Meiklejohn (Basho / SyncFree) CRDT computation October 28, 2014 10 / 42
new program registration mechanisms for locality Provides new distribution models Provides a threshold read primitive Provides a mechanism for reading results of programs from the cluster Composition of programs Extraction into a core language model separate from persistence QuickCheck model for verification of language semantics Meiklejohn (Basho / SyncFree) CRDT computation October 28, 2014 11 / 42
Unbound to bound. Generalize this to lattices of which inflations are performed on. State-based CRDTs only. Meiklejohn (Basho / SyncFree) CRDT computation October 28, 2014 12 / 42
Partial function which is undefined until threshold met. Returns value supplied to the read, regardless of state. Meiklejohn (Basho / SyncFree) CRDT computation October 28, 2014 14 / 42
core library Library extracts over the distribution and data storage layer Allows for distribution over ETS, Riak, and SyncFree reference platform Important for testing out different models; SyncFree reference platform has no replication; variables distributed Riak has replication and persistence ETS, as implemented, is replicated, but not persisted QuickChecked model Meiklejohn (Basho / SyncFree) CRDT computation October 28, 2014 16 / 42
and output CRDT Output CRDT: partial order, inflationary updates Mechanism for creating a view over the database Treat programs as values Sequential composition of programs Meiklejohn (Basho / SyncFree) CRDT computation October 28, 2014 17 / 42
Update function can produce a value in partial order, without existing state Example: G-counter needs to reference current state Value is merged into current state Contains additional causal information in history: Values observed in computation of function Value contributing to result of function Query returns current state Meiklejohn (Basho / SyncFree) CRDT computation October 28, 2014 18 / 42
sets Replicas will eventually be equivalent Replicas written to and read from using quorums Meiklejohn (Basho / SyncFree) CRDT computation October 28, 2014 22 / 42
replicas; with some divergence; quorum operations increase fault-tolerance and recency Meiklejohn (Basho / SyncFree) CRDT computation October 28, 2014 25 / 42
replicas; with some divergence; quorum operations increase fault-tolerance and recency; apply read-repair Meiklejohn (Basho / SyncFree) CRDT computation October 28, 2014 26 / 42
This can be increased to improve fault-tolerance Merge replicas Sum between disjoint replica sets Computation in replica sets can be restarted Meiklejohn (Basho / SyncFree) CRDT computation October 28, 2014 29 / 42
Dynamo-style quorum read/write Harvest vs. yield tradeoff Can be repaired via anti-entropy and read-repair Proactively maintained Sequential composition Meiklejohn (Basho / SyncFree) CRDT computation October 28, 2014 38 / 42
of programs which create "views’ These views have the following properties: Dynamo-style quorum read/write Harvest vs. yield tradeoff Can be repaired via anti-entropy and read-repair Proactively maintained Sequential composition Mapping between CRDTs is explicit Test suite and example applications Meiklejohn (Basho / SyncFree) CRDT computation October 28, 2014 39 / 42