SyncFree: Large Scale Computation without Synchronization

SyncFree Large-Scale Computation Without Synchronization Annette Bieniusa and Christopher Meiklejohn
University of Kaiserslautern and Basho Technologies, Inc. October 28, 2014

Outline Motivation CRDTs - a success story Challenges Atomic updates
Divergence control Computations Optimizations Programming models Provably correct! Up and running! Conclusion - Where to go from here?

Motivation

CRDTs - a success story

Replicated Data Types Typically key-value stores operate with opaque objects
Problem: identifying and resolving concurrent operations Even worse in multi-DC settings Semantic resolution Need to be provided by the application server / client Dynamo; deterministic, but not intuitive

CRDTs in Riak 2.0 Conflict-Free/Confluent/Commutative/Convergent Replicated Data Types Library riak
dt implements different types of state-based CRDTs: Counters (G-Counter, PN-Counter) Sets (G-Set, 2P-Set, OR-Set, ORSWOT) Flags Registers (LWW-Register, MV-Register) Maps Subset exposed in Riak 2.0 Concurrent updates are merged following principled techniques All problems solved?!

The SyncFree Project EU Project on large-scale computation without synchronization
Project consortium of academic and industry partners

Challenges

Atomic updates

Purchasing items Scenario: Virtual wallet User can exchange (virtual) currency
for vouchers, game items, ... Operation should be atomic No money lost! No voucher used twice! How can we achieve this under eventual consistency?

Technology: CRDT Composition Compose CRDTs that are to be updated
together Ad-hoc solutions are error prone Map CRDT allows to compose CRDT objects via embedding Guarantees atomic update But: Deep embedding can lead to large objects

Technology: Transactions Transactions with weak, yet helpful guarantees such as
causal consistency Snapshot reads allow for consistent observation of objects Allows for atomic and dynamic combinations of updates across many objects Needs careful engineering to have well-behaved metadata while being fault-tolerant

Divergence control

Handing out limited resources Scenario: (Shared) Virtual wallet User should
not spend more money than she has on her account Balance checking and reducing would require global synchronization operations such as 2PC Impossible under network partitions!

Bounded divergence Scenario: Ad counting Advertisement should be displayed a
limited number of times to users in a certain area / country Keeping track of how often it is displayed requires counters to deal with high contention Estimated count of delivered ads should not diverge too much from actual number But exact number is not necessary

Technology: Bounded CRDTs Idea: Extend replicas of the shared data
item with leases / reservations / escrow Pro-actively distribute them among the replicas Fast, local operations possible when reservation is locally available Allocation of leases in the background using strongly consistent operations Precise on bound

Technology: Adaptive CRDTs Orthogonal technique Applied as optimization on top
of bounded CRDTs Adaptive CRDTs restrict divergence by reducing the number of replicas Adapting replication schemes probabilistically, over time, according to usage patterns, ... Changing the number of replicas or moving requires coordination Reduces divergence, impacts availability

Computations

Computation Scenario: Leaderboard Database of users playing a game; compute
the top 10 by score Matchmaking between cohorts by rank Aggregate data from all replicas of all objects across multiple DCs Current approaches are ad hoc

Technology: Deterministic Dataﬂow Idea: Connect CRDTs together in a mechanism
which preserves their strong properties Eventual consistency applied to computations Diﬀerent evaluation strategies (previously discussed in Chris Meiklejohn’s talk)

Optimizations

Reducing bandwidth Carlos Baquero’s talk tomorrow Forward only operations and
replay them at the other replicas (POLog) State-based optimizations (δ-CRDT) Keep metadata size small and well-behaved

Programming models

How to make use of CRDTs Need some way to
employ CRDTs in applications State-of-art: Simple operational interface for updates and queries But CRDT semantics can lead to much more powerful programming methodology Account for (static) analyses and tools (correct by construction)

Deterministic dataﬂow programming Methodology for programming with CRDTs Fault-tolerant, replicated
application code Applications should be correct under any execution

Provably correct!

Theoretical Models Abstracting from real-world mess Supporting programmers to reason
about distributed systems Analyses to test and/or verify the correctness of applications Verifying applications on top of architectures with replication is challenging Models on diﬀerent levels - from core libraries to full applications

Example: Observed-Remove Set Speciﬁcation: Remove operation deletes only elements from
the set that have been observed at the replica issuing the remove op When concurrently adding the element (again), it will remain in the set

Example: Observed-Remove Set

A Formal Model for CRDTs Semantics for CRDTs based on
causal history Building on states, operations and merges System abstraction: Reliable messages between replicating nodes All speciﬁcations encoded in theorem proverIsabelle/HOL and proved correct

A Formal Model for Applications Build specifications for applications and
use cases (TLA+ = model checker for Temporal Logic of Actions) Virtual wallet Ad counter Leaderboard Encode application invariants in the specification No loss of money Positive balance Verify invariants hold by using a model checker Verifies both individual CRDTs and interaction between CRDTs

Up and running!

The SyncFree Platform Experimental platform Development, veriﬁcation, and evaluation of
algorithms for update propagation Network architecture: Replicated DCs + app servers to which clients connect Core features: Fault-tolerance, scalability, modularity Written in Erlang with riak core

Riak Build prototypes which can operate on top of Riak
Compare new approaches to existing ad hoc approaches Leverage industrial use cases to drive academic research

Real-world Evaluation Existing use cases are driven from actual use
cases at Rovio, Trifork, ... Future evaluation will be based on (1 -2 years): Ease of development of correct applications Understandability of developed programs Testing applications at scale with real traﬃc

Outlook Extending to the mobile world: Support for oﬄine mode
in clients Reduce bandwidth by employing partial replication, sharding, δ-mutations, ... Moving out of the cloud: Gossiping updates in P2P scenarios

Conclusion - Where to go from here?

The SyncFree Project SyncFree on GitHub: https://github.com/syncfree Project overview: https://syncfree.lip6.fr
Research publications available: https://syncfree.lip6.fr/index.php/publications

SyncFree: Large Scale Computation without Synch...

SyncFree: Large Scale Computation without Synchronization

More Decks by Christopher Meiklejohn

Other Decks in Programming

Featured

Transcript