Slide 1

Slide 1 text

SyncFree Large-Scale Computation Without Synchronization Annette Bieniusa and Christopher Meiklejohn University of Kaiserslautern and Basho Technologies, Inc. October 28, 2014

Slide 2

Slide 2 text

Outline Motivation CRDTs - a success story Challenges Atomic updates Divergence control Computations Optimizations Programming models Provably correct! Up and running! Conclusion - Where to go from here?

Slide 3

Slide 3 text

Motivation

Slide 4

Slide 4 text

CRDTs - a success story

Slide 5

Slide 5 text

Replicated Data Types Typically key-value stores operate with opaque objects Problem: identifying and resolving concurrent operations Even worse in multi-DC settings Semantic resolution Need to be provided by the application server / client Dynamo; deterministic, but not intuitive

Slide 6

Slide 6 text

CRDTs in Riak 2.0 Conflict-Free/Confluent/Commutative/Convergent Replicated Data Types Library riak dt implements different types of state-based CRDTs: Counters (G-Counter, PN-Counter) Sets (G-Set, 2P-Set, OR-Set, ORSWOT) Flags Registers (LWW-Register, MV-Register) Maps Subset exposed in Riak 2.0 Concurrent updates are merged following principled techniques All problems solved?!

Slide 7

Slide 7 text

The SyncFree Project EU Project on large-scale computation without synchronization Project consortium of academic and industry partners

Slide 8

Slide 8 text

Challenges

Slide 9

Slide 9 text

Atomic updates

Slide 10

Slide 10 text

Purchasing items Scenario: Virtual wallet User can exchange (virtual) currency for vouchers, game items, ... Operation should be atomic No money lost! No voucher used twice! How can we achieve this under eventual consistency?

Slide 11

Slide 11 text

Technology: CRDT Composition Compose CRDTs that are to be updated together Ad-hoc solutions are error prone Map CRDT allows to compose CRDT objects via embedding Guarantees atomic update But: Deep embedding can lead to large objects

Slide 12

Slide 12 text

Technology: Transactions Transactions with weak, yet helpful guarantees such as causal consistency Snapshot reads allow for consistent observation of objects Allows for atomic and dynamic combinations of updates across many objects Needs careful engineering to have well-behaved metadata while being fault-tolerant

Slide 13

Slide 13 text

Divergence control

Slide 14

Slide 14 text

Handing out limited resources Scenario: (Shared) Virtual wallet User should not spend more money than she has on her account Balance checking and reducing would require global synchronization operations such as 2PC Impossible under network partitions!

Slide 15

Slide 15 text

Bounded divergence Scenario: Ad counting Advertisement should be displayed a limited number of times to users in a certain area / country Keeping track of how often it is displayed requires counters to deal with high contention Estimated count of delivered ads should not diverge too much from actual number But exact number is not necessary

Slide 16

Slide 16 text

Technology: Bounded CRDTs Idea: Extend replicas of the shared data item with leases / reservations / escrow Pro-actively distribute them among the replicas Fast, local operations possible when reservation is locally available Allocation of leases in the background using strongly consistent operations Precise on bound

Slide 17

Slide 17 text

Technology: Adaptive CRDTs Orthogonal technique Applied as optimization on top of bounded CRDTs Adaptive CRDTs restrict divergence by reducing the number of replicas Adapting replication schemes probabilistically, over time, according to usage patterns, ... Changing the number of replicas or moving requires coordination Reduces divergence, impacts availability

Slide 18

Slide 18 text

Computations

Slide 19

Slide 19 text

Computation Scenario: Leaderboard Database of users playing a game; compute the top 10 by score Matchmaking between cohorts by rank Aggregate data from all replicas of all objects across multiple DCs Current approaches are ad hoc

Slide 20

Slide 20 text

Technology: Deterministic Dataflow Idea: Connect CRDTs together in a mechanism which preserves their strong properties Eventual consistency applied to computations Different evaluation strategies (previously discussed in Chris Meiklejohn’s talk)

Slide 21

Slide 21 text

Optimizations

Slide 22

Slide 22 text

Reducing bandwidth Carlos Baquero’s talk tomorrow Forward only operations and replay them at the other replicas (POLog) State-based optimizations (δ-CRDT) Keep metadata size small and well-behaved

Slide 23

Slide 23 text

Programming models

Slide 24

Slide 24 text

How to make use of CRDTs Need some way to employ CRDTs in applications State-of-art: Simple operational interface for updates and queries But CRDT semantics can lead to much more powerful programming methodology Account for (static) analyses and tools (correct by construction)

Slide 25

Slide 25 text

Deterministic dataflow programming Methodology for programming with CRDTs Fault-tolerant, replicated application code Applications should be correct under any execution

Slide 26

Slide 26 text

Provably correct!

Slide 27

Slide 27 text

Theoretical Models Abstracting from real-world mess Supporting programmers to reason about distributed systems Analyses to test and/or verify the correctness of applications Verifying applications on top of architectures with replication is challenging Models on different levels - from core libraries to full applications

Slide 28

Slide 28 text

Example: Observed-Remove Set Specification: Remove operation deletes only elements from the set that have been observed at the replica issuing the remove op When concurrently adding the element (again), it will remain in the set

Slide 29

Slide 29 text

Example: Observed-Remove Set

Slide 30

Slide 30 text

A Formal Model for CRDTs Semantics for CRDTs based on causal history Building on states, operations and merges System abstraction: Reliable messages between replicating nodes All specifications encoded in theorem proverIsabelle/HOL and proved correct

Slide 31

Slide 31 text

A Formal Model for Applications Build specifications for applications and use cases (TLA+ = model checker for Temporal Logic of Actions) Virtual wallet Ad counter Leaderboard Encode application invariants in the specification No loss of money Positive balance Verify invariants hold by using a model checker Verifies both individual CRDTs and interaction between CRDTs

Slide 32

Slide 32 text

Up and running!

Slide 33

Slide 33 text

The SyncFree Platform Experimental platform Development, verification, and evaluation of algorithms for update propagation Network architecture: Replicated DCs + app servers to which clients connect Core features: Fault-tolerance, scalability, modularity Written in Erlang with riak core

Slide 34

Slide 34 text

Riak Build prototypes which can operate on top of Riak Compare new approaches to existing ad hoc approaches Leverage industrial use cases to drive academic research

Slide 35

Slide 35 text

Real-world Evaluation Existing use cases are driven from actual use cases at Rovio, Trifork, ... Future evaluation will be based on (1 -2 years): Ease of development of correct applications Understandability of developed programs Testing applications at scale with real traffic

Slide 36

Slide 36 text

Outlook Extending to the mobile world: Support for offline mode in clients Reduce bandwidth by employing partial replication, sharding, δ-mutations, ... Moving out of the cloud: Gossiping updates in P2P scenarios

Slide 37

Slide 37 text

Conclusion - Where to go from here?

Slide 38

Slide 38 text

The SyncFree Project SyncFree on GitHub: https://github.com/syncfree Project overview: https://syncfree.lip6.fr Research publications available: https://syncfree.lip6.fr/index.php/publications