PiterPy 2017: Collaborative CRDT-based Systems with Python for fun and profit

Groupware System with Python for fun and profit Collaborative systems
and Conflict-Free Replicated Data Types Max Klymyshyn Head of Software Architecture at Takeoff Technologies November 4, 2017 1 / 48

Work Head of Software Architecture, Takeoﬀ Technologies, 2017 CTO @
ZAKAZ.UA and CartFresh, 2012 Team Lead @ oDesk (now Upwork), 2010 Project Coordinator @ 42 Coﬀee Cups, 2009 2 / 48

Community co-organizer of PapersWeLove Kyiv co-founder of KyivJS co-founder of
LvivJS co-organizer of PyCon Ukraine co-organizer of PiterPy co-organizer of Hotcode judge at UA Web Challenge 3 / 48

Takeoﬀ Technologies eGrocery: Grocery Delivery startup with robotic warehouse 4
/ 48

Distributed systems & Groupware systems Cloud-computing Peer-to-peer networks Collaborative editing
Calendars, reminders, notes Messengers 5 / 48

Distributed setting Multiple devices per single user Collaborative work on
same entity from diﬀerent devices Oﬄine mode 6 / 48

Why Python? Site or client within web-based distributed system (websockets/pubsub->python)
State sharing within concurrently executed programs P2P systems 7 / 48

Client 1: Client n: Python Client: message1 messagen messagen message1
message1 messagen messagen+1 Figure: Python eventually is part of communication ﬂow 8 / 48

Consistency 9 / 48

Strong Consistency “strong consistency” approach serialises (i.e. sequential with no
operations overlap) updates in a global total order performance and scalability bottleneck: low throughput/high latency on transaction (because of consensus) RAFT/PAXOS/Zab conﬂicts with availability and partition-tolerance 10 / 48

Weak Consistency no read guarantees after write Read-Your-Writes hack 11
/ 48

Eventual Consistency ∀i, j : f ∈ ci ⇒ f
∈ cj, Convergence and Termination properties RIAK, MONGO, Cassandra, Couch 12 / 48

Strong Eventual Consistency ... Same as for Eventual Consistency and
∀i, j : ci = cj ⇒ Si ≡ Sj: Correct replicas that have delivered the same updates have equivalent state CALM: consistency as logical monotonicity 13 / 48

ACID 2.0 CAP theorem solution ACID 1.0: atomicity, consistency, isolation,
durability ACID 2.0: associative, commutative, idempotent, distributed 14 / 48

Distributed Setting 15 / 48

Three basic concurrency problems divergence: ∪n i=1 o1 i ̸=
∪m j=1 o2 j causality-violations: o1 → o3 intention-violations: o1||o2 16 / 48

Naive approaches Locking and Single Active Participant Transactions Tentative Transactions
Versioning/tagging Reversible Execution 17 / 48

Conﬂict-Free Replicated Data Types CRDTs 18 / 48

Network limittions Delays Packet loss Out of order delivery Duplicates/retransmission
Head-of-Line blocking 19 / 48

Conﬂict-free replicated data types (CRDT) 2011 Shapiro et al Semilattice
properties (⊔ or LUB – Least Upper Bound): commutativity – ∀x, y : x y = y ⊔ x associativity – x ⊔ (y ⊔ z) = (x ⊔ y) ⊔ z idempotency – x ⊔ x = x 20 / 48

Strong Eventual Consistency C = [c1, ..., cn], ∀i, j
: ci = cj ⇒ si ≡ sj 21 / 48

Counter 1 + 1 22 / 48

+ not idempotent 1 + 1 ̸= 1 23 /
48

∪ – sets are good 1 ∪ 1 = 1
24 / 48

Other properties of sets (1 ∪ 2) ∪ 3 =
1 ∪ (2 ∪ 3) 1 ∪ 2 = 2 ∪ 1 25 / 48

CRDT: In plain Python messages =[ {"tag": 1, "site": "1",
"payload": {"key": "val"}}, {"tag": 1, "site": "1", "payload": {"key": "val"}}, {"tag": 1, "site": "2", "payload": {"key1": "val1"}}, {"tag": 2, "site": "1", "payload": {"key2": "val2"}}, {"tag": 0, "site": "2", "payload": {"key0": "val0"}}] # idempotency, {(m["tag"], m["site"]): m for m in messages} # partial order sorted({(m["tag"], m["site"]): m for m in messages}) 26 / 48

CRDT Transport ci CRDT State (convergent) Conﬂict Resolution Semantics Application
access API Si 27 / 48

CRDT: Replication strategies pubsub + periodic full state update (STOMP
etc.) Server Sent Events (SSE) in, Websockets out 28 / 48

CRDT: GCounter Growth-only counter: Interface class GCounter(object): def constructor(site_id, counters):
pass def increment(self): pass def query(self): pass def merge(self, counter): pass 29 / 48

CRDT: GCounter payload Counter State on init counters = {
1: 0 2: 0 ... N: 0} 30 / 48

CRDT: GCounter Growth-only Counter class GCounter(object): def __init__(self, site_id, counters=None):
if counters is not None and site_id not in counters: counters[site_id] = 0 self.site_id = site_id self.counters = counters or {site_id: 0} def increment(self): self.counters[self.site_id] += 1 def query(self): return sum(self.counters.values()) def merge(self, replica): for site_id in replica.keys(): self.counters[site_id] = max( replica[site_id], self.counters.get(site_id, 0)) 31 / 48

CRDT: GCounter-test Increment only counter linenos def pncounter_state(prefix, *args): linenos
states = [] linenos for n, pncounter in enumerate(args): linenos states.append("sid:%s=%d" % (pncounter.site_id, pncounter.query())) linenos print("[%s] %s" % (prefix, ", ".join(states))) linenos linenos def test_pncounter(): linenos c1, c2, c3 = PNCounter("s1"), PNCounter("s2"), PNCounter("s3") linenos pncounter_state("INITIAL", c1, c2, c3) linenos c1.increment() linenos c1.increment() linenos pncounter_state("C1++ * 2", c1, c2, c3) linenos c1.decrement() linenos pncounter_state("C1--", c1, c2, c3) linenos c2.merge(c1.counters), c3.merge(c1.counters) linenos pncounter_state("MERGE C2 <- C1, C3 <- C1", c1, c2, c3) 32 / 48

CRDT: GCounter example output linenos [INITIAL] sid:s1=0, sid:s2=0, sid:s3=0 linenos
[C1++] sid:s1=1, sid:s2=0, sid:s3=0 linenos [MERGE C2 <- C1, C3 <- C1] sid:s1=1, sid:s2=1, sid:s3=1 linenos [INITIAL] sid:s1=0, sid:s2=0, sid:s3=0 linenos [C1++ & C3++] sid:s1=1, sid:s2=0, sid:s3=1 linenos [MERGE C2 <- C1] sid:s1=1, sid:s2=1, sid:s3=1 linenos [MERGE C1 <- C3, C3 <- C1] sid:s1=2, sid:s2=1, sid:s3=2 33 / 48

CRDT: PNCounter Positive-Negative Counter DEFAULT = lambda: {"p": 0, "n":
0} class PNCounter(object): def __init__(self, site_id, counters=None): if counters is not None and site_id not in counters: counters[site_id] = DEFAULT() self.site_id = site_id self.counters = counters or {site_id: DEFAULT()} def increment(self): self.counters[self.site_id]["p"] += 1 def decrement(self): self.counters[self.site_id]["n"] += 1 def query(self): p = [r["p"] for r in self.counters.values()] n = [r["n"] for r in self.counters.values()] return sum(p) - sum(n) 34 / 48

CRDT: PNCounter Test Output linenos [INITIAL] sid:s1=0, sid:s2=0, sid:s3=0 linenos
[C1++ * 2] sid:s1=2, sid:s2=0, sid:s3=0 linenos [C1--] sid:s1=1, sid:s2=0, sid:s3=0 linenos [MERGE C2 <- C1, C3 <- C1] sid:s1=1, sid:s2=1, sid:s3=1 linenos [INITIAL] sid:s1=0, sid:s2=0, sid:s3=0 linenos [C1++ & C3--] sid:s1=1, sid:s2=0, sid:s3=-1 linenos [MERGE C2 <- C1] sid:s1=1, sid:s2=1, sid:s3=-1 linenos [MERGE C1 <- C3, C3 <- C1] sid:s1=0, sid:s2=1, sid:s3=0 35 / 48

CRDT: MVRegister Multi-Value Register class MVRegister(object): def __init__(self, site_id, register=None):
self.site_id, self.register = site_id, register or {} def set(self, value): self.register = {self.site_id: value} def query(self): return set(self.register.values()) def merge(self, replica): self.register.update({k: v for k, v in replica.items() if k != self.site_id}) 36 / 48

CRDT: MVRegister Test def mvregister_state(prefix, *args): states = [] for
n, mvregister in enumerate(args): states.append("sid:%s=%r" % (mvregister.site_id, mvregister.query())) print("[%s] %s" % (prefix, ", ".join(states))) def test_mvregister(): c1, c2, c3 = MVRegister("s1"), MVRegister("s2"), MVRegister("s3") mvregister_state("INITIAL", c1, c2, c3) c1.set("v1") mvregister_state("C1 := v1", c1, c2, c3) c3.set("v3") mvregister_state("C3 := v3", c1, c2, c3) c2.merge(c1.register), c3.merge(c1.register) mvregister_state("MERGE C2 <- C1, C3 <- C1", c1, c2, c3) 37 / 48

CRDT: MVRegister Output [INITIAL] sid:s1=set([]), sid:s2=set([]), sid:s3=set([]) [C1 := v1]
sid:s1=set(['v1']), sid:s2=set([]), sid:s3=set([]) [C3 := v3] sid:s1=set(['v1']), sid:s2=set([]), sid:s3=set(['v3']) [MERGE C2 <- C1, C3 <- C1] sid:s1=set(['v1']), sid:s2=set(['v1']), sid:s3=set(['v1', 'v3']) 38 / 48

CRDT: MVRegister Figure S1 S2 S3 set(“v1”) “v1” set(“v3”) “v1”,
“v3” Figure: MVRegister Concurrent Operation 39 / 48

JSON CRDT A Conﬂict-Free Replicated JSON Datatype (August 15, 2017)
Replica p: Replica q: {“key”: “A”} {“key”: “A”} {“key”: “B”} {“key”: “C”} {“key”: {“B”, “C”}} {“key”: {“B”, “C”}} network communication doc.get(“key”) := “B”; doc.get(“key”) := “C”; Figure: Concurrent assignment to the register at doc.get(“key”) by replicas p and q. 40 / 48

JSON CRDT Concurrent mutation on same key with diﬀerent types
{} {“a”: {}} {“a”: {“x”: “y”}} {mapT(“a”): {“x”: “y”}, listT(“a”): [“z”]} {} {“a”: []} {“a”: [“z”]} {mapT(“a”): {“x”: “y”}, listT(“a”): [“z”]} network communication doc.get(“a”) := {}; doc.get(“a”).get(“x”) := “y”; doc.get(“a”) := []; doc.get(“a”).idx(0).insertAfter(“z”); Figure: Concurrently assigning values of diﬀerent types to the same map key. 41 / 48

CRDT Shopping Cart 2017: redesign of Shopping Cart API 42
/ 48

CRDTs: CvRDT, CmRDT state-based and op-based replication Figure: State-based Convergent
Replicated Data Type or CvRDT Figure: Op-based Commutative Replicated Data Type or CmRDT 43 / 48

CRDT: Types Registers, Counters, Sets Register: LWW or Multi-Value (Dynamo
or Couchdb-like) Counter (growth-only) and Counter w/decrementing G-Set – growth-only set 2P-Set – remove only once set (G-Set + Tombstones set) LWW-Element-Set – vector clocks OR-Set – unique-tagged elements and list of tags within Tombstones set WOOT, LOGOOT, Treedoc, RGA, LSEQ for ordered lists 44 / 48

Tools CRDT Toolbox CRDT-py Link.crdt Roshi by Soundcloud Riak 2.0:
Counters, Flags, Sets, Registers, Maps Redis Labs CRDT Y-js – framework for oﬄine-ﬁrst p2p shared editing on structured data Swarm (and forever-in-pre-alpha tool) replikativ.io – p2p distributed system framework GUN framework: p2p distributed framework 45 / 48

Own CRDT Data Type + Not too much code +
Relatively easy to extend + Works in a predictable way (if you keep semilattice props) + Very stable within any network quality setting + Oﬄine mode for free - Hard to change serializability paradigm to distributed setting - Hard to verify custom conﬂicts resolution - Hard to build garbage collection - Need to use additional techniques to work with high-throughput dataset (δ-mutation) 46 / 48

Who is using it? Facebook TomTom League of Legends SoundCloud
Bet265 RIAK Distributed Database 47 / 48

Thanks @maxmaxmaxmax 48 / 48

PiterPy 2017: Collaborative CRDT-based Systems ...

PiterPy 2017: Collaborative CRDT-based Systems with Python for fun and profit

More Decks by Max Klymyshyn

Other Decks in Technology

Featured

Transcript