Upgrade to Pro — share decks privately, control downloads, hide ads and more …

PiterPy 2017: Collaborative CRDT-based Systems ...

Max Klymyshyn
November 04, 2017

PiterPy 2017: Collaborative CRDT-based Systems with Python for fun and profit

A detailed overview of CRDT design and practical examples of simple CRDT data types in Python. Some examples taken from https://arxiv.org/abs/1608.03960 and https://hal.inria.fr/inria-00609399v1/document

Max Klymyshyn

November 04, 2017
Tweet

More Decks by Max Klymyshyn

Other Decks in Technology

Transcript

  1. Groupware System with Python for fun and profit Collaborative systems

    and Conflict-Free Replicated Data Types Max Klymyshyn Head of Software Architecture at Takeoff Technologies November 4, 2017 1 / 48
  2. Work Head of Software Architecture, Takeoff Technologies, 2017 CTO @

    ZAKAZ.UA and CartFresh, 2012 Team Lead @ oDesk (now Upwork), 2010 Project Coordinator @ 42 Coffee Cups, 2009 2 / 48
  3. Community co-organizer of PapersWeLove Kyiv co-founder of KyivJS co-founder of

    LvivJS co-organizer of PyCon Ukraine co-organizer of PiterPy co-organizer of Hotcode judge at UA Web Challenge 3 / 48
  4. Distributed setting Multiple devices per single user Collaborative work on

    same entity from different devices Offline mode 6 / 48
  5. Why Python? Site or client within web-based distributed system (websockets/pubsub->python)

    State sharing within concurrently executed programs P2P systems 7 / 48
  6. Client 1: Client n: Python Client: message1 messagen messagen message1

    message1 messagen messagen+1 Figure: Python eventually is part of communication flow 8 / 48
  7. Strong Consistency “strong consistency” approach serialises (i.e. sequential with no

    operations overlap) updates in a global total order performance and scalability bottleneck: low throughput/high latency on transaction (because of consensus) RAFT/PAXOS/Zab conflicts with availability and partition-tolerance 10 / 48
  8. Eventual Consistency ∀i, j : f ∈ ci ⇒ f

    ∈ cj, Convergence and Termination properties RIAK, MONGO, Cassandra, Couch 12 / 48
  9. Strong Eventual Consistency ... Same as for Eventual Consistency and

    ∀i, j : ci = cj ⇒ Si ≡ Sj: Correct replicas that have delivered the same updates have equivalent state CALM: consistency as logical monotonicity 13 / 48
  10. ACID 2.0 CAP theorem solution ACID 1.0: atomicity, consistency, isolation,

    durability ACID 2.0: associative, commutative, idempotent, distributed 14 / 48
  11. Three basic concurrency problems divergence: ∪n i=1 o1 i ̸=

    ∪m j=1 o2 j causality-violations: o1 → o3 intention-violations: o1||o2 16 / 48
  12. Conflict-free replicated data types (CRDT) 2011 Shapiro et al Semilattice

    properties (⊔ or LUB – Least Upper Bound): commutativity – ∀x, y : x y = y ⊔ x associativity – x ⊔ (y ⊔ z) = (x ⊔ y) ⊔ z idempotency – x ⊔ x = x 20 / 48
  13. Other properties of sets (1 ∪ 2) ∪ 3 =

    1 ∪ (2 ∪ 3) 1 ∪ 2 = 2 ∪ 1 25 / 48
  14. CRDT: In plain Python messages =[ {"tag": 1, "site": "1",

    "payload": {"key": "val"}}, {"tag": 1, "site": "1", "payload": {"key": "val"}}, {"tag": 1, "site": "2", "payload": {"key1": "val1"}}, {"tag": 2, "site": "1", "payload": {"key2": "val2"}}, {"tag": 0, "site": "2", "payload": {"key0": "val0"}}] # idempotency, {(m["tag"], m["site"]): m for m in messages} # partial order sorted({(m["tag"], m["site"]): m for m in messages}) 26 / 48
  15. CRDT: Replication strategies pubsub + periodic full state update (STOMP

    etc.) Server Sent Events (SSE) in, Websockets out 28 / 48
  16. CRDT: GCounter Growth-only counter: Interface class GCounter(object): def constructor(site_id, counters):

    pass def increment(self): pass def query(self): pass def merge(self, counter): pass 29 / 48
  17. CRDT: GCounter Growth-only Counter class GCounter(object): def __init__(self, site_id, counters=None):

    if counters is not None and site_id not in counters: counters[site_id] = 0 self.site_id = site_id self.counters = counters or {site_id: 0} def increment(self): self.counters[self.site_id] += 1 def query(self): return sum(self.counters.values()) def merge(self, replica): for site_id in replica.keys(): self.counters[site_id] = max( replica[site_id], self.counters.get(site_id, 0)) 31 / 48
  18. CRDT: GCounter-test Increment only counter linenos def pncounter_state(prefix, *args): linenos

    states = [] linenos for n, pncounter in enumerate(args): linenos states.append("sid:%s=%d" % (pncounter.site_id, pncounter.query())) linenos print("[%s] %s" % (prefix, ", ".join(states))) linenos linenos def test_pncounter(): linenos c1, c2, c3 = PNCounter("s1"), PNCounter("s2"), PNCounter("s3") linenos pncounter_state("INITIAL", c1, c2, c3) linenos c1.increment() linenos c1.increment() linenos pncounter_state("C1++ * 2", c1, c2, c3) linenos c1.decrement() linenos pncounter_state("C1--", c1, c2, c3) linenos c2.merge(c1.counters), c3.merge(c1.counters) linenos pncounter_state("MERGE C2 <- C1, C3 <- C1", c1, c2, c3) 32 / 48
  19. CRDT: GCounter example output linenos [INITIAL] sid:s1=0, sid:s2=0, sid:s3=0 linenos

    [C1++] sid:s1=1, sid:s2=0, sid:s3=0 linenos [MERGE C2 <- C1, C3 <- C1] sid:s1=1, sid:s2=1, sid:s3=1 linenos [INITIAL] sid:s1=0, sid:s2=0, sid:s3=0 linenos [C1++ & C3++] sid:s1=1, sid:s2=0, sid:s3=1 linenos [MERGE C2 <- C1] sid:s1=1, sid:s2=1, sid:s3=1 linenos [MERGE C1 <- C3, C3 <- C1] sid:s1=2, sid:s2=1, sid:s3=2 33 / 48
  20. CRDT: PNCounter Positive-Negative Counter DEFAULT = lambda: {"p": 0, "n":

    0} class PNCounter(object): def __init__(self, site_id, counters=None): if counters is not None and site_id not in counters: counters[site_id] = DEFAULT() self.site_id = site_id self.counters = counters or {site_id: DEFAULT()} def increment(self): self.counters[self.site_id]["p"] += 1 def decrement(self): self.counters[self.site_id]["n"] += 1 def query(self): p = [r["p"] for r in self.counters.values()] n = [r["n"] for r in self.counters.values()] return sum(p) - sum(n) 34 / 48
  21. CRDT: PNCounter Test Output linenos [INITIAL] sid:s1=0, sid:s2=0, sid:s3=0 linenos

    [C1++ * 2] sid:s1=2, sid:s2=0, sid:s3=0 linenos [C1--] sid:s1=1, sid:s2=0, sid:s3=0 linenos [MERGE C2 <- C1, C3 <- C1] sid:s1=1, sid:s2=1, sid:s3=1 linenos [INITIAL] sid:s1=0, sid:s2=0, sid:s3=0 linenos [C1++ & C3--] sid:s1=1, sid:s2=0, sid:s3=-1 linenos [MERGE C2 <- C1] sid:s1=1, sid:s2=1, sid:s3=-1 linenos [MERGE C1 <- C3, C3 <- C1] sid:s1=0, sid:s2=1, sid:s3=0 35 / 48
  22. CRDT: MVRegister Multi-Value Register class MVRegister(object): def __init__(self, site_id, register=None):

    self.site_id, self.register = site_id, register or {} def set(self, value): self.register = {self.site_id: value} def query(self): return set(self.register.values()) def merge(self, replica): self.register.update({k: v for k, v in replica.items() if k != self.site_id}) 36 / 48
  23. CRDT: MVRegister Test def mvregister_state(prefix, *args): states = [] for

    n, mvregister in enumerate(args): states.append("sid:%s=%r" % (mvregister.site_id, mvregister.query())) print("[%s] %s" % (prefix, ", ".join(states))) def test_mvregister(): c1, c2, c3 = MVRegister("s1"), MVRegister("s2"), MVRegister("s3") mvregister_state("INITIAL", c1, c2, c3) c1.set("v1") mvregister_state("C1 := v1", c1, c2, c3) c3.set("v3") mvregister_state("C3 := v3", c1, c2, c3) c2.merge(c1.register), c3.merge(c1.register) mvregister_state("MERGE C2 <- C1, C3 <- C1", c1, c2, c3) 37 / 48
  24. CRDT: MVRegister Output [INITIAL] sid:s1=set([]), sid:s2=set([]), sid:s3=set([]) [C1 := v1]

    sid:s1=set(['v1']), sid:s2=set([]), sid:s3=set([]) [C3 := v3] sid:s1=set(['v1']), sid:s2=set([]), sid:s3=set(['v3']) [MERGE C2 <- C1, C3 <- C1] sid:s1=set(['v1']), sid:s2=set(['v1']), sid:s3=set(['v1', 'v3']) 38 / 48
  25. CRDT: MVRegister Figure S1 S2 S3 set(“v1”) “v1” set(“v3”) “v1”,

    “v3” Figure: MVRegister Concurrent Operation 39 / 48
  26. JSON CRDT A Conflict-Free Replicated JSON Datatype (August 15, 2017)

    Replica p: Replica q: {“key”: “A”} {“key”: “A”} {“key”: “B”} {“key”: “C”} {“key”: {“B”, “C”}} {“key”: {“B”, “C”}} network communication doc.get(“key”) := “B”; doc.get(“key”) := “C”; Figure: Concurrent assignment to the register at doc.get(“key”) by replicas p and q. 40 / 48
  27. JSON CRDT Concurrent mutation on same key with different types

    {} {“a”: {}} {“a”: {“x”: “y”}} {mapT(“a”): {“x”: “y”}, listT(“a”): [“z”]} {} {“a”: []} {“a”: [“z”]} {mapT(“a”): {“x”: “y”}, listT(“a”): [“z”]} network communication doc.get(“a”) := {}; doc.get(“a”).get(“x”) := “y”; doc.get(“a”) := []; doc.get(“a”).idx(0).insertAfter(“z”); Figure: Concurrently assigning values of different types to the same map key. 41 / 48
  28. CRDTs: CvRDT, CmRDT state-based and op-based replication Figure: State-based Convergent

    Replicated Data Type or CvRDT Figure: Op-based Commutative Replicated Data Type or CmRDT 43 / 48
  29. CRDT: Types Registers, Counters, Sets Register: LWW or Multi-Value (Dynamo

    or Couchdb-like) Counter (growth-only) and Counter w/decrementing G-Set – growth-only set 2P-Set – remove only once set (G-Set + Tombstones set) LWW-Element-Set – vector clocks OR-Set – unique-tagged elements and list of tags within Tombstones set WOOT, LOGOOT, Treedoc, RGA, LSEQ for ordered lists 44 / 48
  30. Tools CRDT Toolbox CRDT-py Link.crdt Roshi by Soundcloud Riak 2.0:

    Counters, Flags, Sets, Registers, Maps Redis Labs CRDT Y-js – framework for offline-first p2p shared editing on structured data Swarm (and forever-in-pre-alpha tool) replikativ.io – p2p distributed system framework GUN framework: p2p distributed framework 45 / 48
  31. Own CRDT Data Type + Not too much code +

    Relatively easy to extend + Works in a predictable way (if you keep semilattice props) + Very stable within any network quality setting + Offline mode for free - Hard to change serializability paradigm to distributed setting - Hard to verify custom conflicts resolution - Hard to build garbage collection - Need to use additional techniques to work with high-throughput dataset (δ-mutation) 46 / 48
  32. Who is using it? Facebook TomTom League of Legends SoundCloud

    Bet265 RIAK Distributed Database 47 / 48