Upgrade to Pro — share decks privately, control downloads, hide ads and more …

PiterPy 2017: Collaborative CRDT-based Systems ...

Avatar for Max Klymyshyn Max Klymyshyn
November 04, 2017

PiterPy 2017: Collaborative CRDT-based Systems with Python for fun and profit

A detailed overview of CRDT design and practical examples of simple CRDT data types in Python. Some examples taken from https://arxiv.org/abs/1608.03960 and https://hal.inria.fr/inria-00609399v1/document

Avatar for Max Klymyshyn

Max Klymyshyn

November 04, 2017
Tweet

More Decks by Max Klymyshyn

Other Decks in Technology

Transcript

  1. Groupware System with Python for fun and profit Collaborative systems

    and Conflict-Free Replicated Data Types Max Klymyshyn Head of Software Architecture at Takeoff Technologies November 4, 2017 1 / 48
  2. Work Head of Software Architecture, Takeoff Technologies, 2017 CTO @

    ZAKAZ.UA and CartFresh, 2012 Team Lead @ oDesk (now Upwork), 2010 Project Coordinator @ 42 Coffee Cups, 2009 2 / 48
  3. Community co-organizer of PapersWeLove Kyiv co-founder of KyivJS co-founder of

    LvivJS co-organizer of PyCon Ukraine co-organizer of PiterPy co-organizer of Hotcode judge at UA Web Challenge 3 / 48
  4. Distributed setting Multiple devices per single user Collaborative work on

    same entity from different devices Offline mode 6 / 48
  5. Why Python? Site or client within web-based distributed system (websockets/pubsub->python)

    State sharing within concurrently executed programs P2P systems 7 / 48
  6. Client 1: Client n: Python Client: message1 messagen messagen message1

    message1 messagen messagen+1 Figure: Python eventually is part of communication flow 8 / 48
  7. Strong Consistency “strong consistency” approach serialises (i.e. sequential with no

    operations overlap) updates in a global total order performance and scalability bottleneck: low throughput/high latency on transaction (because of consensus) RAFT/PAXOS/Zab conflicts with availability and partition-tolerance 10 / 48
  8. Eventual Consistency ∀i, j : f ∈ ci ⇒ f

    ∈ cj, Convergence and Termination properties RIAK, MONGO, Cassandra, Couch 12 / 48
  9. Strong Eventual Consistency ... Same as for Eventual Consistency and

    ∀i, j : ci = cj ⇒ Si ≡ Sj: Correct replicas that have delivered the same updates have equivalent state CALM: consistency as logical monotonicity 13 / 48
  10. ACID 2.0 CAP theorem solution ACID 1.0: atomicity, consistency, isolation,

    durability ACID 2.0: associative, commutative, idempotent, distributed 14 / 48
  11. Three basic concurrency problems divergence: ∪n i=1 o1 i ̸=

    ∪m j=1 o2 j causality-violations: o1 → o3 intention-violations: o1||o2 16 / 48
  12. Conflict-free replicated data types (CRDT) 2011 Shapiro et al Semilattice

    properties (⊔ or LUB – Least Upper Bound): commutativity – ∀x, y : x y = y ⊔ x associativity – x ⊔ (y ⊔ z) = (x ⊔ y) ⊔ z idempotency – x ⊔ x = x 20 / 48
  13. Other properties of sets (1 ∪ 2) ∪ 3 =

    1 ∪ (2 ∪ 3) 1 ∪ 2 = 2 ∪ 1 25 / 48
  14. CRDT: In plain Python messages =[ {"tag": 1, "site": "1",

    "payload": {"key": "val"}}, {"tag": 1, "site": "1", "payload": {"key": "val"}}, {"tag": 1, "site": "2", "payload": {"key1": "val1"}}, {"tag": 2, "site": "1", "payload": {"key2": "val2"}}, {"tag": 0, "site": "2", "payload": {"key0": "val0"}}] # idempotency, {(m["tag"], m["site"]): m for m in messages} # partial order sorted({(m["tag"], m["site"]): m for m in messages}) 26 / 48
  15. CRDT: Replication strategies pubsub + periodic full state update (STOMP

    etc.) Server Sent Events (SSE) in, Websockets out 28 / 48
  16. CRDT: GCounter Growth-only counter: Interface class GCounter(object): def constructor(site_id, counters):

    pass def increment(self): pass def query(self): pass def merge(self, counter): pass 29 / 48
  17. CRDT: GCounter Growth-only Counter class GCounter(object): def __init__(self, site_id, counters=None):

    if counters is not None and site_id not in counters: counters[site_id] = 0 self.site_id = site_id self.counters = counters or {site_id: 0} def increment(self): self.counters[self.site_id] += 1 def query(self): return sum(self.counters.values()) def merge(self, replica): for site_id in replica.keys(): self.counters[site_id] = max( replica[site_id], self.counters.get(site_id, 0)) 31 / 48
  18. CRDT: GCounter-test Increment only counter linenos def pncounter_state(prefix, *args): linenos

    states = [] linenos for n, pncounter in enumerate(args): linenos states.append("sid:%s=%d" % (pncounter.site_id, pncounter.query())) linenos print("[%s] %s" % (prefix, ", ".join(states))) linenos linenos def test_pncounter(): linenos c1, c2, c3 = PNCounter("s1"), PNCounter("s2"), PNCounter("s3") linenos pncounter_state("INITIAL", c1, c2, c3) linenos c1.increment() linenos c1.increment() linenos pncounter_state("C1++ * 2", c1, c2, c3) linenos c1.decrement() linenos pncounter_state("C1--", c1, c2, c3) linenos c2.merge(c1.counters), c3.merge(c1.counters) linenos pncounter_state("MERGE C2 <- C1, C3 <- C1", c1, c2, c3) 32 / 48
  19. CRDT: GCounter example output linenos [INITIAL] sid:s1=0, sid:s2=0, sid:s3=0 linenos

    [C1++] sid:s1=1, sid:s2=0, sid:s3=0 linenos [MERGE C2 <- C1, C3 <- C1] sid:s1=1, sid:s2=1, sid:s3=1 linenos [INITIAL] sid:s1=0, sid:s2=0, sid:s3=0 linenos [C1++ & C3++] sid:s1=1, sid:s2=0, sid:s3=1 linenos [MERGE C2 <- C1] sid:s1=1, sid:s2=1, sid:s3=1 linenos [MERGE C1 <- C3, C3 <- C1] sid:s1=2, sid:s2=1, sid:s3=2 33 / 48
  20. CRDT: PNCounter Positive-Negative Counter DEFAULT = lambda: {"p": 0, "n":

    0} class PNCounter(object): def __init__(self, site_id, counters=None): if counters is not None and site_id not in counters: counters[site_id] = DEFAULT() self.site_id = site_id self.counters = counters or {site_id: DEFAULT()} def increment(self): self.counters[self.site_id]["p"] += 1 def decrement(self): self.counters[self.site_id]["n"] += 1 def query(self): p = [r["p"] for r in self.counters.values()] n = [r["n"] for r in self.counters.values()] return sum(p) - sum(n) 34 / 48
  21. CRDT: PNCounter Test Output linenos [INITIAL] sid:s1=0, sid:s2=0, sid:s3=0 linenos

    [C1++ * 2] sid:s1=2, sid:s2=0, sid:s3=0 linenos [C1--] sid:s1=1, sid:s2=0, sid:s3=0 linenos [MERGE C2 <- C1, C3 <- C1] sid:s1=1, sid:s2=1, sid:s3=1 linenos [INITIAL] sid:s1=0, sid:s2=0, sid:s3=0 linenos [C1++ & C3--] sid:s1=1, sid:s2=0, sid:s3=-1 linenos [MERGE C2 <- C1] sid:s1=1, sid:s2=1, sid:s3=-1 linenos [MERGE C1 <- C3, C3 <- C1] sid:s1=0, sid:s2=1, sid:s3=0 35 / 48
  22. CRDT: MVRegister Multi-Value Register class MVRegister(object): def __init__(self, site_id, register=None):

    self.site_id, self.register = site_id, register or {} def set(self, value): self.register = {self.site_id: value} def query(self): return set(self.register.values()) def merge(self, replica): self.register.update({k: v for k, v in replica.items() if k != self.site_id}) 36 / 48
  23. CRDT: MVRegister Test def mvregister_state(prefix, *args): states = [] for

    n, mvregister in enumerate(args): states.append("sid:%s=%r" % (mvregister.site_id, mvregister.query())) print("[%s] %s" % (prefix, ", ".join(states))) def test_mvregister(): c1, c2, c3 = MVRegister("s1"), MVRegister("s2"), MVRegister("s3") mvregister_state("INITIAL", c1, c2, c3) c1.set("v1") mvregister_state("C1 := v1", c1, c2, c3) c3.set("v3") mvregister_state("C3 := v3", c1, c2, c3) c2.merge(c1.register), c3.merge(c1.register) mvregister_state("MERGE C2 <- C1, C3 <- C1", c1, c2, c3) 37 / 48
  24. CRDT: MVRegister Output [INITIAL] sid:s1=set([]), sid:s2=set([]), sid:s3=set([]) [C1 := v1]

    sid:s1=set(['v1']), sid:s2=set([]), sid:s3=set([]) [C3 := v3] sid:s1=set(['v1']), sid:s2=set([]), sid:s3=set(['v3']) [MERGE C2 <- C1, C3 <- C1] sid:s1=set(['v1']), sid:s2=set(['v1']), sid:s3=set(['v1', 'v3']) 38 / 48
  25. CRDT: MVRegister Figure S1 S2 S3 set(“v1”) “v1” set(“v3”) “v1”,

    “v3” Figure: MVRegister Concurrent Operation 39 / 48
  26. JSON CRDT A Conflict-Free Replicated JSON Datatype (August 15, 2017)

    Replica p: Replica q: {“key”: “A”} {“key”: “A”} {“key”: “B”} {“key”: “C”} {“key”: {“B”, “C”}} {“key”: {“B”, “C”}} network communication doc.get(“key”) := “B”; doc.get(“key”) := “C”; Figure: Concurrent assignment to the register at doc.get(“key”) by replicas p and q. 40 / 48
  27. JSON CRDT Concurrent mutation on same key with different types

    {} {“a”: {}} {“a”: {“x”: “y”}} {mapT(“a”): {“x”: “y”}, listT(“a”): [“z”]} {} {“a”: []} {“a”: [“z”]} {mapT(“a”): {“x”: “y”}, listT(“a”): [“z”]} network communication doc.get(“a”) := {}; doc.get(“a”).get(“x”) := “y”; doc.get(“a”) := []; doc.get(“a”).idx(0).insertAfter(“z”); Figure: Concurrently assigning values of different types to the same map key. 41 / 48
  28. CRDTs: CvRDT, CmRDT state-based and op-based replication Figure: State-based Convergent

    Replicated Data Type or CvRDT Figure: Op-based Commutative Replicated Data Type or CmRDT 43 / 48
  29. CRDT: Types Registers, Counters, Sets Register: LWW or Multi-Value (Dynamo

    or Couchdb-like) Counter (growth-only) and Counter w/decrementing G-Set – growth-only set 2P-Set – remove only once set (G-Set + Tombstones set) LWW-Element-Set – vector clocks OR-Set – unique-tagged elements and list of tags within Tombstones set WOOT, LOGOOT, Treedoc, RGA, LSEQ for ordered lists 44 / 48
  30. Tools CRDT Toolbox CRDT-py Link.crdt Roshi by Soundcloud Riak 2.0:

    Counters, Flags, Sets, Registers, Maps Redis Labs CRDT Y-js – framework for offline-first p2p shared editing on structured data Swarm (and forever-in-pre-alpha tool) replikativ.io – p2p distributed system framework GUN framework: p2p distributed framework 45 / 48
  31. Own CRDT Data Type + Not too much code +

    Relatively easy to extend + Works in a predictable way (if you keep semilattice props) + Very stable within any network quality setting + Offline mode for free - Hard to change serializability paradigm to distributed setting - Hard to verify custom conflicts resolution - Hard to build garbage collection - Need to use additional techniques to work with high-throughput dataset (δ-mutation) 46 / 48
  32. Who is using it? Facebook TomTom League of Legends SoundCloud

    Bet265 RIAK Distributed Database 47 / 48