Upgrade to Pro — share decks privately, control downloads, hide ads and more …

RICON 2014 CRDTs

RICON 2014 CRDTs

Designing for Partition Tolerance with CRDTs

Carlos Baquero

October 29, 2014
Tweet

Other Decks in Research

Transcript

  1. Partitions System design would be easier under these premisses: The

    network is reliable Latency is zero (The 8 Fallacies of Distributed Computing, P. Deutsh) If your system is large and geo-distributed they are often both false Network partitions and long latencies are bound to happen (An informal survey of real-world comm. failures, P. Bailis, K. Kingsbury)
  2. Partitions System design would be easier under these premisses: The

    network is reliable Latency is zero (The 8 Fallacies of Distributed Computing, P. Deutsh) If your system is large and geo-distributed they are often both false Network partitions and long latencies are bound to happen (An informal survey of real-world comm. failures, P. Bailis, K. Kingsbury)
  3. Partitions: Panic or embrace them Limit strong consistency to local

    small scale entities Eventual consistency across large scale entities (AP under CAP) (Life beyond Distributed Transactions, Pat Helland) CC BY: Scott Beale / Laughing Squid
  4. Partitions: Availability and Divergence Ok, I have embraced partitions and

    availability, now what? Updates take time to travel to remote locations (light is slow) Globally, data diverges Immediate responses from local data (reads and writes) Local entity data mutates: Immediately, by changes due to IO with local users Eventually, by remote changes that trickle in from net IO Consistency recovered if seeing same set of mutations (EC)
  5. Partitions: Availability and Divergence Ok, I have embraced partitions and

    availability, now what? Updates take time to travel to remote locations (light is slow) Globally, data diverges Immediate responses from local data (reads and writes) Local entity data mutates: Immediately, by changes due to IO with local users Eventually, by remote changes that trickle in from net IO Consistency recovered if seeing same set of mutations (EC)
  6. Partitions: Availability and Divergence Ok, I have embraced partitions and

    availability, now what? Updates take time to travel to remote locations (light is slow) Globally, data diverges Immediate responses from local data (reads and writes) Local entity data mutates: Immediately, by changes due to IO with local users Eventually, by remote changes that trickle in from net IO Consistency recovered if seeing same set of mutations (EC)
  7. Consistently not Available A CP system (e.g. Paxos) can emulate

    a sequential entity User Log Vision Invariant: No wolf with lamb without shepherd dog.
  8. Consistently not Available A CP system (e.g. Paxos) can emulate

    a sequential entity Lamb User Log Vision Invariant: No wolf with lamb without shepherd dog.
  9. Consistently not Available A CP system (e.g. Paxos) can emulate

    a sequential entity User Log Vision Lamb Lamb Lamb Lamb Lamb Lamb Invariant: No wolf with lamb without shepherd dog.
  10. Consistently not Available A CP system (e.g. Paxos) can emulate

    a sequential entity Dog User Log Vision Lamb Lamb Lamb Lamb Lamb Lamb Invariant: No wolf with lamb without shepherd dog.
  11. Consistently not Available A CP system (e.g. Paxos) can emulate

    a sequential entity User Log Vision Lamb Dog Lamb Dog Lamb Dog Lamb Lamb Lamb Dog Dog Dog Invariant: No wolf with lamb without shepherd dog.
  12. Consistently not Available A CP system (e.g. Paxos) can emulate

    a sequential entity Wolf User Log Vision Lamb Dog Lamb Dog Lamb Dog Lamb Lamb Lamb Dog Dog Dog Invariant: No wolf with lamb without shepherd dog.
  13. Consistently not Available A CP system (e.g. Paxos) can emulate

    a sequential entity User Log Vision Lamb Dog Wolf Lamb Dog Wolf Lamb Dog Wolf Lamb Lamb Lamb Dog Dog Dog Wolf Wolf Wolf Invariant: No wolf with lamb without shepherd dog.
  14. Consistently not Available A CP system (e.g. Paxos) can emulate

    a sequential entity User Log Vision Lamb Dog Wolf Lamb Dog Wolf Lamb Dog Wolf Lamb Lamb Lamb Dog Dog Dog Wolf Wolf Wolf Invariant: No wolf with lamb without shepherd dog.
  15. Consistently not Available A CP system (e.g. Paxos) can emulate

    a sequential entity Lamb User Log Vision Lamb Dog Wolf Lamb Dog Wolf Lamb Dog Wolf Lamb Lamb Lamb Dog Dog Dog Wolf Wolf Wolf Invariant: No wolf with lamb without shepherd dog.
  16. Available for Inconsistencies Consistency guarantees are more relaxed in AP

    Systems User Log Vision Invariant: No wolf with lamb without shepherd dog.
  17. Available for Inconsistencies Consistency guarantees are more relaxed in AP

    Systems Lamb User Log Vision Invariant: No wolf with lamb without shepherd dog.
  18. Available for Inconsistencies Consistency guarantees are more relaxed in AP

    Systems User Log Vision Lamb Lamb Invariant: No wolf with lamb without shepherd dog.
  19. Available for Inconsistencies Consistency guarantees are more relaxed in AP

    Systems User Log Vision Lamb Lamb Lamb Lamb Invariant: No wolf with lamb without shepherd dog.
  20. Available for Inconsistencies Consistency guarantees are more relaxed in AP

    Systems Dog User Log Vision Lamb Lamb Lamb Lamb Invariant: No wolf with lamb without shepherd dog.
  21. Available for Inconsistencies Consistency guarantees are more relaxed in AP

    Systems User Log Vision Lamb Lamb Dog Lamb Lamb Dog Invariant: No wolf with lamb without shepherd dog.
  22. Available for Inconsistencies Consistency guarantees are more relaxed in AP

    Systems Wolf User Log Vision Lamb Lamb Dog Lamb Lamb Dog Invariant: No wolf with lamb without shepherd dog.
  23. Available for Inconsistencies Consistency guarantees are more relaxed in AP

    Systems User Log Vision Lamb Lamb Dog Wolf Lamb Lamb Dog Wolf Invariant: No wolf with lamb without shepherd dog.
  24. Available for Inconsistencies Consistency guarantees are more relaxed in AP

    Systems User Log Vision Lamb Wolf Lamb Dog Wolf Lamb Lamb Dog Wolf Wolf Broken Invariant: No wolf with lamb without shepherd dog.
  25. Available for Inconsistencies Eventual consistency reached User Log Vision Lamb

    Dog Wolf Lamb Dog Wolf Lamb Dog Wolf Lamb Lamb Lamb Dog Dog Dog Wolf Wolf Wolf Converged, but invariants were broken on the execution
  26. No Ordering Guarantees? Eventually delivering all operations is not enough

    Some amount of ordering brings some consistency Helps preserving some invariant types Question is: Which ordering guarantees still preserve availability?
  27. No Ordering Guarantees? Eventually delivering all operations is not enough

    Some amount of ordering brings some consistency Helps preserving some invariant types Question is: Which ordering guarantees still preserve availability?
  28. Causal Consistency AP systems are compatible with causal consistency. In

    fact: No consistency semantics stronger than real time causal consistency can be implemented using a one-way convergent and always available distributed storage implementation. (Consistency, Availability, and Convergence. P. Mahajan, L. Alvisi, M. Dahlin) CC BY: Julie Falk
  29. Causal Consistency (Op A) must be present with all causal

    preceding operations Op Op Op Op A Op Op B
  30. Causal Consistency (Op A) and (Op B) are concurrent. Both

    have causal consistency Op Op Op Op A Op Op B
  31. AP Datatype design Key design principle Know if two operations

    are concurrent or causally related → Operations from the same replica are always related Operations from different replicas can be related or concurrent Consider operations add(wolf) and rmv(wolf) If add(wolf) → rmv(wolf), necessarily wolf ∈ S If add(wolf) rmv(wolf), there are options: Add-wins: wolf ∈ S Remove-wins: wolf ∈ S Predictability For related operations, sequential semantics is preserved. Deterministic decision for concurrent operations.
  32. AP Datatype design Key design principle Know if two operations

    are concurrent or causally related → Operations from the same replica are always related Operations from different replicas can be related or concurrent Consider operations add(wolf) and rmv(wolf) If add(wolf) → rmv(wolf), necessarily wolf ∈ S If add(wolf) rmv(wolf), there are options: Add-wins: wolf ∈ S Remove-wins: wolf ∈ S Predictability For related operations, sequential semantics is preserved. Deterministic decision for concurrent operations.
  33. AP Datatype design Key design principle Know if two operations

    are concurrent or causally related → Operations from the same replica are always related Operations from different replicas can be related or concurrent Consider operations add(wolf) and rmv(wolf) If add(wolf) → rmv(wolf), necessarily wolf ∈ S If add(wolf) rmv(wolf), there are options: Add-wins: wolf ∈ S Remove-wins: wolf ∈ S Predictability For related operations, sequential semantics is preserved. Deterministic decision for concurrent operations.
  34. AP Datatype design Key design principle Know if two operations

    are concurrent or causally related → Operations from the same replica are always related Operations from different replicas can be related or concurrent Consider operations add(wolf) and rmv(wolf) If add(wolf) → rmv(wolf), necessarily wolf ∈ S If add(wolf) rmv(wolf), there are options: Add-wins: wolf ∈ S Remove-wins: wolf ∈ S Predictability For related operations, sequential semantics is preserved. Deterministic decision for concurrent operations.
  35. Datatypes for AP Systems AP operation is easier with rich

    datatypes. Datatypes provide semantic context on the operations Conflict-free Replicated Datatypes (CRDTs) Registers, Sets, Maps, Timelines . . . Always available, causal consistency Converge deterministically Two flavours Operation-based. Operations are sent to all other replicas State-based. State is changed, gossiped and merged
  36. Datatypes for AP Systems AP operation is easier with rich

    datatypes. Datatypes provide semantic context on the operations Conflict-free Replicated Datatypes (CRDTs) Registers, Sets, Maps, Timelines . . . Always available, causal consistency Converge deterministically Two flavours Operation-based. Operations are sent to all other replicas State-based. State is changed, gossiped and merged
  37. Datatypes for AP Systems AP operation is easier with rich

    datatypes. Datatypes provide semantic context on the operations Conflict-free Replicated Datatypes (CRDTs) Registers, Sets, Maps, Timelines . . . Always available, causal consistency Converge deterministically Two flavours Operation-based. Operations are sent to all other replicas State-based. State is changed, gossiped and merged
  38. Operation-Based Model Operations are transformed at source Result is a

    commutative downstream message Reliable causal broadcast disseminates message Delivered messages transform the remote state Queries operate on the state (Conflict-Free Replicated Data Types. Shapiro, Pregui¸ ca, Baquero, Zarwiski )
  39. Pure Operation-Based Model Reliable tagged causal broadcast disseminates operation Delivered

    messages added to a Partially Ordered Log Queries operate over the POLog Add-wins OR-Set {v | (t, addv ) ∈ POLog ∧ ∃(t , rmvv ) ∈ POLog · t → t } (Making Operation-Based CRDT Operation-Based. Baquero, Almeida, Shoker)
  40. Pure Operation-Based Model Reliable tagged causal broadcast disseminates operation Delivered

    messages added to a Partially Ordered Log Queries operate over the POLog Add-wins OR-Set {v | (t, addv ) ∈ POLog ∧ ∃(t , rmvv ) ∈ POLog · t → t } (Making Operation-Based CRDT Operation-Based. Baquero, Almeida, Shoker)
  41. Pure Operation-Based add(A) add(A) add(A) rmv(A) add(C) rmv(A) add(C) add(C)

    add(A) add(A) add(A) rmv(A) {A,C} Vision User {A,C} {A,C} Polog
  42. Pure Operation-Based Model POLogs not so practical if they keep

    growing Reliable tagged causal broadcast disseminates operation Delivered messages added to a partially ordered log Stable messages in Polog compacted to sequential Datatype Queries operate over the Datatype + Polog Possibility of highly efficient sequential Datatypes.
  43. Pure Operation-Based Model POLogs not so practical if they keep

    growing Reliable tagged causal broadcast disseminates operation Delivered messages added to a partially ordered log Stable messages in Polog compacted to sequential Datatype Queries operate over the Datatype + Polog Possibility of highly efficient sequential Datatypes.
  44. Pure Operation-Based Model POLogs not so practical if they keep

    growing Reliable tagged causal broadcast disseminates operation Delivered messages added to a partially ordered log Stable messages in Polog compacted to sequential Datatype Queries operate over the Datatype + Polog Possibility of highly efficient sequential Datatypes.
  45. Compacting Pure Operation-Based rmv(A) rmv(A) Vision {A,C} User {A,C} {A,C}

    Polog {A,C} Set {A,C} {A,C} add(A) add(A) add(A)
  46. Compacting Pure Operation-Based rmv(A) rmv(A) Vision {A,C} User {A,C} {A,C}

    Polog {A,C} Set {A,C} {A,C} add(A) add(A) add(A) rmv(A)
  47. State-Based Model Operations transform the local state Local state is

    gossiped to one or more replicas Received state merged to local state Queries operate on the state No tight control on participating replicas. Elasticity. (Conflict-Free Replicated Data Types. Shapiro, Pregui¸ ca, Baquero, Zarwiski )
  48. State-Based Vision {C} User {A,C} {C} State (A,blue2) (C,blue1) ¬

    {black1} (C,blue1) (C,blue1) ¬ {black1} ¬ {black1}
  49. State-Based Vision {A,C} User {A,C} {A,C} State (A,blue2) (C,blue1) ¬

    {black1} (C,blue1) (C,blue1) ¬ {black1} (A,blue2) (A,blue2) ¬ {black1}
  50. Delta-based State-Based Model Operations read the local state and create

    delta Delta merged into local state and local buffer Local buffer gossiped to one or more replicas and reseted Received state merged to local state and buffer Queries operate on the state State can grow with small impact on the dissemination (Efficient State-based CRDTs by Delta-Mutation. Almeida, Shoker, Baquero)
  51. δ State-Based (A,black1) Vision {A} {A} {A} User State Delta

    (A,black1) (A,black1) (A,black1) (A,black1) (A,black1)
  52. δ State-Based (A,black1) Vision {A} {A} {A} User State Delta

    (A,black1) (A,black1) (A,black1) (A,black1) (A,black1)
  53. δ State-Based (A,black1) Vision {A} {A} {A} User State Delta

    (A,black1) (A,black1) (A,black1) (A,black1)
  54. δ State-Based add(C) rmv(A) (A,black1) Vision {A} {A} {A} User

    State Delta (A,black1) (A,black1) (A,black1) (A,black1)
  55. δ State-Based rmv(A) (A,black1) Vision {A} {A,C} {A} User State

    Delta (A,black1) (A,black1) (A,black1) (A,black1) (C,blue1) (C,blue1)
  56. δ State-Based (A,black1) Vision {A} {A,C} {} User State Delta

    (A,black1) (A,black1) (C,blue1) (C,blue1) ¬ {black1} ¬ {black1}
  57. δ State-Based (A,black1) Vision {A} {A,C} {C} User State Delta

    (A,black1) (A,black1) (C,blue1) (C,blue1) ¬ {black1} ¬ {black1} (C,blue1) (C,blue1)
  58. δ State-Based (A,black1) Vision {A} {A,C} {C} User State Delta

    (A,black1) (A,black1) (C,blue1) (C,blue1) ¬ {black1} ¬ {black1} (C,blue1) (C,blue1)
  59. δ State-Based add(A) Vision {C} {A,C} {C} User State Delta

    (A,black1) (C,blue1) ¬ {black1} ¬ {black1} (C,blue1) (C,blue1) (C,blue1) ¬ {black1} ¬ {black1} (C,blue1)
  60. δ State-Based Vision {C} {A,C} {C} User State Delta (C,blue1)

    ¬ {black1} (C,blue1) (C,blue1) ¬ {black1} ¬ {black1} (C,blue1) (A,blue2) (A,blue2) ¬ {black1} ¬ {black1}
  61. δ State-Based Vision {C} {A,C} {C} User State Delta (C,blue1)

    ¬ {black1} (C,blue1) (C,blue1) ¬ {black1} ¬ {black1} (C,blue1) (A,blue2) (A,blue2) ¬ {black1} ¬ {black1}
  62. δ State-Based Vision {A,C} {A,C} {A,C} User State Delta (C,blue1)

    ¬ {black1} (C,blue1) (C,blue1) ¬ {black1} ¬ {black1} (C,blue1) (A,blue2) (A,blue2) (A,blue2) (A,blue2) (A,blue2) (A,blue2) ¬ {black1} ¬ {black1} ¬ {black1}
  63. δ State-Based Vision {A,C} {A,C} {A,C} User State Delta (C,blue1)

    ¬ {black1} (C,blue1) (C,blue1) ¬ {black1} (A,blue2) (A,blue2) (A,blue2) ¬ {black1}
  64. Recap Causal consistency does not impact partition-tolerance Sequential semantics should

    be preserved Design options only over concurrently executed operations Pure operation-based can allow very compact states. Deltas make big state more acceptable Inherent tradeoff among operation and state based
  65. Recap Causal consistency does not impact partition-tolerance Sequential semantics should

    be preserved Design options only over concurrently executed operations Pure operation-based can allow very compact states. Deltas make big state more acceptable Inherent tradeoff among operation and state based
  66. Recap Causal consistency does not impact partition-tolerance Sequential semantics should

    be preserved Design options only over concurrently executed operations Pure operation-based can allow very compact states. Deltas make big state more acceptable Inherent tradeoff among operation and state based
  67. Recap Causal consistency does not impact partition-tolerance Sequential semantics should

    be preserved Design options only over concurrently executed operations Pure operation-based can allow very compact states. Deltas make big state more acceptable Inherent tradeoff among operation and state based
  68. Conclusion Causal consistency simplifies development Deltas transmit less state while

    preserving causal consistency Operation based, high potential but requires stronger middleware