$30 off During Our Annual Pro Sale. View Details »

Just-Right Consistency - Closing the CAP Gap

Just-Right Consistency - Closing the CAP Gap

Percona 2017

Christopher Meiklejohn

April 26, 2017
Tweet

More Decks by Christopher Meiklejohn

Other Decks in Research

Transcript

  1. Just-Right Consistency

    Closing the CAP Gap
    Christopher S. Meiklejohn (@cmeik),

    Peter Lash
    LIGHT ONE

    View Slide

  2. Outline:
    Closing the CAP Gap
    • Just-Right Consistency

    Available as possible, and consistent when
    necessary
    2

    View Slide

  3. Outline:
    Closing the CAP Gap
    • Just-Right Consistency

    Available as possible, and consistent when
    necessary
    • AntidoteDB

    The first database that provides transactions with
    strong semantics, targeted at the JRC approach
    2

    View Slide

  4. Outline:
    Closing the CAP Gap
    • Just-Right Consistency

    Available as possible, and consistent when
    necessary
    • AntidoteDB

    The first database that provides transactions with
    strong semantics, targeted at the JRC approach
    • Moving forward

    Antidote’s path forward from research to
    company and product
    2

    View Slide

  5. Motivation
    Cloud Databases
    3

    View Slide

  6. [Photo: http://vignette3.wikia.nocookie.net/the-titans-rp-and-information/images/f/f5/Blank-World-map2.gif/revision/latest/scale-to-width-down/1280?cb=20141016203452]

    View Slide

  7. A
    Centralized database.
    [Photo: http://vignette3.wikia.nocookie.net/the-titans-rp-and-information/images/f/f5/Blank-World-map2.gif/revision/latest/scale-to-width-down/1280?cb=20141016203452]

    View Slide

  8. A
    Clients read and write
    against the primary copy.
    [Photo: http://vignette3.wikia.nocookie.net/the-titans-rp-and-information/images/f/f5/Blank-World-map2.gif/revision/latest/scale-to-width-down/1280?cb=20141016203452]

    View Slide

  9. A
    B
    C
    Geo-replicated for both
    fault-tolerance and high-availability.
    [Photo: http://vignette3.wikia.nocookie.net/the-titans-rp-and-information/images/f/f5/Blank-World-map2.gif/revision/latest/scale-to-width-down/1280?cb=20141016203452]

    View Slide

  10. A
    B
    C
    Clients read and write locally
    for low-latency.
    [Photo: http://vignette3.wikia.nocookie.net/the-titans-rp-and-information/images/f/f5/Blank-World-map2.gif/revision/latest/scale-to-width-down/1280?cb=20141016203452]

    View Slide

  11. A
    B
    C
    What happens if C
    can’t communicate with other replicas?
    [Photo: http://vignette3.wikia.nocookie.net/the-titans-rp-and-information/images/f/f5/Blank-World-map2.gif/revision/latest/scale-to-width-down/1280?cb=20141016203452]

    View Slide

  12. A
    B
    C
    Choice 1: Consistent-Under-Partition (CP)
    • Synchronize each operation

    Maintains “single system image”
    [Photo: http://vignette3.wikia.nocookie.net/the-titans-rp-and-information/images/f/f5/Blank-World-map2.gif/revision/latest/scale-to-width-down/1280?cb=20141016203452]

    View Slide

  13. A
    B
    C
    Choice 1: Consistent-Under-Partition (CP)
    • Synchronize each operation

    Maintains “single system image”
    • Spanner/F1, serializability model

    Coordination is expensive; Spanner typically has to
    wait 100ms to commit an update transaction
    [Photo: http://vignette3.wikia.nocookie.net/the-titans-rp-and-information/images/f/f5/Blank-World-map2.gif/revision/latest/scale-to-width-down/1280?cb=20141016203452]

    View Slide

  14. A
    B
    C
    Choice 1: Consistent-Under-Partition (CP)
    • Synchronize each operation

    Maintains “single system image”
    • Spanner/F1, serializability model

    Coordination is expensive; Spanner typically has to
    wait 100ms to commit an update transaction
    Over-conservative,

    but easy to program!
    [Photo: http://vignette3.wikia.nocookie.net/the-titans-rp-and-information/images/f/f5/Blank-World-map2.gif/revision/latest/scale-to-width-down/1280?cb=20141016203452]

    View Slide

  15. A
    B
    C
    Choice 2: Available-Under-Partition (AP)
    • Riak, Cassandra, Dynamo

    Operations issued against local copy, and across the cluster in
    parallel
    [Photo: http://vignette3.wikia.nocookie.net/the-titans-rp-and-information/images/f/f5/Blank-World-map2.gif/revision/latest/scale-to-width-down/1280?cb=20141016203452]

    View Slide

  16. A
    B
    C
    Choice 2: Available-Under-Partition (AP)
    • Riak, Cassandra, Dynamo

    Operations issued against local copy, and across the cluster in
    parallel
    • Local operation only, asynchronous propagation

    Stale reads and write conflicts will occur without
    synchronization
    [Photo: http://vignette3.wikia.nocookie.net/the-titans-rp-and-information/images/f/f5/Blank-World-map2.gif/revision/latest/scale-to-width-down/1280?cb=20141016203452]

    View Slide

  17. A
    B
    C
    Choice 2: Available-Under-Partition (AP)
    • Riak, Cassandra, Dynamo

    Operations issued against local copy, and across the cluster in
    parallel
    • Local operation only, asynchronous propagation

    Stale reads and write conflicts will occur without
    synchronization
    Available,

    but difficult to program!
    [Photo: http://vignette3.wikia.nocookie.net/the-titans-rp-and-information/images/f/f5/Blank-World-map2.gif/revision/latest/scale-to-width-down/1280?cb=20141016203452]

    View Slide

  18. A
    B
    C
    CAP Theorem
    CP AP
    [Photo: http://vignette3.wikia.nocookie.net/the-titans-rp-and-information/images/f/f5/Blank-World-map2.gif/revision/latest/scale-to-width-down/1280?cb=20141016203452]

    View Slide

  19. A
    B
    C
    CAP Theorem
    High cost
    CP AP
    [Photo: http://vignette3.wikia.nocookie.net/the-titans-rp-and-information/images/f/f5/Blank-World-map2.gif/revision/latest/scale-to-width-down/1280?cb=20141016203452]

    View Slide

  20. A
    B
    C
    CAP Theorem
    High cost
    Low availability
    CP AP
    [Photo: http://vignette3.wikia.nocookie.net/the-titans-rp-and-information/images/f/f5/Blank-World-map2.gif/revision/latest/scale-to-width-down/1280?cb=20141016203452]

    View Slide

  21. A
    B
    C
    CAP Theorem
    High cost
    Low availability
    Synchronization
    CP AP
    [Photo: http://vignette3.wikia.nocookie.net/the-titans-rp-and-information/images/f/f5/Blank-World-map2.gif/revision/latest/scale-to-width-down/1280?cb=20141016203452]

    View Slide

  22. A
    B
    C
    CAP Theorem
    High cost
    Low availability
    Synchronization
    Low cost
    CP AP
    [Photo: http://vignette3.wikia.nocookie.net/the-titans-rp-and-information/images/f/f5/Blank-World-map2.gif/revision/latest/scale-to-width-down/1280?cb=20141016203452]

    View Slide

  23. A
    B
    C
    CAP Theorem
    High cost
    Low availability
    Synchronization
    Low cost
    High availability
    CP AP
    [Photo: http://vignette3.wikia.nocookie.net/the-titans-rp-and-information/images/f/f5/Blank-World-map2.gif/revision/latest/scale-to-width-down/1280?cb=20141016203452]

    View Slide

  24. A
    B
    C
    CAP Theorem
    High cost
    Low availability
    Synchronization
    Low cost
    High availability
    Anomalies
    CP AP
    [Photo: http://vignette3.wikia.nocookie.net/the-titans-rp-and-information/images/f/f5/Blank-World-map2.gif/revision/latest/scale-to-width-down/1280?cb=20141016203452]

    View Slide

  25. A
    B
    C
    CAP Theorem
    High cost
    Low availability
    Synchronization
    Low cost
    High availability
    Anomalies
    CP AP
    False dichotomy!
    [Photo: http://vignette3.wikia.nocookie.net/the-titans-rp-and-information/images/f/f5/Blank-World-map2.gif/revision/latest/scale-to-width-down/1280?cb=20141016203452]

    View Slide

  26. A
    B
    C
    CAP Theorem
    High cost
    Low availability
    Synchronization
    Low cost
    High availability
    Anomalies
    CP AP
    False dichotomy!
    [Photo: http://vignette3.wikia.nocookie.net/the-titans-rp-and-information/images/f/f5/Blank-World-map2.gif/revision/latest/scale-to-width-down/1280?cb=20141016203452]
    • No “one-size-fits-all” consistency model

    Choosing either model will either be over-conservative or
    risk anomalies

    View Slide

  27. A
    B
    C
    CAP Theorem
    High cost
    Low availability
    Synchronization
    Low cost
    High availability
    Anomalies
    CP AP
    False dichotomy!
    [Photo: http://vignette3.wikia.nocookie.net/the-titans-rp-and-information/images/f/f5/Blank-World-map2.gif/revision/latest/scale-to-width-down/1280?cb=20141016203452]
    • No “one-size-fits-all” consistency model

    Choosing either model will either be over-conservative or
    risk anomalies
    • Application-level invariants

    Instead, tailor consistency choices based on application-
    level invariants for each operation

    View Slide

  28. Just Right Consistency
    • Preserve sequential patterns

    Applications written sequentially that are correct should maintain
    correctness under concurrency
    13

    View Slide

  29. Just Right Consistency
    • Preserve sequential patterns

    Applications written sequentially that are correct should maintain
    correctness under concurrency
    • AP-compatible invariants

    Strongest AP model; invariants that only require “one way”
    communications
    13

    View Slide

  30. Just Right Consistency
    • Preserve sequential patterns

    Applications written sequentially that are correct should maintain
    correctness under concurrency
    • AP-compatible invariants

    Strongest AP model; invariants that only require “one way”
    communications
    • CAP-sensitive invariants

    Transactions that require coordination; “two way” communication
    invariants
    13

    View Slide

  31. Just Right Consistency
    • Preserve sequential patterns

    Applications written sequentially that are correct should maintain
    correctness under concurrency
    • AP-compatible invariants

    Strongest AP model; invariants that only require “one way”
    communications
    • CAP-sensitive invariants

    Transactions that require coordination; “two way” communication
    invariants
    • Tools for analysis and verification

    Identify and verify application has sufficient synchronization to
    ensure application invariants
    13

    View Slide

  32. Example
    Fælles Medicinkort
    14

    View Slide

  33. Fælles Medicinkort
    • FMK [production] / FMKe [synthetic workload]

    Danish National Joint Medicine Card; operating 24x7
    since 2013 for 6 million Danish citizens
    15

    View Slide

  34. Fælles Medicinkort
    • FMK [production] / FMKe [synthetic workload]

    Danish National Joint Medicine Card; operating 24x7
    since 2013 for 6 million Danish citizens
    • Lifecycle management for prescriptions

    Involves patient, pharmacy, and doctor management
    around active prescriptions in Denmark
    15

    View Slide

  35. Fælles Medicinkort
    • FMK [production] / FMKe [synthetic workload]

    Danish National Joint Medicine Card; operating 24x7
    since 2013 for 6 million Danish citizens
    • Lifecycle management for prescriptions

    Involves patient, pharmacy, and doctor management
    around active prescriptions in Denmark
    • Assumed correct in isolation

    “Correct-Individually”, C in ACID, each operation
    ensures application-level invariants
    15

    View Slide

  36. Fælles Medicinkort
    • FMK [production] / FMKe [synthetic workload]

    Danish National Joint Medicine Card; operating 24x7
    since 2013 for 6 million Danish citizens
    • Lifecycle management for prescriptions

    Involves patient, pharmacy, and doctor management
    around active prescriptions in Denmark
    • Assumed correct in isolation

    “Correct-Individually”, C in ACID, each operation
    ensures application-level invariants
    15
    • create-prescription

    Create prescription for patient, doctor, pharmacy

    • update-prescription-medication

    Add or increase medication to prescription

    • process-prescription

    Deliver a medication by a pharmacy

    • get-*-prescriptions

    Query functions to return information about prescriptions

    View Slide

  37. FMKe Invariants
    • Relative order [referential integrity]

    Create a prescription and reference it by a
    patient
    16

    View Slide

  38. FMKe Invariants
    • Relative order [referential integrity]

    Create a prescription and reference it by a
    patient
    • Joint update [atomicity]

    Create prescription, then update doctor,
    patient, and pharmacy
    16

    View Slide

  39. FMKe Invariants
    • Relative order [referential integrity]

    Create a prescription and reference it by a
    patient
    • Joint update [atomicity]

    Create prescription, then update doctor,
    patient, and pharmacy
    • Precondition check [if, then]

    Medication should not be over delivered
    16

    View Slide

  40. Invariants
    AP-compatible
    17

    View Slide

  41. AP-compatible
    • No synchronization

    Updates occur locally without blocking, no
    synchronization in the critical path
    18

    View Slide

  42. AP-compatible
    • No synchronization

    Updates occur locally without blocking, no
    synchronization in the critical path
    • Asynchronous operation

    Updates are fast, available, and exploit
    concurrency
    18

    View Slide

  43. AP-compatible
    • No synchronization

    Updates occur locally without blocking, no
    synchronization in the critical path
    • Asynchronous operation

    Updates are fast, available, and exploit
    concurrency
    • Compatible invariants

    Relative order and joint update invariants can
    be preserved
    18

    View Slide

  44. AP-compatibe
    Data Model
    19

    View Slide

  45. RA
    RB

    View Slide

  46. RA
    RB
    1
    set(1)

    View Slide

  47. RA
    RB
    1
    set(1)
    3
    2
    set(2)
    set(3)

    View Slide

  48. RA
    RB
    1
    set(1)
    3
    2
    set(2)
    set(3)
    2
    3
    Concurrent assignments

    don’t commute!

    View Slide

  49. RA
    RB
    1
    set(1)
    3
    2
    set(2)
    set(3)
    2
    3
    Concurrent assignments

    don’t commute!
    Assignment requires CP.

    View Slide

  50. 24
    Can we find a suitable data model
    for AP systems?

    View Slide

  51. Can we make non-commutative
    updates commutative?
    24
    Can we find a suitable data model
    for AP systems?

    View Slide

  52. RA
    RB
    1
    set(1)
    3
    2
    set(2)
    set(3)
    ?
    ?
    How do we deterministically pick

    a value to keep?

    View Slide

  53. RA
    RB
    1
    set(1)
    3
    2
    set(2)
    set(3)
    ?
    ?
    How do we deterministically pick

    a value to keep?
    Do we use a timestamp?

    (like Cassandra, and drop a value?)

    View Slide

  54. RA
    RB
    1
    set(1)
    3
    2
    set(2)
    set(3)
    ?
    ?
    How do we deterministically pick

    a value to keep?
    Do we use a timestamp?

    (like Cassandra, and drop a value?)
    Timestamps make concurrent
    operations commute

    but fail to capture intent.

    View Slide

  55. Can we be smarter about
    the merge function?
    26

    View Slide

  56. RA
    RB
    1
    set(1)
    3
    2
    set(2)
    set(3)
    3
    3
    max(2,3)
    max(2,3)
    Deterministic
    conflict resolution
    function.

    View Slide

  57. RA
    RB
    1
    set(1)
    3
    2
    set(2)
    set(3)
    3
    3
    max(2,3)
    max(2,3)
    Deterministic
    conflict resolution
    function.
    CRDTs
    generalize
    this framework.

    View Slide

  58. Conflict-Free 

    Replicated Data Types
    • Replicated abstract data types

    Extension of sequential data type that
    encapsulates deterministic merge
    function
    28

    View Slide

  59. Conflict-Free 

    Replicated Data Types
    • Replicated abstract data types

    Extension of sequential data type that
    encapsulates deterministic merge
    function
    • Many existing designs

    Sets, counters, registers, flags, maps
    28

    View Slide

  60. AP-compatibe
    Relative Order
    29

    View Slide

  61. RA
    RB

    View Slide

  62. RA
    RB
    Maintain program order
    implication invariant.

    View Slide

  63. RA
    RB
    Maintain program order
    implication invariant.
    For instance, P => Q.

    View Slide

  64. RA
    RB
    Q
    true(Q)
    Make Q true.

    View Slide

  65. RA
    RB
    Q
    true(Q)
    P
    true(P)
    Make P true.

    View Slide

  66. RA
    RB
    Q
    true(Q)
    P
    true(P)
    Program order implies ordering relationship.

    View Slide

  67. RA
    RB
    Q
    true(Q)
    P
    true(P)
    Ordering is respected at other replicas.

    View Slide

  68. RA
    RB
    Q
    true(Q)
    P
    true(P)
    Out of order propagation violates invariant!

    View Slide

  69. RA
    RB
    Q
    true(Q)
    P
    true(P)
    P is true,
    Q is NOT true!

    View Slide

  70. Let’s look at a
    concrete example.
    37

    View Slide

  71. RA
    RB

    View Slide

  72. RA
    RB
    Q
    true(Q)
    Change default administrator password.

    View Slide

  73. RA
    RB
    Q
    true(Q)
    P
    true(P)
    Enable administrator login.

    View Slide

  74. RA
    RB
    Q
    true(Q)
    P
    true(P)
    Replica A is secure.

    View Slide

  75. RA
    RB
    Q
    true(Q)
    P
    true(P)
    Replica B is secure.

    View Slide

  76. RA
    RB
    Q
    true(Q)
    P
    true(P)
    Reordering allows default password
    to be used to login!

    View Slide

  77. Causal Consistency
    • Respect causality

    Ensure updates are delivered in the causal order

    [Lamport 78]
    44

    View Slide

  78. Causal Consistency
    • Respect causality

    Ensure updates are delivered in the causal order

    [Lamport 78]
    • Strongest available model

    Always able to return some compatible version
    for an object
    44

    View Slide

  79. Causal Consistency
    • Respect causality

    Ensure updates are delivered in the causal order

    [Lamport 78]
    • Strongest available model

    Always able to return some compatible version
    for an object
    • Referential integrity

    Causal consistency is sufficient for providing
    referential integrity in an AP database
    44

    View Slide

  80. …relative order invariants
    are preserved transparently!
    45
    Causal consistency
    means…

    View Slide

  81. AP-compatibe
    Joint Update
    46

    View Slide

  82. RA
    RB
    C1
    Client performing reads.

    View Slide

  83. RA
    RB
    C1
    Rx
    create Rx
    Create prescription.

    View Slide

  84. RA
    RB
    C1
    Rx
    create Rx
    Dr
    update Dr(Rx)
    Add reference in doctor record.

    View Slide

  85. RA
    RB
    C1
    Rx
    create Rx
    Dr
    update Dr(Rx)
    Pt
    update Pt(Rx)
    Add reference in patient record.

    View Slide

  86. RA
    RB
    C1
    Rx
    create Rx
    Dr
    update Dr(Rx)
    Pt
    update Pt(Rx)
    Ph
    update Ph(Rx)
    Add reference in pharmacy record.

    View Slide

  87. RA
    RB
    C1
    Rx
    create Rx
    Dr
    update Dr(Rx)
    Pt
    update Pt(Rx)
    Ph
    update Ph(Rx)
    Updates are causally consistent.

    View Slide

  88. RA
    RB
    C1
    Rx
    create Rx
    Dr
    update Dr(Rx)
    Pt
    update Pt(Rx)
    Ph
    update Ph(Rx)
    Client can read inconsistent state.

    View Slide

  89. RA
    RB
    C1
    Rx
    create Rx
    Dr
    update Dr(Rx)
    Pt
    update Pt(Rx)
    Ph
    update Ph(Rx)
    Client is missing update to pharmacy.

    View Slide

  90. Can we ensure updates are
    All-or-Nothing?
    55

    View Slide

  91. RA
    RB
    C1
    T1
    create Rx
    update Dr(Rx)
    update Pt(Rx)
    update Ph(Rx)
    Group updates into an atomic transaction.

    View Slide

  92. RA
    RB
    C1
    T1
    create Rx
    update Dr(Rx)
    update Pt(Rx)
    update Ph(Rx)
    Updates reflect “All-Or-Nothing” property
    through snapshots.

    View Slide

  93. RA
    RB
    C1
    T1
    create Rx
    update Dr(Rx)
    update Pt(Rx)
    update Ph(Rx)
    T2
    Transactions are delivered in causal order.

    View Slide

  94. RA
    RB
    C1
    T1
    create Rx
    update Dr(Rx)
    update Pt(Rx)
    update Ph(Rx)
    T2
    Therefore, snapshots are causally consistent.

    View Slide

  95. AP-compatible transactions
    provide the “A” in ACID
    60

    View Slide

  96. Transactional
    Causal Consistency
    61
    Strongest model
    that is available (AP)

    View Slide

  97. Invariants
    CAP-sensitive
    62

    View Slide

  98. What about preventing over
    delivery of prescriptions?
    63

    View Slide

  99. RA(2)
    RB(2) ?
    ?
    RC(2) ?
    Three replicas each with
    two available medications.

    View Slide

  100. RA(2)
    RB(2) 1
    1
    1
    pp(1)
    RC(2) 1
    Replica A checks precondition
    and delivers medication.

    View Slide

  101. RA(2)
    RB(2) 1
    1
    1
    pp(1)
    RC(2) 1
    Correct outcome

    where one medication remains.

    View Slide

  102. Is this safe
    with concurrent operations?
    67

    View Slide

  103. RA(2)
    RB(2) ?
    ?
    RC(2) ?
    Three replicas each with
    two available medications.

    View Slide

  104. RA(2)
    RB(2) 4
    4
    1
    pp(1)
    RC(2) 4
    4
    add(3)
    Replica A checks precondition
    and delivers medication.

    View Slide

  105. RA(2)
    RB(2) 4
    4
    1
    pp(1)
    RC(2) 4
    4
    add(3)
    Replica C adds three medications

    to the prescription.

    View Slide

  106. RA(2)
    RB(2) 4
    4
    1
    pp(1)
    RC(2) 4
    4
    add(3)
    Correct outcome
    with four remaining medications.

    View Slide

  107. RA(2)
    RB(2) 4
    4
    1
    pp(1)
    RC(2) 4
    4
    add(3)
    Correct outcome
    with four remaining medications.
    Precondition is stable under
    concurrent addition.

    View Slide

  108. Is this safe
    with concurrent deliveries?
    72

    View Slide

  109. RA(2)
    RB(2) ?
    ?
    RC(2) ?
    Three replicas each with
    two available medications.

    View Slide

  110. RA(2)
    RB(2) -1
    -1
    1
    pp(1)
    RC(2) -1
    0
    pp(2)
    Replica A checks precondition
    and delivers medication.

    View Slide

  111. RA(2)
    RB(2) -1
    -1
    1
    pp(1)
    RC(2) -1
    0
    pp(2)
    Replica C concurrently checks precondition

    and delivers two medications.

    View Slide

  112. RA(2)
    RB(2) -1
    -1
    1
    pp(1)
    RC(2) -1
    0
    pp(2)
    Incorrect outcome
    violating non-negative invariant.

    View Slide

  113. RA(2)
    RB(2) -1
    -1
    1
    pp(1)
    RC(2) -1
    0
    pp(2)
    Incorrect outcome
    violating non-negative invariant.
    Precondition is NOT stable
    under concurrent fulfillment.

    View Slide

  114. RA(2)
    RB(2) -1
    -1
    1
    pp(1)
    RC(2) -1
    0
    pp(2)
    Incorrect outcome
    violating non-negative invariant.
    Precondition is NOT stable
    under concurrent fulfillment.
    • Forbid concurrency

    Prevent operations from proceeding without synchronization to
    enforce invariant
    • Allow concurrency and remove invariant

    Allow operation to proceed, knowing that the invariant may be
    violated under concurrent operations

    View Slide

  115. How do we know when it’s
    safe?
    77

    View Slide

  116. CISE Analysis
    78

    View Slide

  117. RA
    RB I?
    I?
    ?
    Upre?
    RC I?
    ?
    Vpre?
    Analyze possible pairs
    of concurrent operations…

    View Slide

  118. RA
    RB I?
    I?
    ?
    Upre?
    RC I?
    ?
    Vpre?
    …to identify operations where
    the invariant can be violated.

    View Slide

  119. CISE Analysis
    • Individually correct

    Individual operations never violate the
    invariant
    81

    View Slide

  120. CISE Analysis
    • Individually correct

    Individual operations never violate the
    invariant
    • Convergence

    Concurrent effects commute
    81

    View Slide

  121. CISE Analysis
    • Individually correct

    Individual operations never violate the
    invariant
    • Convergence

    Concurrent effects commute
    • Precondition stability

    Preconditions are stable under every pair
    of concurrent operations
    81

    View Slide

  122. CISE Analysis
    • Individually correct

    Individual operations never violate the
    invariant
    • Convergence

    Concurrent effects commute
    • Precondition stability

    Preconditions are stable under every pair
    of concurrent operations
    81
    If satisfied, invariant is
    guaranteed with concurrency.

    View Slide

  123. Database
    AntidoteDB
    82

    View Slide

  124. AntidoteDB
    • Open-source Erlang database

    Developed in Erlang, on top of the Riak Core
    distributed systems framework
    83

    View Slide

  125. AntidoteDB
    • Open-source Erlang database

    Developed in Erlang, on top of the Riak Core
    distributed systems framework
    • Transactional Causal Consistency

    Only industrial-grade database providing both
    causal consistency and all-or-nothing transactions
    83

    View Slide

  126. AntidoteDB
    • Open-source Erlang database

    Developed in Erlang, on top of the Riak Core
    distributed systems framework
    • Transactional Causal Consistency

    Only industrial-grade database providing both
    causal consistency and all-or-nothing transactions
    • Alpha release available

    Currently under development, but an alpha
    release of the product is available on GitHub
    83

    View Slide

  127. A
    B
    N1
    N2
    TxnMgr
    Materializer
    Log
    InterDC-Repl
    Each data center…

    View Slide

  128. A
    B
    N1
    N2
    TxnMgr
    Materializer
    Log
    InterDC-Repl
    …contains multiple nodes…

    View Slide

  129. A
    B
    N1
    N2
    TxnMgr
    Materializer
    Log
    InterDC-Repl
    …each operating a transaction manager, materializers, log.

    View Slide

  130. A
    B
    N1
    N2
    TxnMgr
    Materializer
    Log
    InterDC-Repl
    Strong consistency inside of the data center…

    View Slide

  131. A
    B
    N1
    N2
    TxnMgr
    Materializer
    Log
    InterDC-Repl
    …with a causal consistency protocol running in the wide area.

    View Slide

  132. Data Model
    89
    Register

    • Last-Writer Wins
    • Multi-Value
    Set

    • Grow-Only
    • Add-Wins
    • Remove-Wins
    Map
    Counter

    • Unlimited
    • Restricted ≥ 0
    Graph

    • Directed
    • Monotonic DAG
    • Edit graph
    Sequence

    View Slide

  133. Object API
    90
    User1 = {michel, antidote_crdt_mvreg, user_bucket},
    {ok, Time2} = antidote:update_objects(ignore, [],
    [{User1, assign,
    {["Michel", “[email protected]”],
    ClientIdentifier}}]),
    {ok, Result, Time2} = antidote:read_objects(
    ignore, [], [User1]).

    View Slide

  134. Object API
    91
    User1 = {michel, antidote_crdt_mvreg, user_bucket},
    {ok, Time2} = antidote:update_objects(ignore, [],
    [{User1, assign,
    {["Michel", “[email protected]”],
    ClientIdentifier}}]),
    {ok, Result, Time2} = antidote:read_objects(
    ignore, [], [User1]).
    Identify an object by object identifier.

    View Slide

  135. Object API
    92
    User1 = {michel, antidote_crdt_mvreg, user_bucket},
    {ok, Time2} = antidote:update_objects(ignore, [],
    [{User1, assign,
    {["Michel", “[email protected]”],
    ClientIdentifier}}]),
    {ok, Result, Time2} = antidote:read_objects(
    ignore, [], [User1]).
    Use the update API to assign a value to this register.

    View Slide

  136. Object API
    93
    User1 = {michel, antidote_crdt_mvreg, user_bucket},
    {ok, Time2} = antidote:update_objects(ignore, [],
    [{User1, assign,
    {["Michel", “[email protected]”],
    ClientIdentifier}}]),
    {ok, Result, Time2} = antidote:read_objects(
    ignore, [], [User1]).
    Read the object, providing a minimum snapshot time.

    View Slide

  137. Object API
    93
    User1 = {michel, antidote_crdt_mvreg, user_bucket},
    {ok, Time2} = antidote:update_objects(ignore, [],
    [{User1, assign,
    {["Michel", “[email protected]”],
    ClientIdentifier}}]),
    {ok, Result, Time2} = antidote:read_objects(
    ignore, [], [User1]).
    Read the object, providing a minimum snapshot time.
    Simple, operation-based API.
    (think Redis, Riak CRDTs)

    View Slide

  138. Object API
    93
    User1 = {michel, antidote_crdt_mvreg, user_bucket},
    {ok, Time2} = antidote:update_objects(ignore, [],
    [{User1, assign,
    {["Michel", “[email protected]”],
    ClientIdentifier}}]),
    {ok, Result, Time2} = antidote:read_objects(
    ignore, [], [User1]).
    Read the object, providing a minimum snapshot time.
    Simple, operation-based API.
    (think Redis, Riak CRDTs)
    Causal dependencies are
    automatically captured by
    execution order.

    View Slide

  139. Transaction API
    94
    {ok, TxId} = antidote:start_transaction(Timestamp, []),
    {ok, _} = antidote:read_objects([Set], TxId),
    ok = antidote:update_objects([{Set, add, "Java"}], TxId),
    {ok, _} = antidote:commit_transaction(TxId).

    View Slide

  140. Transaction API
    95
    Start a transaction with the transaction API,
    with a given snapshot time and return a transaction identifier.
    {ok, TxId} = antidote:start_transaction(Timestamp, []),
    {ok, _} = antidote:read_objects([Set], TxId),
    ok = antidote:update_objects([{Set, add, "Java"}], TxId),
    {ok, _} = antidote:commit_transaction(TxId).

    View Slide

  141. {ok, TxId} = antidote:start_transaction(Timestamp, []),
    {ok, _} = antidote:read_objects([Set], TxId),
    ok = antidote:update_objects([{Set, add, "Java"}], TxId),
    {ok, _} = antidote:commit_transaction(TxId).
    Transaction API
    96
    Read objects using the interactive transaction API.

    View Slide

  142. {ok, TxId} = antidote:start_transaction(Timestamp, []),
    {ok, _} = antidote:read_objects([Set], TxId),
    ok = antidote:update_objects([{Set, add, "Java"}], TxId),
    {ok, _} = antidote:commit_transaction(TxId).
    Transaction API
    97
    Update objects using the interactive transaction API.

    View Slide

  143. {ok, TxId} = antidote:start_transaction(Timestamp, []),
    {ok, _} = antidote:read_objects([Set], TxId),
    ok = antidote:update_objects([{Set, add, "Java"}], TxId),
    {ok, _} = antidote:commit_transaction(TxId).
    Transaction API
    98
    Once finished updating, commit the transaction.

    View Slide

  144. {ok, TxId} = antidote:start_transaction(Timestamp, []),
    {ok, _} = antidote:read_objects([Set], TxId),
    ok = antidote:update_objects([{Set, add, "Java"}], TxId),
    {ok, _} = antidote:commit_transaction(TxId).
    Transaction API
    98
    Once finished updating, commit the transaction.
    Transactions read
    causally consistent snapshots
    and updates are
    applied atomically.

    View Slide

  145. Scalability
    99
    Kops / s
    100
    200
    300
    400
    500
    600
    700
    800
    1 x 5
    1 x 10
    1 x 25
    2 x 25
    3 x 25
    1 x 5
    1 x 10
    1 x 25
    2 x 25
    3 x 25
    1 x 5
    1 x 10
    1 x 25
    2 x 25
    3 x 25
    1 x 5
    1 x 10
    1 x 25
    2 x 25
    3 x 25
    99(1) 90(10) 75(25) 50(50)
    read(update) ratio
    DCs ×
    Servers
    LWW registers
    100k keys/partition
    power law distribution

    View Slide

  146. Cure vs. SOA
    100
    Kops / s
    0
    100
    200
    300
    400
    500
    600
    700
    800
    900
    1000
    1100
    Eiger
    GR
    Cure
    EC
    Eiger
    GR
    Cure
    EC
    Eiger
    GR
    Cure
    EC
    Eiger
    GR
    Cure
    EC
    99(1) 90(10) 75(25) 50(50)
    read(update) ratio
    3 DCs × 25 Servers
    LWW registers

    View Slide

  147. Cure vs. EC
    101
    Kops / s
    100
    200
    300
    400
    500
    600
    700
    800
    900
    1000
    1100
    1200
    Cure, 1KB
    EC, 1KB
    Cure, 10KB
    EC, 10KB
    Cure, 1KB
    EC, 1KB
    Cure, 10KB
    EC, 10KB
    Cure, 1KB
    EC, 1KB
    Cure, 10KB
    EC, 10KB
    Cure, 1KB
    EC, 1KB
    Cure, 10KB
    EC, 10KB
    99(1) 90(10) 75(25) 50(50)
    read(update) ratio
    3 DCs x 25 Servers
    CRDT sets

    View Slide

  148. Future Features
    • Intra-DC replication

    Antidote provides no replication within the
    datacenter and assumes only geo-
    replication at the moment
    102

    View Slide

  149. Future Features
    • Intra-DC replication

    Antidote provides no replication within the
    datacenter and assumes only geo-
    replication at the moment
    • ACID transactions

    For Antidote to provide all of JRC, it needs
    ACID transaction support: no research
    needed, only implementation
    102

    View Slide

  150. Moving Forward
    • Research prototype

    Originally a research prototype to build a database
    requiring reduced synchronization (SyncFree FP7)
    with Basho, Rovio, and Trifork
    103

    View Slide

  151. Moving Forward
    • Research prototype

    Originally a research prototype to build a database
    requiring reduced synchronization (SyncFree FP7)
    with Basho, Rovio, and Trifork
    • Research ahead

    LightKone (H2020) will investigate moving AntidoteDB
    close to the edge to provide DDN services
    103

    View Slide

  152. Moving Forward
    • Research prototype

    Originally a research prototype to build a database
    requiring reduced synchronization (SyncFree FP7)
    with Basho, Rovio, and Trifork
    • Research ahead

    LightKone (H2020) will investigate moving AntidoteDB
    close to the edge to provide DDN services
    • Industrialization

    Obtaining seed funding to start a company to
    industrialize AntidoteDB
    103

    View Slide

  153. Resources
    • https://github.com/SyncFree/antidote

    AntidoteDB
    104

    View Slide

  154. Resources
    • https://github.com/SyncFree/antidote

    AntidoteDB
    • http://syncfree.github.io/antidote/

    Documentation for AntidoteDB
    104

    View Slide

  155. Resources
    • https://github.com/SyncFree/antidote

    AntidoteDB
    • http://syncfree.github.io/antidote/

    Documentation for AntidoteDB
    • www.antidotedb.com

    Website
    104

    View Slide

  156. Resources
    • https://github.com/SyncFree/antidote

    AntidoteDB
    • http://syncfree.github.io/antidote/

    Documentation for AntidoteDB
    • www.antidotedb.com

    Website
    • docker pull antidotedb/antidote

    Try out Antidote!
    104

    View Slide

  157. Thanks!
    105
    More questions?
    Come visit us at the
    Evolution bar!

    View Slide