CAP Theorem: 
not what we thought it was, not what we are looking for

168ccec72eee0530b818d44f3fedaacf?s=47 Shlomi Noach
December 18, 2019

CAP Theorem: 
not what we thought it was, not what we are looking for

The CAP theorem is often used to classify distributed systems, and the "two out of three" rule is often quoted. But the CAP theorem is widely misunderstood. What are the exact terms of the CAP Theorem? How does it differ from Brewer's original CAP Conjecture? Where does CAP fall short of meeting practical engineering expectations?

168ccec72eee0530b818d44f3fedaacf?s=128

Shlomi Noach

December 18, 2019
Tweet

Transcript

  1. 1.

    CAP Theorem: 
 not what we thought it was, not

    what we are looking for Shlomi Noach GitHub DevOpsDays TLV 2019
  2. 2.

    About me @github/database-infrastructure Author of orchestrator, gh-ost, freno, ccql and

    others. Blog at http://openark.org 
 github.com/shlomi-noach
 @ShlomiNoach
  3. 3.

    GitHub
 Built for developers Largest open source hosting 40M+ developers


    2.9M+ organizations
 100M+ repositories, Supplier of octocat T-Shirts and stickers
  4. 4.

    Incentive Important part of our work is to keep the

    service available, so that users/customers have access to their data and can operate on their data.
  5. 5.

    CAP Conjecture Suggested by Eric Brewer, 1998-1999 Terms: • [strong]

    Consistency • [high] Availability • Partition tolerance Strong Consistency, High Availability, Partition-resilience: Pick at most 2. C A P
  6. 6.

    CAP Theorem Brewer’s Conjecture and the Feasibility of Consistent, Available,

    Partition-Tolerant Web Services Seth Gilbert, Nancy Lynch
  7. 7.

    CAP Theorem A mathematical proof Uses different definition of Availability

    than CAP Conjecture’s definition Neither of the Availability definitions contain one another (or, are stronger than the other)
  8. 8.

    CAP Theorem In Math, a theorem only holds as long

    as its terms & conditions are met. CAP Theorem does not prove the CAP Conjecture.
  9. 9.
  10. 10.

    CAP Theorem: Consistency Once a write is successful on a

    node, any read on any node must reflect that write or any later write. aka Atomic Consistency, aka Strong consistency, aka Linear consistency, aka Linearizability. C
  11. 11.

    CAP Theorem: Availability …every request received by a non-failing node

    in the system must result in a response. There is no constraint on the actual amount of time. Though it is not specified, it is implied that response must be valid, non-error. Contrast with Brewer’s definition of High availability: data is considered highly available if a given consumer of the data can always reach some replica. A
  12. 12.

    CAP Theorem: Partition Tolerance The system is able to operate

    on network partitioning. Partition tolerance is considered as a given condition, since network partitioning can and does take place regardless of a system’s design. P
  13. 13.

    CAP Theorem A distributed data store [web service] cannot provide

    more than two out of the three properties. Better illustrated as: • If the network is good, you may achieve both Availability and Consistency. • If the network is partitioned, you must choose between Availability and Consistency.
  14. 15.

    Proof of CAP theorem Simplified, not by much. Given two

    nodes replicating from each other ! ! n1 n2
  15. 16.

    Proof of CAP theorem We partition the network between the

    two nodes to an infinite amount of time. ! ! n1 n2
  16. 17.

    Proof of CAP theorem We write data to one node.

    If the system is Available, the write completes in a finite amount of time. ! ! n1 n2
  17. 18.

    Proof of CAP theorem We read data from the other

    node. If the system is Available that read completes in a finite mount of time. ! ! n1 n2
  18. 19.

    Proof of CAP theorem During that finite time the network

    was partitioned, hence our read could not reflect changes made by the write. QED ! ! n1 n2
  19. 21.

    The proof is mathematical Reading data after writing is only

    possible when the write time is finite. Proves impossibility by counter example: we’ve shown a (single) scenario where we can’t have both Availability and Consistency.
  20. 22.

    CAP Theorem CAP does not say “you cannot have both

    A and C at the same time” * It says: “you cannot design a system where you will have both A and C together at all times” * * Assuming P
  21. 24.

    Vacuous truth A statement that asserts that all members of

    the empty set have a certain property. [wikipedia]
  22. 28.
  23. 29.

    CAP Availability It is vacuous truth, that if all the

    nodes in my system are crashed, then my system is Available.
  24. 30.

    Proposal to making my service available ! ! ! !

    ! ! Shut down all database servers?
  25. 31.

    Proposal to making my service available ! ! ! !

    ! ! Shut down all database servers? The CAP Theorem Availability definition goes against practical definition of availability.
  26. 35.
  27. 36.

    One day, Anna's attention is drawn to what seems to

    be an unfolding crisis. oh, dear!
  28. 39.
  29. 51.
  30. 55.

    For eternity. We now
 have an infinite 
 network partitioning,

    
 and will never again 
 have consistency.
  31. 58.

  32. 59.

  33. 60.

  34. 61.
  35. 62.
  36. 63.

    Here's my corporate credit card. 
 I'd like you to

    go to the store downtown, buy a new router, take it to our Virginia DataCenter and replace the old router.
  37. 64.
  38. 66.

    Infinite network partitioning Is not a practical situation. Is not

    something we plan for. Is not something that concerns us as we engineer our systems.
  39. 67.

    Can we trade “infinite” with 
 “long enough”? This would

    break the proof. If a network partition is capped (pun intended) by t seconds we can stall any read by t+α, conveniently delaying beyond the network partitioning, allowing for consistency. Remember: there is no cap (pun again) to query response time. ! ! n1 n2
  40. 68.

    Can we rewrite CAP Theorem in terms of finite timeouts?

    Yes: • in a system where a network partition can last more than t • and where we require queries to respond within time t/2 • We can illustrate a proof similar to CAP Theorem’s where given such partitioning we cannot achieve both availability and consistency. • Let’s name it “Time Limited CAP” ! ! n1 n2
  41. 69.

    Questioning “time limited CAP” Is t a given constraint? Why

    would we necessarily require queries to complete in t/2? Is there system logic / customer logic to the above, or have we tailored this to fit a theorem we had?
  42. 70.

    Questioning “time limited CAP” Opinion: this chase is a fallacy.

    We seem to be trying to prove a theorem while we disagree with its terms, namely Availability.
  43. 71.

    CAP Conjecture Availability: data is considered highly available if a

    given consumer of the data can always reach some replica. Can we work with that?
  44. 72.
  45. 73.

    n > 2 Availability: consumer always has access to data

    via at least one replica. Some node will have our data (we will likely need to detect which one) ! ! ! ! !
  46. 78.

    CAP Conjecture, capacity What if we agreed to quorum? q-Availability:

    assuming number of replicas is n, consumer always has access to data via at least ⌊n/2 + 1⌋ replicas. e.g. - in a 5-replicas cluster, quorum would be 3
 - in a 7-replicas cluster, quorum would be 4 ! ! ! ! !
  47. 80.

    Consensus algorithms: 
 Paxos, Raft Depending on implementation, can guarantee:

    • A write is readily Available to read on quorum nodes • A write is made durable on quorum nodes ! ! ! ! !
  48. 86.

    CAP: recap Assuming P, you cannot design a system where

    you will have both A and C together at all times. Proven by giving a [single] counter example.
  49. 89.

    CAP 9’s? “In practice, many applications are best described in

    terms of reduced consistency or availability… … there is a Weak CAP Principle which we have yet to characterize precisely…"
  50. 90.

    CAP is a subset Availability and Consistency are but two

    (important) properties of modern distributed systems. Geo distributions, transactions, latency, durability, consistent distributed snapshots, DR, failovers, backup options, mean time to recover, observability, operability…(some properties correlate) Which are important to your service?
  51. 91.

    The tradeoff exists We understand this as engineers. Multiple other

    tradeoffs exist. I believe CAP is not the model we should be looking for.
  52. 92.

    Harvest, Yield, and Scalable Tolerant Systems, Armando Fox, Eric A.

    Brewer
 https://pdfs.semanticscholar.org/5015/8bc1a8a67295ab7bce0550886a9859000dc2.pdf Brewer’s Conjecture and the Feasibility of Consistent, Available, Partition-Tolerant Web Services, Seth Gilbert, Nancy Lynch
 https://www.glassbeam.com/sites/all/themes/glassbeam/images/blog/10.1.1.67.6951.pdf A Critique of the CAP Theorem, Martin Kleppmann
 https://arxiv.org/pdf/1509.05393.pdf CAP Twelve Years Later: How the "Rules" Have Changed, Eric Brewer
 https://www.infoq.com/articles/cap-twelve-years-later-how-the-rules-have-changed Please stop calling databases CP or AP, Martin Kleppmann
 https://martin.kleppmann.com/2015/05/11/please-stop-calling-databases-cp-or-ap.html Further reading
  53. 93.

    NewSQL database systems are failing to guarantee consistency, and I

    blame Spanner, Daniel Abadi
 http://dbmsmusings.blogspot.com/2018/09/newsql-database-systems-are-failing-to.html Problems with CAP, and Yahoo’s little known NoSQL system, Daniel Abadi
 http://dbmsmusings.blogspot.com/2010/04/problems-with-cap-and-yahoos-little.html PACELC theorem, Wikipedia
 https://en.wikipedia.org/wiki/PACELC_theorem "A Critique of the CAP Theorem”, Julia Evans
 https://jvns.ca/blog/2016/11/19/a-critique-of-the-cap-theorem/ CAP Theorem, FoundationDB
 https://apple.github.io/foundationdb/cap-theorem.html Further reading