CAP Theorem:  not what we thought it was, not what we are looking for

CAP Theorem:   not what we thought it was, not
what we are looking for Shlomi Noach GitHub DevOpsDays TLV 2019

About me @github/database-infrastructure Author of orchestrator, gh-ost, freno, ccql and
others. Blog at http://openark.org   github.com/shlomi-noach  @ShlomiNoach

GitHub  Built for developers Largest open source hosting 40M+ developers 
2.9M+ organizations  100M+ repositories, Supplier of octocat T-Shirts and stickers

Incentive Important part of our work is to keep the
service available, so that users/customers have access to their data and can operate on their data.

CAP Conjecture Suggested by Eric Brewer, 1998-1999 Terms: • [strong]
Consistency • [high] Availability • Partition tolerance Strong Consistency, High Availability, Partition-resilience: Pick at most 2. C A P

CAP Theorem Brewer’s Conjecture and the Feasibility of Consistent, Available,
Partition-Tolerant Web Services Seth Gilbert, Nancy Lynch

CAP Theorem A mathematical proof Uses different definition of Availability
than CAP Conjecture’s definition Neither of the Availability definitions contain one another (or, are stronger than the other)

CAP Theorem In Math, a theorem only holds as long
as its terms & conditions are met. CAP Theorem does not prove the CAP Conjecture.

CAP Theorem Terms, according to Gilbert & Lynch: - Consistency
- Availability - Partition tolerance

CAP Theorem: Consistency Once a write is successful on a
node, any read on any node must reflect that write or any later write. aka Atomic Consistency, aka Strong consistency, aka Linear consistency, aka Linearizability. C

CAP Theorem: Availability …every request received by a non-failing node
in the system must result in a response. There is no constraint on the actual amount of time. Though it is not specified, it is implied that response must be valid, non-error. Contrast with Brewer’s definition of High availability: data is considered highly available if a given consumer of the data can always reach some replica. A

CAP Theorem: Partition Tolerance The system is able to operate
on network partitioning. Partition tolerance is considered as a given condition, since network partitioning can and does take place regardless of a system’s design. P

CAP Theorem A distributed data store [web service] cannot provide
more than two out of the three properties. Better illustrated as: • If the network is good, you may achieve both Availability and Consistency. • If the network is partitioned, you must choose between Availability and Consistency.

CAP Theorem The discussion is mostly about CP vs AP
systems C A P

Proof of CAP theorem Simplified, not by much. Given two
nodes replicating from each other ! ! n1 n2

Proof of CAP theorem We partition the network between the
two nodes to an infinite amount of time. ! ! n1 n2

Proof of CAP theorem We write data to one node.
If the system is Available, the write completes in a finite amount of time. ! ! n1 n2

Proof of CAP theorem We read data from the other
node. If the system is Available that read completes in a finite mount of time. ! ! n1 n2

Proof of CAP theorem During that finite time the network
was partitioned, hence our read could not reflect changes made by the write. QED ! ! n1 n2

The proof is mathematical Following solid Mathematical principles.

The proof is mathematical Reading data after writing is only
possible when the write time is finite. Proves impossibility by counter example: we’ve shown a (single) scenario where we can’t have both Availability and Consistency.

CAP Theorem CAP does not say “you cannot have both
A and C at the same time” * It says: “you cannot design a system where you will have both A and C together at all times” * * Assuming P

All purple dragons can ﬂy

Vacuous truth A statement that asserts that all members of
the empty set have a certain property. [wikipedia]

Vacuous truth All purple dragons can fly.

Vacuous truth All purple dragons can fly faster than the
speed of light.

Vacuous truth All purple dragons can answer questions. All purple
dragons cannot answer questions.

CAP Availability “…every request received by a non-failing node in
the system must result in a response.”

CAP Availability It is vacuous truth, that if all the
nodes in my system are crashed, then my system is Available.

Proposal to making my service available ! ! ! !
! ! Shut down all database servers?

Proposal to making my service available ! ! ! !
! ! Shut down all database servers? The CAP Theorem Availability definition goes against practical definition of availability.

Inﬁnite network partitioning

Inﬁnite network partitioning Illustrated

Meet Anna and Ben, SRE’s at example.com

One day, Anna's attention is drawn to what seems to
be an unfolding crisis. oh, dear!

Ben, are you seeing what I’m seeing?

The two investigate It seems like our Virginia router is
malfunctioning

Confirmed with the datacenter; they saw some smoke, and then
it burned. It's literally melted.

So our Virginia DC is now network isolated.

You know what that means?

oh, wow! oh, wow!

The router is never coming back. This is an infinite
network partitioning!

We will never have consistency ever again!

There's just no point   in anything. CAP   Theorem
proves this.

For eternity!

We may as well enjoy our time! We can live
A life of adventure!

We can go to  Paris!

We can go to  Rome!

We can go to  DevOpsDays TLV!

Woohoo!

Enters Carmen, CTO for example.com

Hey there! What’s  going on? Seems like  we have an
outage?

Our Virginia router  burst up in flames.  It is gone.

For eternity. We now  have an infinite   network partitioning,
  and will never again   have consistency.

There's just no point   in anything. CAP   Theorem
proves this.

Can we expense   travel to   DevOpsDays TLV?

Here's my corporate credit card.   I'd like you to
go to the store downtown, buy a new router, take it to our Virginia DataCenter and replace the old router.

Yeah,   that also works!

Inﬁnite network partitioning Is not a practical situation. Is not
something we plan for. Is not something that concerns us as we engineer our systems.

Can we trade “inﬁnite” with   “long enough”? This would
break the proof. If a network partition is capped (pun intended) by t seconds we can stall any read by t+α, conveniently delaying beyond the network partitioning, allowing for consistency. Remember: there is no cap (pun again) to query response time. ! ! n1 n2

Can we rewrite CAP Theorem in terms of ﬁnite timeouts?
Yes: • in a system where a network partition can last more than t • and where we require queries to respond within time t/2 • We can illustrate a proof similar to CAP Theorem’s where given such partitioning we cannot achieve both availability and consistency. • Let’s name it “Time Limited CAP” ! ! n1 n2

Questioning “time limited CAP” Is t a given constraint? Why
would we necessarily require queries to complete in t/2? Is there system logic / customer logic to the above, or have we tailored this to fit a theorem we had?

Questioning “time limited CAP” Opinion: this chase is a fallacy.
We seem to be trying to prove a theorem while we disagree with its terms, namely Availability.

CAP Conjecture Availability: data is considered highly available if a
given consumer of the data can always reach some replica. Can we work with that?

n == 2?

n > 2 Availability: consumer always has access to data
via at least one replica. Some node will have our data (we will likely need to detect which one) ! ! ! ! !

CAP is short of CAPacity

CAP is short of CAPacity™

CAP is short of CAPacity™®

CAP is short of CAPacity™® PATENTED

CAP Conjecture, capacity What if we agreed to quorum? q-Availability:
assuming number of replicas is n, consumer always has access to data via at least ⌊n/2 + 1⌋ replicas. e.g. - in a 5-replicas cluster, quorum would be 3  - in a 7-replicas cluster, quorum would be 4 ! ! ! ! !

Consensus algorithms:   Paxos, Raft

Consensus algorithms:   Paxos, Raft Depending on implementation, can guarantee:
• A write is readily Available to read on quorum nodes • A write is made durable on quorum nodes ! ! ! ! !

! ! ! ! ! Do consensus algorithms contradict CAP?

! ! ! ! ! CAP network partitioning, n >
2

! ! ! ! ! Reasonable conﬁdence

CAP: recap Assuming P, you cannot design a system where
you will have both A and C together at all times. Proven by giving a [single] counter example.

Absolutism

High Availability Commonly measured in 9’s. We agree that absolute
availability is unrealistic.

CAP 9’s? “In practice, many applications are best described in
terms of reduced consistency or availability… … there is a Weak CAP Principle which we have yet to characterize precisely…"

CAP is a subset Availability and Consistency are but two
(important) properties of modern distributed systems. Geo distributions, transactions, latency, durability, consistent distributed snapshots, DR, failovers, backup options, mean time to recover, observability, operability…(some properties correlate) Which are important to your service?

The tradeoﬀ exists We understand this as engineers. Multiple other
tradeoffs exist. I believe CAP is not the model we should be looking for.

Harvest, Yield, and Scalable Tolerant Systems, Armando Fox, Eric A.
Brewer  https://pdfs.semanticscholar.org/5015/8bc1a8a67295ab7bce0550886a9859000dc2.pdf Brewer’s Conjecture and the Feasibility of Consistent, Available, Partition-Tolerant Web Services, Seth Gilbert, Nancy Lynch  https://www.glassbeam.com/sites/all/themes/glassbeam/images/blog/10.1.1.67.6951.pdf A Critique of the CAP Theorem, Martin Kleppmann  https://arxiv.org/pdf/1509.05393.pdf CAP Twelve Years Later: How the "Rules" Have Changed, Eric Brewer  https://www.infoq.com/articles/cap-twelve-years-later-how-the-rules-have-changed Please stop calling databases CP or AP, Martin Kleppmann  https://martin.kleppmann.com/2015/05/11/please-stop-calling-databases-cp-or-ap.html Further reading

NewSQL database systems are failing to guarantee consistency, and I
blame Spanner, Daniel Abadi  http://dbmsmusings.blogspot.com/2018/09/newsql-database-systems-are-failing-to.html Problems with CAP, and Yahoo’s little known NoSQL system, Daniel Abadi  http://dbmsmusings.blogspot.com/2010/04/problems-with-cap-and-yahoos-little.html PACELC theorem, Wikipedia  https://en.wikipedia.org/wiki/PACELC_theorem "A Critique of the CAP Theorem”, Julia Evans  https://jvns.ca/blog/2016/11/19/a-critique-of-the-cap-theorem/ CAP Theorem, FoundationDB  https://apple.github.io/foundationdb/cap-theorem.html Further reading

Questions? github.com/shlomi-noach @ShlomiNoach Thank you!

CAP Theorem: not what we thought it was, not w...

CAP Theorem: not what we thought it was, not what we are looking for

More Decks by Shlomi Noach

Other Decks in Technology

Featured

Transcript

CAP Theorem:  not what we thought it was, not w...

CAP Theorem:  not what we thought it was, not what we are looking for