MySQL and the CAP theorem: relevance & misconceptions

168ccec72eee0530b818d44f3fedaacf?s=47 Shlomi Noach
February 02, 2019

MySQL and the CAP theorem: relevance & misconceptions

The CAP theorem is often used to describe the tradeoffs of available versus consistent systems. But the CAP theorem is often misunderstood and misrepresented. In this session we investigate the CAP properties of various MySQL replication setups, and show that it is impossible to claim a setup is AP or that it is CP. We will see how the properties can be non binary, and change across time, geography and workload. We will also observe where CAP's terms can be made irrelevant. This session takes the practical engineer's perspective and shows how we can gain the upper hand with reasonable tradeoffs.

168ccec72eee0530b818d44f3fedaacf?s=128

Shlomi Noach

February 02, 2019
Tweet

Transcript

  1. MySQL and the CAP theorem: relevance & misconceptions Dissecting, affirming

    and refuting CAP assumptions in real production systems. Shlomi Noach GitHub FOSDEM 2019
  2. About me @github/database-infrastructure Author of orchestrator, gh-ost, freno, ccql and

    others. Blog at http://openark.org 
 github.com/shlomi-noach
 @ShlomiNoach
  3. GitHub
 Built for developers Largest open source hosting 100M+ repositories,

    31M+ developers, 
 2.1M+ organizations Supplier of octocat T-Shirts and stickers
  4. Incentive

  5. Incentive People frequently cite CAP theorem by way of: •

    Modelling a system (as AP/CP). • Projecting that it is impossible to achieve some desired setup.
  6. CAP Conjecture Suggested by Eric Brewer, 1998-1999 Terms: • [atomic]

    Consistency • [high] Availability • Partition tolerance
  7. CAP Theorem Mathematically proved (hence “theorem”) by Seth Gilbert &

    Nancy Lynch, MIT. Using different terms than those coined by Brewer. In Math, a theorem only holds as long as its terms & conditions are met.
  8. CAP Theorem Terms, according to Gilbert & Lynch: - Consistency

    - Availability - Partition tolerance
  9. CAP Theorem: Consistency Once a write is successful on a

    node, any read on any node must reflect that write or any later write. aka Atomic Consistency, aka Linear consistency, aka Linearizability.
  10. CAP Theorem: Availability Every non-crashed node must respond to requests

    in a finite amount of time. There is no constraint on the actual amount of time. Though it is not specified, it is implied that response must be valid, non-error. Contrast with Brewer’s definiton of High availability: consumer always has access to data via at least one replica.
  11. CAP Theorem: Partition Tolerance The system is able to operate

    on network partitioning. Partition tolerance is considered as a given condition, since network partitioning can and does take place regardless of a system’s design.
  12. CAP Theorem A distributed data store [web service] cannot provide

    more than two out of the three properties. Better illustrated as: • If the network is good, you may achieve both Availibility and Consistency. • If the network is partitioned, you must choose between Availability and Consistency.
  13. Proof of CAP theorem Simplified, not by much. Given two

    nodes replicating from each other ! ! n1 n2
  14. Proof of CAP theorem We partition the network between the

    two nodes to an infinite amount of time. ! ! n1 n2
  15. Proof of CAP theorem We write data to one node.

    If the system is Available, the write compeltes in a finite amount of time. ! ! n1 n2
  16. Proof of CAP theorem We read data from the other

    node. If the system is Available that read completes in a finite mount of time. ! ! n1 n2
  17. Proof of CAP theorem During that finite time the network

    was partitioned, hence our read could not reflect changes made by the write. QED ! ! n1 n2
  18. CAP is mathematical As engineers we have different production objectives.

  19. CAP is binary The theorem has nothing to say of

    99.95% Availability with 99.99% Consistency Tradeoffs and partial availability & consistency are discussed in both Brewer’s Conjecture paper and Gilbert & Lynch’s Theorem paper, but have no part in the proof.
  20. CAP does not discuss Disaster Recovery Failover Latency (see PACELC)

    Durability, node recovery SLA, uptime, nines Scalability, capacity
  21. CAP & MySQL in production

  22. Standard MySQL replication Single master, multiple replicas, asynchronous replication !

    ! ! ! ! !
  23. Standard MySQL replication:
 CAP properties Not Consistent: asynchronous replication, replication

    lag. Not Available: • Replica nodes are unable to serve writes (return with error). • If master crashes, no server to be able to take writes. • CAP does not discuss failovers/SLA. ! ! ! ! ! !
  24. Standard MySQL replication:
 consistency in practice Read from master? Scalability/capacity

    ! ! ! ! ! !
  25. Standard MySQL replication:
 consistency in practice ProxySQL GTID casual reads:

    • Applies to same connection only. • If master within hostgroup, can failback to master • Not CAP Consistent. A proxy isn’t a known component in CAP Theorem. 
 https://proxysql.com/blog/proxysql-gtid-causal-reads ! ! ! ! ! ! "
  26. Standard MySQL replication:
 consistency in practice ! ! ! !

    ! ! freno to monitor replication lag: • Suggests when replicas are up to date • HAProxy routes queries to replicas, offloading traffic from master • Not CAP Consistent. Not part of CAP. 
 https://githubengineering.com/mitigating-replication-lag-and-reducing-read-load-with-freno/ "
  27. Semi-sync replication Single master, multiple replicas, semi-sync to all/subset of

    replicas. ! ! ! ! ! !
  28. Semi-sync replication:
 CAP properties Still not Consistent. • Though we

    get durability. Still not Available. • semi-sync write Availability depends on configuration; on identities of semi-sync replicas. ! ! ! ! ! !
  29. Hybrid cluster Is it CA? Is it CP? Depends on

    geographic location, time, workload. ! ! ! ! ! !
  30. Vitess SQL layes: vttablet, proxy to backend MySQL standard replication

    backend shards Suggested configuration: • semi-sync • only send traffic to master " " " ! ! ! ! ! !
  31. Vitess: CAP properties Given suggested configuration: • Not Consistent •

    Not Available " " " ! ! ! ! ! !
  32. But! With failover, we can have the system highly available.

    With reads-on-master or another proxy solution, we can make the system highly consistent. " " " ! ! ! ! ! !
  33. CAP != E=MC²

  34. InnoDB Cluster, single writer Single writer node, paxos consensus !

    ! ! < 8.0.14
  35. InnoDB Cluster, single writer: CAP properties Not Available: if network

    partitioned severely, quorum can be lost and no leader to accept writes. Not Consistent; changes are: • propagated to majority of nodes, not all. • made durable, not applied. ! ! ! < 8.0.14
  36. InnoDB Cluster, single writer: availability, consistency in practice MySQL Router

    as Proxy, route traffic to leader. ProxySQL for per-connection consistent reads. ! ! ! " < 8.0.14
  37. InnoDB Cluster, multi writer Multi writer nodes, paxos consensus !

    ! ! < 8.0.14
  38. InnoDB Cluster, multi writer: CAP properties Not Available, as before.

    • Debatable: are write conflicts compatible with Availability? Not consistent, as before. ! ! ! < 8.0.14
  39. Galera: CAP properties wsrep_sync_wait=0 • Not Available, not Consistent wsrep_sync_wait

    != 0 (e.g. =3) • Not Available • Consistent up to timeout on isolated nodes: i.e. not Consistent. ! ! !
  40. InnoDB Cluster, >= 8.0.14 group_replication_consistency= BEFORE_AND_AFTER Supports consistent reads, up

    to timeout on isolated node. Not Available, not Consistent. STONITH for isolated nodes. ! ! !
  41. Is there any AP setup?

  42. AP setups Disjoint masters ! ! ! ! ! !

    ! ! ! ! ! Writable master-master, async replication Writable circular [n > 2] master topology, async replication
  43. Takeaways Math and Engineering are not required to agree. Practical

    engneering makes tradeoffs. CAP Theorem does not represent complex production systems, nor does it meet production requirements. CAP Theorem may not even apply to your system. It might not be the model we should be looking at. Even as CAP says you have to choose A over C in case of P, it does not mean you cannot achieve high availability and high consistency.
  44. PACELC An extention to CAP: - if Partitioned, trade Availability

    for Consistency. - Else, trade Latency for Consistency More realistic, still does not meet production needs.
  45. Harvest, Yield, and Scalable Tolerant Systems, Armando Fox, Eric A.

    Brewer
 https://pdfs.semanticscholar.org/5015/8bc1a8a67295ab7bce0550886a9859000dc2.pdf Brewer’s Conjecture and the Feasibility of Consistent, Available, Partition-Tolerant Web Services, Seth Gilbert, Nancy Lynch
 https://www.glassbeam.com/sites/all/themes/glassbeam/images/blog/10.1.1.67.6951.pdf A Critique of the CAP Theorem, Martin Kleppmann
 https://arxiv.org/pdf/1509.05393.pdf CAP Twelve Years Later: How the "Rules" Have Changed, Eric Brewer
 https://www.infoq.com/articles/cap-twelve-years-later-how-the-rules-have-changed Please stop calling databases CP or AP, Martin Kleppmann
 https://martin.kleppmann.com/2015/05/11/please-stop-calling-databases-cp-or-ap.html Further reading
  46. NewSQL database systems are failing to guarantee consistency, and I

    blame Spanner, Daniel Abadi
 http://dbmsmusings.blogspot.com/2018/09/newsql-database-systems-are-failing-to.html Problems with CAP, and Yahoo’s little known NoSQL system, Daniel Abadi
 http://dbmsmusings.blogspot.com/2010/04/problems-with-cap-and-yahoos-little.html PACELC theorem, Wikipedia
 https://en.wikipedia.org/wiki/PACELC_theorem Further reading
  47. CAP theorem is mostly misunderstood, Sugu Sougoumarane
 http://ssougou.blogspot.com/2017/09/cap-theorem-is-mostly-misunderstood.html "A Critique

    of the CAP Theorem”, Julia Evans
 https://jvns.ca/blog/2016/11/19/a-critique-of-the-cap-theorem/ CAP Theorem, FoundationDB
 https://apple.github.io/foundationdb/cap-theorem.html Further reading
  48. Questions? github.com/shlomi-noach @ShlomiNoach Thank you!