Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Put Your Thinking CAP On

Tomer Gabel
November 17, 2015

Put Your Thinking CAP On

A talk given at the Vilnius Ruby meetup group in Vilnius, Lithuania; originally developed by Yoav Abrahami, and based on the works of Kyle "Aphyr" Kingsbury:

Consistency, availability and partition tolerance: these seemingly innocuous concepts have been giving engineers and researchers of distributed systems headaches for over 15 years. But despite how important they are to the design and architecture of modern software, they are still poorly understood by many engineers.

This session covers the definition and practical ramifications of the CAP theorem; you may think that this has nothing to do with you because you "don't work on distributed systems", or possibly that it doesn't matter because you "run over a local network." Yet even traditional enterprise CRUD applications must obey the laws of physics, which are exactly what the CAP theorem describes. Know the rules of the game and they'll serve you well, or ignore them at your own peril...

Tomer Gabel

November 17, 2015
Tweet

More Decks by Tomer Gabel

Other Decks in Programming

Transcript

  1. Credits Originally a talk by Yoav Abrahami (Wix) Based on

    “Call Me Maybe” by Kyle “Aphyr” Kingsbury
  2. By Example • I want this book! – I add

    it to the cart – Then continue browsing • There’s only one copy in stock!
  3. By Example • I want this book! – I add

    it to the cart – Then continue browsing • There’s only one copy in stock! • … and someone else just bought it.
  4. Consistency: Defined • In a consistent system: All participants see

    the same value at the same time • “Do you have this book in stock?”
  5. Consistency: Defined • If our book store is an inconsistent

    system: – Two customers may buy the book – But there’s only one item in inventory! • We’ve just violated a business constraint.
  6. Availability: Defined • An available system: – Is reachable –

    Responds to requests (within SLA) • Availability does not guarantee success! – The operation may fail – “This book is no longer available”
  7. Availability: Defined • What if the system is unavailable? –

    I complete the checkout – And click on “Pay” – And wait – And wait some more – And… • Did I purchase the book or not?!
  8. Partition Tolerance: Defined • Partition: one or more nodes are

    unreachable • No practical system runs on a single node • So all systems are susceptible! A B C D E
  9. “The Network is Reliable” • All four happen in an

    IP network • To a client, delays and drops are the same • Perfect failure detection is provably impossible1! A B drop delay duplicate reorder A B A B A B time 1 “Impossibility of Distributed Consensus with One Faulty Process”, Fischer, Lynch and Paterson
  10. Partition Tolerance: Reified • External causes: – Bad network config

    – Faulty equipment – Scheduled maintenance • Even software causes partitions: – Bad network config. – GC pauses – Overloaded servers • Plenty of war stories! – Netflix – Twilio – GitHub – Wix :-) • Some hard numbers1: – 5.2 failed devices/day – 59K lost packets/day – Adding redundancy only improves by 40% 1 “Understanding Network Failures in Data Centers: Measurement, Analysis, and Implications”, Gill et al
  11. In Pictures • Let’s consider a simple system: – Service

    A writes values – Service B reads values – Values are replicated between nodes • These are “ideal” systems – Bug-free, predictable Node 1 V0 A Node 2 V0 B
  12. In Pictures • “Sunny day scenario”: – A writes a

    new value V1 – The value is replicated to node 2 – B reads the new value Node 1 V0 A Node 2 V0 B V1 V1 V1 V1
  13. In Pictures • What happens if the network drops? –

    A writes a new value V1 – Replication fails – B still sees the old value – The system is inconsistent Node 1 V0 A Node 2 V0 B V1 V0 V1
  14. In Pictures • Possible mitigation is synchronous replication – A

    writes a new value V1 – Cannot replicate, so write is rejected – Both A and B still see V0 – The system is logically unavailable Node 1 V0 A Node 2 V0 B V1
  15. The network is not reliable • Distributed systems must handle

    partitions • Any modern system runs on >1 nodes… • … and is therefore distributed • Ergo, you have to choose: – Consistency over availability – Availability over consistency
  16. Granularity • Real systems comprise many operations – “Add book

    to cart” – “Pay for the book” • Each has different properties • It’s a spectrum, not a binary choice! Consistency Availability Shopping Cart Checkout
  17. • A document-oriented database • Availability/scale via replica sets –

    Client writes to a master node – Master replicates writes to n replicas • User-selectable consistency guarantees
  18. MongoDB • When a partition occurs: – If the master

    is in the minority, it is demoted – The majority promotes a new master… – … selected by the highest optime
  19. MongoDB • The cluster “heals” after partition resolution: – The

    “old” master rejoins the cluster – Acknowleged minority writes are reverted!
  20. MongoDB • Let’s experiment! • Set up a 5-node MongoDB

    cluster • 5 clients write to the cluster • We then partition the cluster • … and restore it to see what happens
  21. MongoDB • With write concern unacknowleged: – Server does not

    ack writes (except TCP) – The default prior to November 2012 • Results: – 6000 writes – 5700 acknowledged – 3319 survivors – 42% data loss!
  22. MongoDB • With write concern acknowleged: – Server acknowledges writes

    (after store) – The default guarantee • Results: – 6000 writes – 5900 acknowledged – 3692 survivors – 37% data loss!
  23. MongoDB • With write concern replica acknowleged: – Client specifies

    minimum replicas – Server acks after writes to replicas • Results: – 6000 writes – 5695 acknowledged – 3768 survivors – 33% data loss!
  24. MongoDB • With write concern majority: – For an n-node

    cluster, requires at least n/2 replicas – Also called “quorum” • Results: – 6000 writes – 5700 acknowledged – 5701 survivors – No data loss
  25. So what can we do? 1. Keep calm and carry

    on – As Aphyr puts it, “not all applications need consistency” – Have a reliable backup strategy – … and make sure you drill restores! 2. Use write concern majority – And take the performance hit
  26. The prime suspects • Aphyr’s Jepsen tests include: – MySQL

    – PostgreSQL – Redis – ElasticSearch – Cassandra – Kafka – RabbitMQ – … and more • If you’re considering them, go read his posts • In fact, go read his posts regardless http://aphyr.com/tags/jepsen
  27. WE’RE DONE HERE! Thank you for listening [email protected] @tomerg http://il.linkedin.com/in/tomergabel

    Aphyr’s “Call Me Maybe” blog posts: http://aphyr.com/tags/jepsen