Put Your Thinking CAP On

Put Your Thinking CAP On Tomer Gabel, Wix VilniusRB, November
2015

Credits Originally a talk by Yoav Abrahami (Wix) Based on
“Call Me Maybe” by Kyle “Aphyr” Kingsbury

Brewer’s CAP Theorem Partition Tolerance Consistency Availability

Brewer’s CAP Theorem Partition Tolerance Consistency Availability Pick two! Pick
two!

By Example • I want this book! – I add
it to the cart – Then continue browsing • There’s only one copy in stock!

By Example • I want this book! – I add
it to the cart – Then continue browsing • There’s only one copy in stock! • … and someone else just bought it.

Consistency

Consistency: Defined • In a consistent system: All participants see
the same value at the same time • “Do you have this book in stock?”

Consistency: Defined • If our book store is an inconsistent
system: – Two customers may buy the book – But there’s only one item in inventory! • We’ve just violated a business constraint.

Availability

Availability: Defined • An available system: – Is reachable –
Responds to requests (within SLA) • Availability does not guarantee success! – The operation may fail – “This book is no longer available”

Availability: Defined • What if the system is unavailable? –
I complete the checkout – And click on “Pay” – And wait – And wait some more – And… • Did I purchase the book or not?!

Partition Tolerance

Partition Tolerance: Defined • Partition: one or more nodes are
unreachable • No practical system runs on a single node • So all systems are susceptible! A B C D E

“The Network is Reliable” • All four happen in an
IP network • To a client, delays and drops are the same • Perfect failure detection is provably impossible1! A B drop delay duplicate reorder A B A B A B time 1 “Impossibility of Distributed Consensus with One Faulty Process”, Fischer, Lynch and Paterson

Partition Tolerance: Reified • External causes: – Bad network config
– Faulty equipment – Scheduled maintenance • Even software causes partitions: – Bad network config. – GC pauses – Overloaded servers • Plenty of war stories! – Netflix – Twilio – GitHub – Wix :-) • Some hard numbers1: – 5.2 failed devices/day – 59K lost packets/day – Adding redundancy only improves by 40% 1 “Understanding Network Failures in Data Centers: Measurement, Analysis, and Implications”, Gill et al

“Proving” CAP

In Pictures • Let’s consider a simple system: – Service
A writes values – Service B reads values – Values are replicated between nodes • These are “ideal” systems – Bug-free, predictable Node 1 V0 A Node 2 V0 B

In Pictures • “Sunny day scenario”: – A writes a
new value V1 – The value is replicated to node 2 – B reads the new value Node 1 V0 A Node 2 V0 B V1 V1 V1 V1

In Pictures • What happens if the network drops? –
A writes a new value V1 – Replication fails – B still sees the old value – The system is inconsistent Node 1 V0 A Node 2 V0 B V1 V0 V1

In Pictures • Possible mitigation is synchronous replication – A
writes a new value V1 – Cannot replicate, so write is rejected – Both A and B still see V0 – The system is logically unavailable Node 1 V0 A Node 2 V0 B V1

What does it all mean?

The network is not reliable • Distributed systems must handle
partitions • Any modern system runs on >1 nodes… • … and is therefore distributed • Ergo, you have to choose: – Consistency over availability – Availability over consistency

Granularity • Real systems comprise many operations – “Add book
to cart” – “Pay for the book” • Each has different properties • It’s a spectrum, not a binary choice! Consistency Availability Shopping Cart Checkout

CAP IN THE REAL WORLD Kyle “Aphyr” Kingsbury Breaking consistency
guarantees since 2013

• A document-oriented database • Availability/scale via replica sets –
Client writes to a master node – Master replicates writes to n replicas • User-selectable consistency guarantees

MongoDB • When a partition occurs: – If the master
is in the minority, it is demoted – The majority promotes a new master… – … selected by the highest optime

MongoDB • The cluster “heals” after partition resolution: – The
“old” master rejoins the cluster – Acknowleged minority writes are reverted!

MongoDB • Let’s experiment! • Set up a 5-node MongoDB
cluster • 5 clients write to the cluster • We then partition the cluster • … and restore it to see what happens

MongoDB • With write concern unacknowleged: – Server does not
ack writes (except TCP) – The default prior to November 2012 • Results: – 6000 writes – 5700 acknowledged – 3319 survivors – 42% data loss!

MongoDB • With write concern acknowleged: – Server acknowledges writes
(after store) – The default guarantee • Results: – 6000 writes – 5900 acknowledged – 3692 survivors – 37% data loss!

MongoDB • With write concern replica acknowleged: – Client specifies
minimum replicas – Server acks after writes to replicas • Results: – 6000 writes – 5695 acknowledged – 3768 survivors – 33% data loss!

MongoDB • With write concern majority: – For an n-node
cluster, requires at least n/2 replicas – Also called “quorum” • Results: – 6000 writes – 5700 acknowledged – 5701 survivors – No data loss

So what can we do? 1. Keep calm and carry
on – As Aphyr puts it, “not all applications need consistency” – Have a reliable backup strategy – … and make sure you drill restores! 2. Use write concern majority – And take the performance hit

The prime suspects • Aphyr’s Jepsen tests include: – MySQL
– PostgreSQL – Redis – ElasticSearch – Cassandra – Kafka – RabbitMQ – … and more • If you’re considering them, go read his posts • In fact, go read his posts regardless http://aphyr.com/tags/jepsen

Questions? Complaints?

WE’RE DONE HERE! Thank you for listening [email protected] @tomerg http://il.linkedin.com/in/tomergabel
Aphyr’s “Call Me Maybe” blog posts: http://aphyr.com/tags/jepsen

Put Your Thinking CAP On

Put Your Thinking CAP On

More Decks by Tomer Gabel

Other Decks in Programming

Featured

Transcript