Fear No More: Embrace Eventual Consistency

Fear No More: Embrace Eventual Consistency

A number of years ago, Eric Brewer, father of the CAP theorem, coined an architectural style of loosely-coupled distributed systems "BASE", meaning, "Basically Available, Soft-state, and Eventually-consistent". Clearly he meant this as a counterpoint to the "ACID" properties of traditional database systems. BASE systems choose to remain available to operations, sacrificing strict synchronization. While developers are very comfortable with the convenience of ACID, eventual consistency can be frightening, unfamiliar territory.

This talk will dive into the design of eventually consistent systems, touching on theory and practice. We'll see why EC doesn't mean "inconsistent" but is actually a different kind of consistency, with different tradeoffs. These new skills should help developers know when to embrace eventually-consistent solutions instead of fearing them.

This talk was given at QCon San Francisco by Sean Cribbs.

E0f4dbccf64a1d37a92e224b070ee84f?s=128

Basho Technologies

November 08, 2012
Tweet

Transcript

  1. Fear No More: Embrace Eventual Consistency Sean Cribbs @seancribbs

  2. Distributed Systems Experts

  3. None
  4. FEAR Photo from Wikimedia

  5. ACID vs. BASE

  6. • Strong consistency • Isolation • Focus on “commit” •

    Nested transactions • Conservative (pessimistic) • Weak consistency • Availability first • Best effort • Approximate answer • Aggressive (optimistic) ACID vs. BASE Fox, Gribble, Chawathe, Brewer, Gauthier - Cluster-Based Scalable Network Services (SOSP97)
  7. ACID vs. BASE “Inconsistency is the worst thing that could

    happen.” “Being unavailable is the worst thing that could happen.”
  8. Why BASE / EC?

  9. Why BASE / EC? • “Omniscience” is expensive and slow.

  10. Why BASE / EC? • “Omniscience” is expensive and slow.

    • Availability is often correlated to revenue.
  11. Why BASE / EC? • “Omniscience” is expensive and slow.

    • Availability is often correlated to revenue. • Failures happen all the time.
  12. Why BASE / EC? • “Omniscience” is expensive and slow.

    • Availability is often correlated to revenue. • Failures happen all the time. “Any sufficiently large system is in a constant state of partial failure.” Justin Sheehy, Basho CTO
  13. Why BASE / EC? • “Omniscience” is expensive and slow.

    • Availability is often correlated to revenue. • Failures happen all the time. • You’re probably doing it already.
  14. Safety & Liveness Leslie Lamport 1977

  15. Safety

  16. Safety •“Bad things don’t happen” •Point-in-time identifiable

  17. Safety •“Bad things don’t happen” •Point-in-time identifiable •mutual exclusion •partial

    correctness •first-come, first-serve
  18. Liveness

  19. Liveness •“Good things eventually happen” •Always in future

  20. Liveness •“Good things eventually happen” •Always in future •starvation freedom

    •termination •guaranteed service
  21. • Strong consistency • Isolation • Focus on “commit” •

    Nested transactions • Conservative (pessimistic) • Weak consistency • Availability first • Best effort • Approximate answer • Aggressive (optimistic) ACID vs. BASE Fox, Gribble, Chawathe, Brewer, Gauthier - Cluster-Based Scalable Network Services (SOSP97)
  22. Eventual consistency is not safe “...it’s easy to satisfy liveness

    without being useful... If all replicas return the value 42 in response to every request, the system is eventually consistent.” http://www.bailis.org/blog/safety-and-liveness-eventual-consistency-is-not-safe/ Peter Bailis
  23. Liveness of BASE • Convergence - “eventual delivery” • Responsiveness

    - “eventual service” • Resilience - “eventual recovery” • Consensus-free - “eventual progress”
  24. Safety of BASE • Durability - “accepted writes are not

    lost” • Integrity - “data is not corrupted” • Authenticity - “data is not forged”
  25. Real BASE Systems Photo from Wikimedia

  26. Domain Name Service • Federated, hierarchical database • How qconsf.com

    becomes 77.66.16.106 • Layered system with caching
  27. Domain Name Service • Federated, hierarchical database • How qconsf.com

    becomes 77.66.16.106 • Layered system with caching Diagrams from Wikimedia
  28. Domain Name Service • Federated, hierarchical database • How qconsf.com

    becomes 77.66.16.106 • Layered system with caching Diagrams from Wikimedia
  29. DNS Liveness • Convergence - caches eventually expire • Consensus-free

    - local authority over subtree updates • Responsiveness - intermediaries can cache results and reply quicker • Resilience - authority servers can be replicated/ load-balanced
  30. DNS Safety • Authenticity - forgery prevented by DNSSEC

  31. BitTorrent • Peer-to-peer cooperative large-file transfer • Dynamic membership and

    block discovery through the “tracker” node • Epidemic effect http://computer.howstuffworks.com/bittorrent2.htm
  32. BitTorrent Liveness • Convergence - all peers that remain connected

    eventually become seeds • Resilience - loss of one peer doesn’t impede progress • Responsiveness - closer, faster peers tend to be preferred
  33. BitTorrent Safety • Integrity - each block is checksummed to

    prevent corruption
  34. The Web • Sparsely-connected graph of hypertext documents identified by

    URIs • Rich caching semantics: expiration, validation, control • Fluid evolution through uniform interface • Layered system (federated)
  35. Web: Liveness • Consensus-free - local documents can be changed,

    moved, removed without coordination • Convergence - caching semantics prevent unbounded staleness, redirection • Responsiveness - many parties can proxy, cache • Resilience - failure of one server doesn’t stop the system
  36. Web: Safety • Privacy & Authenticity - HTTPS/SSL/TLS • Integrity

    - POST responses don’t pollute caches
  37. Dynamo • Key-value store: distributed, replicated, partitioned • Client requests

    can go to any node • Low-latency at high percentiles • Many clones: Riak, Cassandra, Voldemort 1 2 3
  38. Dynamo: Liveness • Convergence - read-repair, hash-tree exchanges, vector-clocks •

    Resilience - hinted-handoff, sloppy quorums • Responsiveness - replication • Consensus-free - loose coordination, concurrent updates
  39. Dynamo: Safety • Authenticity - won’t serve data you didn’t

    store • Durability - confirmed writes are not lost
  40. ACID vs. BASE

  41. Photo by Associated Press CONFLICT

  42. Photo by Associated Press SPECTRUM

  43. Embrace Eventual Consistency hugs!