Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Everything is Okay: Database Edition

Everything is Okay: Database Edition

Avatar for Dylan Vassallo

Dylan Vassallo

September 03, 2014
Tweet

More Decks by Dylan Vassallo

Other Decks in Programming

Transcript

  1. "A distributed system is one in which the failure of

    a computer you didn't even know existed can render your own computer unusable." — Leslie Lamport, 1987
  2. The network is reliable Latency is zero Bandwidth is infinite

    The network is secure Topology doesn't change There is one administrator Transport cost is zero The network is homogeneous Everything is awesome
  3. The network is reliable Latency is zero Bandwidth is infinite

    The network is secure Topology doesn't change There is one administrator Transport cost is zero The network is homogeneous Everything is awesome
  4. Consistency: all nodes see the same data Availability: all requests

    get a response Partition tolerance: nodes survive arbitrary message loss
  5. The network is reliable Latency is zero Bandwidth is infinite

    The network is secure Topology doesn't change There is one administrator Transport cost is zero The network is homogeneous Everything is awesome
  6. The network is reliable Latency is zero Bandwidth is infinite

    The network is secure Topology doesn't change There is one administrator Transport cost is zero The network is homogeneous Everything is awesome
  7. A typical first year for a new Google cluster: 40-80

    machines with 50% packet loss dozens of DNS blips 12 router restarts 8 network maintenance events 3 router failures 1 network rewiring ...and that's just LAN incidents
  8. "Despite your best efforts, your system will experience enough faults

    that it will have to make a choice between reducing yield (i.e., stop answering requests) and reducing harvest (i.e., giving answers based on incomplete data). This decision should be based on business requirements."
  9. CP: When faced with a network partition, stop answering requests

    AP: When faced with a network partition, answer requests using incomplete data
  10. "There are significant gaps between the description of the Paxos

    algorithm and the needs of a real-world system. In order to build a real- world system, an expert needs to use numerous ideas scattered in the literature and make several relatively small protocol extensions. The cumulative effort will be substantial and the final system will be based on an unproven protocol."
  11. "The fault-tolerance computing community has not paid enough attention to

    testing, a key ingredient for building fault-tolerant systems."
  12. "While many systems use Paxos solely for locking, master election,

    or replication of metadata and configurations, we believe that Megastore is the largest system deployed that uses Paxos to replicate primary user data across datacenters on every write."
  13. "The original Paxos algorithm is ill-suited for high-latency network links

    because it demands multiple rounds of communication."
  14. "Megastore is perhaps the first large-scale storage system to implement

    Paxos-based replication across datacenters while satisfying the scalability and performance requirements of scalable web applications in the cloud."
  15. MongoDB (CP and AP???) "MongoDB is neither AP nor CP.

    The defaults can cause significant loss of acknowledged writes. The strongest consistency offered has bugs which cause false acknowledgements..."
  16. Kafka (CA???) "Kafka’s replication claimed to be CA, but in

    the presence of a partition, threw away an arbitrarily large volume of committed writes."
  17. RabbitMQ (CP or AP) "...in the presence of partitions, RabbitMQ

    clustering will not only deliver duplicate messages, but will also drop huge volumes of acknowledged messages on the floor."
  18. Elasticsearch (CP) "Elasticsearch appears to lose writes...during asymmetric partitions, symmetric

    partitions, overlapping partitions, disjoint partitions, and even partitions which only isolate a single node once. Its convergence times are slow and the cluster can repeatably deadlock, forcing an administrator to intervene before recovery...If you are an Elasticsearch user (as I am): good luck."
  19. Join a monastery Pay Google Hope your site never gets

    popular Choose CP or AP, read aphyr.com, and hope for the best
  20. Join a monastery Pay Google Hope your site never gets

    popular Choose CP or AP, read aphyr.com, and hope for the best