Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Databases, Part 1: The Theory of Scale

Databases, Part 1: The Theory of Scale

This talk will compare and contrast the different approaches to modern database design. Starting with history, we will look at how databases have evolved, and what use cases have driven their development. Ending with theory, we will look at the tradeoffs between different database designs.

Myles Megyesi

January 04, 2013
Tweet

More Decks by Myles Megyesi

Other Decks in Technology

Transcript

  1. The CAP Theorem It is impossible for a distributed computer

    system to simultaneously provide all three of the following guarantees: • Consistency • Availability • Partition tolerance
  2. Consistency All nodes see the same data at the same

    time Scenario: Two customers on Amazon are trying to buy a book at the same time, but there is only one book left. How do we guarantee one success and one failure? • DB Transaction: ◦ decrement the quantity of the book ◦ assign the book to the customer
  3. Availability Simply put: the server responds • Lots of requests

    • Lots of data Solutions: • Vertical scaling ◦ Mecha server (68 GB cache) • Horizontal scaling (increase number of servers) ◦ read slaves ◦ shards
  4. Relational databases (C camp) • Consistent (via atomic operations) •

    Available (via vertical scaling or some horizontal scale) • Not partition tolerant
  5. Dealing with CAP: allow partitions Guarantee Consistency/Availability • Master/Slave ◦

    Wait until master comes back online • Sharding ◦ Wait until shard(s) comes back online • On every write, wait until data is consistent before serving more requests ◦ Extreme latency cost
  6. Dealing with CAP: allow inconsistency Guarantee Availability/Partition Tolerance • At

    some point, all writes will be propagated throughout the system • Keep accepting writes when a partition is down, figure out the conflicts when the system is whole again
  7. Eventual Consistency the storage system guarantees that if no new

    updates are made to the object, eventually all accesses will return the last updated value Can write to any node, not just master
  8. Availability • All nodes are equal ◦ No single point

    of failure • Horizontal Scaling ◦ Add as many nodes as needed ◦ In different regions
  9. Partition Tolerance Node Node Node Node Node Node Node Node

    Portland Virginia Node Node Node Node London
  10. Consistency Revisited Scenario: Two customers on Amazon are trying to

    buy a book at the same time, but there is only one book left. How do we guarantee one success and one failure?