Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Amazon's Dynamo

Muhammet
November 17, 2012

Amazon's Dynamo

Muhammet

November 17, 2012
Tweet

Other Decks in Education

Transcript

  1. Dynamo: Amazon's Highly Available Key-value Store Giuseppe DeCandia, Deniz Hastorun,

    Madan Jampani, Gunavardhan Kakulapati, Avinash Lakshman, Alex Pilchin, Swaminathan Sivasubramanian, Peter Vosshall and Werner Vogels Amazon.com Muhammet Orazov
  2. System Architecture • Partitioning • Replication • Data Versioning •

    Get and Put Execution • Temporary Failures • Permanent Failures • Membership and Failure Detection
  3. Partitioning • Scale incrementally • Consistent hashing (ring) • Departure/arrival

    only affects neighbours • Drawback? ◦ Non-uniform load ◦ Heterogeneity of nodes
  4. Partitioning • Scale incrementally • Consistent hashing (ring) • Departure/arrival

    only affects neighbours • Drawback? ◦ Non-uniform load ◦ Heterogeneity of nodes • Solution: virtual nodes
  5. Replication • Replicate on N nodes • Preference-list (next N

    successor nodes) • Drawback? ◦ next N nodes may map to one physical node
  6. Data Versioning • Eventual consistency • Conflict resolution • Vector

    clocks • Drawback? ◦ vector clock grows • Solution ◦ only size of N ◦ truncation schema
  7. Get and Put Execution • Maintaining consistency • Quorum-like system

    ◦ R + W > N ◦ min # nodes participating in read and write
  8. Get and Put Execution • Maintaining consistency • Quorum-like system

    ◦ R + W > N ◦ min # nodes participating in read and write • Drawback?
  9. Get and Put Execution • Maintaining consistency • Quorum-like system

    ◦ R + W > N ◦ min # nodes participating in read and write • Drawback? ◦ latency is dominated by slowest of R (or W) replicas
  10. Get and Put Execution • Maintaining consistency • Quorum-like system

    ◦ R + W > N ◦ min # nodes participating in read and write • Drawback? ◦ latency is dominated by slowest of R (or W) replicas • Solution ◦ R (or W) configured to be less than N
  11. Temporary Failures • Sloppy quorum ◦ first N healthy nodes

    • Hinted handoff ◦ checked periodically • Highest level availability ◦ set W to 1
  12. Permanent Failures • Replica synchronization (anti-entropy) • Merkle trees ◦

    fast inconsistency detection ◦ minimize amount of data transfer
  13. Permanent Failures • Replica synchronization (anti-entropy) • Merkle trees ◦

    fast inconsistency detection ◦ minimize amount of data transfer • Drawback?
  14. Permanent Failures • Replica synchronization (anti-entropy) • Merkle trees ◦

    fast inconsistency detection ◦ minimize amount of data transfer • Drawback? ◦ key ranges change when nodes join and leave ◦ merkle trees needs to be recalculated
  15. Permanent Failures • Replica synchronization (anti-entropy) • Merkle trees ◦

    fast inconsistency detection ◦ minimize amount of data transfer • Drawback? ◦ key ranges change when nodes join and leave ◦ merkle trees needs to be recalculated • Solution: partition refinement (paper)
  16. Membership and Failure Detection • Ring membership ◦ gossip-based protocol

    • External discovery ◦ logical partitions ◦ seed nodes • Failure detection ◦ gossip-based protocol
  17. Dynamo: Amazon's Highly Available Key-value Store Giuseppe DeCandia, Deniz Hastorun,

    Madan Jampani, Gunavardhan Kakulapati, Avinash Lakshman, Alex Pilchin, Swaminathan Sivasubramanian, Peter Vosshall and Werner Vogels Amazon.com Muhammet Orazov