Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Amazon's Dynamo

Sponsored · Your Podcast. Everywhere. Effortlessly. Share. Educate. Inspire. Entertain. You do you. We'll handle the rest.
Avatar for Muhammet Muhammet
November 17, 2012

Amazon's Dynamo

Avatar for Muhammet

Muhammet

November 17, 2012
Tweet

Other Decks in Education

Transcript

  1. Dynamo: Amazon's Highly Available Key-value Store Giuseppe DeCandia, Deniz Hastorun,

    Madan Jampani, Gunavardhan Kakulapati, Avinash Lakshman, Alex Pilchin, Swaminathan Sivasubramanian, Peter Vosshall and Werner Vogels Amazon.com Muhammet Orazov
  2. System Architecture • Partitioning • Replication • Data Versioning •

    Get and Put Execution • Temporary Failures • Permanent Failures • Membership and Failure Detection
  3. Partitioning • Scale incrementally • Consistent hashing (ring) • Departure/arrival

    only affects neighbours • Drawback? ◦ Non-uniform load ◦ Heterogeneity of nodes
  4. Partitioning • Scale incrementally • Consistent hashing (ring) • Departure/arrival

    only affects neighbours • Drawback? ◦ Non-uniform load ◦ Heterogeneity of nodes • Solution: virtual nodes
  5. Replication • Replicate on N nodes • Preference-list (next N

    successor nodes) • Drawback? ◦ next N nodes may map to one physical node
  6. Data Versioning • Eventual consistency • Conflict resolution • Vector

    clocks • Drawback? ◦ vector clock grows • Solution ◦ only size of N ◦ truncation schema
  7. Get and Put Execution • Maintaining consistency • Quorum-like system

    ◦ R + W > N ◦ min # nodes participating in read and write
  8. Get and Put Execution • Maintaining consistency • Quorum-like system

    ◦ R + W > N ◦ min # nodes participating in read and write • Drawback?
  9. Get and Put Execution • Maintaining consistency • Quorum-like system

    ◦ R + W > N ◦ min # nodes participating in read and write • Drawback? ◦ latency is dominated by slowest of R (or W) replicas
  10. Get and Put Execution • Maintaining consistency • Quorum-like system

    ◦ R + W > N ◦ min # nodes participating in read and write • Drawback? ◦ latency is dominated by slowest of R (or W) replicas • Solution ◦ R (or W) configured to be less than N
  11. Temporary Failures • Sloppy quorum ◦ first N healthy nodes

    • Hinted handoff ◦ checked periodically • Highest level availability ◦ set W to 1
  12. Permanent Failures • Replica synchronization (anti-entropy) • Merkle trees ◦

    fast inconsistency detection ◦ minimize amount of data transfer
  13. Permanent Failures • Replica synchronization (anti-entropy) • Merkle trees ◦

    fast inconsistency detection ◦ minimize amount of data transfer • Drawback?
  14. Permanent Failures • Replica synchronization (anti-entropy) • Merkle trees ◦

    fast inconsistency detection ◦ minimize amount of data transfer • Drawback? ◦ key ranges change when nodes join and leave ◦ merkle trees needs to be recalculated
  15. Permanent Failures • Replica synchronization (anti-entropy) • Merkle trees ◦

    fast inconsistency detection ◦ minimize amount of data transfer • Drawback? ◦ key ranges change when nodes join and leave ◦ merkle trees needs to be recalculated • Solution: partition refinement (paper)
  16. Membership and Failure Detection • Ring membership ◦ gossip-based protocol

    • External discovery ◦ logical partitions ◦ seed nodes • Failure detection ◦ gossip-based protocol
  17. Dynamo: Amazon's Highly Available Key-value Store Giuseppe DeCandia, Deniz Hastorun,

    Madan Jampani, Gunavardhan Kakulapati, Avinash Lakshman, Alex Pilchin, Swaminathan Sivasubramanian, Peter Vosshall and Werner Vogels Amazon.com Muhammet Orazov