Dynamo

Dynamo: Amazon’s Highly Available Key-value store Dimos Raptis @dimor7 [email protected]
Papers We Love (London) October 25, 2018

The history of scaling DB App read replica read replica
replicate read/write read

Problems • Writes could still scale only vertically • Read-friendly
(but write-unfriendly) architecture • High availability • Reduced latency • Needless overhead for simple K-V operations • Far from auto-scaling

Simplified architecture A - F G - K L -
Q R - Z App get(”Bob”) / put(“Bob”, 15)

Techniques Problem Technique Partitioning Consistent Hashing High Availability for writes
Vector clocks with conflict resolution on read Handling failures (temporary) Hinted handoff Recovering from failures (permanent) Anti-entropy with Merkle trees Membership & failure detection Gossip-based protocol

Partitioning - Consistent Hashing Key Hash Chosen server (N=3) Chosen
server (N=4) Bob 16334285 62 1 2 Alice 75946347 39 0 3 Nick 50007991 25 2 1 George 97871733 43 1 3 Key k Chosen server à Hash(k) mod N Each server si assigned location in the ring [0, L] Chosen server à next server in the ring from hash(k) mod L S1 (90) S2 (180) S4 (360) S3 (270) k = ”Bob” Hash(k) = 360018072 Hash(k) mod 360 = 72 Server à s1 S5 (140)

Quorums S1 (90) S2 (180) S4 (360) S3 (270) S5
(140) N: replication factor R: read quorum W: write quorum R + W > N N = 5 R = 3 W = 3

Vector clocks D3 & D4 not causally related à conflict

Handling failures Temporary failures à Hinted handoff Permanent failures à
Sync using Merkle trees O(logN) data transfer worst case, instead of O(N)

Membership & Failure detection Nodes ping random nodes periodically At
every epoch, each node transmits its membership list to b nodes If a node is stated as unavailable for more than M epochs, it’s announced dead Given no additional changes, protocol converges in O(logb N) steps b = 2

Dynamo in the wild Technique Apache Cassandra Riak Consistent Hashing
with virtual nodes ✓ ✓ Vector clocks ✗ (last-write-wins) ✓ Hinted handoff ✓ ✓ Anti-entropy with Merkle trees ✓ (manual repair) ✓ Gossip-based protocol ✓ ✓

Thank you Dimos Raptis @dimor7 [email protected] Papers We Love (London)
October 25, 2018

Dynamo

Dynamo

Dimos Raptis

More Decks by Dimos Raptis

Featured

Transcript

Dynamo: Amazon’s Highly Available Key-value store Dimos Raptis @dimor7 [email protected]

The history of scaling DB App read replica read replica

Problems • Writes could still scale only vertically • Read-friendly

Simplified architecture A - F G - K L -

Techniques Problem Technique Partitioning Consistent Hashing High Availability for writes

Partitioning - Consistent Hashing Key Hash Chosen server (N=3) Chosen

Quorums S1 (90) S2 (180) S4 (360) S3 (270) S5

Vector clocks D3 & D4 not causally related à conflict

Handling failures Temporary failures à Hinted handoff Permanent failures à

Membership & Failure detection Nodes ping random nodes periodically At

Dynamo in the wild Technique Apache Cassandra Riak Consistent Hashing

Thank you Dimos Raptis @dimor7 [email protected] Papers We Love (London)