Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Hermes Reliable Replication Protocol - Poster

Hermes Reliable Replication Protocol - Poster

Poster of Hermes: A Fast, Fault-tolerant and Linearizable Replication Protocol

Antonios Katsarakis

April 30, 2020
Tweet

More Decks by Antonios Katsarakis

Other Decks in Research

Transcript

  1. Hermes Protocol Overview Motivation Results Hermes: A Fast, Fault-tolerant and

    Linearizable Replication Protocol A. Katsarakis, V. Gavrielatos, S. Katebzadeh, A. Joshi*, A. Dragojevic†, B. Grot, V. Nagarajan University of Edinburgh, *Intel, †Microsoft Research hermes-protocol.com State-of-art write performance Hermes State-of-the-Art Protocols Exploit failure-free operation for performance • Local reads from all replicas • Poor write throughput and latency Writes can block local reads hurting performance even at low write ratios Linearizability Reads are served locally when key is Valid Writes commit after invalidating all replicas of a key Fault tolerance Any replica after a fault can replay writes to unblock 5 node (replicas), 56 Gbit RDMA NICs, 1M keys uniformly accessed Linearizability & Fault-tolerance with High-Performance Throughput high-perf. writes + local reads conc. writes + local reads local reads Million requests / sec 4χ 40% @ 5% write ratio Write Latency (normalized to Hermes) % write ratio 6x completion V V I write(A=3) Invalidation (3,TS) Validation Ack Ack V I States of A: Valid or Invalid Writes to flow concurrently in the chain Must traverse the length of chain = slow Reduces an RTT from traditional Paxos All writes serialize on leader = low concurrency Leader ZAB (Multi-Paxos) Head Tail CRAQ (Chain Replication) Broadcast-based, invalidating reliable protocol inspired by multiprocessor’s cache-coherence • Fast local reads from all replicas . • High performance writes Fast (1 RTT) Decentralized Fully concurrent Need never abort Distributed Datastores • Read/write API • Backbone of modern online services Reliable Replication Protocols • Keep replicas strongly consistent despite faults • Define actions to execute reads and writes determines datastore’s performance replicas to keep consistent Local Read Write Unicast Mcast to Replicas Available Data replication for fault tolerance Consistent Programability strongly consistent replicas Performant Exploit replicas for low-latency & high-throughput Logical Timestamp Broadcast + Invalidations + early value propagation + TS