OCB: Cloud Native Disaster Recovery for Stateful Workloads

Slide 1

Slide 1 text

Cloud Native Disaster Recovery Raffaele Spazzoli Architect at Red Hat Lead at TAG Storage Alex Chircop CEO at Storage OS Co-chair TAG Storage 1

Slide 2

Slide 2 text

Cloud Native Disaster Recovery 2 Concern Traditional DR Cloud Native DR Type of deployment active/passive, rarely active/active Active / active Disaster Detection and Recovery Trigger Human Autonomous Disaster Recovery Procedure execution Mix of manual and automated tasks Automated Recovery Time Objective (RTO) From close to zero to hours Close to zero Recovery Point Objective (RPO) From zero to hours Exactly zero for strongly consistent deployments. Theoretically unbounded, practically close to zero for eventual consistent deployments. DR Process Owner Often the Storage Team Application Team Capabilities needed for DR From storage (backup/restore, volume replication) From networking (east-west communication, global load balancer) The information in this table are generally accepted attributes and measurements for Disaster Recovery architectures

Slide 3

Slide 3 text

CNDR - Reference Architecture 3 Traditional DR strategies are still possible in the cloud. Here we are focusing on a new approach.

Slide 4

Slide 4 text

Availability and Consistency 4 High Availability (HA) is a property of a system that allows it to continue performing normally in the presence of failures. What happens when a component in a Failure Domain is lost? Some definitions Consistency is the property of a distributed stateful workload by which all of the instances of the workload “observe” the same state. Consistency Disaster recovery (DR) refers to the strategy for recovering from the complete loss of a datacenter. What happens when an entire Failure Domain is lost? Disaster Recovery Failure domains are areas which may fail due to a single event. Examples: nodes, racks, kubernetes clusters, network zones and datacenters Failure Domain High-Availability

Slide 5

Slide 5 text

CAP Theorem 5 Product CAP Choice (either Availability or Consistency) DynamoDB Availability Cassandra Availability CockroachDB Consistency MongoDB Consistency PACELC corollary: in the absence of network partition, one can only optimize either for latency or consistency

Slide 6

Slide 6 text

Consensus Protocols 6 Consensus Protocols allow for the coordination of distributed processes by agreeing on actions to be taken. Apache Bookkeeper is an example of Reliable Replicated Data Store (for log abstraction use case: append only) Building on consensus protocols and the concept of sharing a log of operations, it is possible to build a Reliable Replicated Data Store Reliable Replicated Data Store Protocols in which all participants perform the same action. They are implemented around the concepts of leader election and strict majority: Paxos, Raft. Shared State Consensus Protocols Protocols in which all participants perform different actions. They require the acknowledgment of all participants and are vulnerable to network partitioning: 2PC, 3PC Unshared State Consensus Protocols

Slide 7

Slide 7 text

Anatomy of a Stateful Application 7 Partitions are a way to increase the general throughput of the workload. This is achieved by breaking the state space in partitions or shards. Partitions Putting it all together Stateful Workload Logical Tiers Replicas are a way to increase availability of a stateful workload. Replicas

Slide 8

Slide 8 text

Examples of Consensus Protocol choices 8 Product Replica consensus protocol Shard consensus protocol Etcd Raft N/A (no support for shards) Consul Raft N/A (no support for shards) Zookeeper Atomic Broadcast (a derivative of Paxos) N/A (no support for shards) ElasticSearch Paxos N/A (No support for transactions) Cassandra Paxos Supported, but details are not available. MongoDB Paxos Homegrown protocol. CockroachDB Raft 2PC YugabyteDB Raft 2PC TiKV Raft Percolator Spanner Raft 2PC+high-precision time service Kafka A custom derivative of PacificA Custom Implementation of 2PC

Slide 9

Slide 9 text

Strongly-Consistent vs Eventually Consistent CNDR 9 Concern Strongly-Consistent Eventually-Consistent RPO Zero Theoretically unbounded, practically close to zero. Temporarily inconsistency can happen. Note: eventual consistency does not mean eventual correctness. RTO Few seconds. Few seconds. Latency String sensitivity to latency between failure domains, single transaction latency will be >= 2 x worst latency between failure domains. No sensitivity to latency between failure domains. Throughput Theoretically scales linearly with the number of instances, practically is dependent on the workload type and the max throughput available between failure domains. Theoretically scales linearly with the number of instances, practically is dependent on the workload type. Minimum required failure domains three two

Slide 10

Slide 10 text

CNDR -- Strong Consistency - Kubernetes Reference Architecture 10

Slide 11

Slide 11 text

CNDR -- Eventual Consistency - Kubernetes Reference Architecture 11

Slide 12

Slide 12 text

References 12 TAG Storage Cloud Native Disaster Recovery Demos and reference implementations: Geographically Distributed Stateful Workloads Part One: Cluster Preparation Geographically Distributed Stateful Workloads Part Two: CockroachDB Geographically Distributed Stateful Workloads - Part 3: Keycloak