Slide 1

Slide 1 text

Improving Availability for Stateful Applications Michelle Au Software Engineer, Google

Slide 2

Slide 2 text

Agenda Persistent storage options Building highly available stateful applications - Failure domain spreading - Demo - Pod downtime and recovery

Slide 3

Slide 3 text

Persistent Storage Options

Slide 4

Slide 4 text

Supported Storage Systems In-tree Drivers - https://kubernetes.io/docs/concepts/storage/#types-of-volumes - Over 15! CSI Drivers - https://kubernetes-csi.github.io/docs/drivers.html - Over 35! Wide range of characteristics - Local vs remote, cloud vs appliance vs software-defined, distributed vs hyper-converged, etc.

Slide 5

Slide 5 text

Storage Characteristics Accessibility - At what granularity does your app have to be co-located with storage? Availability - At what granularity is storage still available during an outage? Durability - Under what conditions could my data be lost? Access Mode - How many nodes can access the volume concurrently?

Slide 6

Slide 6 text

Storage Characteristics Performance - Read/write/mixed IOPS and throughput Cost - Including operation, maintenance

Slide 7

Slide 7 text

Examples Example Accessibility Availability Durability Access Mode Performance Cost Local disk Single node Single node Single disk* Single node Best $ * Most cloud local disks are not durable beyond VM

Slide 8

Slide 8 text

Examples Example Accessibility Availability Durability Access Mode Performance Cost Local disk Single node Single node Single disk* Single node Best $ Cloud disk Single zone Single zone 3x Single node Better $$ * Most cloud local disks are not durable beyond VM

Slide 9

Slide 9 text

Examples Example Accessibility Availability Durability Access Mode Performance Cost Local disk Single node Single node Single disk* Single node Best $ Cloud disk Single zone Single zone 3x Single node Better $$ Replicated cloud disk Multi zone Multi zone 3x Single node Good $$$ * Most cloud local disks are not durable beyond VM

Slide 10

Slide 10 text

Examples Example Accessibility Availability Durability Access Mode Performance Cost Local disk Single node Single node Single disk* Single node Best $ Cloud disk Single zone Single zone 3x Single node Better $$ Replicated cloud disk Multi zone Multi zone 3x Single node Good $$$ Single NFS Global Single server Varies Multi node Good $$$ * Most cloud local disks are not durable beyond VM

Slide 11

Slide 11 text

Examples Example Accessibility Availability Durability Access Mode Performance Cost Local disk Single node Single node Single disk* Single node Best $ Cloud disk Single zone Single zone 3x Single node Better $$ Replicated cloud disk Multi zone Multi zone 3x Single node Good $$$ Single NFS Global Single server Varies Multi node Good $$$ Scaleout/HA Filer Global Global Varies Multi node Varies $$$$ * Most cloud local disks are not durable beyond VM

Slide 12

Slide 12 text

Building Highly-Available Stateful Applications

Slide 13

Slide 13 text

Pod Anti-Affinity Spread replicas across failure domains affinity: podAntiAffinity: requiredDuringSchedulingIgnoredDuringExecution: - topologyKey: failure-domain.beta.kubernetes.io/zone labelSelector: matchExpressions: - key: app operator: In values: - my-app

Slide 14

Slide 14 text

12 Factor Model All replicas share the same data - Example: Content Management Systems (CMS) Need high availability at storage layer - Multi-writer - Globally accessible and available - Example: Scaleout/HA filer

Slide 15

Slide 15 text

Deployment Deployment Pod PVC Pod Pod Zone A Zone B Zone C

Slide 16

Slide 16 text

Deployment Deployment Pod PVC Pod Pod Zone A Zone B Zone C

Slide 17

Slide 17 text

Distributed Model Shard and replicate data between pods - Example: Cassandra, MongoDB Do not need high-availability at storage layer - Single writer - Non-global accessibility and availability - Example: Local disks, cloud disks

Slide 18

Slide 18 text

StatefulSet StatefulSet Pod-0 PVC-1 Pod-2 Pod-1 Zone A Zone B Zone C PVC-0 PVC-2

Slide 19

Slide 19 text

StatefulSet StatefulSet Pod-0 PVC-1 Pod-2 Pod-1 Zone A Zone B Zone C PVC-0 PVC-2

Slide 20

Slide 20 text

Volume Topology Scheduler understands volume accessibility constraints - No user configuration needed - Storage driver provides topology Auto-scale replicas and dynamically provision volumes across zones (except local)

Slide 21

Slide 21 text

Demo Zone A Zone B Zone C

Slide 22

Slide 22 text

Downtime Time to detect failure + Time to replace pod

Slide 23

Slide 23 text

StatefulSet Caveat Stateful applications may require exactly-once semantics - Two containers cannot write to the same volume During split brain, replacement Pod cannot be started - Node fencing can help StatefulSet pod recovery can be long - Minutes: automated - Hours: manual

Slide 24

Slide 24 text

Summary Kubernetes features for high-availability - Volume topology, pod anti-affinity, node taints Stateful application models with pod anti-affinity - Deployment vs Statefulset - Storage redundancy vs application redundancy Design for redundancy and account for downtime

Slide 25

Slide 25 text

Additional Resources Deployments and StatefulSets Pod anti-affinity Even pod spreading design proposal Volume topology blog post Node taints and tolerations Node fencing discussions

Slide 26

Slide 26 text

Get Involved Kubernetes Special Interest Groups (SIGs) - sig-storage, sig-apps, sig-node, sig-scheduling - Community meetings, slack Me - Github/Slack: msau42 - Twitter: _msau42_

Slide 27

Slide 27 text

Questions?

Slide 28

Slide 28 text

No content