Slide 1

Slide 1 text

The Magic of Kubernetes Self-Healing Capabilities Saad Ali Senior Software Engineer, Google May 22, 2019

Slide 2

Slide 2 text

● Kubernetes manages clusters with a single node up to 1000s of node ● Failure is inevitable ● Humans can’t keep up! Problem

Slide 3

Slide 3 text

Kubernetes Self Healing This is where Kubernetes really shines!

Slide 4

Slide 4 text

Agenda ● How Kubernetes Self Healing Works ● Examples of Self Healing in Kubernetes ● Areas for Improvement

Slide 5

Slide 5 text

How Kubernetes Self Healing Works Observe and rectify. Declare intended state. Controllers Declarative API

Slide 6

Slide 6 text

Imperative APIs Node A Node B

Slide 7

Slide 7 text

Imperative APIs Node A Node B Container A

Slide 8

Slide 8 text

How Kubernetes Self Healing Works Kubernetes APIs are declarative rather then imperative.

Slide 9

Slide 9 text

Imperative API - Manual ● You: provide exact set of instructions to drive to desired state ● System: executes instructions ● You: monitor system, and provide further instructions if it deviates. Declarative API - Automatic ● You: define desired state ● System: works to drive towards that state Declarative APIs

Slide 10

Slide 10 text

Declarative APIs - Creating a pod Master: API Server The Kubernetes way! ● You: create API object that is persisted on kube API server until deletion ● System: all components work in parallel to drive to that state Node A Node B kubectl create -f replica.yaml

Slide 11

Slide 11 text

Declarative APIs - Creating a pod The Kubernetes way! ● You: create API object that is persisted on kube API server until deletion ● System: all components work in parallel to drive to that state Master: API Server Node A Node B kubectl create -f replica.yaml apiVersion: apps/v1 kind: ReplicaSet metadata: name: frontend spec: replicas: 1 template: metadata: ... spec: ... containers: - name: nginx image:

Slide 12

Slide 12 text

Declarative APIs - Creating a pod The Kubernetes way! ● You: create API object that is persisted on kube API server until deletion ● System: all components work in parallel to drive to that state Master: API Server Node A Node B Pod A definition

Slide 13

Slide 13 text

Declarative APIs - Creating a pod The Kubernetes way! ● You: create API object that is persisted on kube API server until deletion ● System: all components work in parallel to drive to that state Master: API Server Node A Node B Pod A definition Pod A

Slide 14

Slide 14 text

Declarative APIs - Creating a pod All components watch the Kubernetes API, and figure out what they need to do. Master: API Server Node A Node B Master: Scheduler

Slide 15

Slide 15 text

Declarative APIs - Creating a pod All components watch the Kubernetes API, and figure out what they need to do. Master: API Server Node A Node B Master: Scheduler kubectl create pod

Slide 16

Slide 16 text

Declarative APIs - Creating a pod All components watch the Kubernetes API, and figure out what they need to do. Master: API Server Node A Node B Master: Scheduler Pod A

Slide 17

Slide 17 text

Declarative APIs - Creating a pod All components watch the Kubernetes API, and figure out what they need to do. Master: API Server Node A Node B Master: Scheduler Pod A Node: B

Slide 18

Slide 18 text

Declarative APIs - Creating a pod All components watch the Kubernetes API, and figure out what they need to do. Master: API Server Node A Node B Master: Scheduler Pod A Node: B Pod A

Slide 19

Slide 19 text

Declarative APIs - Creating a pod All components watch the Kubernetes API, and figure out what they need to do. Master: API Server Node A Node B Master: Scheduler Pod A Node: B Pod A kubectl delete pod A

Slide 20

Slide 20 text

Declarative APIs - Creating a pod All components watch the Kubernetes API, and figure out what they need to do. Master: API Server Node A Node B Master: Scheduler Pod A

Slide 21

Slide 21 text

Declarative APIs - Creating a pod All components watch the Kubernetes API, and figure out what they need to do. Master: API Server Node A Node B Master: Scheduler

Slide 22

Slide 22 text

Level triggered instead of edge triggered -- no “missing events” issues. No single point of failure. Simple master components. Automatic recovery! Resulting in a Simpler, more robust system that can easily recover from failure of components. Benefits of Declarative API

Slide 23

Slide 23 text

● In memory cache ○ Desired State ○ Actual State ● Reconciler loop ● Populator -- adds and removes from desired state. Controllers

Slide 24

Slide 24 text

Example of Automatic Recovery Master: API Server Node A Node B Pod A definition Pod A

Slide 25

Slide 25 text

Example of Automatic Recovery Master: API Server Node A Node B Pod A definition Pod A

Slide 26

Slide 26 text

Example of Automatic Recovery Master: API Server Node A Node B Master: Scheduler Master: Node Controller Master: Replica Controller

Slide 27

Slide 27 text

Example of Automatic Recovery Master: API Server Node A Node B Pod A definition Pod A Pod A

Slide 28

Slide 28 text

Challenges What if actual state cache drifts from real world?

Slide 29

Slide 29 text

● Should have a way to observe and rectify Actual State Cache ● Not always easy to implement ○ Example: Orphaned volume mounts. ○ Room for improvement. Actual State Drift

Slide 30

Slide 30 text

Challenges Eventually consistent can take a long time.

Slide 31

Slide 31 text

● Detection ○ 5 minutes ● Force detach ○ 6 minutes ● Attach volume ○ Seconds to minutes ● Starting new pod ○ Seconds to minutes 10+ minutes to detect a shutdown node and move it. Node Shutdown

Slide 32

Slide 32 text

Node Shutdown Kubernetes gives you automatic recovery NOT high availability!!

Slide 33

Slide 33 text

Thank you! Questions?