Slide 1

Slide 1 text

The Magic of Kubernetes Self-Healing Capabilities Saad Ali Senior Software Engineer, Google May 22, 2019 github.com/saad-ali twitter.com/the_saad_ali

Slide 2

Slide 2 text

● Kubernetes manages clusters with a single node up to 1000s of node ● Failure is inevitable ● Humans can’t keep up! Problem

Slide 3

Slide 3 text

Kubernetes Self Healing This is where Kubernetes really shines!

Slide 4

Slide 4 text

Agenda ● How Kubernetes Self Healing Works ● Examples of Self Healing in Kubernetes ● Areas for Improvement

Slide 5

Slide 5 text

How Kubernetes Self Healing Works Observe and rectify. Declare intended state. Controllers Declarative API

Slide 6

Slide 6 text

Imperative APIs Node A Node B

Slide 7

Slide 7 text

Imperative APIs Node A Node B Container A

Slide 8

Slide 8 text

How Kubernetes Self Healing Works Kubernetes APIs are declarative rather then imperative.

Slide 9

Slide 9 text

Imperative API - Manual ● You: provide exact set of instructions to drive to desired state ● System: executes instructions ● You: monitor system, and provide further instructions if it deviates. Declarative API - Automatic ● You: define desired state ● System: works to drive towards that state Declarative APIs

Slide 10

Slide 10 text

Declarative APIs - Creating a pod Master: API Server The Kubernetes way! ● You: create API object that is persisted on kube API server until deletion ● System: all components work in parallel to drive to that state Node A Node B kubectl create -f replica.yaml

Slide 11

Slide 11 text

Declarative APIs - Creating a pod The Kubernetes way! ● You: create API object that is persisted on kube API server until deletion ● System: all components work in parallel to drive to that state Master: API Server Node A Node B kubectl create -f replica.yaml apiVersion: apps/v1 kind: ReplicaSet metadata: name: frontend spec: replicas: 1 template: metadata: ... spec: ... containers: - name: nginx image: internal.mycorp.com:5000/mycontainer:1.7.9

Slide 12

Slide 12 text

Declarative APIs - Creating a pod The Kubernetes way! ● You: create API object that is persisted on kube API server until deletion ● System: all components work in parallel to drive to that state Master: API Server Node A Node B Pod A definition

Slide 13

Slide 13 text

Declarative APIs - Creating a pod The Kubernetes way! ● You: create API object that is persisted on kube API server until deletion ● System: all components work in parallel to drive to that state Master: API Server Node A Node B Pod A definition Pod A

Slide 14

Slide 14 text

Declarative APIs - Creating a pod All components watch the Kubernetes API, and figure out what they need to do. Master: API Server Node A Node B Master: Scheduler

Slide 15

Slide 15 text

Declarative APIs - Creating a pod All components watch the Kubernetes API, and figure out what they need to do. Master: API Server Node A Node B Master: Scheduler kubectl create pod

Slide 16

Slide 16 text

Declarative APIs - Creating a pod All components watch the Kubernetes API, and figure out what they need to do. Master: API Server Node A Node B Master: Scheduler Pod A

Slide 17

Slide 17 text

Declarative APIs - Creating a pod All components watch the Kubernetes API, and figure out what they need to do. Master: API Server Node A Node B Master: Scheduler Pod A Node: B

Slide 18

Slide 18 text

Declarative APIs - Creating a pod All components watch the Kubernetes API, and figure out what they need to do. Master: API Server Node A Node B Master: Scheduler Pod A Node: B Pod A

Slide 19

Slide 19 text

Declarative APIs - Creating a pod All components watch the Kubernetes API, and figure out what they need to do. Master: API Server Node A Node B Master: Scheduler Pod A Node: B Pod A kubectl delete pod A

Slide 20

Slide 20 text

Declarative APIs - Creating a pod All components watch the Kubernetes API, and figure out what they need to do. Master: API Server Node A Node B Master: Scheduler Pod A

Slide 21

Slide 21 text

Declarative APIs - Creating a pod All components watch the Kubernetes API, and figure out what they need to do. Master: API Server Node A Node B Master: Scheduler

Slide 22

Slide 22 text

Level triggered instead of edge triggered -- no “missing events” issues. No single point of failure. Simple master components. Automatic recovery! Resulting in a Simpler, more robust system that can easily recover from failure of components. Benefits of Declarative API

Slide 23

Slide 23 text

● In memory cache ○ Desired State ○ Actual State ● Reconciler loop ● Populator -- adds and removes from desired state. Controllers

Slide 24

Slide 24 text

Example of Automatic Recovery Master: API Server Node A Node B Pod A definition Pod A

Slide 25

Slide 25 text

Example of Automatic Recovery Master: API Server Node A Node B Pod A definition Pod A

Slide 26

Slide 26 text

Example of Automatic Recovery Master: API Server Node A Node B Master: Scheduler Master: Node Controller Master: Replica Controller

Slide 27

Slide 27 text

Example of Automatic Recovery Master: API Server Node A Node B Pod A definition Pod A Pod A

Slide 28

Slide 28 text

Challenges What if actual state cache drifts from real world?

Slide 29

Slide 29 text

● Should have a way to observe and rectify Actual State Cache ● Not always easy to implement ○ Example: Orphaned volume mounts. ○ Room for improvement. Actual State Drift

Slide 30

Slide 30 text

Challenges Eventually consistent can take a long time.

Slide 31

Slide 31 text

● Detection ○ 5 minutes ● Force detach ○ 6 minutes ● Attach volume ○ Seconds to minutes ● Starting new pod ○ Seconds to minutes 10+ minutes to detect a shutdown node and move it. Node Shutdown

Slide 32

Slide 32 text

Node Shutdown Kubernetes gives you automatic recovery NOT high availability!!

Slide 33

Slide 33 text

Thank you! Questions? github.com/saad-ali twitter.com/the_saad_ali