Slide 1

Slide 1 text

Kubernetes Controllers Are they loops or events? Tim Hockin @thockin v1

Slide 2

Slide 2 text

Background on “reconciliation”: https://speakerdeck.com/thockin/kubernetes-what-is-reconciliation

Slide 3

Slide 3 text

Background on “edge vs. level”: https://speakerdeck.com/thockin/edge-vs-level-triggered-logic

Slide 4

Slide 4 text

Usually when we talk about controllers we refer to them as a “loop”

Slide 5

Slide 5 text

Imagine a controller for Pods (aka kubelet). It has 2 jobs: 1) Actuate the pod API 2) Report status on pods

Slide 6

Slide 6 text

What you’d expect looks something like:

Slide 7

Slide 7 text

Node Kubernetes API a kubelet b c Get all pods

Slide 8

Slide 8 text

Node Kubernetes API a kubelet b c { name: a, ... } { name: b, ... } { name: c, ... }

Slide 9

Slide 9 text

Node Kubernetes API a kubelet b c for each pod p { if p is running { verify p config } else { start p } gather status }

Slide 10

Slide 10 text

Node Kubernetes API a kubelet b c Set status c a b

Slide 11

Slide 11 text

...then repeat (aka “a poll loop”)

Slide 12

Slide 12 text

Here’s where it matters

Slide 13

Slide 13 text

Node Kubernetes API a kubelet b c c a b kubectl delete pod b

Slide 14

Slide 14 text

Node Kubernetes API a kubelet c c a b kubectl delete pod b

Slide 15

Slide 15 text

Node Kubernetes API a kubelet c Get all pods c a b

Slide 16

Slide 16 text

Node Kubernetes API a kubelet c { name: a, ... } { name: c, ... } c a b

Slide 17

Slide 17 text

Node Kubernetes API a kubelet c I have “b” but API doesn’t - delete it! c a b

Slide 18

Slide 18 text

Node Kubernetes API a kubelet c Set status c a

Slide 19

Slide 19 text

This is correct level-triggered reconciliation Read desired state, make it so

Slide 20

Slide 20 text

Some controllers are implemented this way, but it’s inefficient at scale

Slide 21

Slide 21 text

Imagine thousands of controllers (kubelet, kube-proxy, dns, ingress, storage...) polling continuously

Slide 22

Slide 22 text

We need to achieve the same behavior more efficiently

Slide 23

Slide 23 text

We could poll less often, but then it takes a long (and variable) time to react - not a great UX

Slide 24

Slide 24 text

Enter the “list-watch” model

Slide 25

Slide 25 text

Node Kubernetes API a kubelet b c Get all pods

Slide 26

Slide 26 text

Node Kubernetes API a kubelet b c { name: a, ... } { name: b, ... } { name: c, ... }

Slide 27

Slide 27 text

Node Kubernetes API a kubelet b c Cache: { name: a, ... } { name: b, ... } { name: c, ... }

Slide 28

Slide 28 text

Node Kubernetes API a kubelet b c Watch all pods Cache: { name: a, ... } { name: b, ... } { name: c, ... }

Slide 29

Slide 29 text

Node Kubernetes API a kubelet b c Cache: { name: a, ... } { name: b, ... } { name: c, ... } for each pod p { if p is running { verify p config } else { start p } gather status }

Slide 30

Slide 30 text

Node Kubernetes API a kubelet b c Set status c a b Cache: { name: a, ... } { name: b, ... } { name: c, ... }

Slide 31

Slide 31 text

We trade memory (the cache) for other resources (API server CPU in particular)

Slide 32

Slide 32 text

There’s no point in polling my own cache, so what happens next?

Slide 33

Slide 33 text

Remember that watch we did earlier? That’s an open stream for events.

Slide 34

Slide 34 text

Node Kubernetes API a kubelet b c c a b kubectl delete pod b Cache: { name: a, ... } { name: b, ... } { name: c, ... }

Slide 35

Slide 35 text

Node Kubernetes API a kubelet c c a b kubectl delete pod b Cache: { name: a, ... } { name: b, ... } { name: c, ... }

Slide 36

Slide 36 text

Node Kubernetes API a kubelet c Delete: { name: b, ... } c a b Cache: { name: a, ... } { name: b, ... } { name: c, ... }

Slide 37

Slide 37 text

Node Kubernetes API a kubelet c Delete: { name: b, ... } c a b Cache: { name: a, ... } { name: c, ... }

Slide 38

Slide 38 text

Node Kubernetes API a kubelet c Cache: { name: a, ... } { name: c, ... } c a b API said to delete pod “b”.

Slide 39

Slide 39 text

Node Kubernetes API a kubelet c Cache: { name: a, ... } { name: c, ... } c a API said to delete pod “b”.

Slide 40

Slide 40 text

“But you said edge-triggered is bad!”

Slide 41

Slide 41 text

It is! But this isn’t edge-triggered.

Slide 42

Slide 42 text

The cache is updated by events (edges) but we are still reconciling state

Slide 43

Slide 43 text

“???”

Slide 44

Slide 44 text

The controller can be restarted at any time and the cache will be reconstructed - we can’t “miss an edge*” * modulo bugs, read on

Slide 45

Slide 45 text

Even if you miss an event, you can still recover the state

Slide 46

Slide 46 text

Ultimately it’s all just software, and software has bugs. Controllers should re-list periodically to get full state...

Slide 47

Slide 47 text

...but we’ve put a lot of energy into making sure that our list-watch is reliable.