Kubernetes ControllersAre they loops or events?Tim Hockin@thockinv1
View Slide
Background on “reconciliation”:https://speakerdeck.com/thockin/kubernetes-what-is-reconciliation
Background on “edge vs. level”:https://speakerdeck.com/thockin/edge-vs-level-triggered-logic
Usually when we talk aboutcontrollers we refer to themas a “loop”
Imagine a controller for Pods(aka kubelet). It has 2 jobs:1) Actuate the pod API2) Report status on pods
What you’d expect lookssomething like:
Node Kubernetes APIakubeletbcGet all pods
Node Kubernetes APIakubeletbc{ name: a, ... }{ name: b, ... }{ name: c, ... }
Node Kubernetes APIakubeletbcfor each pod p {if p is running {verify p config} else {start p}gather status}
Node Kubernetes APIakubeletbcSet statuscab
...then repeat(aka “a poll loop”)
Here’s where it matters
Node Kubernetes APIakubeletbccabkubectldelete pod b
Node Kubernetes APIakubeletccabkubectldelete pod b
Node Kubernetes APIakubeletcGet all podscab
Node Kubernetes APIakubeletc{ name: a, ... }{ name: c, ... }cab
Node Kubernetes APIakubeletcI have “b” but APIdoesn’t - delete it!cab
Node Kubernetes APIakubeletcSet statusca
This is correct level-triggeredreconciliationRead desired state, make it so
Some controllers areimplemented this way, but it’sinefficient at scale
Imagine thousands ofcontrollers (kubelet,kube-proxy, dns, ingress,storage...) pollingcontinuously
We need to achieve the samebehavior more efficiently
We could poll less often, butthen it takes a long (andvariable) time to react - not agreat UX
Enter the “list-watch” model
Node Kubernetes APIakubeletbcCache:{ name: a, ... }{ name: b, ... }{ name: c, ... }
Node Kubernetes APIakubeletbcWatch allpodsCache:{ name: a, ... }{ name: b, ... }{ name: c, ... }
Node Kubernetes APIakubeletbcCache:{ name: a, ... }{ name: b, ... }{ name: c, ... }for each pod p {if p is running {verify p config} else {start p}gather status}
Node Kubernetes APIakubeletbcSet statuscabCache:{ name: a, ... }{ name: b, ... }{ name: c, ... }
We trade memory (the cache)for other resources (APIserver CPU in particular)
There’s no point in polling myown cache, so what happensnext?
Remember that watch we didearlier? That’s an openstream for events.
Node Kubernetes APIakubeletbccabkubectldelete pod bCache:{ name: a, ... }{ name: b, ... }{ name: c, ... }
Node Kubernetes APIakubeletccabkubectldelete pod bCache:{ name: a, ... }{ name: b, ... }{ name: c, ... }
Node Kubernetes APIakubeletcDelete:{ name: b, ... }cabCache:{ name: a, ... }{ name: b, ... }{ name: c, ... }
Node Kubernetes APIakubeletcDelete:{ name: b, ... }cabCache:{ name: a, ... }{ name: c, ... }
Node Kubernetes APIakubeletcCache:{ name: a, ... }{ name: c, ... }cabAPI said to deletepod “b”.
Node Kubernetes APIakubeletcCache:{ name: a, ... }{ name: c, ... }caAPI said to deletepod “b”.
“But you said edge-triggeredis bad!”
It is! But this isn’tedge-triggered.
The cache is updated byevents (edges) but we are stillreconciling state
“???”
The controller can berestarted at any time and thecache will be reconstructed -we can’t “miss an edge*”* modulo bugs, read on
Even if you miss an event, youcan still recover the state
Ultimately it’s all justsoftware, and software hasbugs. Controllers shouldre-list periodically to get fullstate...
...but we’ve put a lot of energyinto making sure that ourlist-watch is reliable.