Save 37% off PRO during our Black Friday Sale! »

Kubernetes Controllers - are they loops or events?

569f10721398d92f5033097ac6d9132c?s=47 Tim Hockin
February 20, 2021

Kubernetes Controllers - are they loops or events?

569f10721398d92f5033097ac6d9132c?s=128

Tim Hockin

February 20, 2021
Tweet

Transcript

  1. Kubernetes Controllers Are they loops or events? Tim Hockin @thockin

    v1
  2. Background on “reconciliation”: https://speakerdeck.com/thockin/kubernetes-what-is-reconciliation

  3. Background on “edge vs. level”: https://speakerdeck.com/thockin/edge-vs-level-triggered-logic

  4. Usually when we talk about controllers we refer to them

    as a “loop”
  5. Imagine a controller for Pods (aka kubelet). It has 2

    jobs: 1) Actuate the pod API 2) Report status on pods
  6. What you’d expect looks something like:

  7. Node Kubernetes API a kubelet b c Get all pods

  8. Node Kubernetes API a kubelet b c { name: a,

    ... } { name: b, ... } { name: c, ... }
  9. Node Kubernetes API a kubelet b c for each pod

    p { if p is running { verify p config } else { start p } gather status }
  10. Node Kubernetes API a kubelet b c Set status c

    a b
  11. ...then repeat (aka “a poll loop”)

  12. Here’s where it matters

  13. Node Kubernetes API a kubelet b c c a b

    kubectl delete pod b
  14. Node Kubernetes API a kubelet c c a b kubectl

    delete pod b
  15. Node Kubernetes API a kubelet c Get all pods c

    a b
  16. Node Kubernetes API a kubelet c { name: a, ...

    } { name: c, ... } c a b
  17. Node Kubernetes API a kubelet c I have “b” but

    API doesn’t - delete it! c a b
  18. Node Kubernetes API a kubelet c Set status c a

  19. This is correct level-triggered reconciliation Read desired state, make it

    so
  20. Some controllers are implemented this way, but it’s inefficient at

    scale
  21. Imagine thousands of controllers (kubelet, kube-proxy, dns, ingress, storage...) polling

    continuously
  22. We need to achieve the same behavior more efficiently

  23. We could poll less often, but then it takes a

    long (and variable) time to react - not a great UX
  24. Enter the “list-watch” model

  25. Node Kubernetes API a kubelet b c Get all pods

  26. Node Kubernetes API a kubelet b c { name: a,

    ... } { name: b, ... } { name: c, ... }
  27. Node Kubernetes API a kubelet b c Cache: { name:

    a, ... } { name: b, ... } { name: c, ... }
  28. Node Kubernetes API a kubelet b c Watch all pods

    Cache: { name: a, ... } { name: b, ... } { name: c, ... }
  29. Node Kubernetes API a kubelet b c Cache: { name:

    a, ... } { name: b, ... } { name: c, ... } for each pod p { if p is running { verify p config } else { start p } gather status }
  30. Node Kubernetes API a kubelet b c Set status c

    a b Cache: { name: a, ... } { name: b, ... } { name: c, ... }
  31. We trade memory (the cache) for other resources (API server

    CPU in particular)
  32. There’s no point in polling my own cache, so what

    happens next?
  33. Remember that watch we did earlier? That’s an open stream

    for events.
  34. Node Kubernetes API a kubelet b c c a b

    kubectl delete pod b Cache: { name: a, ... } { name: b, ... } { name: c, ... }
  35. Node Kubernetes API a kubelet c c a b kubectl

    delete pod b Cache: { name: a, ... } { name: b, ... } { name: c, ... }
  36. Node Kubernetes API a kubelet c Delete: { name: b,

    ... } c a b Cache: { name: a, ... } { name: b, ... } { name: c, ... }
  37. Node Kubernetes API a kubelet c Delete: { name: b,

    ... } c a b Cache: { name: a, ... } { name: c, ... }
  38. Node Kubernetes API a kubelet c Cache: { name: a,

    ... } { name: c, ... } c a b API said to delete pod “b”.
  39. Node Kubernetes API a kubelet c Cache: { name: a,

    ... } { name: c, ... } c a API said to delete pod “b”.
  40. “But you said edge-triggered is bad!”

  41. It is! But this isn’t edge-triggered.

  42. The cache is updated by events (edges) but we are

    still reconciling state
  43. “???”

  44. The controller can be restarted at any time and the

    cache will be reconstructed - we can’t “miss an edge*” * modulo bugs, read on
  45. Even if you miss an event, you can still recover

    the state
  46. Ultimately it’s all just software, and software has bugs. Controllers

    should re-list periodically to get full state...
  47. ...but we’ve put a lot of energy into making sure

    that our list-watch is reliable.