Writing a Custom Kubernetes Operator

Slide 1

Slide 1 text

Writing a Custom Kubernetes Operator Aaron Levy CoreOS Github/Slack: @aaronlevy

Slide 2

Slide 2 text

Why would you write an Operator?

Slide 3

Slide 3 text

Why would you write an Operator? Without modifying the core Kubernetes codebase: ● Extend functionality of existing objects ● Add new concepts / functionality to cluster ● Automate administration of cluster applications ● Replace existing cluster components

Slide 4

Slide 4 text

Why does CoreOS write Operators? ● Extend functionality of existing objects ○ Coordinating Container Linux updates across nodes ● Add new concepts / functionality to cluster ○ etcd-operator - easily launch/manage etcd clusters ● Automate administration of cluster applications ○ Automated updates of Tectonic installations ● Replace existing cluster components ○ Well, we don't do this - but it's possible!

Slide 5

Slide 5 text

What is the "controller pattern"?

Slide 6

Slide 6 text

What is the "controller pattern"? Kubernetes controllers are based on an active reconciliation process. 1. Watch both the desired state and the current state. 2. Move the actual state to be more like the desired state. for { desired := getDesiredState() current := getCurrentState() makeChanges(desired, current) }

Slide 7

Slide 7 text

Controller pattern used in: Kube-Controller-Manager ● Deployment ● Daemonset ● Node ● Service ● Endpoint ● ... Cluster applications ● Kube-DNS ● ingress ● flannel ● etcd-operator ● Prometheus-operator ● ...

Slide 8

Slide 8 text

Let's build an example

Slide 9

Slide 9 text

Let's build an example We will build a “Node Reboot Operator”. Allows an administrator to: ● Trigger a node reboot via kubectl ● Support “rolling” reboots across the cluster

Slide 10

Slide 10 text

Example Node Reboot Operator ● Reboot Agent (all nodes) ● Reboot Controller (single instance) ● Coordination via annotations on node object ○ reboot-needed ○ reboot-now ○ reboot-in-progress

Slide 11

Slide 11 text

Reboot Agent - behavior ● Daemonset with pod running on all nodes ● Watch "reboot-now" annotation on own node object ○ remove "reboot-now" annotation ○ add "reboot-in-progress" annotation ○ Reboot node

Slide 12

Slide 12 text

Reboot Controller - behavior ● Deployment (replica=1) ● Watches for nodes with "reboot-needed" annotation ● Counts "unavailable" nodes ○ not ready or reboot-in-progress ● While unavailable < MaxUnavailable ○ Remove "reboot-needed" annotation ○ Set "reboot-now" annotation on a node

Slide 13

Slide 13 text

Node 3 Node 2 Node 1 Reboot Controller Reboot Agent Reboot Agent Reboot Agent Max Unavailable = 1 Unavailable = 0 Needs Reboot Reboot Now Ready

Slide 14

Slide 14 text

Node 3 Node 2 Node 1 Reboot Controller Reboot Agent Reboot Agent Reboot Agent Max Unavailable = 1 Unavailable = 1 Needs Reboot Reboot Now Ready

Slide 15

Slide 15 text

Node 3 Node 2 Node 1 Reboot Controller Reboot Agent Reboot Agent Reboot Agent Max Unavailable = 1 Unavailable = 0 Needs Reboot Reboot Now Ready

Slide 16

Slide 16 text

Node 3 Node 2 Node 1 Reboot Controller Reboot Agent Reboot Agent Reboot Agent Max Unavailable = 1 Unavailable = 1 Needs Reboot Reboot Now Ready

Slide 17

Slide 17 text

Node 3 Node 2 Node 1 Reboot Controller Reboot Agent Reboot Agent Reboot Agent Max Unavailable = 1 Unavailable = 0 Needs Reboot Reboot Now Ready

Slide 18

Slide 18 text

Node 3 Node 2 Node 1 Reboot Controller Reboot Agent Reboot Agent Reboot Agent Max Unavailable = 1 Unavailable = 1 Needs Reboot Reboot Now Ready

Slide 19

Slide 19 text

Writing a custom operator

Slide 20

Slide 20 text

Writing a custom operator Functional example: ● github.com/aaronlevy/kube-controller-demo ● Following some upstream patterns (and go client) for using these controller patterns - but there is no singular “right way”.

Slide 21

Slide 21 text

Writing a custom operator ● https://github.com/kubernetes/client-go ● Versioning is important (see README.md for more info) ○ Presentation assumes v3.0.0-beta.0

Slide 22

Slide 22 text

Writing a custom operator Building Blocks: ● Creating a Kubernetes client to interact with API ● Downward API for "self" introspection ● Using an Informer for object cache & event handling ● Communicating desired/actual state via annotations

Slide 23

Slide 23 text

Reboot Agent

Slide 24

Slide 24 text

env var: node := os.Getenv("NODE_NAME") client.Core().Nodes().Get(node) Or as flag: - name: reboot-agent command: ./agent --node=$(NODE_NAME) Downward API: apiVersion: v1 kind: Pod metadata: name: reboot-agent spec: containers: - name: reboot-agent env: - name: NODE_NAME valueFrom: fieldRef: fieldPath: spec.nodeName Reboot Agent - Determining "self"

Slide 25

Slide 25 text

Kubernetes client kubecfg := flag.String("kubeconfig", "", "Path to") if kubecfg == "" { config, _ := rest.InClusterConfig() } else { config, _ := clientcmd.BuildConfigFromFlags("", kubecfg) } client, err := kubernetes.NewForConfig(config)

Slide 26

Slide 26 text

Kubernetes client for { node, _ := client.Core().Nodes().Get(nodeName) fmt.Printf("Node has %d labels\n", len(node.Labels)) nodes,_ := client.Core().Nodes().List(v1.ListOptions{}) fmt.Printf("There are %d nodes\n", len(nodes.Items)) time.Sleep(10 * time.Second) }

Slide 27

Slide 27 text

Using the client Repeatedly retrieving from API can become expensive Get() / List() multiple times in code: need cache Want to be notified of object changes: need watches Use an informer!

Slide 28

Slide 28 text

Creating an Informer store, controller := cache.NewInformer( &cache.ListWatch{}, &v1.Node{}, resyncPeriod, cache.ResourceEventHandlerFuncs{}, )

Slide 29

Slide 29 text

Creating an Informer (ListWatch) fs := fields.OneTermEqualSelector("metadata.name", nodeName) cache.ListWatch{ ListFunc: func(lo api.ListOptions) (runtime.Object, error) { lo.FieldSelector = fs.String() return client.Core().Nodes().List(lo) }, WatchFunc: func(lo api.ListOptions) (watch.Interface, error) { lo.FieldSelector = fs.String() return client.Core().Nodes().Watch(lo) }, }

Slide 30

Slide 30 text

Creating an Informer (HandlerFuncs) cache.ResourceEventHandlerFuncs{ AddFunc: func(obj interface{}) {}, UpdateFunc: func(old, new interface{}) {}, DeleteFunc: func(obj interface{}) {}, }, // Example handler function func printNodeName (obj interface{}) { node := obj.(*v1.Node) fmt.Printf("Node is named %s\n", node.Name) }

Slide 31

Slide 31 text

Creating an Informer Resync Period ● UpdateFunc triggered for all objects at this interval ○ Re-queues objects from cache ● Use to re-sync full state ○ May have missed updates or prior actions failed

Slide 32

Slide 32 text

Creating a (psuedo) Informer store, controller := cache.NewInformer(...) controller.Run() store.List() store.Get(node)

Slide 33

Slide 33 text

Let's build our operator! We have all the information we need for agent & controller: ● Determining "self" via downward api ● Creating a kubernetes client ● Use Informer to watch updates to objects ● Use client to update state via annotations

Slide 34

Slide 34 text

Pseudo Reboot Agent

Slide 35

Slide 35 text

Pseudo Reboot Agent Desired behavior: ● When agent sees "reboot-now" annotation on local node: issue reboot. ● We don't care when nodes are added/removed, only if our own node's state has changed to include this annotation.

Slide 36

Slide 36 text

Pseudo Reboot Agent updateFn := func(old, new interface{}) { node := new.(*v1.Node) if _, ok := node.Annotations["reboot-now"]; ok { delete(node.Annotations, "reboot-now") client.Core.Nodes().Update(node) reboot() } } self = os.GetEnv("NODE_NAME") client := createKubeClient() store, controller := newRebootInfmr(self, client, updateFn) controller.Run()

Slide 37

Slide 37 text

Pseudo Reboot Controller

Slide 38

Slide 38 text

Pseudo Reboot Controller Desired behavior: When a node is updated we should: ● Determine if this node needs a reboot via "reboot-needed" annotation. ● Determine if it is safe to reboot the node by checking the number of unavailable nodes. ● And if it is safe, add the "reboot-now" annotation.

Slide 39

Slide 39 text

Pseudo Reboot Controller store, controller := cache.NewInformer( &cache.ListWatch{...}, &v1.Node{}, resyncPeriod, cache.ResourceEventHandlerFuncs{ // AddFunc: handler, UpdateFunc: func(_, new interface{}) {handler(new)}, // DeleteFunc: handler, }, )

Slide 40

Slide 40 text

Pseudo Reboot Controller func handler(obj interface{}) { node := obj.(*v1.Node) if _, ok := node.Annotations["reboot-needed"]; !ok { return // no reboot needed } if getUnavailable() >= MaxUnavailabe { return // Should not reboot more nodes } delete(node.Annotations, "reboot-needed") node.Annotations["reboot-now"] = "true" client.Core().Nodes().Update(node) }

Slide 41

Slide 41 text

Pseudo Reboot Controller func getUnavailable() int { var count int nodes := store.List() // Retrieve from cache for _, n := range nodes { if rebooting(n) or notReady(n) { count++ } } return count }

Slide 42

Slide 42 text

We're done!

Slide 43

Slide 43 text

But wait! There's more!

Slide 44

Slide 44 text

But wait! There's more! … our reboot operator has bugs

Slide 45

Slide 45 text

Bug: Don't mutate cache objects node := store.Get(nodeName) // Mutating cache object! node.Annotations["reboot-now"] = "true" // If it fails, your cache is now incorrect client.Core().Nodes().Update(node)

Slide 46

Slide 46 text

Fix: Make a copy cache object node := store.Get(nodeName) copyObj := api.Scheme.DeepCopy(node) nodeCopy := copyObj.(*v1.Node) // Now safe to modify the copy delete(nodeCopy.Annotations, "reboot-needed") client.Core().Nodes().Update(nodeCopy)

Slide 47

Slide 47 text

Bug: Operating on unsynced cache store, controller := cache.NewInformer(...) controller.Run() // Our cache might not have had time to populate If len(store.List()) == 0 { log.Fatal("No nodes exist") }

Slide 48

Slide 48 text

Fix: Wait for cache to sync store, controller := cache.NewInformer(...) controller.Run() if !cache.WaitForCacheSync(stopCh, controller.HasSynced) { log.Errorf("timed out waiting for cache to sync") } // Now start processing worker.Run()

Slide 49

Slide 49 text

// Taints are a new field in v1.6.x kubectl taint nodes foo dedicated=special:NoSchedule Bug: Update() could remove fields node := nodeStore.Get(nodeName) node.Annotations["foo"] = "bar" // If using client-go v2.0.0, we drop taint field client.Core().Nodes().Update(node)

Slide 50

Slide 50 text

Fix: Use Patch() patch, err := strategicpatch.CreateTwoWayMergePatch( oldData, newData, v1.Node{}) // Use patch to update only the changed fields client.Core().Nodes().Patch( nodeName, api.StrategicMergePatchType, patch)

Slide 51

Slide 51 text

Helpful tools (that were not covered) ● Leader Election ● Work Queues ● Third Party Resources ● Shared Informers ● Events

Slide 52

Slide 52 text

Leader Election ● Not yet in client-go library ○ https://github.com/kubernetes/client-go/issues/28 ○ But can be vendored from kubernetes directly ● For our reboot-controller: ○ Safely have multiple replicas / hot-spares.

Slide 53

Slide 53 text

Work Queues ● Added in github.com/aaronlevy/kube-controller-demo ● Parallelize processing ● Rate limiting (or other) queue types ● Reboot-controller should ○ Use work-queues to collapse multiple node-updates into a single "reboot node" work item. ○ Wait for cache sync before processing events

Slide 54

Slide 54 text

ThirdPartyResources ● Needs a presentation unto itself ○ Client-go examples/ ○ https://github.com/metral/memhog-operator ● Dynamically create new api types ● Can be used as the data model for custom controllers. ● Reboot-controller could implement reboot groups TPR ○ Master reboot group vs worker node reboot group

Slide 55

Slide 55 text

Shared Informers ● Multiple informers using same internal cache ● Lower memory overhead by not caching multiple times ● Behavior slightly different: ○ Cache at least as fresh as event - but could be "more" fresh. ○ e.g. "add" then "delete" - when you see the "add" event, the object may no longer be in cache. ● Common for kube-controller-manager - but in practice I haven't used (usually single-task controllers).

Slide 56

Slide 56 text

Events ● Easily surface information associated with API object. ○ kubectl describe node foo ○ kubectl get events ● Reboot-controller could emit events: ○ "Marking node foo for reboot" ○ "Max unavailable reached, skipping reboot"

Slide 57

Slide 57 text

Helpful Resources github.com/aaronlevy/kube-controller-demo ● Functional example of this presentation ● links to resources below (and some that didn't fit) github.com/kubernetes/community ● contributors/devel/controllers.md ● contributors/design-proposals/principles.md#control-logic github.com/kubernetes/client-go (examples directory) github.com/kubernetes/kubernetes (pkg/controller) github.com/metral/memhog-operator github.com/kbst/memcached (python operator)

Slide 58

Slide 58 text

Thanks You! Questions? [email protected] Github / Slack: @aaronlevy Twitter: @aaronjlevy