Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Writing a Custom Kubernetes Operator

Writing a Custom Kubernetes Operator

Presentation video: https://www.youtube.com/watch?v=U2Bm-Fs3b7Q

Much of the functionality in a Kubernetes cluster is managed by a reconciliation pattern within "controllers". The node, service, or deployment controllers (just to name a few) watch for changes to objects, then act on those changes to drive your cluster to a desired state. This same pattern can be used to implement custom logic, which can be used to extend the functionality of your cluster without ever needing to modify Kubernetes itself.

This talk will cover how to implement your own custom controller, from contacting the Kubernetes API to using existing libraries to easily watch, react, and update components in your cluster. By building on existing functionality and following a few best practices, you can quickly and easily implement your own custom controller.

Aaron Levy

June 06, 2017
Tweet

Other Decks in Technology

Transcript

  1. Why would you write an Operator? Without modifying the core

    Kubernetes codebase: • Extend functionality of existing objects • Add new concepts / functionality to cluster • Automate administration of cluster applications • Replace existing cluster components
  2. Why does CoreOS write Operators? • Extend functionality of existing

    objects ◦ Coordinating Container Linux updates across nodes • Add new concepts / functionality to cluster ◦ etcd-operator - easily launch/manage etcd clusters • Automate administration of cluster applications ◦ Automated updates of Tectonic installations • Replace existing cluster components ◦ Well, we don't do this - but it's possible!
  3. What is the "controller pattern"? Kubernetes controllers are based on

    an active reconciliation process. 1. Watch both the desired state and the current state. 2. Move the actual state to be more like the desired state. for { desired := getDesiredState() current := getCurrentState() makeChanges(desired, current) }
  4. Controller pattern used in: Kube-Controller-Manager • Deployment • Daemonset •

    Node • Service • Endpoint • ... Cluster applications • Kube-DNS • ingress • flannel • etcd-operator • Prometheus-operator • ...
  5. Let's build an example We will build a “Node Reboot

    Operator”. Allows an administrator to: • Trigger a node reboot via kubectl • Support “rolling” reboots across the cluster
  6. Example Node Reboot Operator • Reboot Agent (all nodes) •

    Reboot Controller (single instance) • Coordination via annotations on node object ◦ reboot-needed ◦ reboot-now ◦ reboot-in-progress
  7. Reboot Agent - behavior • Daemonset with pod running on

    all nodes • Watch "reboot-now" annotation on own node object ◦ remove "reboot-now" annotation ◦ add "reboot-in-progress" annotation ◦ Reboot node
  8. Reboot Controller - behavior • Deployment (replica=1) • Watches for

    nodes with "reboot-needed" annotation • Counts "unavailable" nodes ◦ not ready or reboot-in-progress • While unavailable < MaxUnavailable ◦ Remove "reboot-needed" annotation ◦ Set "reboot-now" annotation on a node
  9. Node 3 Node 2 Node 1 Reboot Controller Reboot Agent

    Reboot Agent Reboot Agent Max Unavailable = 1 Unavailable = 0 Needs Reboot Reboot Now Ready
  10. Node 3 Node 2 Node 1 Reboot Controller Reboot Agent

    Reboot Agent Reboot Agent Max Unavailable = 1 Unavailable = 1 Needs Reboot Reboot Now Ready
  11. Node 3 Node 2 Node 1 Reboot Controller Reboot Agent

    Reboot Agent Reboot Agent Max Unavailable = 1 Unavailable = 0 Needs Reboot Reboot Now Ready
  12. Node 3 Node 2 Node 1 Reboot Controller Reboot Agent

    Reboot Agent Reboot Agent Max Unavailable = 1 Unavailable = 1 Needs Reboot Reboot Now Ready
  13. Node 3 Node 2 Node 1 Reboot Controller Reboot Agent

    Reboot Agent Reboot Agent Max Unavailable = 1 Unavailable = 0 Needs Reboot Reboot Now Ready
  14. Node 3 Node 2 Node 1 Reboot Controller Reboot Agent

    Reboot Agent Reboot Agent Max Unavailable = 1 Unavailable = 1 Needs Reboot Reboot Now Ready
  15. Writing a custom operator Functional example: • github.com/aaronlevy/kube-controller-demo • Following

    some upstream patterns (and go client) for using these controller patterns - but there is no singular “right way”.
  16. Writing a custom operator • https://github.com/kubernetes/client-go • Versioning is important

    (see README.md for more info) ◦ Presentation assumes v3.0.0-beta.0
  17. Writing a custom operator Building Blocks: • Creating a Kubernetes

    client to interact with API • Downward API for "self" introspection • Using an Informer for object cache & event handling • Communicating desired/actual state via annotations
  18. env var: node := os.Getenv("NODE_NAME") client.Core().Nodes().Get(node) Or as flag: -

    name: reboot-agent command: ./agent --node=$(NODE_NAME) Downward API: apiVersion: v1 kind: Pod metadata: name: reboot-agent spec: containers: - name: reboot-agent env: - name: NODE_NAME valueFrom: fieldRef: fieldPath: spec.nodeName Reboot Agent - Determining "self"
  19. Kubernetes client kubecfg := flag.String("kubeconfig", "", "Path to") if kubecfg

    == "" { config, _ := rest.InClusterConfig() } else { config, _ := clientcmd.BuildConfigFromFlags("", kubecfg) } client, err := kubernetes.NewForConfig(config)
  20. Kubernetes client for { node, _ := client.Core().Nodes().Get(nodeName) fmt.Printf("Node has

    %d labels\n", len(node.Labels)) nodes,_ := client.Core().Nodes().List(v1.ListOptions{}) fmt.Printf("There are %d nodes\n", len(nodes.Items)) time.Sleep(10 * time.Second) }
  21. Using the client Repeatedly retrieving from API can become expensive

    Get() / List() multiple times in code: need cache Want to be notified of object changes: need watches Use an informer!
  22. Creating an Informer (ListWatch) fs := fields.OneTermEqualSelector("metadata.name", nodeName) cache.ListWatch{ ListFunc:

    func(lo api.ListOptions) (runtime.Object, error) { lo.FieldSelector = fs.String() return client.Core().Nodes().List(lo) }, WatchFunc: func(lo api.ListOptions) (watch.Interface, error) { lo.FieldSelector = fs.String() return client.Core().Nodes().Watch(lo) }, }
  23. Creating an Informer (HandlerFuncs) cache.ResourceEventHandlerFuncs{ AddFunc: func(obj interface{}) {}, UpdateFunc:

    func(old, new interface{}) {}, DeleteFunc: func(obj interface{}) {}, }, // Example handler function func printNodeName (obj interface{}) { node := obj.(*v1.Node) fmt.Printf("Node is named %s\n", node.Name) }
  24. Creating an Informer Resync Period • UpdateFunc triggered for all

    objects at this interval ◦ Re-queues objects from cache • Use to re-sync full state ◦ May have missed updates or prior actions failed
  25. Let's build our operator! We have all the information we

    need for agent & controller: • Determining "self" via downward api • Creating a kubernetes client • Use Informer to watch updates to objects • Use client to update state via annotations
  26. Pseudo Reboot Agent Desired behavior: • When agent sees "reboot-now"

    annotation on local node: issue reboot. • We don't care when nodes are added/removed, only if our own node's state has changed to include this annotation.
  27. Pseudo Reboot Agent updateFn := func(old, new interface{}) { node

    := new.(*v1.Node) if _, ok := node.Annotations["reboot-now"]; ok { delete(node.Annotations, "reboot-now") client.Core.Nodes().Update(node) reboot() } } self = os.GetEnv("NODE_NAME") client := createKubeClient() store, controller := newRebootInfmr(self, client, updateFn) controller.Run()
  28. Pseudo Reboot Controller Desired behavior: When a node is updated

    we should: • Determine if this node needs a reboot via "reboot-needed" annotation. • Determine if it is safe to reboot the node by checking the number of unavailable nodes. • And if it is safe, add the "reboot-now" annotation.
  29. Pseudo Reboot Controller store, controller := cache.NewInformer( &cache.ListWatch{...}, &v1.Node{}, resyncPeriod,

    cache.ResourceEventHandlerFuncs{ // AddFunc: handler, UpdateFunc: func(_, new interface{}) {handler(new)}, // DeleteFunc: handler, }, )
  30. Pseudo Reboot Controller func handler(obj interface{}) { node := obj.(*v1.Node)

    if _, ok := node.Annotations["reboot-needed"]; !ok { return // no reboot needed } if getUnavailable() >= MaxUnavailabe { return // Should not reboot more nodes } delete(node.Annotations, "reboot-needed") node.Annotations["reboot-now"] = "true" client.Core().Nodes().Update(node) }
  31. Pseudo Reboot Controller func getUnavailable() int { var count int

    nodes := store.List() // Retrieve from cache for _, n := range nodes { if rebooting(n) or notReady(n) { count++ } } return count }
  32. Bug: Don't mutate cache objects node := store.Get(nodeName) // Mutating

    cache object! node.Annotations["reboot-now"] = "true" // If it fails, your cache is now incorrect client.Core().Nodes().Update(node)
  33. Fix: Make a copy cache object node := store.Get(nodeName) copyObj

    := api.Scheme.DeepCopy(node) nodeCopy := copyObj.(*v1.Node) // Now safe to modify the copy delete(nodeCopy.Annotations, "reboot-needed") client.Core().Nodes().Update(nodeCopy)
  34. Bug: Operating on unsynced cache store, controller := cache.NewInformer(...) controller.Run()

    // Our cache might not have had time to populate If len(store.List()) == 0 { log.Fatal("No nodes exist") }
  35. Fix: Wait for cache to sync store, controller := cache.NewInformer(...)

    controller.Run() if !cache.WaitForCacheSync(stopCh, controller.HasSynced) { log.Errorf("timed out waiting for cache to sync") } // Now start processing worker.Run()
  36. // Taints are a new field in v1.6.x kubectl taint

    nodes foo dedicated=special:NoSchedule Bug: Update() could remove fields node := nodeStore.Get(nodeName) node.Annotations["foo"] = "bar" // If using client-go v2.0.0, we drop taint field client.Core().Nodes().Update(node)
  37. Fix: Use Patch() patch, err := strategicpatch.CreateTwoWayMergePatch( oldData, newData, v1.Node{})

    // Use patch to update only the changed fields client.Core().Nodes().Patch( nodeName, api.StrategicMergePatchType, patch)
  38. Helpful tools (that were not covered) • Leader Election •

    Work Queues • Third Party Resources • Shared Informers • Events
  39. Leader Election • Not yet in client-go library ◦ https://github.com/kubernetes/client-go/issues/28

    ◦ But can be vendored from kubernetes directly • For our reboot-controller: ◦ Safely have multiple replicas / hot-spares.
  40. Work Queues • Added in github.com/aaronlevy/kube-controller-demo • Parallelize processing •

    Rate limiting (or other) queue types • Reboot-controller should ◦ Use work-queues to collapse multiple node-updates into a single "reboot node" work item. ◦ Wait for cache sync before processing events
  41. ThirdPartyResources • Needs a presentation unto itself ◦ Client-go examples/

    ◦ https://github.com/metral/memhog-operator • Dynamically create new api types • Can be used as the data model for custom controllers. • Reboot-controller could implement reboot groups TPR ◦ Master reboot group vs worker node reboot group
  42. Shared Informers • Multiple informers using same internal cache •

    Lower memory overhead by not caching multiple times • Behavior slightly different: ◦ Cache at least as fresh as event - but could be "more" fresh. ◦ e.g. "add" then "delete" - when you see the "add" event, the object may no longer be in cache. • Common for kube-controller-manager - but in practice I haven't used (usually single-task controllers).
  43. Events • Easily surface information associated with API object. ◦

    kubectl describe node foo ◦ kubectl get events • Reboot-controller could emit events: ◦ "Marking node foo for reboot" ◦ "Max unavailable reached, skipping reboot"
  44. Helpful Resources github.com/aaronlevy/kube-controller-demo • Functional example of this presentation •

    links to resources below (and some that didn't fit) github.com/kubernetes/community • contributors/devel/controllers.md • contributors/design-proposals/principles.md#control-logic github.com/kubernetes/client-go (examples directory) github.com/kubernetes/kubernetes (pkg/controller) github.com/metral/memhog-operator github.com/kbst/memcached (python operator)