Upgrade to Pro — share decks privately, control downloads, hide ads and more …

What We Can Do with Kubernetes Custom Controllers

What We Can Do with Kubernetes Custom Controllers

Zespre Chang

July 31, 2024
Tweet

More Decks by Zespre Chang

Other Decks in Programming

Transcript

  1. Zespre Chang, NTUST, 2024-08-04 What We Can Do With Kubernetes

    Custom Controllers? KCD Taipei 2024 COSCUP 2024
  2. About Me • Zespre (Chih-Hsin) Chang • Email: [email protected]

    Blog: https://blog.zespre.com • Roles • Senior Software Engineer at SUSE • Consultant at FreeBSD Foundation • Projects I’m currently working on • Harvester HCI • OpenStack on FreeBSD • KubeVirtBMC 2
  3. Overview So you want to write a Kubernetes custom controller

    • What pushes you to do this (the WHY) • To enable something that never works/exists in the current K8s cluster? • To create a better wheel than the existing one? • To be the coolest kid in the town • What do you want for it to achieve (the WHAT) • To be the operator for some other applications (configuration/life-cycle management)? • To be a cloud-native application? • What libraries/tools/framework do you want to leverage (the HOW) • client-go/controller-runtime • kubebuilder • Operator SDK • Lasso/Wrangler 3
  4. Overview Operators vs. cloud-native applications • Operator • Ever watched

    The Matrix Trilogy? • You are the operator • Ensure the installation/removal workflow works for the target app • Reflect config changes to the target app • Cloud-native application • Build from the ground up • Might need to revamp the architecture (keep in mind that this will run on a cluster) 4
  5. Outline • Concepts • Custom Resource Definition • Control Loop

    & Operator • Design Tricks • Case Studies • KubeVirtBMC • Harvester’s Upgrade Controller • Harvester’s Managed DHCP Service 6
  6. Custom Resource Definition (CRD) Data schema • API group, version,

    and kind (GVK) • Main blocks (applies to almost all the K8s resources) • `.spec` defines the desired state • `.status` represents the observed state (~= actual state) • Objects instantiated from the CRD template called custom resource (denoted as CR thereafter) • Store all the important bits for controllers to reconcile 7
  7. Control Loop, Controller, and Operator Confusing terms demystified • A

    control loop (reconciler) • Reads the state of resources from etcd via the API server • Changes the state of resources and takes necessary actions to the external world • Updates the status of resources in etcd via the API server • A controller implements a control loop • An operator embeds operational knowledge and consists of several controllers 8
  8. Dealing with Conflicts Optimistic concurrency • Write conflicts happen all

    the time in the distributed world • Locking mechanism introduces performance issues • Why is it called optimistic? • API server is optimistic • “It’s all their (clients’) fault” • Part of the controller logic is optimistic • YOU are the one who is pessimistic framework to the rescue! • Implementation detail • Every Kubernetes object has `resourceVersion` • Write operation rejected if `resourceVersion` conflict • Clients, i.e., controllers, are obliged to do the retry • Intermediate business logic must be retriable (optimistic) 9
  9. Admission Webhooks Event & object • HTTP callbacks that handle

    objects before persisted • Mutator • Invoked first • Patch the requested objects • Typical use cases: add “default values” and relevant “labels” • Validator • Think of it as a particular case of mutator: do no alternation to the objects • Reject the request if it does not fulfill the criteria • Typical use case: check if “the requirements have been met” or “is safe to delete” • A good way to hijack objects before other entities process them • Handy, but beware • Better to be lightweight with no side effects • Responsibility: do not overtake controllers 10
  10. Design Rule of thumb • Declarative idempotency • Describe the

    desired state • Always get the same results no matter how many times being run • Orthogonal • Each set of conditions represents a different aspect of the object • State machine 11
  11. Design Tricks • Abuse the annotations (`.metadata.annotations`) • Easy, no

    need to fiddle with CRD schema, no conversions 😉 • Flexible, you can store whatever you want, data, states, signals, etc. 😉 • Chaotic, hard to maintain if the number grows 😵💫 • Burdensome, don’t know how to use at first glance 😵💫 • Use at your own risk 😈 12
  12. Modeling CRDs • Take Wrangler as an example • Will

    be used to generating controller templates, clientsets, etc. 15 // +genclient // +k8s:deepcopy-gen:interfaces=k8s.io/apimachinery/pkg/ runtime.Object type Order struct { metav1.TypeMeta `json:",inline"` metav1.ObjectMeta `json:"metadata,omitempty"` Spec OrderSpec `json:"spec,omitempty"` Status OrderStatus `json:"status,omitempty"` } type OrderSpec struct { MealConfig `json:"mealConfig"` Sets int `json:"sets"` Chef string `json:"chef,omitempty"` } type OrderStatus struct { Condition string `json:"condition"` }
  13. Scaffold One step away… • Take Wrangler as an example

    • Generate • Controllers • Clientsets 16 controllergen.Run(args.Options{ OutputPackage: "github.com/starbops/kitchen/pkg/generated", Boilerplate: "scripts/boilerplate.go.txt", Groups: map[string]args.Group{ "kitchen.com": { Types: []interface{}{ "./pkg/apis/kitchen.com/v1", }, GenerateTypes: true, GenerateClients: true, }, }, })
  14. Reconciling Logic Writing event handlers • Take Wrangler as an

    example • Writing a controller • Scheme • Create a pluggable scheme • Add types into the scheme 17 // create factory kitchenFactory, err := ctlkitchen.NewFactoryFromConfig(restConfig) if err != nil { return nil, err } // instantiate controllers from factory for Order and Meal resource types orders := kitchenFactory.Kitchen().V1().Order() meals := kitchenFactory.Kitchen().V1().Meal() // implement event handler for Order resource type onOrderChange := func(id string, obj *v1.Order) (*v1.Order, error) { meal := &v1.Meal{ // fill in Meal details } // creat new Meal resources using k8s client for i := 0; i < obj.Spec.Replicas; i++ { if _, err := meals.Create(meal); err != nil { return obj, err } } // avoid acting on caches objCopy := obj.DeepCopy() objCopy.Status.Processed = true // update Order resource using k8s client return orders.Update(objCopy) } // register event handler orders.OnChange(context.Background(), "order-controller", onOrderChange) // kick off everything kitchenFactory.Start(threadiness)
  15. Debugging Controllers • Run the controller binary locally • Attach

    to a live cluster with adequate permissions • Observe the logs from your console • Stepping code execution 18
  16. Testing Controllers Setting up and tear down • Before suite

    • Start the test cluster with envtest (only API server and etcd) • Add the types to the default client-go K8s scheme • Create a client for test CRUD operations • Hook up your custom controller/event handler • Start the controller manager • After suite • Stop the controller manager • Stop the test cluster 19
  17. Packaging & Shipping • Package the built binaries into a

    container image • Write manifests to run the application/controller on Kubernetes • Deployment/DaemonSet/StatefulSet • Ingress/Service/Endpoint • RBAC • … etc. • Derive Helm charts for distributing the application and easy installation 20
  18. KubeVirtBMC https://github.com/starbops/kubevirtbmc • Initially a SUSE HackWeek project of mine

    • Formerly called KubeBMC, but found it confusing • Built with kubebuilder • Target to provide a subset of IPMI & Redfish functionalities for KubeVirt VMs • Power management • Boot device management • Image mount • How-tos: https://github.com/starbops/kubevirtbmc/wiki#the-slow-start-guide • Proposed to join the KubeVirt organization • Contributions are welcome 🤗 22
  19. What’s a BMC? A bit of background, just a bit…

    • A tiny computer that (almost) always watches the server’s back • Up and running as long as the power cord is plugged • Provide a specific set of services, including power and boot device management over the network (so-called out-of-band management) • “Redfish is the new IPMI” 23
  20. Inner Workings of KubeVirtBMC An operator-like design • Work like

    OpenStack VirtualBMC but in a cloud/Kubernetes-native fashion • Distributed and horizontal-scalable • Leverage K8s APIs • Business logic • `virtbmc-controller` (manager) • Reconcile VirtualMachineBMC, VirtualMachine, and Service objects • Deploy virtbmc Pod and Service for each VM • `virtbmc` (agent) • Run an IPMI simulator • Decode IPMI commands and translate them to K8s API calls 24
  21. Harvester https://github.com/harvester/harvester • An open-source HCI project/product • Turn-key solution

    • No vendor lock-in • Key features • Container & VM workload • Rancher Integration • Based on immutable OS • Tech stack • KubeVirt • Longhorn • Rancher • RKE2 • Roadmap (subject to change) • RWX volume • 3rd-party storage • VXLAN (you heard that right) 26
  22. Upgrade Controller A one-click and zero-downtime design • Design with

    state machine in mind • Introduce a new CRD - Upgrade (`upgrades.harvesterhci.io/v1beta1`) • Workflow control point • Upgrade relevant metadata • A bunch of control loops reconcile different kinds of (custom) resources • Upgrade • Pod/Job/Secret/Node • CRDs from other projects include Plan, ManagedChart, etc. 27
  23. Upgrade Controller Drawbacks • Blind men and an elephant (user’s

    perspective) • An upgrade involves almost all kinds of objects • Logs are scattered in different pods • Hard to troubleshoot if something goes wrong • A slight move in one part can affect the whole system • It never rains but it pours 29
  24. vm-dhcp-controller https://github.com/harvester/vm-dhcp-controller • A cloud-native DHCP server for KubeVirt-powered VMs

    • Designed with control and data planes • Topology-aware • Currently only work with Harvester • Try to be provider-agnostic, but apparently it’s not possible • Built with Wrangler framework 31
  25. Inner Workings of vm-dhcp-controller An operator-like design • Work like

    traditional DHCP servers but in a cloud/Kubernetes-native fashion • Business logic • VM DHCP controller (manager) • Reconcile IPPool, VirtualMachineNetworkConfig, and VirtuamMachine objects • Watch Pod events, then trigger IPPool resync • Deploy agents according to IPPool objects and decide where to deploy depending on the following criteria • Target network to serve (net-attach-def) • Nodes that comprise the target network (node affinity) • Handle IP address allocation/deallocation (IPAM) • VM DHCP agent • Sync with API server for IPPool to ensure in-memory DHCP leases • Run a native DHCP serving process 32
  26. eth0 eth0 eth0 control control plane plane control plane data

    data plane plane data plane ippool ippool controller controller ippool controller scaffold scaffold scaffold vmnetcfg vmnetcfg controller controller vmnetcfg controller dhcp dhcp server server dhcp server vmnetcfg vmnetcfg vmnetcfg watch/ watch/ update update watch/ update lease lease store store lease store ippool ippool ippool update update status status update status watch/ watch/ update update watch/ update ippool ippool syncer syncer ippool syncer watch watch watch virtualmachine virtualmachine virtualmachine vm vm controller controller vm controller watch watch watch create create create VM VM VM enp1s0 enp1s0 enp1s0 aa:bb:cc:dd:ee:ff aa:bb:cc:dd:ee:ff 192.168.48.99/24 192.168.48.99/24 aa:bb:cc:dd:ee:ff 192.168.48.99/24 eth1 eth1 eth1 dhcp dhcp req/resp req/resp dhcp req/resp name: name: prod-net prod-net cidr: cidr: 192.168.48.0/24 192.168.48.0/24 gw: gw: 192.168.48.1 192.168.48.1 --- --- allocated: allocated: 192.168.48.99 192.168.48.99 name: prod-net cidr: 192.168.48.0/24 gw: 192.168.48.1 --- allocated: 192.168.48.99 ip: ip: 192.168.48.99 192.168.48.99 mac: mac: aa:bb:cc:dd:ee:ff aa:bb:cc:dd:ee:ff network: network: prod-net prod-net ip: 192.168.48.99 mac: aa:bb:cc:dd:ee:ff network: prod-net mac: mac: aa:bb:cc:dd:ee:ff aa:bb:cc:dd:ee:ff network: network: prod-net prod-net mac: aa:bb:cc:dd:ee:ff network: prod-net enp2s0 enp2s0 enp2s0 dhcp dhcp req/resp req/resp dhcp req/resp vm-dhcp-controller vm-dhcp-controller component component vm-dhcp-controller component vm-dhcp-controller vm-dhcp-controller CR CR vm-dhcp-controller CR other other k8s k8s CRs CRs other k8s CRs ipam ipam ipam pool pool cache cache pool cache update update update sync sync sync aa:bb:cc:dd:ee:ff aa:bb:cc:dd:ee:ff 192.168.48.99/24 192.168.48.99/24 aa:bb:cc:dd:ee:ff 192.168.48.99/24 mac mac cache cache mac cache update update diff diff update diff update update update
  27. Design Decisions Why this but not that? • Should I

    design/model it like a giant state machine? • Yes, you can! But… • Bear in mind that following the orthogonality could save your ass 35
  28. Design Decisions Why this but not that? • Why not

    run an existing solution like ISC Kea or Dnsmasq and then maintain its configuration? • It could be done that way… • At the cost of customizability, scalability, and the possibility of not- playing-well-with-containers • It becomes a totally different set of problems to solve • It’s boring 36
  29. Design Decisions What tools/frameworks to use? • kubebuilder • With

    a command line interface • Generate near-ready-to-ship projects • You are (only) responsible for implementing control loops • Recommended for beginners • Wrangler • Feel like actually writing “controllers” • Handy and feature-rich 37
  30. References • “Programming Kubernetes” - by Michael Hausenblas, Stefan Schimanski

    • The Mechanics of Kubernetes • Events, the DNA of Kubernetes • What the heck are Conditions in Kubernetes controllers? • Groups and Versions and Kinds, oh my! • Introduction to the Informer Pattern 38