Slide 1

Slide 1 text

Controller: Extending your K8s cluster Terin Stock Ross Guarino

Slide 2

Slide 2 text

Introductions Terin Stock @terinjokes Ross Guarino @0xRLG

Slide 3

Slide 3 text

Kubernetes at Cloudflare

Slide 4

Slide 4 text

Pre-Kubernetes at Cloudflare Salt JIRA Laptop Mesos Code Review Emails

Slide 5

Slide 5 text

Pre-Kubernetes at Cloudflare Low Cohesion & High Coupling

Slide 6

Slide 6 text

Keeping it Simple ● Automate processes ○ Make correct the easiest ● Abstract implementation ○ Move the decision making elsewhere ● Remove duplicate state ○ Say it only once

Slide 7

Slide 7 text

Kubernetes Out of the box We run all of our own physical infrastructure. So, Kubernetes the Salt way is the only option

Slide 8

Slide 8 text

Mind the Gap

Slide 9

Slide 9 text

● No cloud load balancer ○ Leifur: a load balancer for bare metal ● Hardware scales much slower ○ Pyli: automate user and namespace creation and RBAC provisioning ● Existing telemetry ○ Rule Loader: configure Prometheus from ConfigMaps Without a Cloud Provider...

Slide 10

Slide 10 text

Can’t believe it’s not Serverless!

Slide 11

Slide 11 text

Controllers are: ● Simple ● Reliable ● Event driven ● Easy to write Can’t believe it’s not Serverless!

Slide 12

Slide 12 text

Example Problem We want a namespace for every developer on the Kubernetes cluster. Possible Solutions: ● Offload it to the IT department ● Onboarding tasks the new hire does their first week ● Write a standalone service

Slide 13

Slide 13 text

Or, Writing a Controller We can write a controller which maintains the relationship between a User and a Namespace

Slide 14

Slide 14 text

4 Steps to Writing a Controller 1. Define your Custom Resource Definition 2. Generate Client Code 3. Listen for events 4. Handle events in queue

Slide 15

Slide 15 text

Define a Custom Resource

Slide 16

Slide 16 text

Creating a Custom Resource for Users apiVersion: apiextensions.k8s.io/v1beta1 kind: CustomResourceDefinition metadata: name: users.example.com spec: group: example.com version: v1 scope: Cluster names: plural: users singular: user kind: User shortNames: - usr

Slide 17

Slide 17 text

Validating Objects ● Resources can be checked against OpenAPI v3 schema on admission

Slide 18

Slide 18 text

K8s Code Gen

Slide 19

Slide 19 text

Generating Client Code github.com/kubernetes/code-generator ● Client Code ● Informers ● Listers ● DeepCopy

Slide 20

Slide 20 text

pkg/apis/example.com/v1/types.go // +genclient // +genclient:noStatus // +k8s:deepcopy-gen=true // +k8s:deepcopy-gen:interfaces=k8s.io/apimachinery/pkg/runtime.Object type User struct { metav1.TypeMeta `json:",inline"` metav1.ObjectMeta `json:"metadata,omitempty"` Spec UserSpec`json:"spec"` } // +k8s:deepcopy-gen=true type UserSpec struct { DisplayName string `json:"display_name"` }

Slide 21

Slide 21 text

Installing the Generator Add the code generator to your Gopkg.toml file: required = ["k8s.io/code-generator/cmd/client-gen"] $ dep ensure

Slide 22

Slide 22 text

Running the generator $ ./vendor/k8s.io/code-generator/generate-groups.sh \ all \ example.com/pkg/clientexample.com/pkg/apis \ example.com:v1

Slide 23

Slide 23 text

pkg/ └── apis └── example.com └── v1 ├── docs.go ├── register.go └── types.go pkg/ ├── apis │ └── example.com │ └── v1 │ ├── docs.go │ ├── register.go │ ├── types.go │ └── zz_generated.deepcopy.go └── client ├── clientset │ └── [...] ├── informers │ └── [...] └── listers │ └── [...]

Slide 24

Slide 24 text

Listening for Events

Slide 25

Slide 25 text

Informers ● React to the changes in resources ● Reduce the burden on API server ● Populate read-only cache (Lister) ● Prevents polling

Slide 26

Slide 26 text

Listers ● Read-only cache populated by Informers ● Reduce burden on API server

Slide 27

Slide 27 text

Work Queues Simple, Intelligent Workqueue: ● Stingy ● Fair ● Multiple Consumers and Producers

Slide 28

Slide 28 text

What goes on the queue? 1. Queues use equivalent to determine duplicate keys 2. The simpler the objects the better 3. Usually .metadata.name works well

Slide 29

Slide 29 text

queue := workqueue.NewRateLimitingQueue() informers := informers.NewSharedInformerFactory( clientSet, time.Second * 30 ) func enqueueUser(queue workqueue.Type, obj interface) { key, _ := cache.DeletionHandlingMetaNamespaceKeyFunc(obj) queue.Add(key) }

Slide 30

Slide 30 text

informers.Example().AddEventHandler( &cache.ResourceEventHandlerFuncs{ AddFunc: func(obj interface{}) error { return enqueueUser(queue, obj) }, UpdateFunc: func(_, obj interface{}) error { return enqueueUser(queue, obj) }, DeleteFunc: func(obj interface{}) error { return enqueueUser(queue, obj) }, })

Slide 31

Slide 31 text

● Update the child’s metadata.ownerReferences to reflect the relationship ● In our case we want to be notified if namespace we care about changes. Watching Children

Slide 32

Slide 32 text

Handling Events

Slide 33

Slide 33 text

Worker go routine ● Pops items off of the queue and calls a work function until instructed to stop ● Cannot block forever on one item Work Functions: ● Handle Deletion ● Idempotent

Slide 34

Slide 34 text

func processWorkItem( queue workqueue.Delaying, workFn func(context.Context, string) error ) { // get the item or signal to quit key, quit := q.Get() if quit { return false } defer q.Done(key) // Tell the queue we’re done processing this item ctx, cancel := context.WithTimeout(context.Background(), 30*time.Second) defer cancel() err := workFn(ctx, key.(string)) if err == nil { q.Forget(key) // Mark the work as successful return true } q.AddRateLimited(key) // Retry at a later time )

Slide 35

Slide 35 text

Tips & Tricks

Slide 36

Slide 36 text

● Don’t panic, just return ● Return errors but don’t retry in your worker functions ● Let the queue retry them Don’t handle Transient Errors

Slide 37

Slide 37 text

Handling Deletion Do you have external state? Do you need to guarantee you witness a deletion?

Slide 38

Slide 38 text

Don’t Handle the OnDelete Differently Avoid duplicating & complicating your code. Consider this a best-effort optimization opportunity for later.

Slide 39

Slide 39 text

No? Use Kubernetes Garbage Collection “[The garbage collector will delete] objects that once had an owner, but no longer have an owner.” There’s no code to write! Since we’ve already set up ownerReferences for notifications.

Slide 40

Slide 40 text

Yes? Use Finalizers for deletion ● Don’t rely on noticing the deletion event ● Use a finalizer to handle deletions

Slide 41

Slide 41 text

How do finalizers work? When you delete a resources with Finalizers Kubernetes will wait until all existing Finalizers are removed then finally delete the resources.

Slide 42

Slide 42 text

On resource deletion, Kubernetes waits for each Finalizer to complete before removing the resource.

Slide 43

Slide 43 text

func syncUser( key string, client exampleclient.ClientSet, userLister listers.UserLister, k8sClient kuberentes.ClientSet, nsLister lister.NamespaceLister ) error { // Get the User from the cache cached, _ := userLister.Get(key) if cached.DeletionTimestamp.IsZero() && apiextensions.CRDHasFinalizer(cachedCRD, "example.com") { // HANDLE DELETE // Remove example.com from Finalizer list } // HANDLE UPDATE/CREATE }

Slide 44

Slide 44 text

TAK, QUESTIONS?