Slide 1

Slide 1 text

Keeping up with k8s cluster upgrades @tasdikrahman | www.tasdikrahman.com Kubernetes Bangalore Meetup September 2022

Slide 2

Slide 2 text

Keeping up with k8s cluster upgrades

Slide 3

Slide 3 text

Keeping up with k8s cluster upgrades Trying to keep up

Slide 4

Slide 4 text

Why should you upgrade, anyway?

Slide 5

Slide 5 text

Why should you upgrade, any way API deprecations

Slide 6

Slide 6 text

Why should you upgrade, any way API deprecations New API’s serve you better

Slide 7

Slide 7 text

Why should you upgrade, any way API deprecations New API’s serve you better Security patches for CVEs

Slide 8

Slide 8 text

Why should you upgrade, any way API deprecations New API’s serve you better Security patches for CVEs Version deprecations by the provider

Slide 9

Slide 9 text

Why should you upgrade, any way API deprecations New API’s serve you better Security patches for CVEs Version deprecations by the provider Incremental changes introduced

Slide 10

Slide 10 text

Why should you upgrade, any way API deprecations New API’s serve you better Security patches for CVEs Version deprecations by the provider Incremental changes introduced Bottomline: Upgrade to prevent infrastructure rot

Slide 11

Slide 11 text

Is this specific to k8s?

Slide 12

Slide 12 text

Is this specific to k8s? ● Compute layer upgrades are not new

Slide 13

Slide 13 text

Is this specific to k8s? ● Compute layer upgrades are not new ● Inventory of what version of OS/language stack is running on compute instances.

Slide 14

Slide 14 text

Is this specific to k8s? ● Compute layer upgrades are not new ● Inventory of what version of OS/language stack is running on compute instances. ● Handling of deprecation of support of OS versions and language stacks

Slide 15

Slide 15 text

Is this specific to k8s? ● Compute layer upgrades are not new ● Inventory of what version of OS/language stack is running on compute instances. ● Handling of deprecation of support of OS versions and language stacks ● Fixing CVE fix patches on these machines.

Slide 16

Slide 16 text

Is this specific to k8s? ● Compute layer upgrades are not new ● Inventory of what version of OS/language stack is running on compute instances. ● Handling of deprecation of support of OS versions and language stacks ● Fixing CVE fix patches on these machines. ● Manage via golden AMI’s, blue green replacements, upgrade in place.

Slide 17

Slide 17 text

Release pace for some managed providers so far

Slide 18

Slide 18 text

Release pace for providers - eks Source: https://endoflife.date/amazon-eks

Slide 19

Slide 19 text

Release pace for providers - gke Source: https://endoflife.date/google-kubernetes-engine

Slide 20

Slide 20 text

Release pace for other providers may follow a similar cadence

Slide 21

Slide 21 text

Reason for such a release velocity

Slide 22

Slide 22 text

Upstream has roughly 4 releases/year

Slide 23

Slide 23 text

A new version every 3 months

Slide 24

Slide 24 text

KEP 2572 sig-release https://github.com/kubernetes/enhancements/blob/master/keps/sig-release/2572-release-cadence/README.md#summary

Slide 25

Slide 25 text

KEP 2572 sig-release https://github.com/kubernetes/enhancements/blob/master/keps/sig-release/2572-release-cadence/README.md#summary

Slide 26

Slide 26 text

K8s release cadence - discussion https://github.com/kubernetes/sig-release/discussions/1290

Slide 27

Slide 27 text

What happens when you don’t upgrade?

Slide 28

Slide 28 text

Force upgradation of your control plane

Slide 29

Slide 29 text

Outcome? Possible Outage

Slide 30

Slide 30 text

Security incidents

Slide 31

Slide 31 text

Optimising for high velocity k8s upgrades

Slide 32

Slide 32 text

Managing state via terraform/cloudformation or something which helps you manage the state

Slide 33

Slide 33 text

self hosting your own k8s cluster

Slide 34

Slide 34 text

self hosting your own k8s cluster Of course not! avoid unless having a very special case

Slide 35

Slide 35 text

Having a staging setup

Slide 36

Slide 36 text

Having a staging setup Having staging as close to prod

Slide 37

Slide 37 text

Have staging run x version for y time before upgrading prod to x version

Slide 38

Slide 38 text

Offload to managed compute nodes

Slide 39

Slide 39 text

Smaller clusters as a tradeoff

Slide 40

Slide 40 text

offloading components as add-ons

Slide 41

Slide 41 text

Removing human bottlenecks

Slide 42

Slide 42 text

Automation

Slide 43

Slide 43 text

https://github.com/aws/containers-roadmap/issues/600

Slide 44

Slide 44 text

Helps you scale upgrades once cluster numbers increase

Slide 45

Slide 45 text

Helps you scale upgrades once cluster numbers increase Is necessary to

Slide 46

Slide 46 text

Reduces toil, offloading repetition

Slide 47

Slide 47 text

Pre upgrade checks

Slide 48

Slide 48 text

Pods running fine, api-deprecations for next version, Affected workloads

Slide 49

Slide 49 text

Post upgrade checks

Slide 50

Slide 50 text

Node maintenance operations

Slide 51

Slide 51 text

Opinionated take: Avoid shell scripts, use one of the k8s clients if writing something from scratch

Slide 52

Slide 52 text

Leverage existing automation

Slide 53

Slide 53 text

GKE EKS ● Stable and rapid release channels for standard deployments ● Avoid static channels if you can ● If you can, evaluate autopilot mode ● Managed node groups ● Cluster addons ○ Coredns ○ Kube-proxy etc

Slide 54

Slide 54 text

Documentation

Slide 55

Slide 55 text

Document everything

Slide 56

Slide 56 text

Have versions compatibility matrix mapped for components

Slide 57

Slide 57 text

Upgrades shouldn’t be an afterthought

Slide 58

Slide 58 text

References ● https://github.com/kubernetes/enhancements/blob/master/keps/sig-release/2572-rel ease-cadence/README.md ● https://github.com/kubernetes/sig-release/discussions/1290 ● https://github.com/deliveryhero/k8s-cluster-upgrade-tool ● https://cloud.google.com/kubernetes-engine/docs/how-to/node-auto-upgrades ● https://github.com/aws/containers-roadmap/issues/600

Slide 59

Slide 59 text

@tasdikrahman | www.tasdikrahman.com