Keeping up with Kubernetes cluster upgrades

Keeping up with k8s cluster upgrades @tasdikrahman | www.tasdikrahman.com Kubernetes
Bangalore Meetup September 2022

Keeping up with k8s cluster upgrades

Keeping up with k8s cluster upgrades Trying to keep up

Why should you upgrade, anyway?

Why should you upgrade, any way API deprecations

Why should you upgrade, any way API deprecations New API’s
serve you better

serve you better Security patches for CVEs

serve you better Security patches for CVEs Version deprecations by the provider

serve you better Security patches for CVEs Version deprecations by the provider Incremental changes introduced

serve you better Security patches for CVEs Version deprecations by the provider Incremental changes introduced Bottomline: Upgrade to prevent infrastructure rot

Is this speciﬁc to k8s?

Is this speciﬁc to k8s? • Compute layer upgrades are
not new

not new • Inventory of what version of OS/language stack is running on compute instances.

not new • Inventory of what version of OS/language stack is running on compute instances. • Handling of deprecation of support of OS versions and language stacks

not new • Inventory of what version of OS/language stack is running on compute instances. • Handling of deprecation of support of OS versions and language stacks • Fixing CVE ﬁx patches on these machines.

not new • Inventory of what version of OS/language stack is running on compute instances. • Handling of deprecation of support of OS versions and language stacks • Fixing CVE ﬁx patches on these machines. • Manage via golden AMI’s, blue green replacements, upgrade in place.

Release pace for some managed providers so far

Release pace for providers - eks Source: https://endoﬂife.date/amazon-eks

Release pace for providers - gke Source: https://endoﬂife.date/google-kubernetes-engine

Release pace for other providers may follow a similar cadence

Reason for such a release velocity

Upstream has roughly 4 releases/year

A new version every 3 months

KEP 2572 sig-release https://github.com/kubernetes/enhancements/blob/master/keps/sig-release/2572-release-cadence/README.md#summary

K8s release cadence - discussion https://github.com/kubernetes/sig-release/discussions/1290

What happens when you don’t upgrade?

Force upgradation of your control plane

Outcome? Possible Outage

Security incidents

Optimising for high velocity k8s upgrades

Managing state via terraform/cloudformation or something which helps you manage
the state

self hosting your own k8s cluster

self hosting your own k8s cluster Of course not! avoid
unless having a very special case

Having a staging setup

Having a staging setup Having staging as close to prod

Have staging run x version for y time before upgrading
prod to x version

Oﬄoad to managed compute nodes

Smaller clusters as a tradeoff

oﬄoading components as add-ons

Removing human bottlenecks

Automation

https://github.com/aws/containers-roadmap/issues/600

Helps you scale upgrades once cluster numbers increase

Helps you scale upgrades once cluster numbers increase Is necessary
to

Reduces toil, oﬄoading repetition

Pre upgrade checks

Pods running ﬁne, api-deprecations for next version, Affected workloads

Post upgrade checks

Node maintenance operations

Opinionated take: Avoid shell scripts, use one of the k8s
clients if writing something from scratch

Leverage existing automation

GKE EKS • Stable and rapid release channels for standard
deployments • Avoid static channels if you can • If you can, evaluate autopilot mode • Managed node groups • Cluster addons ◦ Coredns ◦ Kube-proxy etc

Documentation

Document everything

Have versions compatibility matrix mapped for components

Upgrades shouldn’t be an afterthought

References • https://github.com/kubernetes/enhancements/blob/master/keps/sig-release/2572-rel ease-cadence/README.md • https://github.com/kubernetes/sig-release/discussions/1290 • https://github.com/deliveryhero/k8s-cluster-upgrade-tool • https://cloud.google.com/kubernetes-engine/docs/how-to/node-auto-upgrades
• https://github.com/aws/containers-roadmap/issues/600

@tasdikrahman | www.tasdikrahman.com

Keeping up with Kubernetes cluster upgrades

Keeping up with Kubernetes cluster upgrades

More Decks by Tasdik Rahman

Other Decks in Technology

Featured

Transcript