$30 off During Our Annual Pro Sale. View Details »

Keeping up with Kubernetes cluster upgrades

Tasdik Rahman
September 24, 2022

Keeping up with Kubernetes cluster upgrades

Was given as part of the talk lineup in Bangalore k8s meetup September 2022

https://www.meetup.com/kubernetes-openshift-india-meetup/events/288277755/

Tasdik Rahman

September 24, 2022
Tweet

More Decks by Tasdik Rahman

Other Decks in Technology

Transcript

  1. Keeping up with k8s cluster upgrades @tasdikrahman | www.tasdikrahman.com Kubernetes

    Bangalore Meetup September 2022
  2. Keeping up with k8s cluster upgrades

  3. Keeping up with k8s cluster upgrades Trying to keep up

  4. Why should you upgrade, anyway?

  5. Why should you upgrade, any way API deprecations

  6. Why should you upgrade, any way API deprecations New API’s

    serve you better
  7. Why should you upgrade, any way API deprecations New API’s

    serve you better Security patches for CVEs
  8. Why should you upgrade, any way API deprecations New API’s

    serve you better Security patches for CVEs Version deprecations by the provider
  9. Why should you upgrade, any way API deprecations New API’s

    serve you better Security patches for CVEs Version deprecations by the provider Incremental changes introduced
  10. Why should you upgrade, any way API deprecations New API’s

    serve you better Security patches for CVEs Version deprecations by the provider Incremental changes introduced Bottomline: Upgrade to prevent infrastructure rot
  11. Is this specific to k8s?

  12. Is this specific to k8s? • Compute layer upgrades are

    not new
  13. Is this specific to k8s? • Compute layer upgrades are

    not new • Inventory of what version of OS/language stack is running on compute instances.
  14. Is this specific to k8s? • Compute layer upgrades are

    not new • Inventory of what version of OS/language stack is running on compute instances. • Handling of deprecation of support of OS versions and language stacks
  15. Is this specific to k8s? • Compute layer upgrades are

    not new • Inventory of what version of OS/language stack is running on compute instances. • Handling of deprecation of support of OS versions and language stacks • Fixing CVE fix patches on these machines.
  16. Is this specific to k8s? • Compute layer upgrades are

    not new • Inventory of what version of OS/language stack is running on compute instances. • Handling of deprecation of support of OS versions and language stacks • Fixing CVE fix patches on these machines. • Manage via golden AMI’s, blue green replacements, upgrade in place.
  17. Release pace for some managed providers so far

  18. Release pace for providers - eks Source: https://endoflife.date/amazon-eks

  19. Release pace for providers - gke Source: https://endoflife.date/google-kubernetes-engine

  20. Release pace for other providers may follow a similar cadence

  21. Reason for such a release velocity

  22. Upstream has roughly 4 releases/year

  23. A new version every 3 months

  24. KEP 2572 sig-release https://github.com/kubernetes/enhancements/blob/master/keps/sig-release/2572-release-cadence/README.md#summary

  25. KEP 2572 sig-release https://github.com/kubernetes/enhancements/blob/master/keps/sig-release/2572-release-cadence/README.md#summary

  26. K8s release cadence - discussion https://github.com/kubernetes/sig-release/discussions/1290

  27. What happens when you don’t upgrade?

  28. Force upgradation of your control plane

  29. Outcome? Possible Outage

  30. Security incidents

  31. Optimising for high velocity k8s upgrades

  32. Managing state via terraform/cloudformation or something which helps you manage

    the state
  33. self hosting your own k8s cluster

  34. self hosting your own k8s cluster Of course not! avoid

    unless having a very special case
  35. Having a staging setup

  36. Having a staging setup Having staging as close to prod

  37. Have staging run x version for y time before upgrading

    prod to x version
  38. Offload to managed compute nodes

  39. Smaller clusters as a tradeoff

  40. offloading components as add-ons

  41. Removing human bottlenecks

  42. Automation

  43. https://github.com/aws/containers-roadmap/issues/600

  44. Helps you scale upgrades once cluster numbers increase

  45. Helps you scale upgrades once cluster numbers increase Is necessary

    to
  46. Reduces toil, offloading repetition

  47. Pre upgrade checks

  48. Pods running fine, api-deprecations for next version, Affected workloads

  49. Post upgrade checks

  50. Node maintenance operations

  51. Opinionated take: Avoid shell scripts, use one of the k8s

    clients if writing something from scratch
  52. Leverage existing automation

  53. GKE EKS • Stable and rapid release channels for standard

    deployments • Avoid static channels if you can • If you can, evaluate autopilot mode • Managed node groups • Cluster addons ◦ Coredns ◦ Kube-proxy etc
  54. Documentation

  55. Document everything

  56. Have versions compatibility matrix mapped for components

  57. Upgrades shouldn’t be an afterthought

  58. References • https://github.com/kubernetes/enhancements/blob/master/keps/sig-release/2572-rel ease-cadence/README.md • https://github.com/kubernetes/sig-release/discussions/1290 • https://github.com/deliveryhero/k8s-cluster-upgrade-tool • https://cloud.google.com/kubernetes-engine/docs/how-to/node-auto-upgrades

    • https://github.com/aws/containers-roadmap/issues/600
  59. @tasdikrahman | www.tasdikrahman.com