Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Keeping up with Kubernetes cluster upgrades

Tasdik Rahman
September 24, 2022

Keeping up with Kubernetes cluster upgrades

Was given as part of the talk lineup in Bangalore k8s meetup September 2022

https://www.meetup.com/kubernetes-openshift-india-meetup/events/288277755/

Tasdik Rahman

September 24, 2022
Tweet

More Decks by Tasdik Rahman

Other Decks in Technology

Transcript

  1. Keeping up with k8s cluster upgrades
    @tasdikrahman | www.tasdikrahman.com
    Kubernetes Bangalore Meetup September 2022

    View Slide

  2. Keeping up with k8s cluster upgrades

    View Slide

  3. Keeping up with k8s cluster upgrades
    Trying to keep up

    View Slide

  4. Why should you
    upgrade, anyway?

    View Slide

  5. Why should you upgrade, any way
    API deprecations

    View Slide

  6. Why should you upgrade, any way
    API deprecations
    New API’s serve
    you better

    View Slide

  7. Why should you upgrade, any way
    API deprecations
    New API’s serve
    you better
    Security patches
    for CVEs

    View Slide

  8. Why should you upgrade, any way
    API deprecations
    New API’s serve
    you better
    Security patches
    for CVEs
    Version deprecations
    by the provider

    View Slide

  9. Why should you upgrade, any way
    API deprecations
    New API’s serve
    you better
    Security patches
    for CVEs
    Version deprecations
    by the provider
    Incremental changes
    introduced

    View Slide

  10. Why should you upgrade, any way
    API deprecations
    New API’s serve
    you better
    Security patches
    for CVEs
    Version deprecations
    by the provider
    Incremental changes
    introduced
    Bottomline: Upgrade to prevent infrastructure rot

    View Slide

  11. Is this specific to k8s?

    View Slide

  12. Is this specific to k8s?
    ● Compute layer upgrades are not new

    View Slide

  13. Is this specific to k8s?
    ● Compute layer upgrades are not new
    ● Inventory of what version of OS/language stack is running on compute
    instances.

    View Slide

  14. Is this specific to k8s?
    ● Compute layer upgrades are not new
    ● Inventory of what version of OS/language stack is running on compute
    instances.
    ● Handling of deprecation of support of OS versions and language stacks

    View Slide

  15. Is this specific to k8s?
    ● Compute layer upgrades are not new
    ● Inventory of what version of OS/language stack is running on compute
    instances.
    ● Handling of deprecation of support of OS versions and language stacks
    ● Fixing CVE fix patches on these machines.

    View Slide

  16. Is this specific to k8s?
    ● Compute layer upgrades are not new
    ● Inventory of what version of OS/language stack is running on compute
    instances.
    ● Handling of deprecation of support of OS versions and language stacks
    ● Fixing CVE fix patches on these machines.
    ● Manage via golden AMI’s, blue green replacements, upgrade in place.

    View Slide

  17. Release pace for
    some managed providers
    so far

    View Slide

  18. Release pace for providers - eks
    Source: https://endoflife.date/amazon-eks

    View Slide

  19. Release pace for providers - gke
    Source: https://endoflife.date/google-kubernetes-engine

    View Slide

  20. Release pace for other providers may
    follow a similar cadence

    View Slide

  21. Reason for such a
    release velocity

    View Slide

  22. Upstream has roughly
    4 releases/year

    View Slide

  23. A new version every 3 months

    View Slide

  24. KEP 2572 sig-release
    https://github.com/kubernetes/enhancements/blob/master/keps/sig-release/2572-release-cadence/README.md#summary

    View Slide

  25. KEP 2572 sig-release
    https://github.com/kubernetes/enhancements/blob/master/keps/sig-release/2572-release-cadence/README.md#summary

    View Slide

  26. K8s release cadence - discussion
    https://github.com/kubernetes/sig-release/discussions/1290

    View Slide

  27. What happens when
    you don’t upgrade?

    View Slide

  28. Force upgradation of your control plane

    View Slide

  29. Outcome? Possible Outage

    View Slide

  30. Security incidents

    View Slide

  31. Optimising for high
    velocity k8s upgrades

    View Slide

  32. Managing state via
    terraform/cloudformation or something
    which helps you manage the state

    View Slide

  33. self hosting your own k8s cluster

    View Slide

  34. self hosting your own k8s cluster
    Of course not!
    avoid unless having a very special case

    View Slide

  35. Having a staging setup

    View Slide

  36. Having a staging setup
    Having staging as close to prod

    View Slide

  37. Have staging run
    x version for y time
    before upgrading prod
    to x version

    View Slide

  38. Offload to managed compute nodes

    View Slide

  39. Smaller clusters as a tradeoff

    View Slide

  40. offloading components as add-ons

    View Slide

  41. Removing human bottlenecks

    View Slide

  42. Automation

    View Slide

  43. https://github.com/aws/containers-roadmap/issues/600

    View Slide

  44. Helps you scale upgrades once cluster
    numbers increase

    View Slide

  45. Helps you scale upgrades once cluster
    numbers increase
    Is necessary to

    View Slide

  46. Reduces toil, offloading repetition

    View Slide

  47. Pre upgrade checks

    View Slide

  48. Pods running fine,
    api-deprecations for next version,
    Affected workloads

    View Slide

  49. Post upgrade checks

    View Slide

  50. Node maintenance operations

    View Slide

  51. Opinionated take:
    Avoid shell scripts, use one of the k8s clients
    if writing something from scratch

    View Slide

  52. Leverage existing automation

    View Slide

  53. GKE EKS
    ● Stable and rapid release channels for
    standard deployments
    ● Avoid static channels if you can
    ● If you can, evaluate autopilot mode
    ● Managed node groups
    ● Cluster addons
    ○ Coredns
    ○ Kube-proxy etc

    View Slide

  54. Documentation

    View Slide

  55. Document everything

    View Slide

  56. Have versions compatibility matrix
    mapped for components

    View Slide

  57. Upgrades shouldn’t be
    an afterthought

    View Slide

  58. References
    ● https://github.com/kubernetes/enhancements/blob/master/keps/sig-release/2572-rel
    ease-cadence/README.md
    ● https://github.com/kubernetes/sig-release/discussions/1290
    ● https://github.com/deliveryhero/k8s-cluster-upgrade-tool
    ● https://cloud.google.com/kubernetes-engine/docs/how-to/node-auto-upgrades
    ● https://github.com/aws/containers-roadmap/issues/600

    View Slide

  59. @tasdikrahman | www.tasdikrahman.com

    View Slide