Kubernetes Cluster Upgrade / Mercari Meetup for Microservices Platform

Kubernetes Cluster Upgrade / Mercari Meetup for Microservices Platform

92ce4587cc8465736433e698b1e50aaa?s=128

Daisuke Fujita

July 19, 2018
Tweet

Transcript

  1. 2018/07/19 Mercari Meetup for Microservices Platform @dtan4 Kubernetes Cluster Upgrade

  2. 2 About me @dtan4 (Daisuke Fujita) SRE @ Mercari, Inc.

    Microservices Platform Team
  3. 3 Topics • Why we have to upgrade Kubernetes cluster?

    • How to upgrade Kubernetes cluster safely? • Case: Upgrade GKE clusters in Mercari
  4. Why we have to upgrade Kubernetes cluster?

  5. 5 Kubernetes Release Versioning • Minor releases happens approximately every

    3 months v1.11.0 2018-06-28 05:06:28 +0900 JST v1.10.0 2018-03-27 01:41:58 +0900 JST v1.9.0 2017-12-16 05:53:13 +0900 JST v1.8.0 2017-09-29 07:13:57 +0900 JST v1.7.0 2017-06-30 07:53:16 +0900 JST https://github.com/kubernetes/community/blob/6e31a546a50f6f57f3e9e1422ade0599f5e3e77d/contributors/design-proposals/release/versioning.md#supported-releases-and-component-skew
  6. 6 Kubernetes Release Versioning • 3 minor versions are supported

    v1.11.0 2018-06-28 05:06:28 +0900 JST v1.10.0 2018-03-27 01:41:58 +0900 JST v1.9.0 2017-12-16 05:53:13 +0900 JST v1.8.0 2017-09-29 07:13:57 +0900 JST v1.7.0 2017-06-30 07:53:16 +0900 JST https://github.com/kubernetes/community/blob/6e31a546a50f6f57f3e9e1422ade0599f5e3e77d/contributors/design-proposals/release/versioning.md#supported-releases-and-component-skew
  7. 7 Security fixes

  8. 8 Kubernetes new features • Pod priority and preemption •

    CRD versioning • HPA with custom metrics • ...
  9. 9 3rd party integrations

  10. How to upgrade Kubernetes cluster safely?

  11. 11 kubectl cordon <NODE> • Mark node as Pod-unschedulable ◦

    e.g., kubectl cordon node-02 node-01 Pod Pod Pod node-02 Pod Pod new Pod ❌ node-03 Pod Pod SchedulingDisabled
  12. 12 kubectl drain <NODE> • kubectl cordon + Evict Pods

    on the node gracefully ◦ e.g., kubectl drain node-02 node-01 Pod Pod Pod node-02 Pod Pod new Pod ❌ node-03 Pod Pod Pod Pod SchedulingDisabled
  13. 13 Availability? • All ‘C’ Pods are running on the

    same node node-01 A A B node-02 node-03 B A C C C
  14. 14 Availability? • kubectl drain node-02 ◦ All ‘C’ Pods

    will be terminated simultaneously node-01 A A B node-02 C C node-03 B A C C SchedulingDisabled C C
  15. 15 PodDisruptionBudget • PodDisruptionBudget (PDB) limits the number of Pods

    of a replicated application that are down simultaneously from voluntary disruptions (e.g., drain, autoscaling) • work well with Deployment or Replicaset • Specify minAvailable or maxAvailable by the number of Pods or percentage • Application must be running on multiple Pods https://kubernetes.io/docs/concepts/workloads/pods/disruptions/
  16. 16 PodDisruptionBudget example apiVersion: policy/v1beta1 kind: PodDisruptionBudget metadata: name: app-c

    namespace: app-c-dev spec: minAvailable: 50% selector: matchLabels: app: c
  17. 17 PDB + kubectl drain • PDB for C: minAvailable:

    50% (ALLOWED DISRUPTIONS: 1) node-01 A A B node-02 node-03 B A C C C
  18. 18 PDB + kubectl drain • PDB for C: minAvailable:

    50% (ALLOWED DISRUPTIONS: 1) • kubectl drain node-02 keeps at least two ‘C’ Pods available node-01 A A B node-02 C node-03 B A C SchedulingDisabled C C
  19. 19 PDB + kubectl drain node-01 A A B node-02

    C node-03 B A C C SchedulingDisabled C • PDB for C: minAvailable: 50% (ALLOWED DISRUPTIONS: 1) • kubectl drain node-02 keeps at least two ‘C’ Pods available
  20. 20 PDB + kubectl drain node-01 A A B node-02

    node-03 B A C C SchedulingDisabled C C • PDB for C: minAvailable: 50% (ALLOWED DISRUPTIONS: 1) • kubectl drain node-02 keeps at least two ‘C’ Pods available
  21. 21 GKE cluster upgrade 1. Upgrade GKE master 2. Upgrade

    GKE nodes ◦ Rolling update existing nodes ◦ Node pools migration
  22. 22 Upgrade GKE master • Zonal (single master) cluster causes

    control plane downtime ◦ ~ 2 minutes ◦ kubectl stops working ◦ CronJob will not be kicked • Regional (HA masters) cluster can be upgrade with zero-downtime ◦ Existing zonal -> regional migration is not supported...
  23. 23 GKE node upgrade: rolling update • Upgrade existing nodes

    one by one • Behavior of “Automatic node upgrades” • Make sure that cluster capacity is enough even 1 node is down node-01 v1.10 node-02 v1.10 node-03 v1.10 node-01 v1.11 node-02 v1.10 node-03 v1.10
  24. 24 GKE node upgrade: node pools migration • Create new

    node pools with new Kubernetes version • Migrate all Pods on old nodes to new nodes ◦ kubectl cordon ◦ kubectl drain • Delete old node pools • Make sure that enough GCP resource capacity is prepared https://cloudplatform.googleblog.com/2018/06/Kubernetes-best-practices-upgrading-your-clusters-with-zero-downtime.html
  25. 25 nodepool-1-10 v1.10 GKE node upgrade: node pools migration node-1-10-a

    v1.10 node-1-10-b v1.10 node-1-10-c v1.10
  26. 26 nodepool-1-10 v1.10 GKE node upgrade: node pools migration node-1-10-a

    v1.10 node-1-10-b v1.10 node-1-10-c v1.10 nodepool-1-11 v1.11 node-1-11-a v1.11 node-1-11-b v1.11 node-1-11-c v1.11
  27. 27 nodepool-1-10 v1.10 GKE node upgrade: node pools migration node-1-10-a

    v1.10 node-1-10-b v1.10 node-1-10-c v1.10 nodepool-1-11 v1.11 node-1-11-a v1.11 node-1-11-b v1.11 node-1-11-c v1.11 SchedulingDisabled SchedulingDisabled SchedulingDisabled
  28. 28 nodepool-1-10 v1.10 GKE node upgrade: node pools migration node-1-10-a

    v1.10 node-1-10-b v1.10 node-1-10-c v1.10 nodepool-1-11 v1.11 node-1-11-a v1.11 node-1-11-b v1.11 node-1-11-c v1.11 SchedulingDisabled SchedulingDisabled SchedulingDisabled
  29. 29 nodepool-1-10 v1.10 GKE node upgrade: node pools migration node-1-10-a

    v1.10 node-1-10-b v1.10 node-1-10-c v1.10 nodepool-1-11 v1.11 node-1-11-a v1.11 node-1-11-b v1.11 node-1-11-c v1.11 SchedulingDisabled SchedulingDisabled SchedulingDisabled
  30. 30 GKE node upgrade: node pools migration nodepool-1-11 v1.11 node-1-11-a

    v1.11 node-1-11-b v1.11 node-1-11-c v1.11
  31. Case: GKE cluster upgrade in Mercari

  32. 32 Case: GKE cluster upgrade in Mercari 1. Codenize GKE

    resources 2. Create PDBs for all microservices 3. Upgrade! a. Upgrade GKE master (Terraform) b. Create GKE node pools (Terraform) c. Migrate all Pods (script) d. Delete old GKE node pools (Terraform)
  33. 33 Codenize GKE resources • Existing GKE clusters were created

    by gcloud command • We want to manage GKE (and all other cloud) resources on GitHub! ◦ git log as operation history ◦ review changes by other team members ◦ modify resources from CI without manual operation
  34. 34 Codenize GKE resources • Codenize GKE resources by dtan4/terraforming-gke

  35. 35 Create PDBs for all microservices • Most of our

    microservices didn’t have PDBs • Define default minAvailable by SREs • Confirm microservices’ availability with developers
  36. 36 Upgrade GKE master • Modify Terraform code • Review

    by team members • Apply on CircleCI
  37. 37 Upgrade GKE master

  38. 38 Create & Delete GKE node pools • Copy-and-paste &

    modify Terraform code • Review by team members • Apply on CircleCI
  39. 39 Future work • Enable “Automatic node upgrades” ◦ ensure

    all microservices are resilient towards disruptions ◦ reduce upgrade steps • Create PDB by microservice developers themselves
  40. None