Kubernetes Cluster Upgrade / Mercari Meetup for Microservices Platform

Kubernetes Cluster Upgrade / Mercari Meetup for Microservices Platform

92ce4587cc8465736433e698b1e50aaa?s=128

Daisuke Fujita

July 19, 2018
Tweet

Transcript

  1. 3.

    3 Topics • Why we have to upgrade Kubernetes cluster?

    • How to upgrade Kubernetes cluster safely? • Case: Upgrade GKE clusters in Mercari
  2. 5.

    5 Kubernetes Release Versioning • Minor releases happens approximately every

    3 months v1.11.0 2018-06-28 05:06:28 +0900 JST v1.10.0 2018-03-27 01:41:58 +0900 JST v1.9.0 2017-12-16 05:53:13 +0900 JST v1.8.0 2017-09-29 07:13:57 +0900 JST v1.7.0 2017-06-30 07:53:16 +0900 JST https://github.com/kubernetes/community/blob/6e31a546a50f6f57f3e9e1422ade0599f5e3e77d/contributors/design-proposals/release/versioning.md#supported-releases-and-component-skew
  3. 6.

    6 Kubernetes Release Versioning • 3 minor versions are supported

    v1.11.0 2018-06-28 05:06:28 +0900 JST v1.10.0 2018-03-27 01:41:58 +0900 JST v1.9.0 2017-12-16 05:53:13 +0900 JST v1.8.0 2017-09-29 07:13:57 +0900 JST v1.7.0 2017-06-30 07:53:16 +0900 JST https://github.com/kubernetes/community/blob/6e31a546a50f6f57f3e9e1422ade0599f5e3e77d/contributors/design-proposals/release/versioning.md#supported-releases-and-component-skew
  4. 8.

    8 Kubernetes new features • Pod priority and preemption •

    CRD versioning • HPA with custom metrics • ...
  5. 11.

    11 kubectl cordon <NODE> • Mark node as Pod-unschedulable ◦

    e.g., kubectl cordon node-02 node-01 Pod Pod Pod node-02 Pod Pod new Pod ❌ node-03 Pod Pod SchedulingDisabled
  6. 12.

    12 kubectl drain <NODE> • kubectl cordon + Evict Pods

    on the node gracefully ◦ e.g., kubectl drain node-02 node-01 Pod Pod Pod node-02 Pod Pod new Pod ❌ node-03 Pod Pod Pod Pod SchedulingDisabled
  7. 13.

    13 Availability? • All ‘C’ Pods are running on the

    same node node-01 A A B node-02 node-03 B A C C C
  8. 14.

    14 Availability? • kubectl drain node-02 ◦ All ‘C’ Pods

    will be terminated simultaneously node-01 A A B node-02 C C node-03 B A C C SchedulingDisabled C C
  9. 15.

    15 PodDisruptionBudget • PodDisruptionBudget (PDB) limits the number of Pods

    of a replicated application that are down simultaneously from voluntary disruptions (e.g., drain, autoscaling) • work well with Deployment or Replicaset • Specify minAvailable or maxAvailable by the number of Pods or percentage • Application must be running on multiple Pods https://kubernetes.io/docs/concepts/workloads/pods/disruptions/
  10. 16.

    16 PodDisruptionBudget example apiVersion: policy/v1beta1 kind: PodDisruptionBudget metadata: name: app-c

    namespace: app-c-dev spec: minAvailable: 50% selector: matchLabels: app: c
  11. 17.

    17 PDB + kubectl drain • PDB for C: minAvailable:

    50% (ALLOWED DISRUPTIONS: 1) node-01 A A B node-02 node-03 B A C C C
  12. 18.

    18 PDB + kubectl drain • PDB for C: minAvailable:

    50% (ALLOWED DISRUPTIONS: 1) • kubectl drain node-02 keeps at least two ‘C’ Pods available node-01 A A B node-02 C node-03 B A C SchedulingDisabled C C
  13. 19.

    19 PDB + kubectl drain node-01 A A B node-02

    C node-03 B A C C SchedulingDisabled C • PDB for C: minAvailable: 50% (ALLOWED DISRUPTIONS: 1) • kubectl drain node-02 keeps at least two ‘C’ Pods available
  14. 20.

    20 PDB + kubectl drain node-01 A A B node-02

    node-03 B A C C SchedulingDisabled C C • PDB for C: minAvailable: 50% (ALLOWED DISRUPTIONS: 1) • kubectl drain node-02 keeps at least two ‘C’ Pods available
  15. 21.

    21 GKE cluster upgrade 1. Upgrade GKE master 2. Upgrade

    GKE nodes ◦ Rolling update existing nodes ◦ Node pools migration
  16. 22.

    22 Upgrade GKE master • Zonal (single master) cluster causes

    control plane downtime ◦ ~ 2 minutes ◦ kubectl stops working ◦ CronJob will not be kicked • Regional (HA masters) cluster can be upgrade with zero-downtime ◦ Existing zonal -> regional migration is not supported...
  17. 23.

    23 GKE node upgrade: rolling update • Upgrade existing nodes

    one by one • Behavior of “Automatic node upgrades” • Make sure that cluster capacity is enough even 1 node is down node-01 v1.10 node-02 v1.10 node-03 v1.10 node-01 v1.11 node-02 v1.10 node-03 v1.10
  18. 24.

    24 GKE node upgrade: node pools migration • Create new

    node pools with new Kubernetes version • Migrate all Pods on old nodes to new nodes ◦ kubectl cordon ◦ kubectl drain • Delete old node pools • Make sure that enough GCP resource capacity is prepared https://cloudplatform.googleblog.com/2018/06/Kubernetes-best-practices-upgrading-your-clusters-with-zero-downtime.html
  19. 26.

    26 nodepool-1-10 v1.10 GKE node upgrade: node pools migration node-1-10-a

    v1.10 node-1-10-b v1.10 node-1-10-c v1.10 nodepool-1-11 v1.11 node-1-11-a v1.11 node-1-11-b v1.11 node-1-11-c v1.11
  20. 27.

    27 nodepool-1-10 v1.10 GKE node upgrade: node pools migration node-1-10-a

    v1.10 node-1-10-b v1.10 node-1-10-c v1.10 nodepool-1-11 v1.11 node-1-11-a v1.11 node-1-11-b v1.11 node-1-11-c v1.11 SchedulingDisabled SchedulingDisabled SchedulingDisabled
  21. 28.

    28 nodepool-1-10 v1.10 GKE node upgrade: node pools migration node-1-10-a

    v1.10 node-1-10-b v1.10 node-1-10-c v1.10 nodepool-1-11 v1.11 node-1-11-a v1.11 node-1-11-b v1.11 node-1-11-c v1.11 SchedulingDisabled SchedulingDisabled SchedulingDisabled
  22. 29.

    29 nodepool-1-10 v1.10 GKE node upgrade: node pools migration node-1-10-a

    v1.10 node-1-10-b v1.10 node-1-10-c v1.10 nodepool-1-11 v1.11 node-1-11-a v1.11 node-1-11-b v1.11 node-1-11-c v1.11 SchedulingDisabled SchedulingDisabled SchedulingDisabled
  23. 32.

    32 Case: GKE cluster upgrade in Mercari 1. Codenize GKE

    resources 2. Create PDBs for all microservices 3. Upgrade! a. Upgrade GKE master (Terraform) b. Create GKE node pools (Terraform) c. Migrate all Pods (script) d. Delete old GKE node pools (Terraform)
  24. 33.

    33 Codenize GKE resources • Existing GKE clusters were created

    by gcloud command • We want to manage GKE (and all other cloud) resources on GitHub! ◦ git log as operation history ◦ review changes by other team members ◦ modify resources from CI without manual operation
  25. 35.

    35 Create PDBs for all microservices • Most of our

    microservices didn’t have PDBs • Define default minAvailable by SREs • Confirm microservices’ availability with developers
  26. 36.

    36 Upgrade GKE master • Modify Terraform code • Review

    by team members • Apply on CircleCI
  27. 38.

    38 Create & Delete GKE node pools • Copy-and-paste &

    modify Terraform code • Review by team members • Apply on CircleCI
  28. 39.

    39 Future work • Enable “Automatic node upgrades” ◦ ensure

    all microservices are resilient towards disruptions ◦ reduce upgrade steps • Create PDB by microservice developers themselves
  29. 40.