Slide 1

Slide 1 text

2018/07/19 Mercari Meetup for Microservices Platform @dtan4 Kubernetes Cluster Upgrade

Slide 2

Slide 2 text

2 About me @dtan4 (Daisuke Fujita) SRE @ Mercari, Inc. Microservices Platform Team

Slide 3

Slide 3 text

3 Topics ● Why we have to upgrade Kubernetes cluster? ● How to upgrade Kubernetes cluster safely? ● Case: Upgrade GKE clusters in Mercari

Slide 4

Slide 4 text

Why we have to upgrade Kubernetes cluster?

Slide 5

Slide 5 text

5 Kubernetes Release Versioning ● Minor releases happens approximately every 3 months v1.11.0 2018-06-28 05:06:28 +0900 JST v1.10.0 2018-03-27 01:41:58 +0900 JST v1.9.0 2017-12-16 05:53:13 +0900 JST v1.8.0 2017-09-29 07:13:57 +0900 JST v1.7.0 2017-06-30 07:53:16 +0900 JST https://github.com/kubernetes/community/blob/6e31a546a50f6f57f3e9e1422ade0599f5e3e77d/contributors/design-proposals/release/versioning.md#supported-releases-and-component-skew

Slide 6

Slide 6 text

6 Kubernetes Release Versioning ● 3 minor versions are supported v1.11.0 2018-06-28 05:06:28 +0900 JST v1.10.0 2018-03-27 01:41:58 +0900 JST v1.9.0 2017-12-16 05:53:13 +0900 JST v1.8.0 2017-09-29 07:13:57 +0900 JST v1.7.0 2017-06-30 07:53:16 +0900 JST https://github.com/kubernetes/community/blob/6e31a546a50f6f57f3e9e1422ade0599f5e3e77d/contributors/design-proposals/release/versioning.md#supported-releases-and-component-skew

Slide 7

Slide 7 text

7 Security fixes

Slide 8

Slide 8 text

8 Kubernetes new features ● Pod priority and preemption ● CRD versioning ● HPA with custom metrics ● ...

Slide 9

Slide 9 text

9 3rd party integrations

Slide 10

Slide 10 text

How to upgrade Kubernetes cluster safely?

Slide 11

Slide 11 text

11 kubectl cordon ● Mark node as Pod-unschedulable ○ e.g., kubectl cordon node-02 node-01 Pod Pod Pod node-02 Pod Pod new Pod ❌ node-03 Pod Pod SchedulingDisabled

Slide 12

Slide 12 text

12 kubectl drain ● kubectl cordon + Evict Pods on the node gracefully ○ e.g., kubectl drain node-02 node-01 Pod Pod Pod node-02 Pod Pod new Pod ❌ node-03 Pod Pod Pod Pod SchedulingDisabled

Slide 13

Slide 13 text

13 Availability? ● All ‘C’ Pods are running on the same node node-01 A A B node-02 node-03 B A C C C

Slide 14

Slide 14 text

14 Availability? ● kubectl drain node-02 ○ All ‘C’ Pods will be terminated simultaneously node-01 A A B node-02 C C node-03 B A C C SchedulingDisabled C C

Slide 15

Slide 15 text

15 PodDisruptionBudget ● PodDisruptionBudget (PDB) limits the number of Pods of a replicated application that are down simultaneously from voluntary disruptions (e.g., drain, autoscaling) ● work well with Deployment or Replicaset ● Specify minAvailable or maxAvailable by the number of Pods or percentage ● Application must be running on multiple Pods https://kubernetes.io/docs/concepts/workloads/pods/disruptions/

Slide 16

Slide 16 text

16 PodDisruptionBudget example apiVersion: policy/v1beta1 kind: PodDisruptionBudget metadata: name: app-c namespace: app-c-dev spec: minAvailable: 50% selector: matchLabels: app: c

Slide 17

Slide 17 text

17 PDB + kubectl drain ● PDB for C: minAvailable: 50% (ALLOWED DISRUPTIONS: 1) node-01 A A B node-02 node-03 B A C C C

Slide 18

Slide 18 text

18 PDB + kubectl drain ● PDB for C: minAvailable: 50% (ALLOWED DISRUPTIONS: 1) ● kubectl drain node-02 keeps at least two ‘C’ Pods available node-01 A A B node-02 C node-03 B A C SchedulingDisabled C C

Slide 19

Slide 19 text

19 PDB + kubectl drain node-01 A A B node-02 C node-03 B A C C SchedulingDisabled C ● PDB for C: minAvailable: 50% (ALLOWED DISRUPTIONS: 1) ● kubectl drain node-02 keeps at least two ‘C’ Pods available

Slide 20

Slide 20 text

20 PDB + kubectl drain node-01 A A B node-02 node-03 B A C C SchedulingDisabled C C ● PDB for C: minAvailable: 50% (ALLOWED DISRUPTIONS: 1) ● kubectl drain node-02 keeps at least two ‘C’ Pods available

Slide 21

Slide 21 text

21 GKE cluster upgrade 1. Upgrade GKE master 2. Upgrade GKE nodes ○ Rolling update existing nodes ○ Node pools migration

Slide 22

Slide 22 text

22 Upgrade GKE master ● Zonal (single master) cluster causes control plane downtime ○ ~ 2 minutes ○ kubectl stops working ○ CronJob will not be kicked ● Regional (HA masters) cluster can be upgrade with zero-downtime ○ Existing zonal -> regional migration is not supported...

Slide 23

Slide 23 text

23 GKE node upgrade: rolling update ● Upgrade existing nodes one by one ● Behavior of “Automatic node upgrades” ● Make sure that cluster capacity is enough even 1 node is down node-01 v1.10 node-02 v1.10 node-03 v1.10 node-01 v1.11 node-02 v1.10 node-03 v1.10

Slide 24

Slide 24 text

24 GKE node upgrade: node pools migration ● Create new node pools with new Kubernetes version ● Migrate all Pods on old nodes to new nodes ○ kubectl cordon ○ kubectl drain ● Delete old node pools ● Make sure that enough GCP resource capacity is prepared https://cloudplatform.googleblog.com/2018/06/Kubernetes-best-practices-upgrading-your-clusters-with-zero-downtime.html

Slide 25

Slide 25 text

25 nodepool-1-10 v1.10 GKE node upgrade: node pools migration node-1-10-a v1.10 node-1-10-b v1.10 node-1-10-c v1.10

Slide 26

Slide 26 text

26 nodepool-1-10 v1.10 GKE node upgrade: node pools migration node-1-10-a v1.10 node-1-10-b v1.10 node-1-10-c v1.10 nodepool-1-11 v1.11 node-1-11-a v1.11 node-1-11-b v1.11 node-1-11-c v1.11

Slide 27

Slide 27 text

27 nodepool-1-10 v1.10 GKE node upgrade: node pools migration node-1-10-a v1.10 node-1-10-b v1.10 node-1-10-c v1.10 nodepool-1-11 v1.11 node-1-11-a v1.11 node-1-11-b v1.11 node-1-11-c v1.11 SchedulingDisabled SchedulingDisabled SchedulingDisabled

Slide 28

Slide 28 text

28 nodepool-1-10 v1.10 GKE node upgrade: node pools migration node-1-10-a v1.10 node-1-10-b v1.10 node-1-10-c v1.10 nodepool-1-11 v1.11 node-1-11-a v1.11 node-1-11-b v1.11 node-1-11-c v1.11 SchedulingDisabled SchedulingDisabled SchedulingDisabled

Slide 29

Slide 29 text

29 nodepool-1-10 v1.10 GKE node upgrade: node pools migration node-1-10-a v1.10 node-1-10-b v1.10 node-1-10-c v1.10 nodepool-1-11 v1.11 node-1-11-a v1.11 node-1-11-b v1.11 node-1-11-c v1.11 SchedulingDisabled SchedulingDisabled SchedulingDisabled

Slide 30

Slide 30 text

30 GKE node upgrade: node pools migration nodepool-1-11 v1.11 node-1-11-a v1.11 node-1-11-b v1.11 node-1-11-c v1.11

Slide 31

Slide 31 text

Case: GKE cluster upgrade in Mercari

Slide 32

Slide 32 text

32 Case: GKE cluster upgrade in Mercari 1. Codenize GKE resources 2. Create PDBs for all microservices 3. Upgrade! a. Upgrade GKE master (Terraform) b. Create GKE node pools (Terraform) c. Migrate all Pods (script) d. Delete old GKE node pools (Terraform)

Slide 33

Slide 33 text

33 Codenize GKE resources ● Existing GKE clusters were created by gcloud command ● We want to manage GKE (and all other cloud) resources on GitHub! ○ git log as operation history ○ review changes by other team members ○ modify resources from CI without manual operation

Slide 34

Slide 34 text

34 Codenize GKE resources ● Codenize GKE resources by dtan4/terraforming-gke

Slide 35

Slide 35 text

35 Create PDBs for all microservices ● Most of our microservices didn’t have PDBs ● Define default minAvailable by SREs ● Confirm microservices’ availability with developers

Slide 36

Slide 36 text

36 Upgrade GKE master ● Modify Terraform code ● Review by team members ● Apply on CircleCI

Slide 37

Slide 37 text

37 Upgrade GKE master

Slide 38

Slide 38 text

38 Create & Delete GKE node pools ● Copy-and-paste & modify Terraform code ● Review by team members ● Apply on CircleCI

Slide 39

Slide 39 text

39 Future work ● Enable “Automatic node upgrades” ○ ensure all microservices are resilient towards disruptions ○ reduce upgrade steps ● Create PDB by microservice developers themselves

Slide 40

Slide 40 text

No content