Advanced Kubernetes For UMC

Slide 1

Slide 1 text

Kyle Bai ⽩白凱仁 Advanced Kubernetes

Slide 2

Slide 2 text

@k2r2bai About Me ⽩白凱仁(Kyle Bai) • Software Engineer @ inwinSTACK. • OSS Contributor. • Certified Kubernetes Administrator/Application Developer. • Co-organizer of Cloud Native Taiwan User Group. • Interested in emerging technologies. @kairen https://k2r2bai.com

Slide 3

Slide 3 text

@k2r2bai • Safely Upgrading Kubernetes Clusters • Upgrading the exist cluster using kubeadm • Highly available Kubernetes cluster • Image Registry - Harbor • Content trust for container image • Monitoring Kubernetes / Registry • Logging Kubernetes / Registry • Troubleshooting Skills / Tools Agenda Today I would like to talk about

Slide 4

Slide 4 text

Safely Upgrading Kubernetes Clusters

Slide 5

Slide 5 text

@k2r2bai Scope of cluster upgrades • Kubernetes ⼆二進制執⾏行行檔 • Kubernetes 控制平⾯面元件 • Container Runtime • Etcd 叢集 • 叢集網路路(CNI plugins) • Base OS ....

Slide 6

Slide 6 text

@k2r2bai Scope of cluster upgrades • Kubernetes ⼆二進制執⾏行行檔 - kubelet • Kubernetes 控制平⾯面元件 - apiserver, scheduler, controller manager • Container Runtime - Docker • Etcd 叢集 - etcd data • 叢集網路路(CNI plugins) - calico, flannel, ... • Base OS - Linux Kernel ....

Slide 7

Slide 7 text

@k2r2bai You should know those before begin 1. 在更更新叢集時，您需要備份 etcd 資料 • etcd Operator • etcdctl snapshot + cron + restore ETCDCTL_API=3 etcdctl --endpoints $ENDPOINT snapshot save snapshot.db ETCDCTL_API=3 etcdctl snapshot restore snapshot.db [FLAGS] https://github.com/etcd-io/etcd/blob/master/Documentation/op-guide/recovery.md = https://github.com/coreos/etcd-operator

Slide 8

Slide 8 text

@k2r2bai You should know those before begin 1. 在更更新叢集時，您需要備份 etcd 資料 2. 叢集更更新時，務必以⼀一個次要版本(Minor)為間隔更更新 • Kubernetes 約每三個⽉月會發⾏行行新的次要版本 • Good: My cluster is of v1.10, I want to upgrade to v1.11. • Bad: My cluster is of v1.10, I want to upgrade to v.1.13. • Good: My cluster is of v1.10, I upgrade to v1.11 and then upgrade to v1.12.

Slide 9

Slide 9 text

@k2r2bai You should know those before begin 1. 在更更新叢集時，您需要備份 etcd 資料 2. 叢集更更新時，務必以⼀一個中版本號為間隔更更新 3. 閱讀 Release Notes 來來了了解每個版本的變化 • Known Issues • Action Required • Deprecations and removals https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG.md

Slide 10

Slide 10 text

@k2r2bai You should know those before begin 1. 在更更新叢集時，您需要備份 etcd 資料 2. 叢集更更新時，務必以⼀一個中版本號為間隔更更新 3. 閱讀 Release Notes 來來了了解每個版本的變化 4. 善⽤用⼯工具或公有雲服務來來完成叢集更更新過程 • Kubeadm • Kops • Kubespray - Ansible • Cluster API • GKE, EKS, AKS, .... https://github.com/kubernetes-sigs/cluster-api

Slide 11

Slide 11 text

@k2r2bai 5. 了了解要更更新的 Kubernetes ⽬目標版本 API 變化 • API 會隨版本演進⽽而改變，如 v1.16 要移除 extensions/v1beta1 You should know those before begin https://kubernetes.io/blog/2019/07/18/api-deprecations-in-1-16/

Slide 12

Slide 12 text

@k2r2bai 5. 了了解要更更新的 Kubernetes ⽬目標版本 API 變化 6. 確保應⽤用程式使⽤用⾼高級 API 建置實例例，如 Deployment • 應⽤用程式有多個實例例(Pod)⽀支撐 • 利利⽤用探針確保應⽤用狀狀態，以攔阻流量量的分發 • 使⽤用 Pod 的 preStop hook 來來加強⽣生命週期管理理 You should know those before begin https://kubernetes.io/blog/2019/07/18/api-deprecations-in-1-16/

Slide 13

Slide 13 text

@k2r2bai 5. 了了解要更更新的 Kubernetes ⽬目標版本 API 變化 6. 確保應⽤用程式使⽤用⾼高級 API 建置實例例，如 Deployment 7. 更更新 Node 以前，優先更更新 Master You should know those before begin

Slide 14

Slide 14 text

@k2r2bai Foreseeable roblem and solutions • etcd 中過舊的資料

Slide 15

Slide 15 text

@k2r2bai Foreseeable roblem and solutions • etcd 中過舊的資料 • Don't remove API versions(#52185). • Storage migration System [1][2]. • Don't deploy EOL version[3]. Now, v1.12 - v1.14 is a good choice. • Read release notes. 1. https://github.com/kubernetes/community/pull/2524 2. https://github.com/kubernetes-sigs/kube-storage-version-migrator 3. https://kubernetes.io/docs/reference/using-api/deprecation-policy/

Slide 16

Slide 16 text

@k2r2bai Foreseeable roblem and solutions • Clients 使⽤用的版本已過舊(過時) • 應⽤用與服務使⽤用的 API 依然是相依於 extensions/v1beta1 • 開發的 Custom Controller 涉及過舊版本的 API 與 Libraries

Slide 17

Slide 17 text

@k2r2bai Foreseeable roblem and solutions • Clients 使⽤用的版本已過舊(過時) • 應⽤用與服務使⽤用的 API 依然是相依於 extensions/v1beta1 • 需在叢集更更新以前，優先將應⽤用與服務的 API object 轉移到新版本 • 開發的 Custom Controller 涉及過舊版本的 API 與 Libraries • 需要更更新程式碼使⽤用新版 API，並且將相依 Libraries 也更更新⾄至相容的新版本

Slide 18

Slide 18 text

@k2r2bai Foreseeable roblem and solutions • Policy breaks after upgrade(Webhook, RBAC) • 有新版本 batch/v2 與 test_batch/v1

Slide 19

Slide 19 text

@k2r2bai Foreseeable roblem and solutions • Policy breaks after upgrade(Webhook, RBAC) • 有新版本 batch/v2 與 test_batch/v1 • 在叢集更更新前，先更更新 Policies 使⽤用⽬目前所有⽀支援的版本

Slide 20

Slide 20 text

@k2r2bai Unforeseeable problem and suggestions • 在升級前盡可能地在實際環境測試要升級的版本 • 使⽤用舊版本的 Sonobuoy 來來測試新版本叢集 • 確保組態檔案保持⼀一致，並確保設定是否因為版本改變⽽而出現錯誤 • 檢視 Addons 是否因為新版本的特性⽽而崩潰 • 利利⽤用多階段升級⽅方式來來確保 API 的相容。

Slide 21

Slide 21 text

@k2r2bai Summary • Before upgrade. • Backup etcd data. • Read release notes. • Upgrade clients. • Upgrade configurations.

Slide 22

Slide 22 text

@k2r2bai Summary • Before upgrade. • During upgrade. • Upgrade master before node. • Don't use new APIs until HA master upgrade is done. • No in-place kubelet upgrade.

Slide 23

Slide 23 text

@k2r2bai Summary • Before upgrade. • During upgrade. • After upgrade. • Check cluster addons. • Check cluster networking(CNI plugins). • Make sure that nodes are Ready.

Slide 24

Slide 24 text

Upgrading the exist cluster using kubeadm

Slide 25

Slide 25 text

@k2r2bai Step of cluster upgrades 1. Master • 更更新 kube-apiserver, controller manager, scheduler • 更更新 Addons，ex: kube-proxy, CoreDNS • 更更新 kubelet binary file 與組態檔案 • (optional)更更新 Node bootstrap tokens 的 RBAC 規則

Slide 26

Slide 26 text

@k2r2bai Step of cluster upgrades 1. Master 2. Nodes • 在 Master 使⽤用 kubectl drain 來來驅趕 Pods 到其他節點 • 使⽤用 PodDisruptionBudget 限制 Workload Controller 搬移 • 更更新 kubelet binary file 與組態檔案 • 在 Master 使⽤用 kubectl uncordon 讓節點能夠被排成 Pod

Slide 27

Slide 27 text

@k2r2bai Let's use Kubeadm for upgrading a cluster! v1.14.5 -> v1.15.2

Slide 28

Slide 28 text

@k2r2bai Upgrading master nodes ⾸首先進去 Master(k8s-m1)，並更更新 kubeadm 到: $ sudo apt-get update && sudo apt-get install -y kubeadm=1.15.2-00 && \ sudo apt-mark hold kubeadm $ kubeadm version kubeadm version: &version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.2", GitCommit:"f6278300bebbb750328ac16ee6dd3aa7d3549568", GitTreeState:"clean", BuildDate:"2019-08-05T09:20:51Z", GoVersion:"go1.12.5", Compiler:"gc", Platform:"linux/amd64"}

Slide 29

Slide 29 text

@k2r2bai Upgrading master nodes 在 Master 更更新 Control Plane 元件: $ sudo kubeadm upgrade plan $ sudo kubeadm upgrade apply v1.15.2 (optional)在其他 Master 節點更更新 Control Plane 元件: $ sudo kubeadm upgrade node

Slide 30

Slide 30 text

@k2r2bai Upgrading master nodes 在 Master 更更新 kubelet 與 kubectl: $ sudo apt-get update && sudo apt-get install -y kubelet=1.15.2-00 kubectl=1.15.2-00 && \ sudo apt-mark hold kubelet kubectl 重新啟動 kubelet: $ sudo systemctl restart kubelet

Slide 31

Slide 31 text

@k2r2bai Upgrading worker nodes ⾸首先在 Master 執⾏行行 drain 指令來來將 Nodes workload 轉移到其他節點: $ kubectl drain $NODE --ignore-daemonsets WARNING: ignoring DaemonSet-managed Pods: kube-system/calico-node-p4hs7, kube-system/ kube-proxy-nlznx evicting pod "nginx-68487cdddf-4cctq" evicting pod "calico-kube-controllers-6c795cc467-pc46s" ... node/k8s-n1 evicted

Slide 32

Slide 32 text

@k2r2bai Upgrading worker nodes 進去 Nodes(k8s-n1, k8s-n2)，並更更新 kubeadm 到: $ sudo apt-get update && sudo apt-get install -y kubeadm=1.15.2-00 && \ sudo apt-mark hold kubeadm $ kubeadm version kubeadm version: &version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.2", GitCommit:"f6278300bebbb750328ac16ee6dd3aa7d3549568", GitTreeState:"clean", BuildDate:"2019-08-05T09:20:51Z", GoVersion:"go1.12.5", Compiler:"gc", Platform:"linux/amd64"}

Slide 33

Slide 33 text

@k2r2bai Upgrading worker nodes 在 Node 更更新節點組態: $ sudo kubeadm upgrade node 在 Node 更更新 kubelet : $ sudo apt-get update && sudo apt-get install -y kubelet=1.15.2-00 && \ sudo apt-mark hold kubelet 重新啟動 kubelet: $ sudo systemctl restart kubelet

Slide 34

Slide 34 text

@k2r2bai Check cluster 在 Master 執⾏行行 uncordon 恢復 Worker node 到可排成狀狀態: $ kubectl uncordon $NODE 在 Master 透過 kubectl 檢查叢集版本與狀狀態: $ kubectl version Client Version: version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.2", GitCommit:"f6278300bebbb750328ac16ee6dd3aa7d3549568", GitTreeState:"clean", BuildDate:"2019-08- 05T09:23:26Z", GoVersion:"go1.12.5", Compiler:"gc", Platform:"linux/amd64"} Server Version: version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.2", GitCommit:"f6278300bebbb750328ac16ee6dd3aa7d3549568", GitTreeState:"clean", BuildDate:"2019-08- 05T09:15:22Z", GoVersion:"go1.12.5", Compiler:"gc", Platform:"linux/amd64"}

Slide 35

Slide 35 text

@k2r2bai Check cluster 在 Master node 查看節點狀狀態: $ kubectl get no $ kubectl get cs

Slide 36

Slide 36 text

@k2r2bai How it works?

Slide 37

Slide 37 text

@k2r2bai How it works?

Slide 38

Slide 38 text

@k2r2bai kubeadm upgrade apply • 檢查當前叢集是否處於可更更新狀狀態 • The API server is reachable • All nodes are in the Ready state • The control plane is healthy • 以 Version Skew 策略略[1]執⾏行行 • 確保 Control plane 元件的映像檔已在機器上，若若沒有則抓取更更新版本映像檔 1. https://kubernetes.io/docs/setup/release/version-skew-policy/

Slide 39

Slide 39 text

@k2r2bai kubeadm upgrade apply(count.) • 更更新 Control plane 元件，或是發⽣生問題時回滾原本版本 • 更更新 kube-dns and kube-proxy manifests，並確保需要的 RBAC 規則被建立 • 建立新的 API server certificate 與 key 檔案，若若檔案將在 180 後過期的話，將備份舊的檔案

Slide 40

Slide 40 text

@k2r2bai kubeadm upgrade node(master) • 從當前叢集取得 kubeadm 的 ClusterConfiguration • (optional)備份 kube-apiserver certificate • 更更新 Control plane 元件的 Static Pod manifests • 更更新當前節點的 kubelet 組態

Slide 41

Slide 41 text

@k2r2bai kubeadm upgrade node(worker) • 從當前叢集取得 kubeadm 的 ClusterConfiguration • 更更新當前節點的 kubelet 組態

Slide 42

Slide 42 text

Highly available Kubernetes cluster

Slide 43

Slide 43 text

@k2r2bai Why we need Kubernetes High Availability(HA)? • No SPOF(single point of failure). • Load balancing workload for API servers. • Failover clustering for Kubernetes state data(Etcd). • Running in multiple zones(across failure domains). • Zero-downtime Upgrade. https://github.com/bradfitz/homelab

Slide 44

Slide 44 text

@k2r2bai Stacked etcd topology https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/ha-topology/

Slide 45

Slide 45 text

@k2r2bai External etcd topology https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/ha-topology/

Slide 46

Slide 46 text

@k2r2bai Self-Hosted Kubernetes https://kccna18.sched.com/event/GrWQ

Slide 47

Slide 47 text

@k2r2bai Kubernetes Active-Active Components

Slide 48

Slide 48 text

@k2r2bai Kubernetes Active-Passive Components https://kccna18.sched.com/event/GrWQ

Slide 49

Slide 49 text

@k2r2bai Quorum with etcd https://kccna18.sched.com/event/GrWQ

Slide 50

Slide 50 text

@k2r2bai Cluster size and Failure tolerance https://kccna18.sched.com/event/GrWQ N / 2 +1

Slide 51

Slide 51 text

@k2r2bai History of HA in kubeadm https://kccnceu19.sched.com/event/MPj5/deep-dive-cluster-lifecycle-sig-kubeadm-fabrizio-pandini-lubomir-i-ivanov-vmware

Slide 52

Slide 52 text

@k2r2bai Set up a highly available cluster with kubeadm

Slide 53

Slide 53 text

@k2r2bai ”Multi-Master is NOT ENOUGH!! Eliminating every single point of failure in each layer of the stack“ Karan Goel, Meaghan Kjelland @ Google https://kccna18.sched.com/event/GrWQ

Slide 54

Slide 54 text

@k2r2bai SPOF in each layer of the stack Virtual Machines Application Network Partitions Physical Machines Storage Cooling Systems Electric & Power https://kccna18.sched.com/event/GrWQ

Slide 55

Slide 55 text

Image Registry - Harbor

Slide 56

Slide 56 text

@k2r2bai Harbor Harbor 是 CNCF Incubating 專案，該專案是基於 Docker Distribution 擴展功能的 Container Registry，提供映像檔儲存、簽署、漏洞洞掃描等功能。另外增加了了安全、⾝身份認證與 Web- based 管理理介⾯面等功能。 • LDAP/Active Directory、OIDC • Clair - 容器映像檔安全掃描 • Notary - 容器映像檔簽署(Content trust) • S3、Cloud Storage 等儲存後端 • 映像檔副本機制 • 使⽤用者管理理(User managment)，存取控制(Access control)和活動稽核(Activity auditing)

Slide 57

Slide 57 text

@k2r2bai Clair Clair 是 CoreOS 開源的容器映像檔安全掃描專案，其提供 API 式的分析服務，透過比對公開漏洞洞資料庫 CVE（Common Vulnerabilities and Exposures）的漏洞洞資料，並發送關於容器潛藏漏洞洞的有⽤用和可操作資訊給管理理者。 Clair CVE Updater REST API PostgreSQL CVE Data sources CRUD • Debian Security Bug Tracker • Ubuntu CVE Tracker • Red Hat Security Data • Oracle Linux Security Data • Alpine SecDB • NIST NVD

Slide 58

Slide 58 text

@k2r2bai Notary Notary 是 CNCF Incubating 專案，該專案是 Docker 對安全模組重構時，抽離的獨立專案。Notary 是⽤用於建立內容信任的平台，⽬目標是確保 Server 與 Client 之間交互使⽤用已經相互信任的連線，並保證在 Internet 上的內容發佈安全性，該專案在容器應⽤用時，能夠對映像檔、映像檔完整性等安全需求提供⽀支援(Content trust)。

Slide 59

Slide 59 text

@k2r2bai Create machine using Vagrant with Virtualbox ⾸首先透過 Git 取得以下 Repo(已下載過則忽略略): $ git clone https://github.com/inwinstack/k8s-course 進入對應⽬目錄執⾏行行以下指令: $ cd k8s-course/harbor $ vagrant up $ vagrant status $ vagrant ssh k8s-harbor

Slide 60

Slide 60 text

@k2r2bai Harbor installation 進入到 VM 然後下載 Docker compose: $ curl -L https://github.com/docker/compose/releases/download/1.24.1/docker-compose-`uname - s`-`uname -m` -o /usr/local/bin/docker-compose $ chmod +x /usr/local/bin/docker-compose 下載 Harbor offline 安裝檔: $ wget https://storage.googleapis.com/harbor-releases/release-1.8.0/harbor-offline-installer-v1.8.2- rc2.tgz $ tar xvf harbor-offline-installer-v1.8.2-rc2.tgz $ cd harbor && cp -rp /vagrant/config/harbor.yml ./ $ docker load < harbor.v1.8.2.tar.gz

Slide 61

Slide 61 text

@k2r2bai Harbor installation 複製 Self-signed 憑證: $ mkdir -p /data/cert/ $ cp -rp /vagrant/config/certs/* /data/cert/ $ chmod 777 -R /data/cert/ # just for demo 產⽣生 configs 與 compose 檔案: $ ./prepare

Slide 62

Slide 62 text

@k2r2bai Harbor installation 透過 docker-compose 部署 Harbor: $ docker-compose up -d 部署 Clair 與 Notary: $ sudo ./install.sh --with-notary --with-clair

Slide 63

Slide 63 text

@k2r2bai Use Docker with Harbor 複製 ca 憑證到 Docker certs ⽬目錄，以確保 HTTPs 能夠授權 : $ mkdir -p /etc/docker/certs.d/192.16.35.99 $ cp /vagrant/config/certs/ca.crt /etc/docker/certs.d/192.16.35.99/ 測試 Push image 到 Harbor: $ docker login 192.16.35.99 $ docker pull alpine:3.7 $ docker tag alpine:3.7 192.16.35.99/library/alpine:3.7 $ docker push 192.16.35.99/library/alpine:3.7 Access Portal: https://192.16.35.99 https://github.com/goharbor/harbor/blob/master/docs/user_guide.md

Slide 64

Slide 64 text

@k2r2bai Use Docker with Harbor

Slide 65

Slide 65 text

@k2r2bai Use Harbor on Kubernetes 在所有 K8s 節點上，複製 ca 憑證到 Docker certs ⽬目錄，以確保 HTTPs 能夠授權 : $ mkdir -p /etc/docker/certs.d/192.16.35.99 $ cp /vagrant/harbor/ca.crt /etc/docker/certs.d/192.16.35.99/ 在 Master 節點執⾏行行以下指令建立 Pull Secret 與 Pod : $ /vagrant/harbor/run.sh $ kubectl apply -f /vagrant/harbor $ kubectl get po

Slide 66

Slide 66 text

@k2r2bai Content trust for images

Slide 67

Slide 67 text

@k2r2bai Content trust for images

Slide 68

Slide 68 text

@k2r2bai Content trust for images 在 k8s-harbor VM 上建立 Notary HTTPs ca 憑證 : $ mkdir -p $HOME/.docker/tls/192.16.35.99:4443/ $ cp /vagrant/config/certs/ca.crt $HOME/.docker/tls/192.16.35.99:4443/ Push ⼀一個簽署的 Image 到 Harbor: $ export DOCKER_CONTENT_TRUST=1 $ export DOCKER_CONTENT_TRUST_SERVER=https://192.16.35.99:4443 $ docker tag alpine:3.7 192.16.35.99/trust/alpine:3.7 $ docker push 192.16.35.99/trust/alpine:3.7

Slide 69

Slide 69 text

@k2r2bai Content trust for images 在 k8s-harbor pull 未簽署的 Image : $ docker rmi 192.16.35.99/library/alpine $ docker pull 192.16.35.99/library/alpine Error: remote trust data does not exist for 192.16.35.99/library/alpine: 192.16.35.99:4443 does not have trust data for 192.16.35.99/library/alpine

Slide 70

Slide 70 text

@k2r2bai Portieris Portieris 是 IBM 開發的 Kubernetes Admission Controller，該 Controller 透過 Notary 驗證 Container Image 的內容是否為信任，若若不是的話將禁⽌止使⽤用該 Image。另外也⽀支援以下功能: • Whitelist Images • Fail Closed • Namespace or Cluster Wide Policies https://github.com/IBM/portieris

Slide 71

Slide 71 text

Monitoring Kubernetes / Registry

Slide 72

Slide 72 text

@k2r2bai https://landscape.cncf.io

Slide 73

Slide 73 text

@k2r2bai

Slide 74

Slide 74 text

@k2r2bai

Slide 75

Slide 75 text

@k2r2bai Prometheus 是 CNCF 孵化畢業的專案，主要提供系統監控警報框架與 TSDB(Time Series Database)。Prometheus 啟發於 Google 的 Borgmon 監控系統。 • Pull mode • A multi-dimensional data model • Alert System (with AlertManager) • Metrics Collection & Storage • Metrics, not logging, not tracing • Dashboarding / Graphing / Trending Prometheus

Slide 76

Slide 76 text

@k2r2bai Long-Live Job Short-Live Job

Slide 77

Slide 77 text

@k2r2bai

Slide 78

Slide 78 text

@k2r2bai

Slide 79

Slide 79 text

@k2r2bai

Slide 80

Slide 80 text

@k2r2bai

Slide 81

Slide 81 text

@k2r2bai Grafana 是⼀一個跨平臺的開源的測量量分析和視覺化⼯工具，可以透過將收集的資料進⾏行行查詢，然後視覺化的呈現，另外也提供即時通知機制。 Grafana

Slide 82

Slide 82 text

@k2r2bai Things to monitor on Kubernetes • Nodes • Pods • Control Plane • kubelet, kube-apiserver, controller manager, scheduler • APM • Request per second • Error rates • App specific metrics

Slide 83

Slide 83 text

@k2r2bai Architecture

Slide 84

Slide 84 text

@k2r2bai Let's go to deploy Kubernetes monitoring system

Slide 85

Slide 85 text

@k2r2bai Prometheus deployment ⾸首先透過 Git 取得以下 Repo(已下載過則忽略略): $ git clone https://github.com/inwinstack/k8s-course 進入對應⽬目錄執⾏行行以下指令(確保當前環境有 kubeconfig): $ cd k8s-course/addons/monitoring $ kubectl apply -f namespace.yml $ kubectl apply -f operator

Slide 86

Slide 86 text

@k2r2bai Prometheus deployment 檢查 Prometheus operator 是否已經執⾏行行: $ kubectl -n monitoring get po 當 Operator 啟動時，即可建立並設定 Prometheus 與 Grafana: $ kubectl apply -f service-discovery $ kubectl apply -f prometheus -f alertmanater -f prometheus-adapter $ kubectl apply -f node-exporter -f kube-state-metrics $ kubectl apply -f servicemonitor $ kubectl apply -f grafana

Slide 87

Slide 87 text

@k2r2bai Prometheus deployment 透過 port-forward 來來存取 Grafana: $ kubectl -n monitoring port-forward svc/grafana 3000:3000 --address 0.0.0.0 存取 HOST:3000

Slide 88

Slide 88 text

Logging Kubernetes / Registry

Slide 89

Slide 89 text

@k2r2bai

Slide 90

Slide 90 text

@k2r2bai Fluentd Fluentd 是 CNCF 的孵化專案，是類似 Logstash 所扮演的⾓角⾊色，該專案將蒐集 Log 負責的過程統⼀一規格化，並簡化定義 Input, Filter, Output 來來收集不同來來源的 Logs。

Slide 91

Slide 91 text

@k2r2bai Elasticsearch Elasticsearch是⼀一個基於 Lucene 函式庫的⽂文字搜尋引擎。它提供了了⼀一個分散式、⽀支援多租⼾戶的⽂文字搜索引擎，具有 HTTP Web 介⾯面和無模式 JSON 檔案。 • Built on top of Lucene • Document-oriented - It stores complex entities as structured JSON documents and indexes all fields by default. • Full-text search • Schema Free • RESTFul API

Slide 92

Slide 92 text

@k2r2bai Kibana Kibana 是⼀一套 Log 開源分析與視覺化平台，主要設計⽤用於與 Elasticsearch 運作，管理理者能夠在 Kibana 進⾏行行查詢、過濾與視覺化呈現儲存在 Elasticsearch 索引中的 Log 資料。

Slide 93

Slide 93 text

@k2r2bai Things to log on Kubernetes ALL THE THINGS!

Slide 94

Slide 94 text

@k2r2bai Architecture

Slide 95

Slide 95 text

@k2r2bai Let's go to deploy Kubernetes logging system

Slide 96

Slide 96 text

@k2r2bai EFK deployment ⾸首先透過 Git 取得以下 Repo(已下載過則忽略略): $ git clone https://github.com/inwinstack/k8s-course 進入對應⽬目錄執⾏行行以下指令(確保當前環境有 kubeconfig): $ cd k8s-course/addons/logging $ kubectl apply -f ./

Slide 97

Slide 97 text

@k2r2bai EFK deployment 透過 port-forward 來來存取 Kibana: $ kubectl -n kube-system port-forward svc/kibana-logging 5601:5601 --address 0.0.0.0 存取 HOST:5601

Slide 98

Slide 98 text

Troubleshooting Skills / Tools

Slide 99

Slide 99 text

@k2r2bai Make good use of kubectl $ kubectl -n get pods -o wide $ kubectl -n describe pod/ • API object 的 events • 容器⾏行行程退出原因 • 健康檢查失敗退出原因 • OOMKilled • 映像檔損毀或抓不到問題

Slide 100

Slide 100 text

@k2r2bai Make good use of kubectl $ kubectl -n logs [-c container_name] $ kubectl -n logs [-c container_name] --previous • 從 Logs 了了解是否為應⽤用程式崩潰 • 透過 --previous 查看先前執⾏行行過的 Pod Logs

Slide 101

Slide 101 text

@k2r2bai Looking at system logs, systemd, journalctl $ cat /var/log/kubernetes/xxx # kube-proxy, apiserver, ..., etc $ systemctl status kubelet $ journalctl -xeu kubelet • 查看 OS 管理理的 Kubernetes 元件狀狀態 • 確認 Kubernetes 元件是否為 Root cause

Slide 102

Slide 102 text

@k2r2bai Make good use of net-utils $ ping $ tcpdump -i -nn [condition] $ nslookup $ iptables -nL $ ipvsadm -L • 檢查跨節點網路路是否正常 • 檢查 KubeDNS 是否能解析 • 查看 iptables chain 與 ipvs rules 是否正確

Slide 103

Slide 103 text

@k2r2bai Learning the status of Pod • Pending • Waiting / ContainerCreating • ImageInspectError • ImagePullBackOff • CrashLoopBackOff • Error • Terminating • Unknown

Slide 104

Slide 104 text

@k2r2bai Learning the status of Pod • Pending - resources 是否⾜足夠; Host Port 是否佔⽤用; • Waiting / ContainerCreating - Image 太⼤大或網路路太慢; • ImagePullBackOff - registry 是否能存取; 檢查 registry-qps, registry-burst ⼤大⼩小; • ImageInspectError - container FS 是否有問題; Image 是否損毀; dockerd Error; • CrashLoopBackOff - ⾏行行程是否執⾏行行正常; Mount 是否有錯誤; • Error - ConfigMap/Secrets/PV 是否存在; • Terminating - 如果 > 30s 檢查 grace period; • Unknown - 節點是否 Ready;

Slide 105

Slide 105 text

@k2r2bai Utils Container https://blog.pichuang.com.tw/20190715-troubleshooting-from-container-to-any/ https://github.com/nicolaka/netshoot

Slide 106

Slide 106 text

@k2r2bai Troubleshooting Tools • Container • ctop - Top-like interface for container metrics • dive - A tool for exploring a docker image. • Kubernetes • krew - Package manager for kubectl plugins. • stern - Multi pod and container log tailing for Kubernetes. • ksniff - Ease sniffing on Kubernetes pods using tcpdump and Wireshark. • Weave Scope - Monitoring, visualisation & management for Docker & Kubernetes. • k9s - Terminal UI to interact with your Kubernetes clusters. • Skaffold, Telepresence - Local Kubernetes development made easy.

Slide 107

Slide 107 text

@k2r2bai Call for some help • Kubernetes 常⾒見見問題 - https://kubernetes.io/docs/tasks/debug-application-cluster/ debug-cluster/ • Kubernetes 官⽅方論壇 - https://discuss.kubernetes.io/ • Kubernetes GitHub Issue - https://github.com/kubernetes/kubernetes/issues • stackoverflow - https://stackoverflow.com/questions/tagged/kubernetes • Slack - https://slack.k8s.io/ • CNTUG Telegram - https://t.me/cntug

Slide 108

Slide 108 text

Additional

Slide 109

Slide 109 text

@k2r2bai Knative Knative extends Kubernetes to provide the missing building blocks that developers need to create modern, source-centric, container-based, cloud-native applications. “Developed in close partnership with Pivotal, IBM, Red Hat, and SAP, Knative pushes Kubernetes-based computing forward by providing the building blocks you need to build and deploy modern, container-based serverless applications.”

Slide 110

Slide 110 text

@k2r2bai Knative + Istio = Power The Knative framework is built on top of Kubernetes and Istio which provide a an Application runtime (container based) and advanced network routing respectively.

Slide 111

Slide 111 text

@k2r2bai KubeEdge • KubeEdge is an open source system extending native containerized application orchestration and device management to hosts at Edge. • It is built upon Kubernetes and provides core infrastructure support for network, app. • Deployment and metadata sychronization between cloud and edge. https://kubeedge.io/

Slide 112

Slide 112 text

@k2r2bai

Slide 113

Slide 113 text

@k2r2bai Rook Rook is an open source cloud-native storage orchestrator for Kubernetes, providing the platform, framework, and support for a diverse set of storage solutions to natively integrate with cloud-native environments.

Slide 114

Slide 114 text

@k2r2bai Kubernetes Failure Stories A compiled list of links to public failure stories related to Kubernetes. https://github.com/hjacobs/kubernetes-failure-stories