Advanced Kubernetes For UMC

5a84ea9a0afaca03be45b87dde58e51c?s=47 Kyle Bai
August 15, 2019

Advanced Kubernetes For UMC

UMC 第三天課程內容

5a84ea9a0afaca03be45b87dde58e51c?s=128

Kyle Bai

August 15, 2019
Tweet

Transcript

  1. Kyle Bai ⽩白凱仁 <k2r2.bai@gmail.com> Advanced Kubernetes

  2. @k2r2bai About Me ⽩白凱仁(Kyle Bai) • Software Engineer @ inwinSTACK.

    • OSS Contributor. • Certified Kubernetes Administrator/Application Developer. • Co-organizer of Cloud Native Taiwan User Group. • Interested in emerging technologies. @kairen https://k2r2bai.com
  3. @k2r2bai • Safely Upgrading Kubernetes Clusters • Upgrading the exist

    cluster using kubeadm • Highly available Kubernetes cluster • Image Registry - Harbor • Content trust for container image • Monitoring Kubernetes / Registry • Logging Kubernetes / Registry • Troubleshooting Skills / Tools Agenda Today I would like to talk about
  4. Safely Upgrading Kubernetes Clusters

  5. @k2r2bai Scope of cluster upgrades • Kubernetes ⼆二進制執⾏行行檔 • Kubernetes

    控制平⾯面元件 • Container Runtime • Etcd 叢集 • 叢集網路路(CNI plugins) • Base OS ....
  6. @k2r2bai Scope of cluster upgrades • Kubernetes ⼆二進制執⾏行行檔 - kubelet

    • Kubernetes 控制平⾯面元件 - apiserver, scheduler, controller manager • Container Runtime - Docker • Etcd 叢集 - etcd data • 叢集網路路(CNI plugins) - calico, flannel, ... • Base OS - Linux Kernel ....
  7. @k2r2bai You should know those before begin 1. 在更更新叢集時,您需要備份 etcd

    資料 • etcd Operator • etcdctl snapshot + cron + restore ETCDCTL_API=3 etcdctl --endpoints $ENDPOINT snapshot save snapshot.db ETCDCTL_API=3 etcdctl snapshot restore snapshot.db [FLAGS] https://github.com/etcd-io/etcd/blob/master/Documentation/op-guide/recovery.md = https://github.com/coreos/etcd-operator
  8. @k2r2bai You should know those before begin 1. 在更更新叢集時,您需要備份 etcd

    資料 2. 叢集更更新時,務必以⼀一個次要版本(Minor)為間隔更更新 • Kubernetes 約每三個⽉月會發⾏行行新的次要版本 • Good: My cluster is of v1.10, I want to upgrade to v1.11. • Bad: My cluster is of v1.10, I want to upgrade to v.1.13. • Good: My cluster is of v1.10, I upgrade to v1.11 and then upgrade to v1.12.
  9. @k2r2bai You should know those before begin 1. 在更更新叢集時,您需要備份 etcd

    資料 2. 叢集更更新時,務必以⼀一個中版本號為間隔更更新 3. 閱讀 Release Notes 來來了了解每個版本的變化 • Known Issues • Action Required • Deprecations and removals https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG.md
  10. @k2r2bai You should know those before begin 1. 在更更新叢集時,您需要備份 etcd

    資料 2. 叢集更更新時,務必以⼀一個中版本號為間隔更更新 3. 閱讀 Release Notes 來來了了解每個版本的變化 4. 善⽤用⼯工具或公有雲服務來來完成叢集更更新過程 • Kubeadm • Kops • Kubespray - Ansible • Cluster API • GKE, EKS, AKS, .... https://github.com/kubernetes-sigs/cluster-api
  11. @k2r2bai 5. 了了解要更更新的 Kubernetes ⽬目標版本 API 變化 • API 會隨版本演進⽽而改變,如

    v1.16 要移除 extensions/v1beta1 You should know those before begin https://kubernetes.io/blog/2019/07/18/api-deprecations-in-1-16/
  12. @k2r2bai 5. 了了解要更更新的 Kubernetes ⽬目標版本 API 變化 6. 確保應⽤用程式使⽤用⾼高級 API

    建置實例例,如 Deployment • 應⽤用程式有多個實例例(Pod)⽀支撐 • 利利⽤用探針確保應⽤用狀狀態,以攔阻流量量的分發 • 使⽤用 Pod 的 preStop hook 來來加強⽣生命週期管理理 You should know those before begin https://kubernetes.io/blog/2019/07/18/api-deprecations-in-1-16/
  13. @k2r2bai 5. 了了解要更更新的 Kubernetes ⽬目標版本 API 變化 6. 確保應⽤用程式使⽤用⾼高級 API

    建置實例例,如 Deployment 7. 更更新 Node 以前,優先更更新 Master You should know those before begin
  14. @k2r2bai Foreseeable roblem and solutions • etcd 中過舊的資料

  15. @k2r2bai Foreseeable roblem and solutions • etcd 中過舊的資料 • Don't

    remove API versions(#52185). • Storage migration System [1][2]. • Don't deploy EOL version[3]. Now, v1.12 - v1.14 is a good choice. • Read release notes. 1. https://github.com/kubernetes/community/pull/2524 2. https://github.com/kubernetes-sigs/kube-storage-version-migrator 3. https://kubernetes.io/docs/reference/using-api/deprecation-policy/
  16. @k2r2bai Foreseeable roblem and solutions • Clients 使⽤用的版本已過舊(過時) • 應⽤用與服務使⽤用的

    API 依然是相依於 extensions/v1beta1 • 開發的 Custom Controller 涉及過舊版本的 API 與 Libraries
  17. @k2r2bai Foreseeable roblem and solutions • Clients 使⽤用的版本已過舊(過時) • 應⽤用與服務使⽤用的

    API 依然是相依於 extensions/v1beta1 • 需在叢集更更新以前,優先將應⽤用與服務的 API object 轉移到新版本 • 開發的 Custom Controller 涉及過舊版本的 API 與 Libraries • 需要更更新程式碼使⽤用新版 API,並且將相依 Libraries 也更更新⾄至相容的新版本
  18. @k2r2bai Foreseeable roblem and solutions • Policy breaks after upgrade(Webhook,

    RBAC) • 有新版本 batch/v2 與 test_batch/v1
  19. @k2r2bai Foreseeable roblem and solutions • Policy breaks after upgrade(Webhook,

    RBAC) • 有新版本 batch/v2 與 test_batch/v1 • 在叢集更更新前,先更更新 Policies 使⽤用⽬目前所有⽀支援的版本
  20. @k2r2bai Unforeseeable problem and suggestions • 在升級前盡可能地在實際環境測試要升級的版本 • 使⽤用舊版本的 Sonobuoy

    來來測試新版本叢集 • 確保組態檔案保持⼀一致,並確保設定是否因為版本改變⽽而出現錯誤 • 檢視 Addons 是否因為新版本的特性⽽而崩潰 • 利利⽤用多階段升級⽅方式來來確保 API 的相容。
  21. @k2r2bai Summary • Before upgrade. • Backup etcd data. •

    Read release notes. • Upgrade clients. • Upgrade configurations.
  22. @k2r2bai Summary • Before upgrade. • During upgrade. • Upgrade

    master before node. • Don't use new APIs until HA master upgrade is done. • No in-place kubelet upgrade.
  23. @k2r2bai Summary • Before upgrade. • During upgrade. • After

    upgrade. • Check cluster addons. • Check cluster networking(CNI plugins). • Make sure that nodes are Ready.
  24. Upgrading the exist cluster using kubeadm

  25. @k2r2bai Step of cluster upgrades 1. Master • 更更新 kube-apiserver,

    controller manager, scheduler • 更更新 Addons,ex: kube-proxy, CoreDNS • 更更新 kubelet binary file 與組態檔案 • (optional)更更新 Node bootstrap tokens 的 RBAC 規則
  26. @k2r2bai Step of cluster upgrades 1. Master 2. Nodes •

    在 Master 使⽤用 kubectl drain 來來驅趕 Pods 到其他節點 • 使⽤用 PodDisruptionBudget 限制 Workload Controller 搬移 • 更更新 kubelet binary file 與組態檔案 • 在 Master 使⽤用 kubectl uncordon 讓節點能夠被排成 Pod
  27. @k2r2bai Let's use Kubeadm for upgrading a cluster! v1.14.5 ->

    v1.15.2
  28. @k2r2bai Upgrading master nodes ⾸首先進去 Master(k8s-m1),並更更新 kubeadm 到: $ sudo

    apt-get update && sudo apt-get install -y kubeadm=1.15.2-00 && \ sudo apt-mark hold kubeadm $ kubeadm version kubeadm version: &version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.2", GitCommit:"f6278300bebbb750328ac16ee6dd3aa7d3549568", GitTreeState:"clean", BuildDate:"2019-08-05T09:20:51Z", GoVersion:"go1.12.5", Compiler:"gc", Platform:"linux/amd64"}
  29. @k2r2bai Upgrading master nodes 在 Master 更更新 Control Plane 元件:

    $ sudo kubeadm upgrade plan $ sudo kubeadm upgrade apply v1.15.2 (optional)在其他 Master 節點更更新 Control Plane 元件: $ sudo kubeadm upgrade node
  30. @k2r2bai Upgrading master nodes 在 Master 更更新 kubelet 與 kubectl:

    $ sudo apt-get update && sudo apt-get install -y kubelet=1.15.2-00 kubectl=1.15.2-00 && \ sudo apt-mark hold kubelet kubectl 重新啟動 kubelet: $ sudo systemctl restart kubelet
  31. @k2r2bai Upgrading worker nodes ⾸首先在 Master 執⾏行行 drain 指令來來將 Nodes

    workload 轉移到其他節點: $ kubectl drain $NODE --ignore-daemonsets WARNING: ignoring DaemonSet-managed Pods: kube-system/calico-node-p4hs7, kube-system/ kube-proxy-nlznx evicting pod "nginx-68487cdddf-4cctq" evicting pod "calico-kube-controllers-6c795cc467-pc46s" ... node/k8s-n1 evicted
  32. @k2r2bai Upgrading worker nodes 進去 Nodes(k8s-n1, k8s-n2),並更更新 kubeadm 到: $

    sudo apt-get update && sudo apt-get install -y kubeadm=1.15.2-00 && \ sudo apt-mark hold kubeadm $ kubeadm version kubeadm version: &version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.2", GitCommit:"f6278300bebbb750328ac16ee6dd3aa7d3549568", GitTreeState:"clean", BuildDate:"2019-08-05T09:20:51Z", GoVersion:"go1.12.5", Compiler:"gc", Platform:"linux/amd64"}
  33. @k2r2bai Upgrading worker nodes 在 Node 更更新節點組態: $ sudo kubeadm

    upgrade node 在 Node 更更新 kubelet : $ sudo apt-get update && sudo apt-get install -y kubelet=1.15.2-00 && \ sudo apt-mark hold kubelet 重新啟動 kubelet: $ sudo systemctl restart kubelet
  34. @k2r2bai Check cluster 在 Master 執⾏行行 uncordon 恢復 Worker node

    到可排成狀狀態: $ kubectl uncordon $NODE 在 Master 透過 kubectl 檢查叢集版本與狀狀態: $ kubectl version Client Version: version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.2", GitCommit:"f6278300bebbb750328ac16ee6dd3aa7d3549568", GitTreeState:"clean", BuildDate:"2019-08- 05T09:23:26Z", GoVersion:"go1.12.5", Compiler:"gc", Platform:"linux/amd64"} Server Version: version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.2", GitCommit:"f6278300bebbb750328ac16ee6dd3aa7d3549568", GitTreeState:"clean", BuildDate:"2019-08- 05T09:15:22Z", GoVersion:"go1.12.5", Compiler:"gc", Platform:"linux/amd64"}
  35. @k2r2bai Check cluster 在 Master node 查看節點狀狀態: $ kubectl get

    no $ kubectl get cs
  36. @k2r2bai How it works?

  37. @k2r2bai How it works?

  38. @k2r2bai kubeadm upgrade apply • 檢查當前叢集是否處於可更更新狀狀態 • The API server

    is reachable • All nodes are in the Ready state • The control plane is healthy • 以 Version Skew 策略略[1]執⾏行行 • 確保 Control plane 元件的映像檔已在機器上,若若沒有則抓取更更新版本映像檔 1. https://kubernetes.io/docs/setup/release/version-skew-policy/
  39. @k2r2bai kubeadm upgrade apply(count.) • 更更新 Control plane 元件,或是發⽣生問題時回滾原本版本 •

    更更新 kube-dns and kube-proxy manifests,並確保需要的 RBAC 規則被建立 • 建立新的 API server certificate 與 key 檔案,若若檔案將在 180 後過期的話,將備份舊的 檔案
  40. @k2r2bai kubeadm upgrade node(master) • 從當前叢集取得 kubeadm 的 ClusterConfiguration •

    (optional)備份 kube-apiserver certificate • 更更新 Control plane 元件的 Static Pod manifests • 更更新當前節點的 kubelet 組態
  41. @k2r2bai kubeadm upgrade node(worker) • 從當前叢集取得 kubeadm 的 ClusterConfiguration •

    更更新當前節點的 kubelet 組態
  42. Highly available Kubernetes cluster

  43. @k2r2bai Why we need Kubernetes High Availability(HA)? • No SPOF(single

    point of failure). • Load balancing workload for API servers. • Failover clustering for Kubernetes state data(Etcd). • Running in multiple zones(across failure domains). • Zero-downtime Upgrade. https://github.com/bradfitz/homelab
  44. @k2r2bai Stacked etcd topology https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/ha-topology/

  45. @k2r2bai External etcd topology https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/ha-topology/

  46. @k2r2bai Self-Hosted Kubernetes https://kccna18.sched.com/event/GrWQ

  47. @k2r2bai Kubernetes Active-Active Components

  48. @k2r2bai Kubernetes Active-Passive Components https://kccna18.sched.com/event/GrWQ

  49. @k2r2bai Quorum with etcd https://kccna18.sched.com/event/GrWQ

  50. @k2r2bai Cluster size and Failure tolerance https://kccna18.sched.com/event/GrWQ N / 2

    +1
  51. @k2r2bai History of HA in kubeadm https://kccnceu19.sched.com/event/MPj5/deep-dive-cluster-lifecycle-sig-kubeadm-fabrizio-pandini-lubomir-i-ivanov-vmware

  52. @k2r2bai Set up a highly available cluster with kubeadm

  53. @k2r2bai ”Multi-Master is NOT ENOUGH!! Eliminating every single point of

    failure in each layer of the stack“ Karan Goel, Meaghan Kjelland @ Google https://kccna18.sched.com/event/GrWQ
  54. @k2r2bai SPOF in each layer of the stack Virtual Machines

    Application Network Partitions Physical Machines Storage Cooling Systems Electric & Power https://kccna18.sched.com/event/GrWQ
  55. Image Registry - Harbor

  56. @k2r2bai Harbor Harbor 是 CNCF Incubating 專案,該專案是基於 Docker Distribution 擴展功能的

    Container Registry,提供映像檔儲存、簽署、漏洞洞掃描等功能。另外增加了了安全、⾝身份認證與 Web- based 管理理介⾯面等功能。 • LDAP/Active Directory、OIDC • Clair - 容器映像檔安全掃描 • Notary - 容器映像檔簽署(Content trust) • S3、Cloud Storage 等儲存後端 • 映像檔副本機制 • 使⽤用者管理理(User managment),存取控制(Access control)和活動稽核(Activity auditing)
  57. @k2r2bai Clair Clair 是 CoreOS 開源的容器映像檔安全掃描專案,其提供 API 式的分析服務,透過比對公 開漏洞洞資料庫 CVE(Common

    Vulnerabilities and Exposures)的漏洞洞資料,並發送關於容 器潛藏漏洞洞的有⽤用和可操作資訊給管理理者。 Clair CVE Updater REST API PostgreSQL CVE Data sources CRUD • Debian Security Bug Tracker • Ubuntu CVE Tracker • Red Hat Security Data • Oracle Linux Security Data • Alpine SecDB • NIST NVD
  58. @k2r2bai Notary Notary 是 CNCF Incubating 專案,該專案是 Docker 對安全模組重構時,抽離的獨立專 案。Notary

    是⽤用於建立內容信任的平台,⽬目標是確保 Server 與 Client 之間交互使⽤用已經 相互信任的連線,並保證在 Internet 上的內容發佈安全性,該專案在容器應⽤用時,能夠對 映像檔、映像檔完整性等安全需求提供⽀支援(Content trust)。
  59. @k2r2bai Create machine using Vagrant with Virtualbox ⾸首先透過 Git 取得以下

    Repo(已下載過則忽略略): $ git clone https://github.com/inwinstack/k8s-course 進入對應⽬目錄執⾏行行以下指令: $ cd k8s-course/harbor $ vagrant up $ vagrant status $ vagrant ssh k8s-harbor
  60. @k2r2bai Harbor installation 進入到 VM 然後下載 Docker compose: $ curl

    -L https://github.com/docker/compose/releases/download/1.24.1/docker-compose-`uname - s`-`uname -m` -o /usr/local/bin/docker-compose $ chmod +x /usr/local/bin/docker-compose 下載 Harbor offline 安裝檔: $ wget https://storage.googleapis.com/harbor-releases/release-1.8.0/harbor-offline-installer-v1.8.2- rc2.tgz $ tar xvf harbor-offline-installer-v1.8.2-rc2.tgz $ cd harbor && cp -rp /vagrant/config/harbor.yml ./ $ docker load < harbor.v1.8.2.tar.gz
  61. @k2r2bai Harbor installation 複製 Self-signed 憑證: $ mkdir -p /data/cert/

    $ cp -rp /vagrant/config/certs/* /data/cert/ $ chmod 777 -R /data/cert/ # just for demo 產⽣生 configs 與 compose 檔案: $ ./prepare
  62. @k2r2bai Harbor installation 透過 docker-compose 部署 Harbor: $ docker-compose up

    -d 部署 Clair 與 Notary: $ sudo ./install.sh --with-notary --with-clair
  63. @k2r2bai Use Docker with Harbor 複製 ca 憑證到 Docker certs

    ⽬目錄,以確保 HTTPs 能夠授權 : $ mkdir -p /etc/docker/certs.d/192.16.35.99 $ cp /vagrant/config/certs/ca.crt /etc/docker/certs.d/192.16.35.99/ 測試 Push image 到 Harbor: $ docker login 192.16.35.99 $ docker pull alpine:3.7 $ docker tag alpine:3.7 192.16.35.99/library/alpine:3.7 $ docker push 192.16.35.99/library/alpine:3.7 Access Portal: https://192.16.35.99 https://github.com/goharbor/harbor/blob/master/docs/user_guide.md
  64. @k2r2bai Use Docker with Harbor

  65. @k2r2bai Use Harbor on Kubernetes 在所有 K8s 節點上,複製 ca 憑證到

    Docker certs ⽬目錄,以確保 HTTPs 能夠授權 : $ mkdir -p /etc/docker/certs.d/192.16.35.99 $ cp /vagrant/harbor/ca.crt /etc/docker/certs.d/192.16.35.99/ 在 Master 節點執⾏行行以下指令建立 Pull Secret 與 Pod : $ /vagrant/harbor/run.sh $ kubectl apply -f /vagrant/harbor $ kubectl get po
  66. @k2r2bai Content trust for images

  67. @k2r2bai Content trust for images

  68. @k2r2bai Content trust for images 在 k8s-harbor VM 上建立 Notary

    HTTPs ca 憑證 : $ mkdir -p $HOME/.docker/tls/192.16.35.99:4443/ $ cp /vagrant/config/certs/ca.crt $HOME/.docker/tls/192.16.35.99:4443/ Push ⼀一個簽署的 Image 到 Harbor: $ export DOCKER_CONTENT_TRUST=1 $ export DOCKER_CONTENT_TRUST_SERVER=https://192.16.35.99:4443 $ docker tag alpine:3.7 192.16.35.99/trust/alpine:3.7 $ docker push 192.16.35.99/trust/alpine:3.7
  69. @k2r2bai Content trust for images 在 k8s-harbor pull 未簽署的 Image

    : $ docker rmi 192.16.35.99/library/alpine $ docker pull 192.16.35.99/library/alpine Error: remote trust data does not exist for 192.16.35.99/library/alpine: 192.16.35.99:4443 does not have trust data for 192.16.35.99/library/alpine
  70. @k2r2bai Portieris Portieris 是 IBM 開發的 Kubernetes Admission Controller,該 Controller

    透過 Notary 驗 證 Container Image 的內容是否為信任,若若不是的話將禁⽌止使⽤用該 Image。另外也⽀支援以 下功能: • Whitelist Images • Fail Closed • Namespace or Cluster Wide Policies https://github.com/IBM/portieris
  71. Monitoring Kubernetes / Registry

  72. @k2r2bai https://landscape.cncf.io

  73. @k2r2bai

  74. @k2r2bai

  75. @k2r2bai Prometheus 是 CNCF 孵化畢業的專案,主要提供系統監控警報框架與 TSDB(Time Series Database)。Prometheus 啟發於 Google

    的 Borgmon 監控系統。 • Pull mode • A multi-dimensional data model • Alert System (with AlertManager) • Metrics Collection & Storage • Metrics, not logging, not tracing • Dashboarding / Graphing / Trending Prometheus
  76. @k2r2bai Long-Live Job Short-Live Job

  77. @k2r2bai

  78. @k2r2bai

  79. @k2r2bai

  80. @k2r2bai

  81. @k2r2bai Grafana 是⼀一個跨平臺的開源的測量量分析和視覺化⼯工具,可以透過將收集的資料進⾏行行查 詢,然後視覺化的呈現,另外也提供即時通知機制。 Grafana

  82. @k2r2bai Things to monitor on Kubernetes • Nodes • Pods

    • Control Plane • kubelet, kube-apiserver, controller manager, scheduler • APM • Request per second • Error rates • App specific metrics
  83. @k2r2bai Architecture

  84. @k2r2bai Let's go to deploy Kubernetes monitoring system

  85. @k2r2bai Prometheus deployment ⾸首先透過 Git 取得以下 Repo(已下載過則忽略略): $ git clone

    https://github.com/inwinstack/k8s-course 進入對應⽬目錄執⾏行行以下指令(確保當前環境有 kubeconfig): $ cd k8s-course/addons/monitoring $ kubectl apply -f namespace.yml $ kubectl apply -f operator
  86. @k2r2bai Prometheus deployment 檢查 Prometheus operator 是否已經執⾏行行: $ kubectl -n

    monitoring get po 當 Operator 啟動時,即可建立並設定 Prometheus 與 Grafana: $ kubectl apply -f service-discovery $ kubectl apply -f prometheus -f alertmanater -f prometheus-adapter $ kubectl apply -f node-exporter -f kube-state-metrics $ kubectl apply -f servicemonitor $ kubectl apply -f grafana
  87. @k2r2bai Prometheus deployment 透過 port-forward 來來存取 Grafana: $ kubectl -n

    monitoring port-forward svc/grafana 3000:3000 --address 0.0.0.0 存取 HOST:3000
  88. Logging Kubernetes / Registry

  89. @k2r2bai

  90. @k2r2bai Fluentd Fluentd 是 CNCF 的孵化專案,是類似 Logstash 所扮演的⾓角⾊色,該專案將蒐集 Log 負責的

    過程統⼀一規格化,並簡化定義 Input, Filter, Output 來來收集不同來來源的 Logs。
  91. @k2r2bai Elasticsearch Elasticsearch是⼀一個基於 Lucene 函式庫的⽂文字搜尋引擎。它提供了了⼀一個分散式、⽀支援多 租⼾戶的⽂文字搜索引擎,具有 HTTP Web 介⾯面和無模式 JSON

    檔案。 • Built on top of Lucene • Document-oriented - It stores complex entities as structured JSON documents and indexes all fields by default. • Full-text search • Schema Free • RESTFul API
  92. @k2r2bai Kibana Kibana 是⼀一套 Log 開源分析與視覺化平台,主要設計⽤用於與 Elasticsearch 運作,管理理者 能夠在 Kibana

    進⾏行行查詢、過濾與視覺化呈現儲存在 Elasticsearch 索引中的 Log 資料。
  93. @k2r2bai Things to log on Kubernetes ALL THE THINGS!

  94. @k2r2bai Architecture

  95. @k2r2bai Let's go to deploy Kubernetes logging system

  96. @k2r2bai EFK deployment ⾸首先透過 Git 取得以下 Repo(已下載過則忽略略): $ git clone

    https://github.com/inwinstack/k8s-course 進入對應⽬目錄執⾏行行以下指令(確保當前環境有 kubeconfig): $ cd k8s-course/addons/logging $ kubectl apply -f ./
  97. @k2r2bai EFK deployment 透過 port-forward 來來存取 Kibana: $ kubectl -n

    kube-system port-forward svc/kibana-logging 5601:5601 --address 0.0.0.0 存取 HOST:5601
  98. Troubleshooting Skills / Tools

  99. @k2r2bai Make good use of kubectl $ kubectl -n <namespace>

    get pods <pod-name> -o wide $ kubectl -n <namespace> describe pod/<name> • API object 的 events • 容器⾏行行程退出原因 • 健康檢查失敗退出原因 • OOMKilled • 映像檔損毀或抓不到問題
  100. @k2r2bai Make good use of kubectl $ kubectl -n <namespace>

    logs <name> [-c container_name] $ kubectl -n <namespace> logs <name> [-c container_name] --previous • 從 Logs 了了解是否為應⽤用程式崩潰 • 透過 --previous 查看先前執⾏行行過的 Pod Logs
  101. @k2r2bai Looking at system logs, systemd, journalctl $ cat /var/log/kubernetes/xxx

    # kube-proxy, apiserver, ..., etc $ systemctl status kubelet $ journalctl -xeu kubelet • 查看 OS 管理理的 Kubernetes 元件狀狀態 • 確認 Kubernetes 元件是否為 Root cause
  102. @k2r2bai Make good use of net-utils $ ping <ip> $

    tcpdump -i <eth> -nn [condition] $ nslookup <service_name> $ iptables -nL $ ipvsadm -L • 檢查跨節點網路路是否正常 • 檢查 KubeDNS 是否能解析 • 查看 iptables chain 與 ipvs rules 是否正確
  103. @k2r2bai Learning the status of Pod • Pending • Waiting

    / ContainerCreating • ImageInspectError • ImagePullBackOff • CrashLoopBackOff • Error • Terminating • Unknown
  104. @k2r2bai Learning the status of Pod • Pending - resources

    是否⾜足夠; Host Port 是否佔⽤用; • Waiting / ContainerCreating - Image 太⼤大或網路路太慢; • ImagePullBackOff - registry 是否能存取; 檢查 registry-qps, registry-burst ⼤大⼩小; • ImageInspectError - container FS 是否有問題; Image 是否損毀; dockerd Error; • CrashLoopBackOff - ⾏行行程是否執⾏行行正常; Mount 是否有錯誤; • Error - ConfigMap/Secrets/PV 是否存在; • Terminating - 如果 > 30s 檢查 grace period; • Unknown - 節點是否 Ready;
  105. @k2r2bai Utils Container https://blog.pichuang.com.tw/20190715-troubleshooting-from-container-to-any/ https://github.com/nicolaka/netshoot

  106. @k2r2bai Troubleshooting Tools • Container • ctop - Top-like interface

    for container metrics • dive - A tool for exploring a docker image. • Kubernetes • krew - Package manager for kubectl plugins. • stern - Multi pod and container log tailing for Kubernetes. • ksniff - Ease sniffing on Kubernetes pods using tcpdump and Wireshark. • Weave Scope - Monitoring, visualisation & management for Docker & Kubernetes. • k9s - Terminal UI to interact with your Kubernetes clusters. • Skaffold, Telepresence - Local Kubernetes development made easy.
  107. @k2r2bai Call for some help • Kubernetes 常⾒見見問題 - https://kubernetes.io/docs/tasks/debug-application-cluster/

    debug-cluster/ • Kubernetes 官⽅方論壇 - https://discuss.kubernetes.io/ • Kubernetes GitHub Issue - https://github.com/kubernetes/kubernetes/issues • stackoverflow - https://stackoverflow.com/questions/tagged/kubernetes • Slack - https://slack.k8s.io/ • CNTUG Telegram - https://t.me/cntug
  108. Additional

  109. @k2r2bai Knative Knative extends Kubernetes to provide the missing building

    blocks that developers need to create modern, source-centric, container-based, cloud-native applications. “Developed in close partnership with Pivotal, IBM, Red Hat, and SAP, Knative pushes Kubernetes-based computing forward by providing the building blocks you need to build and deploy modern, container-based serverless applications.”
  110. @k2r2bai Knative + Istio = Power The Knative framework is

    built on top of Kubernetes and Istio which provide a an Application runtime (container based) and advanced network routing respectively.
  111. @k2r2bai KubeEdge • KubeEdge is an open source system extending

    native containerized application orchestration and device management to hosts at Edge. • It is built upon Kubernetes and provides core infrastructure support for network, app. • Deployment and metadata sychronization between cloud and edge. https://kubeedge.io/
  112. @k2r2bai

  113. @k2r2bai Rook Rook is an open source cloud-native storage orchestrator

    for Kubernetes, providing the platform, framework, and support for a diverse set of storage solutions to natively integrate with cloud-native environments.
  114. @k2r2bai Kubernetes Failure Stories A compiled list of links to

    public failure stories related to Kubernetes. https://github.com/hjacobs/kubernetes-failure-stories