地端自建 Kubernetes (K8s) 小宇宙 (On-premises Kubernetes) @ CNTUG 2024/11 Meetup #63

地端自建 Kubernetes 小宇宙 Johnny Sung

Full stack developer Johnny Sung (宋岡諺) https://fb.com/j796160836 https://blog.jks.co ff ee/
https://www.slideshare.net/j796160836 https://github.com/j796160836

大綱 •K8s 基本概念 •K8s 元件的概念 •地端架設實務 •GPU 卡？

High Availability https://soco-st.com/18158 高可用性

https://blog.whmcs.com/133514/demystifying-high-availability-for-whmcs

CAP 定理 • 一致性（Consistency） •可用性（Availability） •分區容錯性（Partition tolerance）
https://zh.wikipedia.org/zh-tw/CAP%E5%AE%9A%E7%90%86 https://medium.com/nerd-for-tech/understand-cap-theorem-751f0672890e

https://medium.com/how-gipi-learn/%E5%B0%8A%E9%87%8D-%E9%9C%80%E8%A6%81%E9%9D%A0%E5%B0%88%E6%A5%AD%E5%8E%BB%E8%B4%8F%E5%9B%9E%E4%BE%86-8fdecf676fe5

https://medium.com/%E5%BE%8C%E7%AB%AF%E6%96%B0%E6%89%8B%E6%9D%91/cap%E5%AE%9A%E7%90%86101-3fdd10e0b9a

大部分都要改程式 https://soco-st.com/18158 要做到高可用性我們今天就 Infrastructure
做探討

https://javascript.plainenglish.io/what-is-a-server-explanation-for-young-developers-2511d8b313b7

https://ithelp.ithome.com.tw/articles/10250841 Virtual Machine (VM) vs Docker

https://upload.wikimedia.org/wikipedia/commons/6/67/Kubernetes_logo.svg

https://www.cncf.io/blog/2024/06/06/unveiling-the-10-year-kubernetes-anniversary-logo/

開發工程師視角的 Kubernetes 可能是你？ https://soco-st.com/20498

https://soco-st.com/20498 我知道！就是檔！

是什麼？可以吃嗎？ https://soco-st.com/20498

想想以前 Docker 的時代

Created by hanis tusiyani from Noun Project https://thenounproject.com/icon/server-7086299/  https://thenounproject.com/icon/data-center-7086329/  https://www.pngwing.com/en/free-png-ztqam
docker run -v ./www:/usr/share/nginx/html:ro -p 80:80 -d nginx docker run 指令一次起單一服務

docker run -v ./www:/usr/share/nginx/html:ro -p 80:80 -d nginx version: "3" services: nginx: image: nginx volumes: - ./www:/usr/share/nginx/html:ro ports: - 80:80 docker run 指令 docker-compose.yml 一次起多組服務一次起單一服務

Created by hanis tusiyani from Noun Project docker run -v ./www:/usr/share/nginx/html:ro -p 80:80 -d nginx version: "3" services: nginx: image: nginx volumes: - ./www:/usr/share/nginx/html:ro ports: - 80:80 docker run 指令 docker-compose.yml • deployment.yml • services.yml • rbac.yml • config-map.yml • …. 一次起多組服務 Kubernetes 多組服務部署在多台主機上一次起單一服務

一個網站服務的基本元件

Pod Container https://thenounproject.com/icon/ram-7094983/ https://thenounproject.com/icon/hard-disk-7094988/ https://thenounproject.com/icon/network-5355161/ https://thenounproject.com/icon/history-5019532/ https://thenounproject.com/icon/central-processing-unit-7095000/ https://thenounproject.com/icon/form-6622708/  https://thenounproject.com/icon/approval-6293848/ 網站服務的基本元件

Pod • Kubernetes 的最小單位 • 裡面「通常」只會有
一個 container （sidecar 情境除外） https://medium.com/@ajeetrai707/kubernetes-pods-an-introduction-650b6f93874d

Pod Container https://thenounproject.com/icon/ram-7094983/ https://thenounproject.com/icon/hard-disk-7094988/ https://thenounproject.com/icon/network-5355161/ https://thenounproject.com/icon/history-5019532/ https://thenounproject.com/icon/central-processing-unit-7095000/ https://thenounproject.com/icon/form-6622708/  https://thenounproject.com/icon/approval-6293848/ Service
Created by Mada Creative 網站服務的基本元件

• 將一個 Pod 或一組 Pod 開放對外出去的定義， 
運作模式主要有三種： • ClusterIP • NodePort • Load Balancer Service (服務) apiVersion: v1 kind: Service metadata: name: my-service namespace: my-namespace spec: selector: app: my-deployment ports: - protocol: TCP port: 3000 targetPort: 3000 nodePort: 31200 type: NodePort （NodePort 範圍 30000-32767）地端 K8s 預設沒有 LoadBalancer 可用

Pod Container Deployment ReplicaSet https://thenounproject.com/icon/ram-7094983/ https://thenounproject.com/icon/hard-disk-7094988/ https://thenounproject.com/icon/network-5355161/ https://thenounproject.com/icon/history-5019532/ https://thenounproject.com/icon/central-processing-unit-7095000/ https://thenounproject.com/icon/form-6622708/ 
https://thenounproject.com/icon/approval-6293848/ by Muhammad Naufal Subhiansyah from Noun Project by Muhammad Naufal Subhiansyah from Noun Project Service Created by Mada Creative 網站服務的基本元件

Deployment (部署) • 定義一個 Pod 的部署方式
• Replicas 要幾份 • 設定參數 • Con fi gMap, Secret • Resources Limit (CPU, memory) • VolumeMounts  （使用的 PersistentVolumeClaim PVC） apiVersion: apps/v1 kind: Deployment metadata: labels: app: my-deployment name: my-deployment namespace: my-namespace spec: replicas: 1 selector: matchLabels: app: my-deployment template: metadata: labels: app: my-deployment spec: containers: - image: my_image:1.0 name: my_image resources: requests: memory: 64Mi cpu: 250m limits: memory: 128Mi cpu: 500m ports: - containerPort: 3000 name: my_image volumeMounts: - name: my-pvc mountPath: /mydata - name: my-pvc mountPath: /data/output volumes: - name: my-pvc persistentVolumeClaim: claimName: my-pvc

https://thenounproject.com/icon/approval-6293848/ by Muhammad Naufal Subhiansyah from Noun Project by Muhammad Naufal Subhiansyah from Noun Project Service Created by Mada Creative PVC PersistentVolumeClaim PersistentVolume PV 1:1 網站服務的基本元件

https://thenounproject.com/icon/approval-6293848/ by Muhammad Naufal Subhiansyah from Noun Project by Muhammad Naufal Subhiansyah from Noun Project Service Created by Mada Creative PVC PersistentVolumeClaim PersistentVolume PV Created by Andika Cahya Fitriani from the Noun Project Provisioner StorageClass 1:1 網站服務的基本元件還有更多...

磁碟相關 • PersistentVolumeClaim (PVC)：空間要求 • PersistentVolume (PV) • StorageClass Hard
Disk Drive by Ahmad Roaayala from Noun Project (CC BY 3.0)  https://thenounproject.com/browse/icons/term/hard-disk-drive/

PersistentVolumeClaim (PVC) • 白話文：存放（磁碟）空間的要求 • 開發人
員建置（類似空間需求申請單） Hard Disk Drive by Ahmad Roaayala from Noun Project (CC BY 3.0)  https://thenounproject.com/browse/icons/term/hard-disk-drive/ MB => Mi (Mebibytes) GB => Gi (Gibibytes) apiVersion: v1 kind: PersistentVolumeClaim metadata: name: my-pvc spec: storageClassName: nfs-client accessModes: - ReadWriteMany resources: requests: storage: 10Gi

docker-compose version: "3" services: nginx: image: nginx volumes: - ./www:/usr/share/nginx/html:ro
ports: - 80:80 • 服務部署 • 磁碟 • 網路

對應 Kubernetes 的元件 • 服務部署 → Deployment / Pod •
磁碟 → PersistentVolumeClaim (PVC) / Con fi gMap / Secret • 網路 → Service / Ingress 永久磁碟儲存需求會自動 1:1 對應 PersistentVolume (PV) 地端 K8s 預設沒有 LoadBalancer 可用

Kustomize Kustomize 是一個 Kubernetes 的配置管理工具，可以透過定制資源的配置來簡化
Kubernetes 的部署。它專注於以聲明式方式修改和管理 Kubernetes manifest 檔案，不需要動態生成配置。使用者可以建立基礎配置的 "基底"，然後在不同環境（如開發、測試和生產）中進行客製化覆蓋。Kustomize 允許合併或替換 YAML 檔案的部分，使得配置更加模組化和可重用。它現在是 Kubernetes 的一部分，可以直接透過 kubectl 命令行工具使用。 https://zlaval.medium.com/kustomize-template-free-kubernetes-application-management-3d70ca9d2e05

Kustomize 檔案架構 https://thenounproject.com/icon/ fi le-6897025/ https://thenounproject.com/icon/puzzle-6850847/ deployment.yml services.yml config-map.yml …
kustomization.yaml

當 YAMLs 越來越多... 你需要請更多 YAML 工程師👷

當 YAMLs 越來越多... 你需要 Helm

Helm Helm 是一個用於 Kubernetes 的套件管理工
具，允許開發者和運維團隊打包、配置和部署服務。Helm 使用稱為 "Charts" 的配置文件來描述一組相關的 Kubernetes 資源，這些資源可以預先配置並重複使用。透過 Helm，用戶可以輕鬆地安裝、升級和管理 Kubernetes 應用，並支援版本控制和回滾 (Rollback) 功能，使得部署和維護變得更加方便和有效。 https://helm.sh/

Created by Mas Mirza from Noun Project values.yml • deployment.yml
• services.yml • rbac.yml • config-map.yml • …. Helm 檔案架構 https://thenounproject.com/icon/ fi le-6897025/ https://thenounproject.com/icon/puzzle-6850847/ Charts

可是我對 Helm 指令不太熟耶... 🥸

https://github.com/JohnnyWorks-TW/vue-helm-cli-helper 可以服用我寫的 Helm Chart 小助手 😎

維運工程師視角的 Kubernetes 可能還是你？ https://soco-st.com/18158

K8s 的各種選擇 •作業系統 OS •K8s distro •Container Runtime •CNI (Container
Network Interface) •CRI (Container Runtime Interface)

K8s 的各種選擇 •作業系統 OS •Ubuntu? Redhat?

K8s 的各種選擇 •Container Runtime •docker? containerd? cri-o?

K8s 的各種選擇 •K8s distro •社群版 •kubeadm? Rancher? •商用版
•OpenShift? VMWare Tanzu?

K8s 的各種選擇 •CNI (Container Network Interface) •Flannel? Calico? Cilium?

通通綜合起來... https://soco-st.com/18158

我給你一個預設選項吧！ •作業系統 OS：ubuntu •K8s distro：kubeadm •Container Runtime: docker
•CNI (Container Network Interface): fl annel •CRI (Container Runtime Interface): cri-dockerd https://soco-st.com/21673 https://en.m.wikipedia.org/wiki/File:UbuntuCoF.svg https://www.docker.com/company/newsroom/media-resources/

K8s 作業系統選擇 •建議選多人用的 •Debian 系列：Ubuntu, Debian, … •Redhat
系列：RHEL (Red Hat Enterprise Linux), RockyLinux, Fedora core, …

K8s distro 選擇 •個人經驗，使用標準 kubeadm 指令
（也有人稱為 Vanilla Kubernetes） •商用付費版可以考慮

Container Runtime 選擇 •建議選多人用的 •docker •containerd •cri-o •cri-o
配合 NVIDIA GPU Driver 似乎有問題

CRI 選擇 •CRI 必須配合 Container runtime •docker → cri-dockerd

https://kubernetes.io/blog/2017/11/containerd-container-runtime-options-kubernetes/

Flannel Calico Cilium •使用 BGP（Border Gateway Protocol）進行路由，提供
高效的三層網絡 • •利用 Linux 的 eBPF 技術，直接在內核層處理網絡流量 • 支援 L3/L4/L7 層的網絡策略 •簡單易用的網絡解決方案 •使用 VXLAN 或 host-gw 模式來實現網絡互通 •不支援 Network Policies 無法做細粒度的流量控制 CNI 比較 https://www.civo.com/blog/calico-vs- fl annel-vs-cilium

特性 Flannel Calico Cilium 主要用途簡單網絡連接高性能網絡與安全策略
現代化網絡與安全策略封裝方式 VXLAN/host-gw BGP/IPIP/VXLAN eBPF 性能中高高支持網絡策略否是（進階策略）是（L3-L7 層支援）資源消耗低中中學習曲線低中高高適用場景小型集群、簡單場景大型集群、混合環境微服務、高安全性需求 CNI 比較

CNI 選擇 •簡單好上手就：Flannel •複雜功能一次到位：Cilium •CNI 預設
namespace 間沒有阻擋

不只這些 https://soco-st.com/18158

•StorageClass: nfs-subdir-external-provisioner •Metric server •ArgoCD •Prometheus + Grafana K8s 常安裝的元件
部署監控儲存 K8s擴展

https://blog.jks.co ff ee/on-premise-self-host-kubernetes-k8s-setup-redhat https://blog.jks.co ff ee/on-premise-self-host-kubernetes-k8s-setup-ubuntu

大致步驟 •<每台都做> 關掉 Swap •<每台都做> 安裝 Docker •<每台都做> 安裝
kubelet、kubeadm、kubectl •<每台都做> 安裝 cri-dockerd •<每台都做> 設定 /etc/hosts •設定 Control plane node •設定 Worker node •<Control plane 做> 安裝 Helm 套件管理程式 •<Control plane 做> 安裝 Flannel CNI •<Control plane 做> 測試檢查叢集

K8s 的元件介紹 https://shopee.tw/product/4216795/204703757

https://kubernetes.io/docs/concepts/overview/components/

https://mrdevops.hashnode.dev/kubernetes-architecture

K8s Control Plane 元件 •kube-apiserver 主要核心，提供 Kubernetes HTTP
API •etcd Key-Value 資料庫，有一致性與高可用的特色 •kube-scheduler 排程分配器，把 Pod 分到合適的 node •kube-controller-manager 有一個 Run Loop 監控叢集的狀態，盡可能調整狀態達成目標 •cloud-controller-manager 與雲端廠商元件溝通使用，底層與雲端整合

K8s 每個 Node 元件 •kubelet 主服務，確保各元件有正常運作 •kube-proxy 維護網路規則以實現 Service 的功能
•Container runtime 容器化運行軟體

K8s 重點元件 •kubelet •etcd •CoreDNS •Network CNI •Container Runtime (CRI)
https://github.com/coredns https://github.com/etcd-io/etcd

K8s 元件 •StorageClass

https://www.cncf.io/

光提到 K8s 主元件就有這些了

若是週邊系統就更多了 And More…

https://www.onlogic.com/blog/what-is-a-gpu-a-beginners-guide/ 關於 GPU

想玩地端 LLM ?

首先，你要有張 NVIDIA 的卡（誤）

https://mises.org/mises-daily/understanding-price-money

GPU 相關 •NVIDIA driver •NVIDIA CUDA •GPU Operator

GPU Operator 重點元件 •Device Plugin •GPU Feature Discovery (GFD) •DCGM
•DCGM Exporter • …

https://info.nvidia.com/how-to-use-gpus-on-kubernetes-webinar.html

GPU K8s 大致步驟 •裝 NVIDIA driver（.run的版本） •裝 NVIDIA Cuda
•裝 NVIDIA Container Toolkit •下指令 patch con fi g 綁定 Containerd •裝 Kubernetes •裝 GPU Operator

https://realfood.tesco.com/recipes/rainbow-cake.html GPU 怎麼切？

https://aws.amazon.com/tw/blogs/containers/gpu-sharing-on-amazon-eks-with-nvidia-time-slicing-and-accelerated-ec2-instances/ https://developer.nvidia.com/blog/improving-gpu-utilization-in-kubernetes/

GPU 切割方式看起來有五種，但其實只有二種 https://soco-st.com/18158

Time slicing •分時多工的原理 •vRAM 不限制 MPS •Multi-Thread 方
式分配 •vRAM 每份固定大 (Multi-Process Service)

MIG •硬體層面切割 GPU •指定型號才有（例如：A100, H100） •Blackwell 或
Hopper™ 系列 vGPU •NVIDIA 支援 GPU 虛擬化 •要軟體授權 (Multi-Instance GPU) (virtual GPU) https://www.nvidia.com/en-us/technologies/multi-instance-gpu/

GPU Mode 比較 •Time slicing: Memory 不限制，Process 間會排擠 •MPS:
軟體性均分 •MIG: 硬體層級分割 https://cloud.google.com/kubernetes-engine/docs/concepts/timesharing-gpus

https://www.youtube.com/watch?v=Q2GuTUO170w

地端 (on-premises) 建置有什麼雷點？很多... 😎

地端 (on-premises) 建置有什麼雷點？很多... 🥹

https://www.reddit.com/r/Helldivers/comments/1eir0ha/free_mines_are_a_bad_idea/

地端離線建置的雷點 •離線建置實測會 hang 住 •離線建議使用傳統 docker load 建置
https://kubernetes.io/docs/reference/setup-tools/kubeadm/kubeadm-init/

裝 cri-dockerd •沒有出現成套件可用 (.rpm .deb) •Redhat 的 Golang 沒這麼新....
https://github.com/coredns https://github.com/etcd-io/etcd

https://github.com/Mirantis/cri-dockerd/releases/tag/v0.3.16 https://wiki.ubuntu.com/Releases

https://github.com/Mirantis/cri-dockerd/releases/tag/v0.3.16 https://wiki.ubuntu.com/Releases noble?

https://blog.jks.co ff ee/on-premise-self-host-kubernetes-k8s-setup-redhat

每個機器都要有名稱 •設定主機對應 vi /etc/hosts 192.168.1.100 k8s-ctrl 192.168.1.101 k8s-node1 192.168.1.102 k8s-node2

關於 hostname •合法字元：小寫 a-z 數字 0-9 與連字號 -
•63 個字元 •不要用大寫，不要用底線 _ 1 to 63 characters long and the entire hostname, including the dots, can be at most 253 characters long. Valid characters for hostnames are ASCII(7) letters from a to z, the digits from 0 to 9, and the hyphen (−). A hostname may not start with a hyphen. https://www.linuxcampus.net/documentation/man-html/htmlman7/hostname.7.html

指令/操作持續時間影響範圍效果 sysctl -w vm.swappiness=0 臨時只降低 swap
使用優先權 swap 還在，但基本不會被用到 sudo swapoff -a 臨時停止當前所有 swap 分區 swap 停用，內存壓力增加修改 /etc/fstab 永久 👑 開機後不再啟用 swap swap 永久禁用關閉 SWAP

SWAP 參數 •Redhat / RockeyLinux 預設會有參數在 GRUB2 上 grubby --info
DEFAULT 查看 grub2 參數 index=0 kernel="/boot/vmlinuz-5.14.0-503.14.1.el9_5.x86_64" args="ro crashkernel=1G-4G:192M,4G-64G:256M,64G-:512M resume=/dev/mapper/rl_rk8--ctrl-swap rd.lvm.lv=rl_rk8-ctrl/root rd.lvm.lv=rl_rk8-ctrl/swap" root="/dev/mapper/rl_rk8--ctrl-root" initrd="/boot/initramfs-5.14.0-503.14.1.el9_5.x86_64.img" title="Rocky Linux (5.14.0-503.14.1.el9_5.x86_64) 9.5 (Blue Onyx)" id="11732e333bc94575b1636210b0a72f03-5.14.0-503.14.1.el9_5.x86_64" 影響開機流程

grubby --update-kernel=ALL --remove-args="resume=/dev/mapper/rl_rk8--ctrl-swap rd.lvm.lv=rl_rk8-ctrl/swap" SWAP 參數 •Redhat / RockeyLinux 預設會有參數在
GRUB2 上 index=0 kernel="/boot/vmlinuz-5.14.0-503.14.1.el9_5.x86_64" args="ro crashkernel=1G-4G:192M,4G-64G:256M,64G-:512M resume=/dev/mapper/rl_rk8--ctrl-swap rd.lvm.lv=rl_rk8-ctrl/root rd.lvm.lv=rl_rk8-ctrl/swap" root="/dev/mapper/rl_rk8--ctrl-root" initrd="/boot/initramfs-5.14.0-503.14.1.el9_5.x86_64.img" title="Rocky Linux (5.14.0-503.14.1.el9_5.x86_64) 9.5 (Blue Onyx)" id="11732e333bc94575b1636210b0a72f03-5.14.0-503.14.1.el9_5.x86_64" 移除 grub2 參數影響開機流程

https://www.digitimes.com.tw/tech/dt/n/shwnws.asp?id=0000378513_J7B63ZJC00KCSI4YOD46W

GPU Compute Mode •0: Default (Compute shared mode) 預設，一
次可以執行多個程式 •1: Exclusive Thread (deprecated) 作用與 Exclusive Process 相同 •2: Prohibited 禁止在該卡執行任何計算程式 •3: Exclusive Process 獨佔模式，該卡只能一次執行一個程式 Multi-Process Service (MPS) 將使用這個 Time slicing 將使用這個

GPU Compute Mode Compute Mode The compute mode flag indicates
whether individual or multiple compute applications may run on the GPU. "Default" means multiple contexts are allowed per device. "Exclusive Process" means only one context is allowed per device, usable from multiple threads at a time. "Prohibited" means no contexts are allowed per device (no compute apps). "EXCLUSIVE_PROCESS" was added in CUDA 4.0. Prior CUDA releases supported only one exclusive mode, which is equivalent to "EXCLUSIVE_THREAD" in CUDA 4.0 and beyond. For all CUDA-capable products.

GPU Compute Mode •使用 nvidia-smi 指令設定 nvidia-smi -i 0
-c DEFAULT nvidia-smi -i 0 -c EXCLUSIVE_PROCESS 第幾張卡， index 從零開始 Compute Mode

https://www.youtube.com/watch?v=Di1hgIQhiG0

GPU可以幹嘛？ •可以玩遊戲（誤） •可以跑 LLM •可以跑 AIGC

把玩開源 LLM •Gemma 採用與建立 Gemini 模型時相同的研究成果和技術，開源
LLM 模型 •Ollama https://ollama.com/ •Open webui https://openwebui.com/

AI 大比拼

評選標準 •GPU 卡跑得動 •繁體中文 •通用性佳

選手介紹 Gemma-7B TAIDE-LX-8B MR Breeze-7B TAIDE 計畫是由財團法人
國家實驗研究院 (國研院) 致力於開發符合台灣語言和文化特性的生成式人工智慧對話引擎模型基於 Llama3 開發並建置聯發創新基地 (MediaTek Research) 基於 Mistral-7B 開發出開源的 MediaTek Research Breeze-7B 模型採用與建立 Gemini 模型時相同的研究成果和技術，開源 LLM 模型 https://huggingface.co/chienweichang/Llama3-TAIDE-LX-8B-Chat-Alpha1-GGUF https://huggingface.co/MediaTek-Research/Breeze-7B-Instruct-v1_0 https://huggingface.co/google/gemma-7b

Q https://soco-st.com/18158

地端自建 Kubernetes (K8s) 小宇宙 (On-premises Kubernet...

地端自建 Kubernetes (K8s) 小宇宙 (On-premises Kubernetes) @ CNTUG 2024/11 Meetup #63

More Decks by Johnny Sung

Other Decks in Technology

Featured

Transcript