20230615 Kubernetes Scalable Workloads

KUBERNETES SCALABLE WORKLOADS Phil Huang Cloud Native Taiwan User Group
Member CNCF Ambassador

PHIL HUANG • Open Source Community • Cloud Native Taiwan
User Group Member • CNCF Ambassador • Work at • Senior Cloud Solution Architect, Microsoft Azure • blog.pichuang.com.tw

2023/06 UPDATE OF CLOUD NATIVE TAIWAN USER GROUP Cloud Native
Glossary • Goal: Complete the minimum required translation quantity before KubeCon NA 2023 • Help needed: Join Slack, commit code, and submit your PR Kubernetes Community Days Taiwan 2023 • Goal: Continuously hold an annual community-driven event to enhance Taiwan's international visibility • Help needed: Offer online registration and physical participation in the event at 7/29 - 7/30. Sustainability of Technical Advisory Group (TAG) • Goal: Host the Cloud Native Sustainability event in collaboration with CNCF Global Week during Oct. • Help needed: Seeking 1 ~ 2 speaker who is willing to share a sustainability-related session CNTUG Volunteers for CNCF Global Engagement • Goal: We aim to enhance the visibility of Taiwan community's participation in the international community • Help needed: Actively participate in local community events, including giving talks, sharing articles, and engaging in international events 3 Ref: https://www.cncf.io/all-cncf/

4 • What is Kubernetes SIG (Specialist Interest Group)? •
SIG are persistent open groups that focus on a part of the project. Kubernetes SIG (Specialist Interest Group) Name Autoscaling SIG Scalability SIG Purpose Covers deployment and maintenance of components for automated scaling in Kubernetes Coordinate and contribute to general system-wide scalability and performance improvements by driving large architectural changes and finding bottlenecks Subprojects • <NEW!> balancer • <NEW!> MPA • addon-sizer • cluster-autoscaler (CA) • horizontal-pod-autoscaler (HPA) • vertical-pod-autoscaler (VPA) • kubernetes-scalability-and-performance-tests-and- validation • kubernetes-scalability-bottlenecks-detection • kubernetes-scalability-definition • kubernetes-scalability-governance • kubernetes-scalability-test-frameworks Target Primarily app developers who require development on Kubernetes Technical Architects responsible for determining the overall technical architecture of Kubernetes

KUBERNETES SCALABILITY

6 Scalability is NOT a single number Scalability Envelope (可擴展性範圍)
Ref: https://github.com/kubernetes/community/blob/master/sig-scalability/configs-and-limits/thresholds.md 1. It has boundaries. 2. As you move farther along one dimension (e.g. #Pods/Node), your cross-section wart other dimensions get smaller. 3. Decomposable into smaller envelopes • #Nodes vs. #Pods/Node • #Namespace vs. #Namespace/Services • ...

7 No More Than 150,000 #Pods Threshold for Cluster Scope
Ref: https://kubernetes.io/docs/setup/best-practices/cluster-large/ Consideration for Large Cluster SIG-Scalability Thresholds Azure Kubernetes Service (with Azure CNI) Red Hat OpenShift Container Platform v4 Updated 2017/8/4 2020/2/25 2023/3/29 2023/05/23 k8s version v1.27 v1.26 v4.13 (k8s v1.26) #Pods 150,000 150,000 150,000 #Pods/Node < 110 min(110, 10*#cores) default 30, MAX 250 default 250, MAX 500 #Nodes 5,000 5,000 5,000 2,000 #Containers 300,000 #Services 10,000 10,000 • Personal estimation recommendation • No more than 150,000 #Pods • No more than 10,000 #Services • Choose either 110 or 250 as the benchmark for #Pods/Node • 100 #Nodes can be used as an estimation target if there are no specific requirements

8 Decision Sequence Parameters Criteria Smallest Subnet Prefix (Usable IP)
Default Subnet Prefix (Usable IP) 1 #Node 100 #Nodes /25 (126) Depends 2 #Pods/Node 250 /24 (254) /24 (254) 3 #Pod min(150,000, #Node * #Pods/Node) /14 (262,142) /14 (262,142) 4 #Service No more than 10,000 #Services /18 (16,382) /16 (65,534) /14 for Pod CIDR is Reasonable Design Kubernetes Network Planning • In reality, the boundaries that are relatively easier for you to grasp 1. Underlay and overlay network Planning 2. VM or BM Instance Size (CPU/Memory/Disk)

9 Kubernetes Scalability is NOT equivalent to #Nodes #Nodes vs.
#Pods/Node # Nodes # Namespaces Pod Churn # Pods/node # Services # Secrets # Backends/service # Net LBs # Ingresses 5k 250 # Pods/Node # Nodes 30 600 API Server starts getting overloaded past this point Kubelet start getting overloaded past this point Assume #Pods = 150,000 • In reality, #Nodes is MORE IMPORTANT than #Pods/Node, primarily due to cost considerations. Typical misconception 100 110 Safety Capacity

10 Sizing of Control Plane will AFFECT #Nodes it can
handle Scale Estimation VMware Tanzu Kubernetes Grid Red Hat OpenShift Container Platform Etcd Hardware Recommendations #Nodes CPU Memory CPU Memory CPU Memory Up to 10 Worker Nodes 2 8 Up to 25 Worker Nodes 4 16 Up to 50 Worker Nodes 2 8 Up to 100 Worker Nodes 4 16 8 32 Up to 250 Worker Nodes 8 32 16 96 4 16 Up to 500 Worker Nodes 16 64 Up to 1,000 Worker Nodes 8 32 Up to 3,000 Worker Nodes 16 64 • Why is 100 Nodes the recommended estimation benchmark? • When making initial scale estimations, it is preferable to focus on sizing the control plane nodes for better control and management • The sizing of control plane should be bigger than the benchmark; it should not be smaller

11 Kubernetes Scalability and Performance SLI/SLOs Suggest self-build solutions are
required for reference • SLI: Measured as 99th percentile over last 5 minutes (在過去 5 分鐘內第 99 個百分位數) • SLO: 99th percentile per cluster-day <= ?s (可收集資料的範圍第 99 個百分位數需低於 ? 秒) Ref: https://github.com/kubernetes/community/blob/master/sig-scalability/slos/slos.md#steady-state-slisslos Status Service Level Indicators (SLI) Service Level Objective (SLO) Official Latency of processing mutating API calls for single objects for every (resource, verb) pair, measured as 99th percentile over last 5 minutes In default Kubernetes installation, for every (resource, verb) pair, excluding virtual and aggregated resources and Custom Resource Definitions, 99th percentile per cluster-day1 <= 1s Official Latency of processing non-streaming read-only API calls for every (resource, scope) pair, measured as 99th percentile over last 5 minutes In default Kubernetes installation, for every (resource, scope) pair, excluding virtual and aggregated resources and Custom Resource Definitions, 99th percentile per cluster-day1 (a) <= 1s if scope=resource (b) <= 30s otherwise (if scope=namespace or scope=cluster) Official Startup latency of schedulable stateless pods, excluding time to pull images and run init containers, measured from pod creation timestamp to when all its containers are reported as started and observed via watch, measured as 99th percentile over last 5 minutes In default Kubernetes installation, 99th percentile per cluster-day1 <= 5s

KUBERNETES AUTOSCALING

Know Your Capacity Before discussing autoscaling Ref: https://learnk8s.io/kubernetes-instance-calculator # Pods/Node:
Define how many Pods can be placed on 1 instance. If you are using Cloud Provider service, you may be primarily concerned about resource.request.*, because of the east of expansion If you are using on-premise resource, cost sensitive or you don't know how to sizing, you may be primarily concerned about resource.limit.* to make sure the upper bound. VM (or BM) Instance Size: Define how much resources are available for 1 instance.

14 One Large Node vs. Multiple Small Nodes Cluster Autoscaler
(CA) is a tool that Automatically Adjusts the size on Kubernetes cluster Ref: https://learnk8s.io/kubernetes-node-size https://github.com/kubernetes/autoscaler/tree/master/cluster-autoscaler Is it possible to use Cluster Autoscaler (CA)? Preferred Multiple Small Node Preferred One Large Nodes • Depends on the situation, I personally recommend that one large Node is better than multiple Small Node if you are in Taiwan. Yes No • Most scenarios are VM environments or Cloud Provider • Easy dynamic adjustment of the #Nodes • The utilization rate will be higher • Most scenarios are BM environments / on- premises / Hardware specific (e.g. GPU/SGX) • Don't want to adjust # Nodes Standard D32s_v5 32 CPU / 128 GB Standard D16s_v5 16 CPU / 64 GB Standard D16s_v5 16 CPU / 64 GB

15 HPA is a little different from what you think
Horizontal Pod Autoscaler (HPA) is a tool that Automatically Adjusts the replicas of applications • How I see and use HPA? 1. For metrics, HPA is based on resource.request.* instead of resource.limit.* 2. For critical service, it should be set resource.request.* == resource.limit.* 3. For stabilization window, you should use autoscaling/v2 (v1.23 marked as stable) instead of autoscaling/v1, because v2 support behavior controller 4. For avoiding flip-flop issues, targetAverageValue (or targetAverageUtilization) should be 50 <= N < 90 • Known Limitations • Does not work with DaemonSets • If Kubernetes is out of capacity, HPA will not work and "CrashLoopBackOff" message will appear. The issue can be solved through Cluster Autoscaler (CA) • Do not use HPA with VPA or others metrics, which may have unknown side effects Ref: https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/ https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.27/#horizontalpodautoscaler-v1-autoscaling https://developer.aliyun.com/article/1061326 Source: kubecost

16 Balancer Kubecon + CloudNativeCon Europe 2023 • Stage: Alpha
• Balancer allows to • Balance Pod distribution across similar deployments • Autoscale Pods from similar deployments together • Use Cases 1. Evenly distribute pods across zone 2. Maintain pods ratio on spot VMs 3. Workload on difference machines Ref: https://www.youtube.com/watch?v=VgTWf4mjyG8&list=PLj6h78yzYM2PyrvCoOii4rAopBswfz1p7&index=292

17 Multi-Dimensional Pod Autoscaler (MPA) Kubecon + CloudNativeCon Europe 2023
• Stage: Alpha • MPA allows to • Coordination between HPA and VPA • Use Cases 1. Bursty application workload requires frequent updates of both resource dimensions to adapt to load spikes without overprovisioning 2. Advanced approaches require MPA to be able to find combined optimal vertical & horizontal scaling Ref: https://youtu.be/VgTWf4mjyG8?t=1147

18 Join Kubernetes Community Days Taiwan 2023 COSCUP 2023 July
29 ~ 30 CNCF's mission is to make cloud native computing ubiquitous.

THANK YOU Phil Huang Cloud Native Taiwan User Group Member
CNCF Ambassador

20230615 Kubernetes Scalable Workloads

20230615 Kubernetes Scalable Workloads

Phil Huang

More Decks by Phil Huang

Other Decks in Technology

Featured

Transcript

KUBERNETES SCALABLE WORKLOADS Phil Huang Cloud Native Taiwan User Group

PHIL HUANG • Open Source Community • Cloud Native Taiwan

2023/06 UPDATE OF CLOUD NATIVE TAIWAN USER GROUP Cloud Native

4 • What is Kubernetes SIG (Specialist Interest Group)? •

KUBERNETES SCALABILITY

6 Scalability is NOT a single number Scalability Envelope (可擴展性範圍)

7 No More Than 150,000 #Pods Threshold for Cluster Scope

8 Decision Sequence Parameters Criteria Smallest Subnet Prefix (Usable IP)

9 Kubernetes Scalability is NOT equivalent to #Nodes #Nodes vs.

10 Sizing of Control Plane will AFFECT #Nodes it can

11 Kubernetes Scalability and Performance SLI/SLOs Suggest self-build solutions are

KUBERNETES AUTOSCALING

Know Your Capacity Before discussing autoscaling Ref: https://learnk8s.io/kubernetes-instance-calculator # Pods/Node:

14 One Large Node vs. Multiple Small Nodes Cluster Autoscaler

15 HPA is a little different from what you think

16 Balancer Kubecon + CloudNativeCon Europe 2023 • Stage: Alpha

17 Multi-Dimensional Pod Autoscaler (MPA) Kubecon + CloudNativeCon Europe 2023

18 Join Kubernetes Community Days Taiwan 2023 COSCUP 2023 July

THANK YOU Phil Huang Cloud Native Taiwan User Group Member