Slide 1

Slide 1 text

KUBERNETES API PRIORITY AND FAIRNESS KUBERNETES MEETUP TOKYO #63 AYA IGARASHI @LADICLE

Slide 2

Slide 2 text

WHO AM I? AYA IGARASHI, @LADICLE SWE, CLOUDNATIX INC.

Slide 3

Slide 3 text

KUBERNETES API PRIORITY AND FAIRNESS WHAT IS APF? APF is a mechanism to protect API servers against CPU and memory overloads KEP-1040: https://github.com/kubernetes/enhancements/issues/1040 2019/05 Alpha v1.18 2020/03 Beta v1.20 2020/10 GA v1.29 2023/12

Slide 4

Slide 4 text

WHAT'S THE PROBLEM? Create a new Pod Get 100 Pods! NOISY NEIGHBOR BLOCKS CRITICAL REQUESTS Get 100 Pods! Get 600 Secrets! Get 800 Pods! Mr. Priority max-requests-in f light: 4 max-mutating-requests-in f light: 2

Slide 5

Slide 5 text

Get Secret Get 100 Pods! WHAT'S THE PROBLEM? NOISY NEIGHBOR BLOCKS CRITICAL REQUESTS Get 100 Pods! Get 600 Secrets! Get 800 Pods! Mr. Priority max-requests-in f light: 4 max-mutating-requests-in f light: 2

Slide 6

Slide 6 text

HOW TO SOLVE THE PROBLEM? CREATE LANES BASED ON REQUEST PRIORITY Get Secret Get 100 Pods! Get 100 Pods! Get 600 Secrets! Get 800 Pods! Mr. Priority

Slide 7

Slide 7 text

APF Objects

Slide 8

Slide 8 text

FlowSchema API requests are mapped to a f low schema based on properties. TWO RESOURCES, FLOW CONTROL AND PRIORITY LEVEL CONFIGURATION, MAKE UP APF RESOURCE OVERVIEW PriorityLevelCon f iguration This resource con f igures the proportion of requests allowed for the priority level.

Slide 9

Slide 9 text

FlowSchema API requests are mapped to a f low schema based on properties. TWO RESOURCES, FLOW CONTROL AND PRIORITY LEVEL CONFIGURATION, MAKE UP APF RESOURCE OVERVIEW PriorityLevelCon f iguration PLC con f igures the proportion of requests allowed for the priority level.

Slide 10

Slide 10 text

REQUEST HANDLING FLOW catch-all exempt workload- high catch-all exempt global- default queues Request from kube-scheduler queues kube- scheduler global- default queues FlowSchema PriorityLevelCon f iguration

Slide 11

Slide 11 text

REQUEST HANDLING FLOW catch-all exempt workload- high catch-all exempt global- default queues queues kube- scheduler global- default system:kube- scheduler Precedence: 800 All Precedence: 9900 All Precedence: 10000 system:masters Precedence: 1 First small match wins queues

Slide 12

Slide 12 text

REQUEST HANDLING FLOW catch-all exempt workload- high catch-all exempt global- default queues queues kube- scheduler global- default queues In- f Queue over f return 429 error 1. Dispatch 2. Enqueue 3. Reject

Slide 13

Slide 13 text

REQUEST HANDLING FLOW catch-all exempt workload- high catch-all exempt global- default queues queues kube- scheduler global- default queues Request from normal user In- f (e.g., 38 < 49)

Slide 14

Slide 14 text

REQUEST HANDLING FLOW catch-all exempt workload- high catch-all exempt global- default queues queues kube- scheduler global- default queues Request from system:masters Always Be Dispatched Immediately system:masters Precedence: 1

Slide 15

Slide 15 text

1. Dispatch REQUEST HANDLING FLOW catch-all exempt workload- high catch-all exempt global- default queues queues kube- scheduler global- default queues In- f 2. Reject Fallback to All Precedence: 9900 All Precedence: 10000

Slide 16

Slide 16 text

FLOW SCHEMA 1. Matches a request i ff at least one member of the rules matches the request. 2. Choose one among the matching FlowSchema with the smallest value.

Slide 17

Slide 17 text

PRIORITY LEVEL CONFIGURATION

Slide 18

Slide 18 text

Request Isolations

Slide 19

Slide 19 text

HOW DOES THE APF PROTECT THE K8S? SERVER CONCURRENCY LIMIT max-requests-in f light max-mutating- requests-in f light Server Current Limit

Slide 20

Slide 20 text

TWO LEVEL OF REQUEST ISOLATION HOW DOES THE APF PROTECT THE K8S? Request Isolation Concurrency Isolation PL3 Queue 1 Queue 2 Queue 3 PL1 PL2 User A Namespace B PL

Slide 21

Slide 21 text

Concurrency Limit

Slide 22

Slide 22 text

CONCURRENCY LIMIT PRIORITYLEVELCONFIGURATION IS NOT SPECIFIED IN AN ABSOLUTE NUMBER OF LIMITS. ServerCL = max-requests-in f light + max-mutating-requests-in f light SumNCS = SUM[LimitedPLC k] NCS(k) NominalCL(NCL) = ceil(ServerCL * NCS / SumNCS) LendableCL(LCL) = round(NCL * LP / 100) BorrowCL(BCL) = round(NCL * BLP / 100) Default: 400 + 200 = 600 NCL LCL BCL Borrow Lend MinCL MaxCL (Subject to change)

Slide 23

Slide 23 text

MinCurrentCL (MinCCL) = max( MinCL, min(NominalCL, HighSeatDemand)) LowerBoundSum (LBSum) = Sum[Limited PLC k] MinCCL(i) CurrentCL = MinCCL * RemainingServerCL / LBSum — LBSum >= RemainingServerCL min( MaxCL, max(MinCCL, FairProp * max(minCCL, SmoothSD))) — else CONCURRENCY LIMIT CURRENT CONCURRENCY LIMITS ARE ADJUSTED EVERY 10 SECONDS Total CCL is approximately equal to ServerCL. EnvelopeSeatDemand (ESD) = AvgSeatDemand + StDevSeatDemand SmoothSeatDemand (SSD) = max(ESD, A* SmoothSD + (1-A)*ESD) NOTE: A=0.977 means that the half-life of exponential decay is about 5 mins.

Slide 24

Slide 24 text

REQUEST EXECUTING COST THE REQUEST TAKES UP MORE THAN ONE SEAT Create a Pod Current Limit Seats Request Request List 800 Pods!

Slide 25

Slide 25 text

MONITORING APF LOTS OF METRICS RELATED TO APF ARE PROVIDED

Slide 26

Slide 26 text

MONITORING APF METRICS CAN HELP YOU DETERMINE IF YOUR SETTINGS IS UNSUITABLE apiserver_ f lowcontrol_ current_executing_requests apiserver_ f lowcontrol_ current_executing_seats apiserver_ f lowcontrol_ current_limit_seats apiserver_ f lowcontrol_ current_inqueue_requests Request Request Queue

Slide 27

Slide 27 text

MONITORING APF INVESTIGATE THE CAUSE OF 429 ERRORS BY APF Concurrency-limit Queue-full Time-out Cancel apiserver_ f lowcontrol_ request_wait_duration_seconds apiserver_ f lowcontrol_ rejected_requests_total

Slide 28

Slide 28 text

Tweak APF Settings

Slide 29

Slide 29 text

CAUTION: MISCONFIGURED APF SETTINGS CAN RESULT IN WORKLOAD DISRUPTIONS PREVENT REQUEST DROPPING (429 ERR) Isolate important/noisy requests Allocate more capacity to requests Increase the PLC's concurrency limit, or map the FS to a higher capacity PLC. NOTE: Consider f ixing the noisy workload f irst. Earn more seats means starving others' seats. Add a new FlowSchema (FS) and PriorityLevelCon f iguration (PLC). Then, reduce the capacity of the existing PLC by that amount. ServerCL ServerCL

Slide 30

Slide 30 text

DEFAULT APF CONFIGURATIONS API Server maintains two kinds of default objects https://github.com/kubernetes/apiserver/blob/release-1.29/pkg/ apis/ f lowcontrol/bootstrap/default.go Mandatory Objects: FS & PLC: exempt & catch-all Suggested Objects (modi f iable): FS: system-nodes, system-leader-election, etc. PLC: system, leader-election, workload-high, etc. NOTE: If the object has apf.kubernetes.io/autoupdate- spec annotation and its value is true, the API server periodically maintains the object.

Slide 31

Slide 31 text

CASE STUDY: AWS EKS SOME FLOW SCHEMAS FOR EKS ADDED; NO CHANGES IN PRIORITY LEVEL CONFIGURATIONS Global Default: NCS=20 Workload High: NCS=40 Global Default: Precedence=9900

Slide 32

Slide 32 text

APF is a mechanism to protect API servers against overload Before tweaking APF settings, consider f ixing your workload f irst When getting frequent 429 errors due to f low control, review APF-related metrics RECAP

Slide 33

Slide 33 text

REFERENCES KEP-1040 Priority and Fairness (Feb 11) https://github.com/kubernetes/enhancements/tree/master/keps/sig-api-machinery/1040-priority-and-fairness O ff icial Document: Flow Control (v1.29) https://kubernetes.io/docs/concepts/cluster-administration/ f low-control/ Kubernetes API Reference (v1.29 FlowSchema/PriorityLevelCon f iguration v1) https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.29/ Codebase (b041969) https://github.com/kubernetes/apiserver/tree/release-1.29/pkg/util/ f lowcontrol API Priority and Fairness by Containers from the Couch - YouTube https://www.youtube.com/watch?v=YnPPHBawhE0 Shu ff le Sharding: Massive and Magical Fault Isolation https://aws.amazon.com/blogs/architecture/shu ff le-sharding-massive-and-magical-fault-isolation/