Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Kubernetes: API Priority and Fairness

Kubernetes: API Priority and Fairness

Aya (Igarashi) Ozawa

February 13, 2024
Tweet

More Decks by Aya (Igarashi) Ozawa

Other Decks in Technology

Transcript

  1. KUBERNETES API PRIORITY AND FAIRNESS WHAT IS APF? APF is

    a mechanism to protect API servers against CPU and memory overloads KEP-1040: https://github.com/kubernetes/enhancements/issues/1040 2019/05 Alpha v1.18 2020/03 Beta v1.20 2020/10 GA v1.29 2023/12
  2. WHAT'S THE PROBLEM? Create a new Pod Get 100 Pods!

    NOISY NEIGHBOR BLOCKS CRITICAL REQUESTS Get 100 Pods! Get 600 Secrets! Get 800 Pods! Mr. Priority max-requests-in f light: 4 max-mutating-requests-in f light: 2
  3. Get Secret Get 100 Pods! WHAT'S THE PROBLEM? NOISY NEIGHBOR

    BLOCKS CRITICAL REQUESTS Get 100 Pods! Get 600 Secrets! Get 800 Pods! Mr. Priority max-requests-in f light: 4 max-mutating-requests-in f light: 2
  4. HOW TO SOLVE THE PROBLEM? CREATE LANES BASED ON REQUEST

    PRIORITY Get Secret Get 100 Pods! Get 100 Pods! Get 600 Secrets! Get 800 Pods! Mr. Priority
  5. FlowSchema API requests are mapped to a f low schema

    based on properties. TWO RESOURCES, FLOW CONTROL AND PRIORITY LEVEL CONFIGURATION, MAKE UP APF RESOURCE OVERVIEW PriorityLevelCon f iguration This resource con f igures the proportion of requests allowed for the priority level.
  6. FlowSchema API requests are mapped to a f low schema

    based on properties. TWO RESOURCES, FLOW CONTROL AND PRIORITY LEVEL CONFIGURATION, MAKE UP APF RESOURCE OVERVIEW PriorityLevelCon f iguration PLC con f igures the proportion of requests allowed for the priority level.
  7. REQUEST HANDLING FLOW catch-all exempt workload- high catch-all exempt global-

    default queues Request from kube-scheduler queues kube- scheduler global- default queues FlowSchema PriorityLevelCon f iguration
  8. REQUEST HANDLING FLOW catch-all exempt workload- high catch-all exempt global-

    default queues queues kube- scheduler global- default system:kube- scheduler Precedence: 800 All Precedence: 9900 All Precedence: 10000 system:masters Precedence: 1 First small match wins queues
  9. REQUEST HANDLING FLOW catch-all exempt workload- high catch-all exempt global-

    default queues queues kube- scheduler global- default queues In- f Queue over f return 429 error 1. Dispatch 2. Enqueue 3. Reject
  10. REQUEST HANDLING FLOW catch-all exempt workload- high catch-all exempt global-

    default queues queues kube- scheduler global- default queues Request from normal user In- f (e.g., 38 < 49)
  11. REQUEST HANDLING FLOW catch-all exempt workload- high catch-all exempt global-

    default queues queues kube- scheduler global- default queues Request from system:masters Always Be Dispatched Immediately system:masters Precedence: 1
  12. 1. Dispatch REQUEST HANDLING FLOW catch-all exempt workload- high catch-all

    exempt global- default queues queues kube- scheduler global- default queues In- f 2. Reject Fallback to All Precedence: 9900 All Precedence: 10000
  13. FLOW SCHEMA 1. Matches a request i ff at least

    one member of the rules matches the request. 2. Choose one among the matching FlowSchema with the smallest value.
  14. HOW DOES THE APF PROTECT THE K8S? SERVER CONCURRENCY LIMIT

    max-requests-in f light max-mutating- requests-in f light Server Current Limit
  15. TWO LEVEL OF REQUEST ISOLATION HOW DOES THE APF PROTECT

    THE K8S? Request Isolation Concurrency Isolation PL3 Queue 1 Queue 2 Queue 3 PL1 PL2 User A Namespace B PL
  16. CONCURRENCY LIMIT PRIORITYLEVELCONFIGURATION IS NOT SPECIFIED IN AN ABSOLUTE NUMBER

    OF LIMITS. ServerCL = max-requests-in f light + max-mutating-requests-in f light SumNCS = SUM[LimitedPLC k] NCS(k) NominalCL(NCL) = ceil(ServerCL * NCS / SumNCS) LendableCL(LCL) = round(NCL * LP / 100) BorrowCL(BCL) = round(NCL * BLP / 100) Default: 400 + 200 = 600 NCL LCL BCL Borrow Lend MinCL MaxCL (Subject to change)
  17. MinCurrentCL (MinCCL) = max( MinCL, min(NominalCL, HighSeatDemand)) LowerBoundSum (LBSum) =

    Sum[Limited PLC k] MinCCL(i) CurrentCL = MinCCL * RemainingServerCL / LBSum — LBSum >= RemainingServerCL min( MaxCL, max(MinCCL, FairProp * max(minCCL, SmoothSD))) — else CONCURRENCY LIMIT CURRENT CONCURRENCY LIMITS ARE ADJUSTED EVERY 10 SECONDS Total CCL is approximately equal to ServerCL. EnvelopeSeatDemand (ESD) = AvgSeatDemand + StDevSeatDemand SmoothSeatDemand (SSD) = max(ESD, A* SmoothSD + (1-A)*ESD) NOTE: A=0.977 means that the half-life of exponential decay is about 5 mins.
  18. REQUEST EXECUTING COST THE REQUEST TAKES UP MORE THAN ONE

    SEAT Create a Pod Current Limit Seats Request Request List 800 Pods!
  19. MONITORING APF METRICS CAN HELP YOU DETERMINE IF YOUR SETTINGS

    IS UNSUITABLE apiserver_ f lowcontrol_ current_executing_requests apiserver_ f lowcontrol_ current_executing_seats apiserver_ f lowcontrol_ current_limit_seats apiserver_ f lowcontrol_ current_inqueue_requests Request Request Queue
  20. MONITORING APF INVESTIGATE THE CAUSE OF 429 ERRORS BY APF

    Concurrency-limit Queue-full Time-out Cancel apiserver_ f lowcontrol_ request_wait_duration_seconds apiserver_ f lowcontrol_ rejected_requests_total
  21. CAUTION: MISCONFIGURED APF SETTINGS CAN RESULT IN WORKLOAD DISRUPTIONS PREVENT

    REQUEST DROPPING (429 ERR) Isolate important/noisy requests Allocate more capacity to requests Increase the PLC's concurrency limit, or map the FS to a higher capacity PLC. NOTE: Consider f ixing the noisy workload f irst. Earn more seats means starving others' seats. Add a new FlowSchema (FS) and PriorityLevelCon f iguration (PLC). Then, reduce the capacity of the existing PLC by that amount. ServerCL ServerCL
  22. DEFAULT APF CONFIGURATIONS API Server maintains two kinds of default

    objects https://github.com/kubernetes/apiserver/blob/release-1.29/pkg/ apis/ f lowcontrol/bootstrap/default.go Mandatory Objects: FS & PLC: exempt & catch-all Suggested Objects (modi f iable): FS: system-nodes, system-leader-election, etc. PLC: system, leader-election, workload-high, etc. NOTE: If the object has apf.kubernetes.io/autoupdate- spec annotation and its value is true, the API server periodically maintains the object.
  23. CASE STUDY: AWS EKS SOME FLOW SCHEMAS FOR EKS ADDED;

    NO CHANGES IN PRIORITY LEVEL CONFIGURATIONS Global Default: NCS=20 Workload High: NCS=40 Global Default: Precedence=9900
  24. APF is a mechanism to protect API servers against overload

    Before tweaking APF settings, consider f ixing your workload f irst When getting frequent 429 errors due to f low control, review APF-related metrics RECAP
  25. REFERENCES KEP-1040 Priority and Fairness (Feb 11) https://github.com/kubernetes/enhancements/tree/master/keps/sig-api-machinery/1040-priority-and-fairness O ff

    icial Document: Flow Control (v1.29) https://kubernetes.io/docs/concepts/cluster-administration/ f low-control/ Kubernetes API Reference (v1.29 FlowSchema/PriorityLevelCon f iguration v1) https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.29/ Codebase (b041969) https://github.com/kubernetes/apiserver/tree/release-1.29/pkg/util/ f lowcontrol API Priority and Fairness by Containers from the Couch - YouTube https://www.youtube.com/watch?v=YnPPHBawhE0 Shu ff le Sharding: Massive and Magical Fault Isolation https://aws.amazon.com/blogs/architecture/shu ff le-sharding-massive-and-magical-fault-isolation/