Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Balancing Kubernetes performance, resilience & cost by using ML-based optimization – a real-world case

Stefano Doni
January 12, 2024
6

Balancing Kubernetes performance, resilience & cost by using ML-based optimization – a real-world case

Properly tuning Kubernetes microservice applications is a daunting task even for experienced Performance Engineers and SREs, often resulting in companies facing reliability and performance issues, as well as unexpected costs.

In this session, we first explain Kubernetes resource management and autoscaling mechanisms and how properly setting pod resources and autoscaling policies is critical to avoid over-provisioning and impacting the bottom line.

We discuss a real-world case of a digital provider of accounting & invoicing services for SMB clients. We demonstrate how ML-based optimization techniques allowed the SRE team and service architects to automatically tune the pod configuration and dramatically reduce the associated Kubernetes cost. We also describe the results of incorporating resilience-related goals in the optimization goals.
Finally, we propose a general approach to tune pods and autoscaling policies for Kubernetes applications.

Presented at CMG IMPACT 2022 conference:
https://www.cmg.org/2021/11/balancing-kubernetes-performance-resilience-cost/

Stefano Doni

January 12, 2024
Tweet

More Decks by Stefano Doni

Transcript

  1. © 2021 Akamas • All Rights Reserved • Confidential Balancing

    Kubernetes performance, resilience & cost by using ML-based optimization – a real-world case CMG IMPACT 2022 Stefano Doni (Akamas)
  2. © 2021 Akamas • All Rights Reserved • Confidential Agenda

    • Kubernetes benefits & challenges • Kubernetes optimization challenge • ML-powered optimization: a real-world case • Key takeaways Stefano Doni CTO at Akamas 15+ years in performance engineering 2015 CMG Best Paper Award Winner
  3. © 2021 Akamas • All Rights Reserved • Confidential 90%+

    new apps run in Kubernetes containers Source: CNCF Survey 2020 (https://www.cncf.io/wp-content/uploads/2020/12/CNCF_Survey_Report_2020.pdf)
  4. © 2021 Akamas • All Rights Reserved • Confidential Key

    Kubernetes benefits … Source: Portworx Adoption Survey 2021 (https://www.purestorage.com/content/dam/pdf/en/analyst-reports/ar-portworx-pure-storage-2021-kubernetes-adopt ion-survey.pdf)
  5. © 2021 Akamas • All Rights Reserved • Confidential Kubernetes

    Failure Stories (https://k8s.af) • Performance issues (high latency, CPU throttling) ◦ Airbnb, Buffer, Omio, Zalando, Civis Analytics, Target, Adevinta, Algolia • Stability issues (OOM kills) ◦ AirBnb, Blue Matador, Zalando, Datadog, NU.nl, Yahoo, Nordstrom … with some performance & reliability issues… https://youtu.be/QXApVwRBeys https://www.youtube.com/watch?v=4CT0cI62YHk
  6. © 2021 Akamas • All Rights Reserved • Confidential …

    and growing Kubernetes-related costs Source: Kubernetes FinOps Report, 2021 June (https://www.cncf.io/wp-content/uploads/2021/06/FINOPS_Kubernetes_Report.pdf) “For enterprises and startups alike, cloud and Kubernetes-related bills are going up. Over the past year, 68% of respondents reported Kubernetes costs on the uptick; just 12% have lowered their Kubernetes expenses, while 20% have managed to keep costs more or less constant. Among those whose spend increased, half saw it jump more than 20% during the year.”
  7. © 2021 Akamas • All Rights Reserved • Confidential Pod

    A Pod B Fact #1: Resource requests determine K8s cluster capacity CPU Memory • Requests are resources the container is guaranteed to get • Cluster capacity is based on pod resource requests - there is no overcommitment! • Resource requests is not equal to utilization: a cluster can be full even if utilization is 1% Node (4 CPU, 8 GB Memory) Resource Requests from Pod Manifest Pod A 2 cores 2GB Memory Pod A apiVersion: v1 kind: Pod metadata: name: Pod A spec: containers: - name: app image: nginx:1.1 resources: requests: memory: “2Gi” cpu: “2” 2 4 2 4 6 8 Pod B
  8. © 2021 Akamas • All Rights Reserved • Confidential Fact

    #2: Resource limits may strongly impact application performance and stability • A container can consume more resources than it has requested • Resource limits allow to specify the maximum resources a container can use (e.g. CPU = 2) • When a container hits its resource limits bad things can happen Container CPU limit Container Memory limit K8s throttle container CPU -> Application performance slowdown When hitting Memory Limits When hitting CPU Limits K8s kills the container -> Application stability issues X CPU Usage Memory Usage
  9. © 2021 Akamas • All Rights Reserved • Confidential Fact

    #3: CPU limits may disrupt service performance, even when CPU used << limit https://erickhun.com/posts/kubernetes-faster-services- no-cpu-limits Limits restrict CPU speed at very low CPU usage (~30%) 22x faster services with no CPU limits (Buffer Inc.) Source: Akamas Research no CPU limit CPU slow down Service Latency Time
  10. © 2021 Akamas • All Rights Reserved • Confidential Fact

    #4: Setting resource requests and limits is required to ensure Kubernetes stability “While your Kubernetes cluster might work fine without setting resource requests and limits, you will start running into stability issues as your teams and projects grow” (Google, Kubernetes best practices) https://cloud.google.com/blog/products/containers-kubernetes/ kubernetes-best-practices-resource-requests-and-limits
  11. © 2021 Akamas • All Rights Reserved • Confidential Fact

    5#: Kubernetes autoscalers do not address service reliability and cost efficiency VPA off VPA on VPA causes service reliability SLO to fail VPA reduces resources requests (& limits) 2 3 VPA off VPA on 2 2 3 Container looks over-provisioned (CPU util < 50%) SLO: response time (p90) < 16ms 1 1
  12. © 2021 Akamas • All Rights Reserved • Confidential A

    real-world case European leader in accounting, payroll & business management software 1,7M users 400M invoices / year Digital Service App Digital Service App Digital Service App frequent updates dictated by business & regulatory compliance Azure Kubernetes Service (AKS) B2B Stateless Authorization Service AWS Elastic Kubernetes Service (EKS)
  13. © 2021 Akamas • All Rights Reserved • Confidential Optimization

    goals & constraints GOAL: MINIMIZE application_cost transaction_throughput > baseline - 10% AND transaction_errore_rate < baseline + 10% AND transaction_response_time < baseline +10% CONSTRAINTS: B2B Stateless Authorization Service Azure Container Instances pricing Azure Kubernetes Service (AKS)
  14. © 2021 Akamas • All Rights Reserved • Confidential Optimization

    parameters B2B Stateless Authorization Service TUNABLE PARAMETERS 9
  15. © 2021 Akamas • All Rights Reserved • Confidential High-level

    optimization architecture & process 3. Collect KPIs 4. Score vs Goal Target System 2. Apply Workload Automated Workflow 1. Apply Configuration 5. AI-powered optimization B2B Stateless Authorization Service
  16. © 2021 Akamas • All Rights Reserved • Confidential Optimization

    results & decision support Configuration #19 Configuration #17 COST RESPONSE TIME THROUGHPUT REPLICAS
  17. © 2021 Akamas • All Rights Reserved • Confidential Load

    testing scenario & script The load scenario was designed to replicate the daily behavior in a 30m time window: • ramp-up [3m] • steady state with 150 Users and ~1200 requests/s [~12m] corresponding to productive hours of the day • steady state with 8 Users and ~65 requests/s [~14m] corresponding to out of working hours The testing script was also designed to respect user distribution (as provided by a dataset is composed by 20K unique credentials) the API calls distribution (as calculated from production log analysis) with each API call delayed from the previous one by a random pause (think time) between 250ms and 750ms. A B C A B C
  18. © 2021 Akamas • All Rights Reserved • Confidential Baseline

    configuration REQUESTS = 1.5 cores REQUESTS = 3.42 GB POD LIMITS = 2 cores LIMITS = 4.39 GB MAX HEAP = 4 GB MIN HEAP = 0.51 GB BASELINE
  19. © 2021 Akamas • All Rights Reserved • Confidential Autoscaling

    settings • Scalers: CPU and Memory • Triggering thresholds ◦ CPU: 70% average ◦ Memory: 90% average • Evaluation periods ◦ Out: 1 minute ◦ In: 5 minutes REQUESTS = 1.5 cores REQUESTS = 3.42 GB scale out REPLICA POD SCALE OUT = 1.05 cores SCALE OUT = 3.1 GB SCALING POLICIES
  20. © 2021 Akamas • All Rights Reserved • Confidential Baseline

    configuration - behaviour When load drops the number of replicas does not scale down, despite low resource use (CPU) - this clearly impacts cloud bill Response time peak breaching service reliability SLO due to high CPU usage and throttling during the scale out (JVM startup) 2 1 2 1 response time SLO
  21. © 2021 Akamas • All Rights Reserved • Confidential CONFIGURATION

    #34 (after 19h) AI-driven results: Best (Lowest Cost) conf BEST BASELINE -49.1%
  22. © 2021 Akamas • All Rights Reserved • Confidential Best

    vs Baseline confs - behaviour Response time always within SLO - no peaks 2 2 Autoscaling not triggered - full load sustained by 1 replica 1 1 BEST (LOWEST COST) BASELINE response time SLO response time SLO
  23. © 2021 Akamas • All Rights Reserved • Confidential Best

    vs Baseline confs - analysis REQUESTS = 1.5 cores REQUESTS = 3.42 GB POD LIMITS = 2 cores LIMITS = 4.39 GB MAX HEAP = 4 GB MIN HEAP = 0.51 GB REQUESTS = 2.77 cores REQUESTS = 5.08 GB POD LIMITS = 3.67 cores LIMITS = 5.16 GB MAX HEAP = 4.76 GB MIN HEAP = 4.36 GB BEST (LOWEST COST) BASELINE larger pod (higher fixed cost but less scaling) + container & runtime aligned conf
  24. © 2021 Akamas • All Rights Reserved • Confidential AI-driven

    results: Best Reliability configuration BEST BASELINE CONFIGURATION #14 (after 8h) -15.9% HIGH RELIABILITY
  25. © 2021 Akamas • All Rights Reserved • Confidential AI-driven

    results: High Reliability conf Response time peak is 2x lower 1 After peak replicas are scaled back to 1 2 1 2 HIGH RESILIENCE BASELINE response time SLO response time SLO
  26. © 2021 Akamas • All Rights Reserved • Confidential High

    Resilience vs Baseline confs - analysis REQUESTS = 1.5 cores REQUESTS = 3.42 GB POD LIMITS = 2 cores LIMITS = 4.39 GB MAX HEAP = 4 GB MIN HEAP = 0.51 GB REQUESTS = 1.17 cores REQUESTS = 5.6 GB POD LIMITS = 3.7 cores LIMITS = 5.69 GB MAX HEAP = 3.45 GB MIN HEAP = 1.94 GB HIGH RESILIENCE BASELINE higher memory requests and lower CPU requests (but higher limits) than baseline
  27. © 2021 Akamas • All Rights Reserved • Confidential Key

    takeaways the interplay between the different application layers and technologies requires tuning the full-stack configuration to make sure that both the optimization goals and SLOs are matched the vastness of the configuration space can only be effectively explored by leveraging automated ML-based methods capable of converging to optimal configuration within hours the complexity of modern and dynamic nature of applications under varying workloads and releases requires continuous performance tuning - in addition to utilization-based scaling Full Stack Continuous ML-powered the complexity of modern applications and delivery practices require a new approach
  28. Contacts [email protected] @AkamasLabs @akamaslabs Italy HQ Via Schiaffino 11 Milan,

    20158 +39-02-4951-7001 USA East 211 Congress Street Boston, MA 02110 +1-617-936-0212 Singapore 5 Temasek Blvd Singapore 038985 USA West 12130 Millennium Drive Los Angeles, CA 90094 +1-323-524-0524 LinkedIn Twitter Email © 2021 Akamas • All Rights Reserved • Confidential akamas.io Website