Balancing Kubernetes performance, resilience & cost by using ML-based optimization – a real-world case

© 2021 Akamas • All Rights Reserved • Conﬁdential Balancing
Kubernetes performance, resilience & cost by using ML-based optimization – a real-world case CMG IMPACT 2022 Stefano Doni (Akamas)

© 2021 Akamas • All Rights Reserved • Conﬁdential Agenda
• Kubernetes benefits & challenges • Kubernetes optimization challenge • ML-powered optimization: a real-world case • Key takeaways Stefano Doni CTO at Akamas 15+ years in performance engineering 2015 CMG Best Paper Award Winner

© 2021 Akamas • All Rights Reserved • Conﬁdential Kubernetes
Benefits & Challenges

© 2021 Akamas • All Rights Reserved • Conﬁdential 90%+
new apps run in Kubernetes containers Source: CNCF Survey 2020 (https://www.cncf.io/wp-content/uploads/2020/12/CNCF_Survey_Report_2020.pdf)

© 2021 Akamas • All Rights Reserved • Conﬁdential Key
Kubernetes benefits … Source: Portworx Adoption Survey 2021 (https://www.purestorage.com/content/dam/pdf/en/analyst-reports/ar-portworx-pure-storage-2021-kubernetes-adopt ion-survey.pdf)

Failure Stories (https://k8s.af) • Performance issues (high latency, CPU throttling) ◦ Airbnb, Buffer, Omio, Zalando, Civis Analytics, Target, Adevinta, Algolia • Stability issues (OOM kills) ◦ AirBnb, Blue Matador, Zalando, Datadog, NU.nl, Yahoo, Nordstrom … with some performance & reliability issues… https://youtu.be/QXApVwRBeys https://www.youtube.com/watch?v=4CT0cI62YHk

© 2021 Akamas • All Rights Reserved • Conﬁdential …
and growing Kubernetes-related costs Source: Kubernetes FinOps Report, 2021 June (https://www.cncf.io/wp-content/uploads/2021/06/FINOPS_Kubernetes_Report.pdf) “For enterprises and startups alike, cloud and Kubernetes-related bills are going up. Over the past year, 68% of respondents reported Kubernetes costs on the uptick; just 12% have lowered their Kubernetes expenses, while 20% have managed to keep costs more or less constant. Among those whose spend increased, half saw it jump more than 20% during the year.”

optimization challenge

© 2021 Akamas • All Rights Reserved • Conﬁdential Pod
A Pod B Fact #1: Resource requests determine K8s cluster capacity CPU Memory • Requests are resources the container is guaranteed to get • Cluster capacity is based on pod resource requests - there is no overcommitment! • Resource requests is not equal to utilization: a cluster can be full even if utilization is 1% Node (4 CPU, 8 GB Memory) Resource Requests from Pod Manifest Pod A 2 cores 2GB Memory Pod A apiVersion: v1 kind: Pod metadata: name: Pod A spec: containers: - name: app image: nginx:1.1 resources: requests: memory: “2Gi” cpu: “2” 2 4 2 4 6 8 Pod B

© 2021 Akamas • All Rights Reserved • Conﬁdential Fact
#2: Resource limits may strongly impact application performance and stability • A container can consume more resources than it has requested • Resource limits allow to specify the maximum resources a container can use (e.g. CPU = 2) • When a container hits its resource limits bad things can happen Container CPU limit Container Memory limit K8s throttle container CPU -> Application performance slowdown When hitting Memory Limits When hitting CPU Limits K8s kills the container -> Application stability issues X CPU Usage Memory Usage

#3: CPU limits may disrupt service performance, even when CPU used << limit https://erickhun.com/posts/kubernetes-faster-services- no-cpu-limits Limits restrict CPU speed at very low CPU usage (~30%) 22x faster services with no CPU limits (Buffer Inc.) Source: Akamas Research no CPU limit CPU slow down Service Latency Time

#4: Setting resource requests and limits is required to ensure Kubernetes stability “While your Kubernetes cluster might work fine without setting resource requests and limits, you will start running into stability issues as your teams and projects grow” (Google, Kubernetes best practices) https://cloud.google.com/blog/products/containers-kubernetes/ kubernetes-best-practices-resource-requests-and-limits

5#: Kubernetes autoscalers do not address service reliability and cost efficiency VPA off VPA on VPA causes service reliability SLO to fail VPA reduces resources requests (& limits) 2 3 VPA off VPA on 2 2 3 Container looks over-provisioned (CPU util < 50%) SLO: response time (p90) < 16ms 1 1

© 2021 Akamas • All Rights Reserved • Conﬁdential ML-powered
optimization: a real-world case

© 2021 Akamas • All Rights Reserved • Conﬁdential A
real-world case European leader in accounting, payroll & business management software 1,7M users 400M invoices / year Digital Service App Digital Service App Digital Service App frequent updates dictated by business & regulatory compliance Azure Kubernetes Service (AKS) B2B Stateless Authorization Service AWS Elastic Kubernetes Service (EKS)

© 2021 Akamas • All Rights Reserved • Conﬁdential Optimization
goals & constraints GOAL: MINIMIZE application_cost transaction_throughput > baseline - 10% AND transaction_errore_rate < baseline + 10% AND transaction_response_time < baseline +10% CONSTRAINTS: B2B Stateless Authorization Service Azure Container Instances pricing Azure Kubernetes Service (AKS)

parameters B2B Stateless Authorization Service TUNABLE PARAMETERS 9

© 2021 Akamas • All Rights Reserved • Conﬁdential High-level
optimization architecture & process 3. Collect KPIs 4. Score vs Goal Target System 2. Apply Workload Automated Workflow 1. Apply Configuration 5. AI-powered optimization B2B Stateless Authorization Service

results & decision support Configuration #19 Configuration #17 COST RESPONSE TIME THROUGHPUT REPLICAS

© 2021 Akamas • All Rights Reserved • Conﬁdential Load
testing scenario & script The load scenario was designed to replicate the daily behavior in a 30m time window: • ramp-up [3m] • steady state with 150 Users and ~1200 requests/s [~12m] corresponding to productive hours of the day • steady state with 8 Users and ~65 requests/s [~14m] corresponding to out of working hours The testing script was also designed to respect user distribution (as provided by a dataset is composed by 20K unique credentials) the API calls distribution (as calculated from production log analysis) with each API call delayed from the previous one by a random pause (think time) between 250ms and 750ms. A B C A B C

© 2021 Akamas • All Rights Reserved • Conﬁdential Baseline
configuration REQUESTS = 1.5 cores REQUESTS = 3.42 GB POD LIMITS = 2 cores LIMITS = 4.39 GB MAX HEAP = 4 GB MIN HEAP = 0.51 GB BASELINE

© 2021 Akamas • All Rights Reserved • Conﬁdential Autoscaling
settings • Scalers: CPU and Memory • Triggering thresholds ◦ CPU: 70% average ◦ Memory: 90% average • Evaluation periods ◦ Out: 1 minute ◦ In: 5 minutes REQUESTS = 1.5 cores REQUESTS = 3.42 GB scale out REPLICA POD SCALE OUT = 1.05 cores SCALE OUT = 3.1 GB SCALING POLICIES

© 2021 Akamas • All Rights Reserved • Conﬁdential Baseline
configuration - behaviour When load drops the number of replicas does not scale down, despite low resource use (CPU) - this clearly impacts cloud bill Response time peak breaching service reliability SLO due to high CPU usage and throttling during the scale out (JVM startup) 2 1 2 1 response time SLO

© 2021 Akamas • All Rights Reserved • Conﬁdential Best
vs Baseline confs - behaviour Response time always within SLO - no peaks 2 2 Autoscaling not triggered - full load sustained by 1 replica 1 1 BEST (LOWEST COST) BASELINE response time SLO response time SLO

© 2021 Akamas • All Rights Reserved • Conﬁdential Best
vs Baseline confs - analysis REQUESTS = 1.5 cores REQUESTS = 3.42 GB POD LIMITS = 2 cores LIMITS = 4.39 GB MAX HEAP = 4 GB MIN HEAP = 0.51 GB REQUESTS = 2.77 cores REQUESTS = 5.08 GB POD LIMITS = 3.67 cores LIMITS = 5.16 GB MAX HEAP = 4.76 GB MIN HEAP = 4.36 GB BEST (LOWEST COST) BASELINE larger pod (higher fixed cost but less scaling) + container & runtime aligned conf

© 2021 Akamas • All Rights Reserved • Conﬁdential AI-driven
results: High Reliability conf Response time peak is 2x lower 1 After peak replicas are scaled back to 1 2 1 2 HIGH RESILIENCE BASELINE response time SLO response time SLO

© 2021 Akamas • All Rights Reserved • Conﬁdential High
Resilience vs Baseline confs - analysis REQUESTS = 1.5 cores REQUESTS = 3.42 GB POD LIMITS = 2 cores LIMITS = 4.39 GB MAX HEAP = 4 GB MIN HEAP = 0.51 GB REQUESTS = 1.17 cores REQUESTS = 5.6 GB POD LIMITS = 3.7 cores LIMITS = 5.69 GB MAX HEAP = 3.45 GB MIN HEAP = 1.94 GB HIGH RESILIENCE BASELINE higher memory requests and lower CPU requests (but higher limits) than baseline

© 2021 Akamas • All Rights Reserved • Conﬁdential Key
takeaways the interplay between the different application layers and technologies requires tuning the full-stack configuration to make sure that both the optimization goals and SLOs are matched the vastness of the configuration space can only be effectively explored by leveraging automated ML-based methods capable of converging to optimal configuration within hours the complexity of modern and dynamic nature of applications under varying workloads and releases requires continuous performance tuning - in addition to utilization-based scaling Full Stack Continuous ML-powered the complexity of modern applications and delivery practices require a new approach

Contacts [email protected] @AkamasLabs @akamaslabs Italy HQ Via Schiaffino 11 Milan,
20158 +39-02-4951-7001 USA East 211 Congress Street Boston, MA 02110 +1-617-936-0212 Singapore 5 Temasek Blvd Singapore 038985 USA West 12130 Millennium Drive Los Angeles, CA 90094 +1-323-524-0524 LinkedIn Twitter Email © 2021 Akamas • All Rights Reserved • Conﬁdential akamas.io Website

Balancing Kubernetes performance, resilience & ...

Balancing Kubernetes performance, resilience & cost by using ML-based optimization – a real-world case

Stefano Doni

More Decks by Stefano Doni

Featured

Transcript

© 2021 Akamas • All Rights Reserved • Conﬁdential Balancing

© 2021 Akamas • All Rights Reserved • Conﬁdential Agenda

© 2021 Akamas • All Rights Reserved • Conﬁdential Kubernetes

© 2021 Akamas • All Rights Reserved • Conﬁdential 90%+

© 2021 Akamas • All Rights Reserved • Conﬁdential Key

© 2021 Akamas • All Rights Reserved • Conﬁdential Kubernetes

© 2021 Akamas • All Rights Reserved • Conﬁdential …

© 2021 Akamas • All Rights Reserved • Conﬁdential Kubernetes

© 2021 Akamas • All Rights Reserved • Conﬁdential Pod

© 2021 Akamas • All Rights Reserved • Conﬁdential Fact

© 2021 Akamas • All Rights Reserved • Conﬁdential Fact

© 2021 Akamas • All Rights Reserved • Conﬁdential Fact

© 2021 Akamas • All Rights Reserved • Conﬁdential Fact

© 2021 Akamas • All Rights Reserved • Conﬁdential ML-powered

© 2021 Akamas • All Rights Reserved • Conﬁdential A

© 2021 Akamas • All Rights Reserved • Conﬁdential Optimization

© 2021 Akamas • All Rights Reserved • Conﬁdential Optimization

© 2021 Akamas • All Rights Reserved • Conﬁdential High-level

© 2021 Akamas • All Rights Reserved • Conﬁdential Optimization

© 2021 Akamas • All Rights Reserved • Conﬁdential Load

© 2021 Akamas • All Rights Reserved • Conﬁdential Baseline

© 2021 Akamas • All Rights Reserved • Conﬁdential Autoscaling

© 2021 Akamas • All Rights Reserved • Conﬁdential Baseline

© 2021 Akamas • All Rights Reserved • Conﬁdential CONFIGURATION

© 2021 Akamas • All Rights Reserved • Conﬁdential Best

© 2021 Akamas • All Rights Reserved • Conﬁdential Best

© 2021 Akamas • All Rights Reserved • Conﬁdential AI-driven

© 2021 Akamas • All Rights Reserved • Conﬁdential AI-driven

© 2021 Akamas • All Rights Reserved • Conﬁdential High

© 2021 Akamas • All Rights Reserved • Conﬁdential Conclusions

© 2021 Akamas • All Rights Reserved • Conﬁdential Key

Contacts [email protected] @AkamasLabs @akamaslabs Italy HQ Via Schiaffino 11 Milan,