Beyond default settings: Optimizing Java on K8s with AI-driven performance tuning

© 2026 Akamas • All Rights Reserved • Confidential Beyond
default settings: Optimizing Java on K8s with AI-driven performance tuning Stefano Doni, Akamas Co-founder & CTO DevNexus 2026  Atlanta

© 2026 Akamas • All Rights Reserved • Confidential A
trip to Java

© 2026 Akamas • All Rights Reserved • Confidential Common
misconceptions: JVM tunes itself

misconceptions: heap size is enough

misconceptions: the JVM is enough

© 2026 Akamas • All Rights Reserved • Confidential The
Java on K8s optimization stack Cloud Instance Pod Horizontal Scaling Application QoS JVM Pod Vertical Scaling Node Scaling Cloud Pricing Pods Clusters Cloud infrastructure Applications

© 2026 Akamas • All Rights Reserved • Confidential Top
Java performance on K8s challenges https://akamas.io/resources/the-state-of-java-on-kubernetes-2026-w hy-defaults-are-killing-your-performance

© 2025 Akamas • All Rights Reserved • Confidential Problem
#1 Performance & (auto) scaling

© 2026 Akamas • All Rights Reserved • Confidential The
Java + HPA autoscaling challenge Java apps experience slower performance and significantly higher CPU utilization during the initial startup or warm-up phase This can cause • App violating SLOs • HPA over-scale and replicas saturation • CPU spikes, causing noisy neighbours or even node stability issues

© 2026 Akamas • All Rights Reserved • Confidential HPA
going crazy with Java Spring Petclinic

© 2026 Akamas • All Rights Reserved • Confidential Java
warmup performance & CPU spikes

© 2026 Akamas • All Rights Reserved • Confidential JVM
JIT compiler 101 • The JIT compiler compiles bytecode to native code for frequently executed methods (“hotspotsˮ) • The JVM provides two compilers: C1 (client) and C2 (server) • JIT compilers use CPU/memory to do their work • Trade-off: code speed vs resource usage vs app runtime • Key JVM configs ◦ XXTieredStopAtLevel=N ◦ XXCompileThresholdScaling=N ◦ XXTieredCompilation ◦ XXCICompilerCount=N Interpreter C1: no profiling C1: limited profiling C1: full profiling C2 0 1 2 3 4 Compilation level

© 2026 Akamas • All Rights Reserved • Confidential K8s
HPA scaling 101 • The Horizontal Pod Autoscaler HPA) adjusts the number of pod replicas • The scaling decision is based on metrics and threshold ◦ Example: CPU util > 50% vs CPU requests • Traffic is forwarded to new replicas once ready (pod probes) • Itʼs reactive - can be slow to react to sudden peaks • Key HPA configs ◦ HPA Scaling metric & threshold ◦ Pod requests & limits ◦ Pod readiness/liveness/startup probes Pod Pod Pod Pod Pod replicas Deployment + HPA

© 2025 Akamas • All Rights Reserved • Confidential Problem
#2 Resource (cost) efficiency

© 2026 Akamas • All Rights Reserved • Confidential A
mental model for full-stack K8s efficiency Time Cluster scaling efficiency Workload scaling efficiency Application runtime efficiency The 3 K8s efficiency metrics Resources CPU, mem) Allocatable Requests App demand Used

heap sizing is a long standing problem

© 2026 Akamas • All Rights Reserved • Confidential Why
heap size tuning is important? JVM uses all of the available memory 2 GiB 1.2 GiB JVM heap used JVM max heap App response time • The JVM tends to use all of the memory it has been configured with • Sizing based on K8s container memory usage is going to miss a lot of savings • Experiment with JVM max heap size to see how much you can save - while monitoring app performance! 40% Mem used

© 2026 Akamas • All Rights Reserved • Confidential How
do people really set heap size? https://akamas.io/resources/the-state-of-java-on-kubernetes-2026-why-defaults-are-killing-your-performance

ergonomics in K8s: heap memory sizing Source: Microsoft • MaxRAMPercentage default is very conservative: increase it to use all the requested memory of your pod • Watch out for out of memory kills by k8s - the JVM allocates off-heap memory in addition to the heap • Do not trust JVM ergonomics: itʼs best to explicitly set JVM flags to avoid surprises

© 2026 Akamas • All Rights Reserved • Confidential OpenJDK
garbage collectors Collector name Best for Serial Memory footprint Parallel Throughput G1 Balanced throughput - performance Shenandoah Low latency ZGC Low latency

© 2026 Akamas • All Rights Reserved • Confidential Choose
your GC carefully, trade-offs apply Choose your GC carefully, trade-offs apply Serial is 10% slower, but very efficient on memory and CPU Parallel is 22% faster, while also very eﬃcient on memory (-31%) Z and Shenandoah are signiﬁcantly slower and use more resources https://shipilev.net/jvm/anatomy-quarks/21-heap-uncommit https://akamas.io/resources/right-app-gc-maximum-performance

default ergonomics in K8s: GC 2 4 6 8 1 Number of CPUs Memory MB 1791 MB Serial GC G1 GC • Default GC selection is based on hard-coded thresholds defined decades ago • You may end up paying the cost of a suboptimal GC, and you may not even know it! • Other good collectors like Parallel GC are not considered

on K8s: lesson learned • JVM configuration drives your application performance and efficiency, not your code • Default values may be far from optimal • Donʼt trust JVM ergonomics on K8s, always choose your heap and GC • JVM resource management is deeply interdependent with K8s pod settings and HPA behaviour • Proper JVM and K8s configurations can fix the biggest performance & efficiency issues

performance tuning is done today Todayʼs approach: manual, slow, requires full-stack skills, doesnʼt scale, reactive Developer SRE / DevOps K8s/JVM app Performance problem! Analyzes and recommend new config Validates new config vs requirements

© 2026 Akamas • All Rights Reserved • Confidential What
if we could automate that? Developer SRE / DevOps K8s/JVM app K8s/JVM app K8s/JVM app K8s/JVM app K8s/JVM app K8s/JVM app K8s/JVM app K8s/JVM app K8s/JVM app Developer … JVM & K8s Automated optimization platform New approach: automated, requires low skills & effort, scales to big environments, proactive

to automate tuning?

© 2026 Akamas • All Rights Reserved • Confidential Application
Telemetry AI Optimization Engine Full Stack Performance Models DEV Tuning Profiles Define the goals GitOps Pipelines Open a PR Tuned App Review and Merge Human In the Loop Optimization Opportunities DEV Informed Decisions Reinforcement Learning AI-powered optimization architecture

© 2026 Akamas • All Rights Reserved • Confidential Key
optimization capabilities • Goal-driven (cost & performance) + constraints • Full-stack, application-aware • Fast convergence • Automated optimization with human-in-the loop controls • Explainable & deterministic • Integrates with observability • Safe, high confidence of changes (must be deployed in prod, canʼt learn from failures) • UX

© 2026 Akamas • All Rights Reserved • Confidential Efficiency
optimization: 28% throughput & meeting SLOs Baseline configuration Peak Throughput matching SLO 74 TPS Best configuration 28% Peak Throughput matching SLO 95 TPS SLO breaking at 100ms

© 2026 Akamas • All Rights Reserved • Confidential Takeaways
• Default JVM settings on Kubernetes are often suboptimal for performance and cost • Application performance & efficiency is primarily driven by JVM and K8s configuration • Manual approaches donʼt work anymore in the new cloud-native world • AI-driven performance tuning automates optimization for better cost-performance trade-offs

Beyond default settings: Optimizing Java on K8s...

Beyond default settings: Optimizing Java on K8s with AI-driven performance tuning

More Decks by Stefano Doni

Other Decks in Technology

Featured

Transcript