Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Maximizing the Launch Reliability: Ensuring Sta...

Avatar for hhiroshell hhiroshell
November 12, 2025

Maximizing the Launch Reliability: Ensuring Stable Application Lift-off and Orbit on Kubernetes

@KubeCon + CloudNative Con North America 2025

Launch reliability is crucial for applications on Kubernetes that frequently restart. However, applications often struggle to achieve optimal performance immediately after starting, failing to handle the load and experiencing startup failures. This lack of launch reliability can result in service downtime during rollouts and make the horizontal pod autoscaler ineffective. To prevent these issues, it's essential to tune the application and apply various practices within manifests.

This session will cover best practices for maximizing the launch reliability of applications on Kubernetes, including application tuning, appropriate resource allocation, health check settings, and techniques for automated warm-up. Additionally, he will explore the practical use of Kubernetes' recent feature, in-place pod resize, particularly for CPU bursting at startup. By the end of the session, you will gain actionable insights to enhance application stability and achieve flexible scaling.

Avatar for hhiroshell

hhiroshell

November 12, 2025
Tweet

More Decks by hhiroshell

Other Decks in Technology

Transcript

  1. KubeCon + CloudNativeCon North America 2025 2 About Me •

    Senior Platform Engineer@LY Corporation • Contributing to CNCF Platform Engineering Community Group • Author of books on Kubernetes • DIY keyboard enthusiast 2 Hiroshi Hayakawa | @hhiroshell
  2. KubeCon + CloudNativeCon North America 2025 3 📝 Agenda 1.

    Introduction 2. Practices for Stable Application Launch a. Tune your application (container and application level) b. Make your application "truly" initialized before accepting user traffic 3. Recent Kubernetes Features for Launch Reliability 4. Conclusion 3
  3. KubeCon + CloudNativeCon North America 2025 4 📝 Agenda 1.

    Introduction 👈 2. Practices for Stable Application Launch a. Tune your application (container and application level) b. Make your application "truly" initialized before accepting user traffic 3. Recent Kubernetes Features for Launch Reliability 4. Conclusion 4
  4. KubeCon + CloudNativeCon North America 2025 🚀 Overview of Application

    Launch in Kubernetes 5 5 Pre-start process Containers start up 1. Scheduling • A new pod is registered with the kube-apiserver • The scheduler decides which node to place the pod 2. Preparation • The kubelet on the Node makes environments for containers in the pod 3. Lift off • The application process starts and performs a series of initialization procedures 4. In orbit • The application enters stable operation
  5. KubeCon + CloudNativeCon North America 2025 🚀 Overview of Application

    Launch in Kubernetes 6 6 Pre-start process Containers start up 1. Scheduling • A new pod is registered with the kube-apiserver • The scheduler decides which node to place the pod 2. Preparation • The kubelet on the Node makes environments for containers in the pod 3. Lift off • The application process starts and performs a series of initialization procedures ⚠ Launch failure Due to initialization issues or external factors, the application cannot reach orbit and crashes.
  6. KubeCon + CloudNativeCon North America 2025 7 Why Launch Reliability

    Matters • Hard to foresee ◦ Slight performance degradation or high traffic volume can cause launch failures • Can happen at any time ◦ In Kubernetes, applications are frequently restarted due to rolling updates, rescheduling, etc. 7 "It all looked so easy when you did it on paper -- where valves never froze, gyros never drifted, and rocket motors did not blow up in your face.” — Milton W. Rosen, rocket engineer, 1956.
  7. KubeCon + CloudNativeCon North America 2025 8 Why Is Application

    Launch Difficult? • There are various differences from applications in orbit ◦ Frequent GC caused by initialization process ◦ Cold cache ◦ Incomplete thread pool / connection pool initialization ◦ Incomplete class loading ◦ Insufficient JIT compilation ◦ ...etc 8
  8. KubeCon + CloudNativeCon North America 2025 9 Why Is Application

    Launch Difficult? • There are various differences from applications in orbit ◦ Frequent GC caused by initialization process ◦ Cold cache ◦ Incomplete thread pool / connection pool initialization ◦ Incomplete class loading ◦ Insufficient JIT compilation ◦ ...etc 👉 Applications immediately after launch generally have lower performance 👉 Let's get rid of these hindrances! 9
  9. KubeCon + CloudNativeCon North America 2025 10 📝 Agenda 1.

    Introduction 2. Practices for Stable Application Launch a. Tune your application (container and application level) 👈 b. Make your application "truly" initialized before accepting user traffic 3. Recent Kubernetes Features for Launch Reliability 4. Conclusion 10
  10. KubeCon + CloudNativeCon North America 2025 11 Application Tuning for

    Startup • Good tuning helps overcome some startup hindrances: ◦ Frequent GC caused by initialization process ◦ Incomplete class loading ◦ Insufficient JIT compilation 11
  11. KubeCon + CloudNativeCon North America 2025 12 Application Tuning for

    Startup • Good tuning helps overcome some startup hindrances: ◦ Frequent GC caused by initialization process ◦ Incomplete class loading ◦ Insufficient JIT compilation • Tune applications in two levels: ◦ Container level = resource requests / limits ◦ Application level = language runtime, framework, your code 12
  12. KubeCon + CloudNativeCon North America 2025 13 A Quick review

    of Container Level Resource Control • .spec.containers[].resources.requests.[cpu|memory]
 ◦ quantity of resources guaranteed to be available to the container • .spec.containers[].resources.limits.[cpu|memory]
 ◦ limits for resource consumption beyond requests 13 Actual resource consumption of the container resources.limits resources.requests
  13. KubeCon + CloudNativeCon North America 2025 14 Tuning for Stable

    Launch - Container Level ✅ Allocate higher CPU / memory limits for resource bursting in startup • Whether to raise requests as well depends on several trade-offs ◦ requests = limits -> reduces resource utilization efficiency ◦ requests < limits -> need to consider QoS Class for stability 14 🔥 resource consumption time resource limits
  14. KubeCon + CloudNativeCon North America 2025 15 QoS (Quality of

    Service) Class • Pod eviction priority is influenced by QoS Class • Guaranteed: ◦ All containers have CPU and Memory limits and requests set, and limits equal requests ◦ Setting only limits automatically sets requests to the same value, so specifying only CPU and Memory limits also results in Guaranteed class • Burstable: ◦ Applies when neither Guaranteed nor BestEffort conditions are met • BestEffort: ◦ All containers have no limits or requests set 15 Less likely to be evicted More likely to be evicted
  15. KubeCon + CloudNativeCon North America 2025 16 In our production

    application… • Increasing CPU limits may eliminate latency degradation during rolling updates 16 CPU limits = 4 CPU limits = 6 rolling update rolling update (but there is no degradation)
  16. KubeCon + CloudNativeCon North America 2025 17 Tuning for Stable

    Launch - Application Level ✅ Consider the effects of cgroups(*) on language runtimes, libraries, and frameworks behavior *) cgroups = resource limits (Roughly speaking) 17 CPU Memory Java • The return value of Runtime.availableProcessors(), as well as the allocation of ForkJoin pools and thread pools, changes according to cgroups limitations. • Libraries and frameworks that depend on these values or behaviors are also affected. The heap memory size is automatically determined according to the value of the limits (ergonomics). This also affects the selection of the GC algorithm. Go • <= 1.24: GOMAXPROCS is set based on the number of logical CPUs on the host machine. It is not affected by cgroups. • 1.25: The default value of GOMAXPROCS is adjusted according to the CPU quota defined by cgroups v2. Not affected by cgroups. The behavior of the GC can be manually tuned using the GOMEMLIMIT parameter.
  17. KubeCon + CloudNativeCon North America 2025 18 Tuning for Stable

    Launch - Application Level ✅ Raise the maximum heap size in JVM applications • The maximum heap memory is 20-30% of the resource limit. It often means that the resource requests / limits are left unused. • We can use flags like -XX:MaxRAMPercentage=50.0 in JVM flags to set heap size as a percentage of the limits 18 Max size of heap memory resources.limits resources.requests Heap other areas
  18. KubeCon + CloudNativeCon North America 2025 19 One More Tuning

    Tip for JVM Applications • JVM's GC algorithm is automatically selected based on cgroups values • Be aware that increasing limits may unintentionally trigger a change in GC algorithm 19 Resource Limits GC algorithm • Memory: < 1,8 GB SerialGC • CPU: 2+ [cores] • Memory: > 1,8 [GB] G1GC
  19. KubeCon + CloudNativeCon North America 2025 20 📝 Agenda 1.

    Introduction 2. Practices for Stable Application Launch a. Tune your application (container and application level) b. Make your application "truly" initialized before accepting user traffic 3. Recent Kubernetes Features for Launch Reliability 4. Conclusion 20
  20. KubeCon + CloudNativeCon North America 2025 Rethinking "initialization" 21 21

    1. Runtime Bootstrapping • Initialize fundamental runtime components such as memory management and thread scheduling 2. Core System Setup • Load standard libraries and initialize core system services 3. Flamework / Container Setup • Start the web server • Resolve dependencies • Register request handlers • Begin listening ports 4. Application Initialization • Initialize the data access layer • Load caches • Start background jobs 5. Continuous Optimization • JIT Compilation
  21. KubeCon + CloudNativeCon North America 2025 Rethinking "initialization" 22 22

    1. Runtime Bootstrapping • Initialize fundamental runtime components such as memory management and thread scheduling 2. Core System Setup • Load standard libraries and initialize core system services 3. Flamework / Container Setup • Start the web server • Resolve dependencies • Register request handlers • Begin listening ports 4. Application Initialization • Initialize the data access layer • Load caches • Start background jobs 5. Continuous Optimization • JIT Compilation ⚠ Live traffic may comes in from here,
  22. KubeCon + CloudNativeCon North America 2025 Rethinking "initialization" 23 23

    1. Runtime Bootstrapping • Initialize fundamental runtime components such as memory management and thread scheduling 2. Core System Setup • Load standard libraries and initialize core system services 3. Flamework / Container Setup • Start the web server • Resolve dependencies • Register request handlers • Begin listening ports 4. Application Initialization • Initialize the data access layer • Load caches • Start background jobs 5. Continuous Optimization • JIT Compilation ✅ Shift to here ⚠ Live traffic may comes in from here,
  23. KubeCon + CloudNativeCon North America 2025 Pod Startup Flow 24

    24 time InitContainers run sequentially Containers starts and runs ENTRYPOINT command startup probe readiness probe liveness probe … … … postStart lifecycle hook starts as well 🚀
  24. KubeCon + CloudNativeCon North America 2025 Pod Startup Flow 25

    25 time InitContainers run sequentially Containers starts and runs ENTRYPOINT command startup probe readiness probe liveness probe … … … postStart lifecycle hook starts as well Service In Pod becomes “READY.” Requests come into the pod. Make containers ready here, 🚀
  25. KubeCon + CloudNativeCon North America 2025 26 Ensure applications are

    truly initialized ✅ Perform application-level initialization before the Readiness Probe succeeds • Delay the readiness probe or block with startup probe, to allow time for application-level initialization • Ensure performing enough initialization in the application, for example: ◦ Fill the DB connection pool with idle connections ◦ Activate threads for request handling • Utilize postStart hook or startup probe for supporting initialization ◦ e.g. Send warmup traffic from inside the pod 26
  26. KubeCon + CloudNativeCon North America 2025 Example: Fill the DB

    Connection Pool 27 27 # Maintain the number of minimum idle connection before the app goes live
  27. KubeCon + CloudNativeCon North America 2025 Example: Activating threads for

    Request Handling 28 28 # Increase minimum thread count to reduce thread creation overhead
  28. KubeCon + CloudNativeCon North America 2025 29 For Further Optimization...

    • For applications requiring further performance optimization, sending traffic through automatic warm-up can be effective ◦ Example: JIT compilation 29
  29. KubeCon + CloudNativeCon North America 2025 30 Considerations for automatic

    warmup • Startup procedures may become more complex, potentially increasing the risk of startup failures • Startup time may increase • Dependent components may experience unexpected load 30
  30. KubeCon + CloudNativeCon North America 2025 31 📝 Agenda 1.

    Introduction 2. Practices for Stable Application Launch a. Tune your application (container and application level) b. Make your application "truly" initialized before accepting user traffic 3. Recent Kubernetes Features relate to Launch Reliability 👈 4. Conclusion 31
  31. KubeCon + CloudNativeCon North America 2025 32 In-place Pod Resize

    (Beta at >= Kubernetes v1.33) • Feature that allows changing resource limits / requests without restarting containers 🤔 Could it be used for resource bursting during application startup? 32 resource consumption time ▼ lower the resource limits
  32. KubeCon + CloudNativeCon North America 2025 33 Concerns about Memory

    Limit Reduction • Reducing memory limits to lower than actual usage without restart can trigger OOMKill (cgroups v2) ◦ Best-effort checking by kubelet will be performed, but it's not guaranteed to be safe: https://github.com/kubernetes/kubernetes/pull/133012 • 👉 Strategies to mitigate risks: ◦ Avoid reducing memory limits when possible ◦ Wait for a certain period after startup before reducing limits 33
  33. KubeCon + CloudNativeCon North America 2025 34 Runtime and cgroups

    Relationship Matters • Runtimes, libraries, and frameworks need to dynamically adapt to cgroup changes • Dynamic adaptation capabilities in Java and Go runtimes are still evolving 34 CPU Memory Java ❌ No dynamic update — detects cgroup CPU limits only at startup. ❌ No dynamic update — reads memory limits only at startup for heap sizing. Go ✅ Auto-adjusts GOMAXPROCS when cgroup CPU quota changes (since Go 1.25). ⚠ Manual only — supports GOMEMLIMIT, but no built-in auto-update. However, custom implementation is possible (e.g., polling cgroup values).
  34. KubeCon + CloudNativeCon North America 2025 35 Concerns from Operational

    Perspective • Resize is triggered by a Pod's subresource, so you need to specify resize for each replica individually 👉 Automation is needed for production use 👉 The integration of VPA and in-place resizing currently under development? • https://kubernetes.io/docs/concepts/workloads/autoscaling/#in-place-pod-vertical-s caling • https://github.com/kubernetes/autoscaler/issues/4016 35
  35. KubeCon + CloudNativeCon North America 2025 36 Dedicated Autoscaling Feature

    for Startup? • In some cases, we might want to adjust resource allocation after startup only, while handling regular autoscaling with HPA • Resource consumption patterns during startup and steady state often have significantly different characteristics 36 startup steady state
  36. KubeCon + CloudNativeCon North America 2025 37 📝 Agenda 1.

    Introduction 2. Practices for Stable Application Launch a. Tune your application (container and application level) b. Make your application "truly" initialized before accepting user traffic 3. Recent Kubernetes Features relate to Launch Reliability 4. Conclusion 👈 37
  37. KubeCon + CloudNativeCon North America 2025 38 Conclusion • 🚀

    Application launch reliability is critical for production Kubernetes workloads • ✅ Key practices for achieving stable launch: ◦ Tune application resources at both container and runtime levels ◦ Ensure complete initialization before accepting live traffic • 🔄 Kubernetes ecosystem is evolving to address launch challenges ◦ In-place Pod Resize introduces new optimization possibilities ◦ Consider operational implications and work around current limitations • 💡 Remember: Reliable launches lead to stable orbits! 38