Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Nuances on Kubernetes - RubyConf Taiwan 2025

Nuances on Kubernetes - RubyConf Taiwan 2025

It is not so difficult to “kubernetize” your application, but what’s next? There are so many subtle things to consider to make your application performant and reliable.

Unfortunately, it is easy to make it wrong: a small change in resource limits can slow your application, a wrong container liveness check will make your app crash faster under heavy load, and even following containerization best practices can make your application less performant. Let’s see why!

Avatar for Andrey Novikov

Andrey Novikov

August 10, 2025
Tweet

More Decks by Andrey Novikov

Other Decks in Programming

Transcript

  1. About me Andrey Novikov Back-end engineer at Evil Martians Ruby,

    Go, PostgreSQL, Docker, k8s… Living in Osaka, Japan for 3 years Love to ride mopeds, motorcycles, and cars over Japan
  2. Martian Open Source Yabeda: Ruby application instrumentation framework Lefthook: git

    hooks manager AnyCable: Polyglot replacement for ActionCable server PostCSS: A tool for transforming CSS with JavaScript Imgproxy: Fast and secure standalone server for resizing and converting remote images Logux: Client-server communication framework based on Optimistic UI, CRDT, and log Overmind: Process manager for Procfile-based applications and tmux And many others at evilmartians.com/oss
  3. Why Kubernetes? Easy to add components libvips make Ruby crash?

    Add imgproxy as an internal service! (See Imgproxy is Amazing) ActionCable is slow? Add AnyCable (with Redis) to the setup! Need to process tons of webhooks? Write a webhook processor in Rust! Easy to replicate Create one more staging or pull request preview apps Easy to scale Horizontal pod autoscaler with KEDA Less vendor-lock Migrate between cloud providers with ease
  4. Why should developers know Kubernetes? TL;DR: To deploy new features

    faster without waiting for admins/devops. to tweak production settings themselves to be able to add new component / microservice with minimal help from devops to understand how application work in production to understand devops-speak 🙃
  5. What is Kubernetes… Cluster operating system for deploying applications Abstracts

    the application from underlying hardware/clouds* Declarative configuration and built-in control loop Uses (Docker-)containers to run applications… …and its own abstractions for their orchestration and launch * https://buttondown.email/nelhage/archive/two-reasons-kubernetes-is-so-complex/
  6. Pod 🫛 The main application unit in Kubernetes like atom:

    complex but not separable ⚛️ A logically indivisible group of containers but usually only one is the main one That runs together on one node Shares localhost and internal cluster IP address One pod is like a separate server from the application’s point of view Unit of component scaling Replicate more of a kind to get more throughput Documentation: kubernetes.io/docs/concepts/workloads/pods Image: kubernetes.io/docs/tutorials/kubernetes- basics/explore/explore-intro
  7. Service Abstraction for logical grouping of pods Service discovery there’s

    an internal cluster DNS name Balances incoming traffic between pods It is basic Round Robin but with pod status checks Allows to scale applications horizontally Though actual scaling is done by ReplicaSet Documentation: kubernetes.io/docs/concepts/services- networking/service Image: kubernetes.io/docs/tutorials/kubernetes- basics/expose/expose-intro
  8. Kubernetes health probes Three of them for each individual container

    inside each pod: liveness Container is killed and restarted if it doesn’t respond to "are you alive?" readiness Pod is excluded from traffic balancing if it doesn’t respond to "ready to get more work?" Not needed for non-web components (e.g. Sidekiq) startup Allows delaying the start of liveness and readiness checks for long-starting applications Both liveness and readiness are executed in parallel throughout the pod’s lifetime. https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/
  9. Easy (and wrong) Same probes for a web application: path:

    /health port: 80 path: /health port: 80 containers: - name: app livenessProbe: httpGet: timeoutSeconds: 3 periodSeconds: 10 readinessProbe: httpGet: timeoutSeconds: 3 periodSeconds: 10 Picture: behance.net/gallery/35173039/Stickers-for-another- one-IT-conference-DUMP2016
  10. Request queues Requests wait for a free worker/backend in Nginx

    or app server. Image: https://railsautoscale.com/request-queue-time/
  11. Load comes 🏋️ 1. Slow requests hit one pod and

    get stuck in the queue 2. Container in a pod doesn’t respond to liveness in time 🥀 3. Kubernetes kills the container 💀 4. And immediately starts it again, but this takes some time… ⌚ 5. During restart, more requests come to other pods. 6. GOTO 1 🤡 Incorrectly configured liveness probe under load will kill the application, pod after pod!
  12. What to do? Send liveness probe through a bypass! What

    to run on a different port? E.g. Puma control app: https://til.magmalabs.io/posts/283cb0bd01-separate-health-check- endpoint-using-puma Or, write a custom plugin for your needs, like one in yabeda- puma-plugin at lib/puma/plugin/yabeda_prometheus.rb Don’t use metrics endpoint for health probes! It’s too heavy. path: /stats port: 8080 # ← another port! containers: - name: app livenessProbe: httpGet: timeoutSeconds: 3 periodSeconds: 10 readinessProbe: httpGet: path: /health port: 80 timeoutSeconds: 3 periodSeconds: 10 activate_control_app "tcp://0.0.0.0:8080", no_token: true yabeda-puma-plugin for prometheus
  13. Healthchecks: summary 1. Liveness goes through the “back door” Set

    up a listener on a separate port where only the probe will go. Don't test dependencies (database, Redis, etc.) in a liveness probe! Kubernetes should not kill your pods under load! 2. Readiness goes from “front door”, with client requests Let the “overloaded” pod exit from load balancing and “cool down”. Monitor for unready pods in monitoring!
  14. Healthchecks for everything Every component of your application should have

    its own liveness check even if it’s not a web application. Sidekiq too! sidekiq_alive gem
  15. Use timeouts and circuit breakers Don't let your application dependencies

    to fail your readiness probe. Detect failures early and short-circuit before they take over your application. Use stoplight: stoplight gem
  16. Monitor request queues! Request queue wait time is the main

    metric that shows that the application is "at the limit". Put it in your monitoring. If it’s noticeably greater than 0 — need to scale up (Kubernetes has Horizontal Pod Autoscaler) If it’s always strictly 0 — can think about scaling down. Use yabeda-puma-plugin to monitor request queues. It has puma_backlog metric containing number of requests waiting to be processed by workers.
  17. And now you can autoscale! Use Kubernetes Horizontal Pod Autoscaler

    with KEDA React on saturation metrics showing that application is overloaded: request queue wait time for HTTP requests queue latency for Sidekiq (using yabeda-sidekiq) 1. USE method: https://www.brendangregg.com/usemethod.html ↩︎ Monitor USE metrics when it comes to performance! Utilization: number of free workers Saturation: time waiting for a free worker (95p) Errors: percentage of errors when processing requests [1]
  18. Requests and limits 101 For each container in a Pod:

    requests and limits for pod — sum of values of its containers requests: for k8s scheduler limits: for OS kernel on cluster nodes cpu — measured in CPU cores ( m is millicpu: thousandths of a processor core) memory — measured in bytes and multiples (mebibytes, gebibytes) https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/ resources: requests: cpu: 200m memory: 256Mi limits: cpu: 500m memory: 512Mi
  19. Requests 101 Instructions for k8s scheduler to help it effectively

    distribute pods to nodes: Not used at runtime* Doesn’t consider actual consumption* only node capacity and requests of other pods * Used when setting up OOM, but more on that later Image: Ilya Cherepanov requests: cpu: 200m memory: 256Mi resources: limits: cpu: 500m memory: 512Mi
  20. Limits 101 Not used at scheduling time Enforced by OS

    kernel on nodes cpu configures CPU throttling You can limit for less than 1 core. https://www.datadoghq.com/blog/kubernetes-cpu- requests-limits/ memory configures Out of Memory Killer Memory is not compressible resource, so container will be killed if it exceeds the limit limits: cpu: 400m memory: 512Mi resources: requests: cpu: 200m memory: 256Mi
  21. Requests × Limits = QoS In various combinations of requests

    and limits, both container performance and lifetime can change significantly: Guaranteed — requests = limits CPU is guaranteed to be provided killed by OOM last Burstable — requests ≠ limits can use CPU more than requested killed after BestEffort in order of "greediness" BestEffort — no requests and limits CPU is provided last First to be killed by OOM killer Always specify both requests and limits! Image: Ilya Cherepanov
  22. So, what’s with milliCPU again? In case of limits, configures

    CPU throttling — the fraction of processor time that can be used within 100ms. SRE, set up the CPUThrottlingHigh alert! Don't use fractional CPU limits for containers where response speed is important! More: https://engineering.indeedblog.com/blog/2019/12/unthrottled-fixing-cpu-limits-in-the-cloud/ cpu: 400m resources: requests: cpu: 200m memory: 256Mi limits: memory: 512Mi
  23. Utilizing cores A process on languages with GIL (Ruby, Python)

    cannot utilize a processor core completely! Maximum 30-50%. Use both processes and threads: Run 2-4 worker processes per container (CoW works 🐮) 3-5 threads in each process ➕ Worker utilization will be improved by better distribution of requests to idle workers. More: https://www.speedshop.co/2017/10/12/appserver.html
  24. Is many processes per container better? It is counter-intuitive, but

    it is more performant! 🤯 Nate Berkopec @nateberkopec · Follow Which web app server is more performant: 1. 1 container with 8 forked child processes 2. 8 containers with 1 process each, round robin/random load balancer in front It's #1 and it's not even close! This extremely relevant if you have to have to use random/RR load balancers. x.com/nateberkopec/status/1938336171559649415 Benchmarks by Nate Berkopec
  25. Thank you! @Envek @Envek @[email protected] @envek.bsky.social github.com/Envek @evilmartians @evilmartians @[email protected]

    @evilmartians.com evilmartians.com Our awesome blog: evilmartians.com/chronicles! Slides are available at: envek.github.io/rubyconftw-nuances-on-kubernetes Slides ↑