Nuances
of running applications in
Kubernetes
Andrey Novikov, Evil Martians
RubyConf Taiwan 2025
August 10, 2025
Slide 2
Slide 2 text
About me
Andrey Novikov
Back-end engineer at Evil Martians
Ruby, Go, PostgreSQL, Docker, k8s…
Living in Osaka, Japan for 3 years
Love to ride mopeds, motorcycles,
and cars over Japan
Slide 3
Slide 3 text
邪惡的火星人
evilmartians.com
Slide 4
Slide 4 text
Martian Open Source
Yabeda: Ruby application instrumentation framework Lefthook: git hooks manager AnyCable: Polyglot replacement for ActionCable server PostCSS: A tool for transforming CSS with JavaScript
Imgproxy: Fast and secure standalone server for
resizing and converting remote images
Logux: Client-server communication framework based
on Optimistic UI, CRDT, and log
Overmind: Process manager for Procfile-based
applications and tmux
And many others at evilmartians.com/oss
Slide 5
Slide 5 text
No content
Slide 6
Slide 6 text
Why Kubernetes?
Flexibility and control over the application architecture
Slide 7
Slide 7 text
Why Kubernetes?
Easy to add components
libvips make Ruby crash? Add imgproxy as an internal service! (See Imgproxy is Amazing)
ActionCable is slow? Add AnyCable (with Redis) to the setup!
Need to process tons of webhooks? Write a webhook processor in Rust!
Easy to replicate
Create one more staging or pull request preview apps
Easy to scale
Horizontal pod autoscaler with KEDA
Less vendor-lock
Migrate between cloud providers with ease
Slide 8
Slide 8 text
Kubernetes 101
Versatility comes with complexity
Slide 9
Slide 9 text
Why should developers know Kubernetes?
TL;DR: To deploy new features faster without waiting for admins/devops.
to tweak production settings themselves
to be able to add new component / microservice with minimal help from devops
to understand how application work in production
to understand devops-speak 🙃
Slide 10
Slide 10 text
What is Kubernetes…
Cluster operating system for deploying applications
Abstracts the application from underlying hardware/clouds*
Declarative configuration and built-in control loop
Uses (Docker-)containers to run applications…
…and its own abstractions for their orchestration and launch
* https://buttondown.email/nelhage/archive/two-reasons-kubernetes-is-so-complex/
Slide 11
Slide 11 text
…and what it consists of
Image: Kirill Kouznetsov
Slide 12
Slide 12 text
Pod 🫛
The main application unit in Kubernetes
like atom: complex but not separable ⚛️
A logically indivisible group of containers
but usually only one is the main one
That runs together on one node
Shares localhost and internal cluster IP address
One pod is like a separate server
from the application’s point of view
Unit of component scaling
Replicate more of a kind to get more throughput
Documentation:
kubernetes.io/docs/concepts/workloads/pods
Image: kubernetes.io/docs/tutorials/kubernetes-
basics/explore/explore-intro
Slide 13
Slide 13 text
Service
Abstraction for logical grouping of pods
Service discovery
there’s an internal cluster DNS name
Balances incoming traffic between pods
It is basic Round Robin but with pod status checks
Allows to scale applications horizontally
Though actual scaling is done by ReplicaSet
Documentation:
kubernetes.io/docs/concepts/services-
networking/service
Image: kubernetes.io/docs/tutorials/kubernetes-
basics/expose/expose-intro
Slide 14
Slide 14 text
Let’s get to the nuances!
Are you ready?
Slide 15
Slide 15 text
Healthchecks
Why so many and what’s the difference?
Slide 16
Slide 16 text
Kubernetes health probes
Three of them for each individual container inside each pod:
liveness
Container is killed and restarted if it doesn’t respond to "are you alive?"
readiness
Pod is excluded from traffic balancing if it doesn’t respond to "ready to get more work?"
Not needed for non-web components (e.g. Sidekiq)
startup
Allows delaying the start of liveness and readiness checks for long-starting applications
Both liveness and readiness are executed in parallel throughout the pod’s lifetime.
https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/
Slide 17
Slide 17 text
Easy (and wrong)
Same probes for a web application:
path: /health
port: 80
path: /health
port: 80
containers:
- name: app
livenessProbe:
httpGet:
timeoutSeconds: 3
periodSeconds: 10
readinessProbe:
httpGet:
timeoutSeconds: 3
periodSeconds: 10
Picture: behance.net/gallery/35173039/Stickers-for-another-
one-IT-conference-DUMP2016
Slide 18
Slide 18 text
Request queues
Requests wait for a free worker/backend in Nginx or app server.
Image: https://railsautoscale.com/request-queue-time/
Slide 19
Slide 19 text
Load comes 🏋️
1. Slow requests hit one pod and get stuck in the queue
2. Container in a pod doesn’t respond to liveness in time 🥀
3. Kubernetes kills the container 💀
4. And immediately starts it again, but this takes some time… ⌚
5. During restart, more requests come to other pods.
6. GOTO 1 🤡
Incorrectly configured liveness probe under load will kill the application,
pod after pod!
What to do?
Send liveness probe through a bypass! What to run on a different port?
E.g. Puma control app:
https://til.magmalabs.io/posts/283cb0bd01-separate-health-check-
endpoint-using-puma
Or, write a custom plugin for your needs, like one in yabeda-
puma-plugin at lib/puma/plugin/yabeda_prometheus.rb
Don’t use metrics endpoint for
health probes! It’s too heavy.
path: /stats
port: 8080 # ← another port!
containers:
- name: app
livenessProbe:
httpGet:
timeoutSeconds: 3
periodSeconds: 10
readinessProbe:
httpGet:
path: /health
port: 80
timeoutSeconds: 3
periodSeconds: 10
activate_control_app "tcp://0.0.0.0:8080", no_token: true
yabeda-puma-plugin for prometheus
Slide 22
Slide 22 text
Healthchecks: summary
1. Liveness goes through the “back door”
Set up a listener on a separate port where only the probe will go.
Don't test dependencies (database, Redis, etc.) in a liveness probe!
Kubernetes should not kill your pods under load!
2. Readiness goes from “front door”, with client requests
Let the “overloaded” pod exit from load balancing and “cool down”.
Monitor for unready pods in monitoring!
Slide 23
Slide 23 text
Healthchecks for everything
Every component of your application should have its own liveness check even if it’s not
a web application.
Sidekiq too!
sidekiq_alive gem
Slide 24
Slide 24 text
Use timeouts and circuit breakers
Don't let your application dependencies to fail your readiness probe.
Detect failures early and short-circuit before they take over your application.
Use stoplight:
stoplight gem
Slide 25
Slide 25 text
Monitor request queues!
Request queue wait time is the main metric that shows that the application is "at the
limit". Put it in your monitoring.
If it’s noticeably greater than 0 — need to scale up (Kubernetes has Horizontal Pod
Autoscaler)
If it’s always strictly 0 — can think about scaling down.
Use yabeda-puma-plugin to monitor request queues.
It has puma_backlog metric containing number of requests waiting
to be processed by workers.
Slide 26
Slide 26 text
And now you can autoscale!
Use Kubernetes Horizontal Pod Autoscaler with KEDA
React on saturation metrics showing that application is overloaded:
request queue wait time for HTTP requests
queue latency for Sidekiq (using yabeda-sidekiq)
1. USE method: https://www.brendangregg.com/usemethod.html
↩︎
Monitor USE metrics when it comes to performance!
Utilization: number of free workers
Saturation: time waiting for a free worker (95p)
Errors: percentage of errors when processing requests
[1]
Slide 27
Slide 27 text
Requests and limits
How to (not) properly ask for resources
Slide 28
Slide 28 text
Requests and limits 101
For each container in a Pod:
requests and limits for pod — sum of values of its containers
requests: for k8s scheduler
limits: for OS kernel on cluster nodes
cpu — measured in CPU cores
( m is millicpu: thousandths of a processor core)
memory — measured in bytes
and multiples (mebibytes, gebibytes)
https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/
resources:
requests:
cpu: 200m
memory: 256Mi
limits:
cpu: 500m
memory: 512Mi
Slide 29
Slide 29 text
Requests 101
Instructions for k8s scheduler to help it
effectively distribute pods to nodes:
Not used at runtime*
Doesn’t consider actual consumption*
only node capacity and requests of other pods
* Used when setting up OOM, but more on that later
Image: Ilya Cherepanov
requests:
cpu: 200m
memory: 256Mi
resources:
limits:
cpu: 500m
memory: 512Mi
Slide 30
Slide 30 text
Limits 101
Not used at scheduling time
Enforced by OS kernel on nodes
cpu configures CPU throttling
You can limit for less than 1 core.
https://www.datadoghq.com/blog/kubernetes-cpu-
requests-limits/
memory configures Out of Memory Killer
Memory is not compressible resource, so
container will be killed if it exceeds the limit
limits:
cpu: 400m
memory: 512Mi
resources:
requests:
cpu: 200m
memory: 256Mi
Slide 31
Slide 31 text
Requests × Limits = QoS
In various combinations of requests and limits, both container performance and
lifetime can change significantly:
Guaranteed — requests = limits
CPU is guaranteed to be provided
killed by OOM last
Burstable — requests ≠ limits
can use CPU more than requested
killed after BestEffort in order of "greediness"
BestEffort — no requests and limits
CPU is provided last
First to be killed by OOM killer
Always specify both requests and limits! Image: Ilya Cherepanov
Slide 32
Slide 32 text
Threads, processes, and
containers
Counter-intuitive findings about Ruby performance with limited CPU
Slide 33
Slide 33 text
So, what’s with milliCPU again?
In case of limits, configures CPU throttling — the fraction of
processor time that can be used within 100ms.
SRE, set up the CPUThrottlingHigh alert!
Don't use fractional CPU limits for containers where response speed is important!
More: https://engineering.indeedblog.com/blog/2019/12/unthrottled-fixing-cpu-limits-in-the-cloud/
cpu: 400m
resources:
requests:
cpu: 200m
memory: 256Mi
limits:
memory: 512Mi
Slide 34
Slide 34 text
Utilizing cores
A process on languages with GIL (Ruby, Python)
cannot utilize a processor core completely!
Maximum 30-50%.
Use both processes and threads:
Run 2-4 worker processes per
container (CoW works 🐮)
3-5 threads in each process
➕ Worker utilization will be improved by better
distribution of requests to idle workers.
More: https://www.speedshop.co/2017/10/12/appserver.html
Slide 35
Slide 35 text
Is many processes per container better?
It is counter-intuitive, but it is more performant! 🤯
Nate Berkopec
@nateberkopec · Follow
Which web app server is more performant:
1. 1 container with 8 forked child processes
2. 8 containers with 1 process each, round robin/random load
balancer in front
It's #1 and it's not even close! This extremely relevant if you
have to have to use random/RR load balancers.
x.com/nateberkopec/status/1938336171559649415
Benchmarks by Nate Berkopec