Nuances on Kubernetes - RubyConf Taiwan 2025

by Andrey Novikov

Slide 1

Slide 1 text

Nuances of running applications in Kubernetes Andrey Novikov, Evil Martians RubyConf Taiwan 2025 August 10, 2025

Slide 2

Slide 2 text

About me Andrey Novikov Back-end engineer at Evil Martians Ruby, Go, PostgreSQL, Docker, k8s… Living in Osaka, Japan for 3 years Love to ride mopeds, motorcycles, and cars over Japan

Slide 3

Slide 3 text

邪惡的火星人 evilmartians.com

Slide 4

Slide 4 text

Martian Open Source Yabeda: Ruby application instrumentation framework Lefthook: git hooks manager AnyCable: Polyglot replacement for ActionCable server PostCSS: A tool for transforming CSS with JavaScript Imgproxy: Fast and secure standalone server for resizing and converting remote images Logux: Client-server communication framework based on Optimistic UI, CRDT, and log Overmind: Process manager for Procfile-based applications and tmux And many others at evilmartians.com/oss

Slide 5

Slide 5 text

No content

Slide 6

Slide 6 text

Why Kubernetes? Flexibility and control over the application architecture

Slide 7

Slide 7 text

Why Kubernetes? Easy to add components libvips make Ruby crash? Add imgproxy as an internal service! (See Imgproxy is Amazing) ActionCable is slow? Add AnyCable (with Redis) to the setup! Need to process tons of webhooks? Write a webhook processor in Rust! Easy to replicate Create one more staging or pull request preview apps Easy to scale Horizontal pod autoscaler with KEDA Less vendor-lock Migrate between cloud providers with ease

Slide 8

Slide 8 text

Kubernetes 101 Versatility comes with complexity

Slide 9

Slide 9 text

Why should developers know Kubernetes? TL;DR: To deploy new features faster without waiting for admins/devops. to tweak production settings themselves to be able to add new component / microservice with minimal help from devops to understand how application work in production to understand devops-speak 🙃

Slide 10

Slide 10 text

What is Kubernetes… Cluster operating system for deploying applications Abstracts the application from underlying hardware/clouds* Declarative configuration and built-in control loop Uses (Docker-)containers to run applications… …and its own abstractions for their orchestration and launch * https://buttondown.email/nelhage/archive/two-reasons-kubernetes-is-so-complex/

Slide 11

Slide 11 text

…and what it consists of Image: Kirill Kouznetsov

Slide 12

Slide 12 text

Pod 🫛 The main application unit in Kubernetes like atom: complex but not separable ⚛️ A logically indivisible group of containers but usually only one is the main one That runs together on one node Shares localhost and internal cluster IP address One pod is like a separate server from the application’s point of view Unit of component scaling Replicate more of a kind to get more throughput Documentation: kubernetes.io/docs/concepts/workloads/pods Image: kubernetes.io/docs/tutorials/kubernetes- basics/explore/explore-intro

Slide 13

Slide 13 text

Service Abstraction for logical grouping of pods Service discovery there’s an internal cluster DNS name Balances incoming traffic between pods It is basic Round Robin but with pod status checks Allows to scale applications horizontally Though actual scaling is done by ReplicaSet Documentation: kubernetes.io/docs/concepts/services- networking/service Image: kubernetes.io/docs/tutorials/kubernetes- basics/expose/expose-intro

Slide 14

Slide 14 text

Let’s get to the nuances! Are you ready?

Slide 15

Slide 15 text

Healthchecks Why so many and what’s the difference?

Slide 16

Slide 16 text

Kubernetes health probes Three of them for each individual container inside each pod: liveness Container is killed and restarted if it doesn’t respond to "are you alive?" readiness Pod is excluded from traffic balancing if it doesn’t respond to "ready to get more work?" Not needed for non-web components (e.g. Sidekiq) startup Allows delaying the start of liveness and readiness checks for long-starting applications Both liveness and readiness are executed in parallel throughout the pod’s lifetime. https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/

Slide 17

Slide 17 text

Easy (and wrong) Same probes for a web application: path: /health port: 80 path: /health port: 80 containers: - name: app livenessProbe: httpGet: timeoutSeconds: 3 periodSeconds: 10 readinessProbe: httpGet: timeoutSeconds: 3 periodSeconds: 10 Picture: behance.net/gallery/35173039/Stickers-for-another- one-IT-conference-DUMP2016

Slide 18

Slide 18 text

Request queues Requests wait for a free worker/backend in Nginx or app server. Image: https://railsautoscale.com/request-queue-time/

Slide 19

Slide 19 text

Load comes 🏋️ 1. Slow requests hit one pod and get stuck in the queue 2. Container in a pod doesn’t respond to liveness in time 🥀 3. Kubernetes kills the container 💀 4. And immediately starts it again, but this takes some time… ⌚ 5. During restart, more requests come to other pods. 6. GOTO 1 🤡 Incorrectly configured liveness probe under load will kill the application, pod after pod!

Slide 20

Slide 20 text

Picture: https://knowyourmeme.com/photos/1901279-death-knocking-on-doors

Slide 21

Slide 21 text

What to do? Send liveness probe through a bypass! What to run on a different port? E.g. Puma control app: https://til.magmalabs.io/posts/283cb0bd01-separate-health-check- endpoint-using-puma Or, write a custom plugin for your needs, like one in yabeda- puma-plugin at lib/puma/plugin/yabeda_prometheus.rb Don’t use metrics endpoint for health probes! It’s too heavy. path: /stats port: 8080 # ← another port! containers: - name: app livenessProbe: httpGet: timeoutSeconds: 3 periodSeconds: 10 readinessProbe: httpGet: path: /health port: 80 timeoutSeconds: 3 periodSeconds: 10 activate_control_app "tcp://0.0.0.0:8080", no_token: true yabeda-puma-plugin for prometheus

Slide 22

Slide 22 text

Healthchecks: summary 1. Liveness goes through the “back door” Set up a listener on a separate port where only the probe will go. Don't test dependencies (database, Redis, etc.) in a liveness probe! Kubernetes should not kill your pods under load! 2. Readiness goes from “front door”, with client requests Let the “overloaded” pod exit from load balancing and “cool down”. Monitor for unready pods in monitoring!

Slide 23

Slide 23 text

Healthchecks for everything Every component of your application should have its own liveness check even if it’s not a web application. Sidekiq too! sidekiq_alive gem

Slide 24

Slide 24 text

Use timeouts and circuit breakers Don't let your application dependencies to fail your readiness probe. Detect failures early and short-circuit before they take over your application. Use stoplight: stoplight gem

Slide 25

Slide 25 text

Monitor request queues! Request queue wait time is the main metric that shows that the application is "at the limit". Put it in your monitoring. If it’s noticeably greater than 0 — need to scale up (Kubernetes has Horizontal Pod Autoscaler) If it’s always strictly 0 — can think about scaling down. Use yabeda-puma-plugin to monitor request queues. It has puma_backlog metric containing number of requests waiting to be processed by workers.

Slide 26

Slide 26 text

And now you can autoscale! Use Kubernetes Horizontal Pod Autoscaler with KEDA React on saturation metrics showing that application is overloaded: request queue wait time for HTTP requests queue latency for Sidekiq (using yabeda-sidekiq) 1. USE method: https://www.brendangregg.com/usemethod.html ↩︎ Monitor USE metrics when it comes to performance! Utilization: number of free workers Saturation: time waiting for a free worker (95p) Errors: percentage of errors when processing requests [1]

Slide 27

Slide 27 text

Requests and limits How to (not) properly ask for resources

Slide 28

Slide 28 text

Requests and limits 101 For each container in a Pod: requests and limits for pod — sum of values of its containers requests: for k8s scheduler limits: for OS kernel on cluster nodes cpu — measured in CPU cores ( m is millicpu: thousandths of a processor core) memory — measured in bytes and multiples (mebibytes, gebibytes) https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/ resources: requests: cpu: 200m memory: 256Mi limits: cpu: 500m memory: 512Mi

Slide 29

Slide 29 text

Requests 101 Instructions for k8s scheduler to help it effectively distribute pods to nodes: Not used at runtime* Doesn’t consider actual consumption* only node capacity and requests of other pods * Used when setting up OOM, but more on that later Image: Ilya Cherepanov requests: cpu: 200m memory: 256Mi resources: limits: cpu: 500m memory: 512Mi

Slide 30

Slide 30 text

Limits 101 Not used at scheduling time Enforced by OS kernel on nodes cpu configures CPU throttling You can limit for less than 1 core. https://www.datadoghq.com/blog/kubernetes-cpu- requests-limits/ memory configures Out of Memory Killer Memory is not compressible resource, so container will be killed if it exceeds the limit limits: cpu: 400m memory: 512Mi resources: requests: cpu: 200m memory: 256Mi

Slide 31

Slide 31 text

Requests × Limits = QoS In various combinations of requests and limits, both container performance and lifetime can change significantly: Guaranteed — requests = limits CPU is guaranteed to be provided killed by OOM last Burstable — requests ≠ limits can use CPU more than requested killed after BestEffort in order of "greediness" BestEffort — no requests and limits CPU is provided last First to be killed by OOM killer Always specify both requests and limits! Image: Ilya Cherepanov

Slide 32

Slide 32 text

Threads, processes, and containers Counter-intuitive findings about Ruby performance with limited CPU

Slide 33

Slide 33 text

So, what’s with milliCPU again? In case of limits, configures CPU throttling — the fraction of processor time that can be used within 100ms. SRE, set up the CPUThrottlingHigh alert! Don't use fractional CPU limits for containers where response speed is important! More: https://engineering.indeedblog.com/blog/2019/12/unthrottled-fixing-cpu-limits-in-the-cloud/ cpu: 400m resources: requests: cpu: 200m memory: 256Mi limits: memory: 512Mi

Slide 34

Slide 34 text

Utilizing cores A process on languages with GIL (Ruby, Python) cannot utilize a processor core completely! Maximum 30-50%. Use both processes and threads: Run 2-4 worker processes per container (CoW works 🐮) 3-5 threads in each process ➕ Worker utilization will be improved by better distribution of requests to idle workers. More: https://www.speedshop.co/2017/10/12/appserver.html

Slide 35

Slide 35 text

Is many processes per container better? It is counter-intuitive, but it is more performant! 🤯 Nate Berkopec @nateberkopec · Follow Which web app server is more performant: 1. 1 container with 8 forked child processes 2. 8 containers with 1 process each, round robin/random load balancer in front It's #1 and it's not even close! This extremely relevant if you have to have to use random/RR load balancers. x.com/nateberkopec/status/1938336171559649415 Benchmarks by Nate Berkopec

Slide 36

Slide 36 text

Thank you! @Envek @Envek @[email protected] @envek.bsky.social github.com/Envek @evilmartians @evilmartians @[email protected] @evilmartians.com evilmartians.com Our awesome blog: evilmartians.com/chronicles! Slides are available at: envek.github.io/rubyconftw-nuances-on-kubernetes Slides ↑