Hi there ● Gentaro Terada (@hibariya) ● Works at ESM, Inc. ● A college student (engineering) ● Likes Ruby, Internet, and Programming ● https://hibariya.org
Motivations / Why Kubernetes Moving from a PaaS built on AWS EC2... ● to terminate an environment which is reaching its EOL ● to scale quickly ● to take advantage of containers' mobility ● to save running costs ● to make staging env more similar to production one ● to make development env more similar to production one
The things we have done this time ☑ Terminate an environment which is reaching its EOL ☑ Scale quickly ☑ Take advantage of containers’ mobility ☑ Save running costs ☑ Make staging env more similar to production one ☐ Make development env more similar to production one (WIP)
HPA + Cluster Autoscaler Let’s say we got a sudden traffic increase... 1. When the CPU usage of a services becomes higher 2. Then HPA increases the number of pods 3. If the CPU usage does not settle down, HPA keeps increasing the number of pods and eventually runs out of the CPU/Memory resources 4. Then the cluster increases the number of nodes
HPA + Cluster Autoscaler All you have to do is code it in a concise manner. The cluster will add/remove nodes to satisfy the pod resource requests automatically. HPA: Horizontal Pod Autoscaler (K8s) Settings for cluster autoscaling (Terraform) https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/ https://cloud.google.com/kubernetes-engine/docs/concepts/cluster-autoscaler
Take advantage of containers’ mobility For example: when you are going to upgrade Ruby, then... ● you do not have to update Chef cookbooks ● you do not always have to stop the whole service ● you can run tests on a container which is very similar to the production (or uses the same base image)
Stopped using Heroku as a "staging" ● It was far from similar to our production env ○ Running on totally different infrastructure ○ Using different gems ● This time, a very similar and much cheaper environment could be prepared with preemptible machines ○ Of course, it is not free, has a trade-off The app uses different gems on Heroku
Options ● Use NGINX Ingress ● Use API gateway such as Ambassador ○ did not try this time ● Proxy traffics between GLBC and Rails by NGINX ○ not sure if it is correct ○ did not try this time
Using NGINX Ingress on GKE: Pros ● You can use NGINX to control the behavior ● Applying configuration seems faster ● The development environment will be more similar to the production
Using NGINX Ingress on GKE: Cons ● Cannot use Cloud CDN ○ Cloud CDN requires L7 LB ○ NGINX Ingress uses L4 LB ○ -> We made Cloudfront handle traffics for /assets/* this time ● Cannot use Google-managed DV certificate renewal ○ The same reason as above ○ -> We arranged our own Cert-Manager instead
OOMKilled with 9 (SIGKILL) If a container exceeds its memory limit, it will be killed with SIGKILL immediately. HTTP requests might be aborted (broken pipe) due to this behavior. https://github.com/kubernetes/kubernetes/issues/40157
How to terminate a container gradually? Disabled pod OOM Killer and created our own liveness probe in Go that mark a container as “unhealthy” when it exceeds its memory limit. https://github.com/golang-samples/gopher-vector
Motivation ● Single process seems match Docker ● Puma is the default HTTP server from Rails 5.0 ● Hopefully, it reduces request queueing time of our app
A problem: memory usage grew up rapidly And came up to about x1.5-2.0 usage comparing Unicorn. This problem set us off trying the followings: ● Reduce the number of arena by setting MALLOC_ARENA_MAX ○ Worked well for our app ○ Although it is not expensive, it is not free: space-time tradeoff (AFAIK) ● Change the memory allocator such as jemalloc ○ Worked well like a charm; adopted ○ Growing usage seemed relatively slower ○ This time we have chosen jemalloc 3.6 https://www.speedshop.co/2017/12/04/malloc-doubles-ruby-memory.html https://bugs.ruby-lang.org/issues/14718
Conclusion K8s on GKE made infrastructure-management easier and made our application more robust. Although there were several challenges we had to struggle with, well-managed K8s for ordinary Rails apps seems one good choice today.