AWS EC2... • to terminate an environment which is reaching its EOL • to scale quickly • to take advantage of containers' mobility • to save running costs • to make staging env more similar to production one • to make development env more similar to production one
environment which is reaching its EOL ☑ Scale quickly ☑ Take advantage of containers’ mobility ☑ Save running costs ☑ Make staging env more similar to production one ☐ Make development env more similar to production one (WIP)
traffic increase... 1. When the CPU usage of a services becomes higher 2. Then HPA increases the number of pods 3. If the CPU usage does not settle down, HPA keeps increasing the number of pods and eventually runs out of the CPU/Memory resources 4. Then the cluster increases the number of nodes
code it in a concise manner. The cluster will add/remove nodes to satisfy the pod resource requests automatically. HPA: Horizontal Pod Autoscaler (K8s) Settings for cluster autoscaling (Terraform) https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/ https://cloud.google.com/kubernetes-engine/docs/concepts/cluster-autoscaler
going to upgrade Ruby, then... • you do not have to update Chef cookbooks • you do not always have to stop the whole service • you can run tests on a container which is very similar to the production (or uses the same base image)
from similar to our production env ◦ Running on totally different infrastructure ◦ Using different gems • This time, a very similar and much cheaper environment could be prepared with preemptible machines ◦ Of course, it is not free, has a trade-off The app uses different gems on Heroku
CDN ◦ Cloud CDN requires L7 LB ◦ NGINX Ingress uses L4 LB ◦ -> We made Cloudfront handle traffics for /assets/* this time • Cannot use Google-managed DV certificate renewal ◦ The same reason as above ◦ -> We arranged our own Cert-Manager instead
limit, it will be killed with SIGKILL immediately. HTTP requests might be aborted (broken pipe) due to this behavior. https://github.com/kubernetes/kubernetes/issues/40157
and created our own liveness probe in Go that mark a container as “unhealthy” when it exceeds its memory limit. https://github.com/golang-samples/gopher-vector
to about x1.5-2.0 usage comparing Unicorn. This problem set us off trying the followings: • Reduce the number of arena by setting MALLOC_ARENA_MAX ◦ Worked well for our app ◦ Although it is not expensive, it is not free: space-time tradeoff (AFAIK) • Change the memory allocator such as jemalloc ◦ Worked well like a charm; adopted ◦ Growing usage seemed relatively slower ◦ This time we have chosen jemalloc 3.6 https://www.speedshop.co/2017/12/04/malloc-doubles-ruby-memory.html https://bugs.ruby-lang.org/issues/14718
application more robust. Although there were several challenges we had to struggle with, well-managed K8s for ordinary Rails apps seems one good choice today.