Idobata on GKE - Moving an Ordinary Rails App

Slide 1

Slide 1 text

Idobata on GKE Moving an Ordinary Rails App 2019/08/21 ESM Real Lounge

Slide 2

Slide 2 text

Hi there ● Gentaro Terada (@hibariya) ● Works at ESM, Inc. ● A college student (engineering) ● Likes Ruby, Internet, and Programming ● https://hibariya.org

Slide 3

Slide 3 text

Today we will talk about: How was moving a Rails app from EC2 to GKE? ● Beneﬁts ● Challenges

Slide 4

Slide 4 text

Motivations / Why Kubernetes Moving from a PaaS built on AWS EC2... ● to terminate an environment which is reaching its EOL ● to scale quickly ● to take advantage of containers' mobility ● to save running costs ● to make staging env more similar to production one ● to make development env more similar to production one

Slide 5

Slide 5 text

Why GKE ● Well managed (Monitoring, networking, logging, etc) ● Amount of information source ● Relatively newer K8s

Slide 6

Slide 6 text

IaC: Terraform + Kubernetes Terraform for managing cloud resources that are not cared for by K8s. +

Slide 7

Slide 7 text

The components of a Rails app (Idobata) ● Web (RoR) ● SSE Server (Go) ● Background Job (Que) ● PostgreSQL (Cloud SQL) ● Redis (Cloud Memorystore) ● Memcached

Slide 8

Slide 8 text

The components of a Rails app (Idobata) VPC Network Cloud Load Balancing (L4) PostgreSQL Cloud SQL Redis Cloud Memorystore K8s Cluster Kubernetes Engine Web Rails Background Job Workers Que Eventd (SSE Server) Memcached NGINX Ingress Controller Rails CDN AWS Cloudfront peering peering HTTP Traffics

Slide 9

Slide 9 text

The things we have done this time ☑ Terminate an environment which is reaching its EOL ☑ Scale quickly ☑ Take advantage of containers’ mobility ☑ Save running costs ☑ Make staging env more similar to production one ☐ Make development env more similar to production one (WIP)

Slide 10

Slide 10 text

Scale Quickly

Slide 11

Slide 11 text

HPA + Cluster Autoscaler Let’s say we got a sudden traﬃc increase... 1. When the CPU usage of a services becomes higher 2. Then HPA increases the number of pods 3. If the CPU usage does not settle down, HPA keeps increasing the number of pods and eventually runs out of the CPU/Memory resources 4. Then the cluster increases the number of nodes

Slide 12

Slide 12 text

HPA + Cluster Autoscaler All you have to do is code it in a concise manner. The cluster will add/remove nodes to satisfy the pod resource requests automatically. HPA: Horizontal Pod Autoscaler (K8s) Settings for cluster autoscaling (Terraform) https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/ https://cloud.google.com/kubernetes-engine/docs/concepts/cluster-autoscaler

Slide 13

Slide 13 text

Take Advantage of Containers’ Mobility

Slide 14

Slide 14 text

Take advantage of containers’ mobility For example: when you are going to upgrade Ruby, then... ● you do not have to update Chef cookbooks ● you do not always have to stop the whole service ● you can run tests on a container which is very similar to the production (or uses the same base image)

Slide 15

Slide 15 text

Make Staging Env More Similar to Production One

Slide 16

Slide 16 text

Stopped using Heroku as a "staging" ● It was far from similar to our production env ○ Running on totally different infrastructure ○ Using different gems ● This time, a very similar and much cheaper environment could be prepared with preemptible machines ○ Of course, it is not free, has a trade-off The app uses different gems on Heroku

Slide 17

Slide 17 text

Challenges

Slide 18

Slide 18 text

Keep Existing NGINX-rate-limiting Spec

Slide 19

Slide 19 text

Options ● Use NGINX Ingress ● Use API gateway such as Ambassador ○ did not try this time ● Proxy traﬃcs between GLBC and Rails by NGINX ○ not sure if it is correct ○ did not try this time

Slide 20

Slide 20 text

Using NGINX Ingress on GKE: Pros ● You can use NGINX to control the behavior ● Applying conﬁguration seems faster ● The development environment will be more similar to the production

Slide 21

Slide 21 text

Using NGINX Ingress on GKE: Cons ● Cannot use Cloud CDN ○ Cloud CDN requires L7 LB ○ NGINX Ingress uses L4 LB ○ -> We made Cloudfront handle traﬃcs for /assets/* this time ● Cannot use Google-managed DV certiﬁcate renewal ○ The same reason as above ○ -> We arranged our own Cert-Manager instead

Slide 22

Slide 22 text

VPC Network Cloud Load Balancing (L4) PostgreSQL Cloud SQL Redis Cloud Memorystore K8s Cluster Kubernetes Engine Web Rails Background Job Workers Que Eventd (SSE Server) Memcached NGINX Ingress Controller Rails CDN AWS Cloudfront peering peering HTTP Traffics

Slide 23

Slide 23 text

Managing Pod Memory Resource

Slide 24

Slide 24 text

OOMKilled with 9 (SIGKILL) If a container exceeds its memory limit, it will be killed with SIGKILL immediately. HTTP requests might be aborted (broken pipe) due to this behavior. https://github.com/kubernetes/kubernetes/issues/40157

Slide 25

Slide 25 text

How to terminate a container gradually? Disabled pod OOM Killer and created our own liveness probe in Go that mark a container as “unhealthy” when it exceeds its memory limit. https://github.com/golang-samples/gopher-vector

Slide 26

Slide 26 text

Moving from Unicorn to Puma

Slide 27

Slide 27 text

Motivation ● Single process seems match Docker ● Puma is the default HTTP server from Rails 5.0 ● Hopefully, it reduces request queueing time of our app

Slide 28

Slide 28 text

A problem: memory usage grew up rapidly And came up to about x1.5-2.0 usage comparing Unicorn. This problem set us oﬀ trying the followings: ● Reduce the number of arena by setting MALLOC_ARENA_MAX ○ Worked well for our app ○ Although it is not expensive, it is not free: space-time tradeoﬀ (AFAIK) ● Change the memory allocator such as jemalloc ○ Worked well like a charm; adopted ○ Growing usage seemed relatively slower ○ This time we have chosen jemalloc 3.6 https://www.speedshop.co/2017/12/04/malloc-doubles-ruby-memory.html https://bugs.ruby-lang.org/issues/14718

Slide 29

Slide 29 text

end

Slide 30

Slide 30 text

Conclusion K8s on GKE made infrastructure-management easier and made our application more robust. Although there were several challenges we had to struggle with, well-managed K8s for ordinary Rails apps seems one good choice today.