Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Idobata on GKE - Moving an Ordinary Rails App

Idobata on GKE - Moving an Ordinary Rails App

Talked how was moving to GKE from a PaaS built on AWS EC2.
ESM Real Lounge: https://blog.agile.esm.co.jp/entry/2019/08/14/173316
ESM Agile Div: https://agile.esm.co.jp/en/

Hibariya Hi

August 21, 2019
Tweet

More Decks by Hibariya Hi

Other Decks in Programming

Transcript

  1. Hi there • Gentaro Terada (@hibariya) • Works at ESM,

    Inc. • A college student (engineering) • Likes Ruby, Internet, and Programming • https://hibariya.org
  2. Today we will talk about: How was moving a Rails

    app from EC2 to GKE? • Benefits • Challenges
  3. Motivations / Why Kubernetes Moving from a PaaS built on

    AWS EC2... • to terminate an environment which is reaching its EOL • to scale quickly • to take advantage of containers' mobility • to save running costs • to make staging env more similar to production one • to make development env more similar to production one
  4. Why GKE • Well managed (Monitoring, networking, logging, etc) •

    Amount of information source • Relatively newer K8s
  5. The components of a Rails app (Idobata) • Web (RoR)

    • SSE Server (Go) • Background Job (Que) • PostgreSQL (Cloud SQL) • Redis (Cloud Memorystore) • Memcached
  6. The components of a Rails app (Idobata) VPC Network Cloud

    Load Balancing (L4) PostgreSQL Cloud SQL Redis Cloud Memorystore K8s Cluster Kubernetes Engine Web Rails Background Job Workers Que Eventd (SSE Server) Memcached NGINX Ingress Controller Rails CDN AWS Cloudfront peering peering HTTP Traffics
  7. The things we have done this time ☑ Terminate an

    environment which is reaching its EOL ☑ Scale quickly ☑ Take advantage of containers’ mobility ☑ Save running costs ☑ Make staging env more similar to production one ☐ Make development env more similar to production one (WIP)
  8. HPA + Cluster Autoscaler Let’s say we got a sudden

    traffic increase... 1. When the CPU usage of a services becomes higher 2. Then HPA increases the number of pods 3. If the CPU usage does not settle down, HPA keeps increasing the number of pods and eventually runs out of the CPU/Memory resources 4. Then the cluster increases the number of nodes
  9. HPA + Cluster Autoscaler All you have to do is

    code it in a concise manner. The cluster will add/remove nodes to satisfy the pod resource requests automatically. HPA: Horizontal Pod Autoscaler (K8s) Settings for cluster autoscaling (Terraform) https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/ https://cloud.google.com/kubernetes-engine/docs/concepts/cluster-autoscaler
  10. Take advantage of containers’ mobility For example: when you are

    going to upgrade Ruby, then... • you do not have to update Chef cookbooks • you do not always have to stop the whole service • you can run tests on a container which is very similar to the production (or uses the same base image)
  11. Stopped using Heroku as a "staging" • It was far

    from similar to our production env ◦ Running on totally different infrastructure ◦ Using different gems • This time, a very similar and much cheaper environment could be prepared with preemptible machines ◦ Of course, it is not free, has a trade-off The app uses different gems on Heroku
  12. Options • Use NGINX Ingress • Use API gateway such

    as Ambassador ◦ did not try this time • Proxy traffics between GLBC and Rails by NGINX ◦ not sure if it is correct ◦ did not try this time
  13. Using NGINX Ingress on GKE: Pros • You can use

    NGINX to control the behavior • Applying configuration seems faster • The development environment will be more similar to the production
  14. Using NGINX Ingress on GKE: Cons • Cannot use Cloud

    CDN ◦ Cloud CDN requires L7 LB ◦ NGINX Ingress uses L4 LB ◦ -> We made Cloudfront handle traffics for /assets/* this time • Cannot use Google-managed DV certificate renewal ◦ The same reason as above ◦ -> We arranged our own Cert-Manager instead
  15. VPC Network Cloud Load Balancing (L4) PostgreSQL Cloud SQL Redis

    Cloud Memorystore K8s Cluster Kubernetes Engine Web Rails Background Job Workers Que Eventd (SSE Server) Memcached NGINX Ingress Controller Rails CDN AWS Cloudfront peering peering HTTP Traffics
  16. OOMKilled with 9 (SIGKILL) If a container exceeds its memory

    limit, it will be killed with SIGKILL immediately. HTTP requests might be aborted (broken pipe) due to this behavior. https://github.com/kubernetes/kubernetes/issues/40157
  17. How to terminate a container gradually? Disabled pod OOM Killer

    and created our own liveness probe in Go that mark a container as “unhealthy” when it exceeds its memory limit. https://github.com/golang-samples/gopher-vector
  18. Motivation • Single process seems match Docker • Puma is

    the default HTTP server from Rails 5.0 • Hopefully, it reduces request queueing time of our app
  19. A problem: memory usage grew up rapidly And came up

    to about x1.5-2.0 usage comparing Unicorn. This problem set us off trying the followings: • Reduce the number of arena by setting MALLOC_ARENA_MAX ◦ Worked well for our app ◦ Although it is not expensive, it is not free: space-time tradeoff (AFAIK) • Change the memory allocator such as jemalloc ◦ Worked well like a charm; adopted ◦ Growing usage seemed relatively slower ◦ This time we have chosen jemalloc 3.6 https://www.speedshop.co/2017/12/04/malloc-doubles-ruby-memory.html https://bugs.ruby-lang.org/issues/14718
  20. end

  21. Conclusion K8s on GKE made infrastructure-management easier and made our

    application more robust. Although there were several challenges we had to struggle with, well-managed K8s for ordinary Rails apps seems one good choice today.