Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Idobata on GKE - Moving an Ordinary Rails App

Idobata on GKE - Moving an Ordinary Rails App

Talked how was moving to GKE from a PaaS built on AWS EC2.
ESM Real Lounge: https://blog.agile.esm.co.jp/entry/2019/08/14/173316
ESM Agile Div: https://agile.esm.co.jp/en/

Hibariya Hi

August 21, 2019
Tweet

More Decks by Hibariya Hi

Other Decks in Programming

Transcript

  1. Idobata on GKE
    Moving an Ordinary Rails App
    2019/08/21 ESM Real Lounge

    View Slide

  2. Hi there
    ● Gentaro Terada (@hibariya)
    ● Works at ESM, Inc.
    ● A college student (engineering)
    ● Likes Ruby, Internet, and Programming
    ● https://hibariya.org

    View Slide

  3. Today we will talk about:
    How was moving a Rails app from EC2 to GKE?
    ● Benefits
    ● Challenges

    View Slide

  4. Motivations / Why Kubernetes
    Moving from a PaaS built on AWS EC2...
    ● to terminate an environment which is reaching its EOL
    ● to scale quickly
    ● to take advantage of containers' mobility
    ● to save running costs
    ● to make staging env more similar to production one
    ● to make development env more similar to production
    one

    View Slide

  5. Why GKE
    ● Well managed (Monitoring, networking, logging, etc)
    ● Amount of information source
    ● Relatively newer K8s

    View Slide

  6. IaC: Terraform + Kubernetes
    Terraform for managing cloud resources that are not
    cared for by K8s.
    +

    View Slide

  7. The components of a Rails app (Idobata)
    ● Web (RoR)
    ● SSE Server (Go)
    ● Background Job (Que)
    ● PostgreSQL (Cloud SQL)
    ● Redis (Cloud Memorystore)
    ● Memcached

    View Slide

  8. The components of a Rails app (Idobata)
    VPC Network
    Cloud Load
    Balancing (L4)
    PostgreSQL
    Cloud SQL
    Redis
    Cloud Memorystore
    K8s Cluster
    Kubernetes
    Engine
    Web
    Rails
    Background Job Workers
    Que
    Eventd (SSE Server)
    Memcached
    NGINX Ingress Controller
    Rails
    CDN
    AWS Cloudfront
    peering
    peering
    HTTP Traffics

    View Slide

  9. The things we have done this time
    ☑ Terminate an environment which is reaching its EOL
    ☑ Scale quickly
    ☑ Take advantage of containers’ mobility
    ☑ Save running costs
    ☑ Make staging env more similar to production one
    ☐ Make development env more similar to production one
    (WIP)

    View Slide

  10. Scale Quickly

    View Slide

  11. HPA + Cluster Autoscaler
    Let’s say we got a sudden traffic increase...
    1. When the CPU usage of a services becomes higher
    2. Then HPA increases the number of pods
    3. If the CPU usage does not settle down, HPA keeps
    increasing the number of pods and eventually runs out
    of the CPU/Memory resources
    4. Then the cluster increases the number of nodes

    View Slide

  12. HPA + Cluster Autoscaler
    All you have to do is code it in a concise manner. The
    cluster will add/remove nodes to satisfy the pod resource
    requests automatically.
    HPA: Horizontal Pod Autoscaler (K8s) Settings for cluster autoscaling (Terraform)
    https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/
    https://cloud.google.com/kubernetes-engine/docs/concepts/cluster-autoscaler

    View Slide

  13. Take Advantage of Containers’ Mobility

    View Slide

  14. Take advantage of containers’ mobility
    For example: when you are going to upgrade Ruby, then...
    ● you do not have to update Chef cookbooks
    ● you do not always have to stop the whole service
    ● you can run tests on a container which is very similar to
    the production (or uses the same base image)

    View Slide

  15. Make Staging Env More Similar to
    Production One

    View Slide

  16. Stopped using Heroku as a "staging"
    ● It was far from similar to our production env
    ○ Running on totally different infrastructure
    ○ Using different gems
    ● This time, a very similar and much cheaper
    environment could be prepared with preemptible
    machines
    ○ Of course, it is not free, has a trade-off The app uses different gems on Heroku

    View Slide

  17. Challenges

    View Slide

  18. Keep Existing NGINX-rate-limiting Spec

    View Slide

  19. Options
    ● Use NGINX Ingress
    ● Use API gateway such as Ambassador
    ○ did not try this time
    ● Proxy traffics between GLBC and Rails by NGINX
    ○ not sure if it is correct
    ○ did not try this time

    View Slide

  20. Using NGINX Ingress on GKE: Pros
    ● You can use NGINX to control the behavior
    ● Applying configuration seems faster
    ● The development environment will be more similar to
    the production

    View Slide

  21. Using NGINX Ingress on GKE: Cons
    ● Cannot use Cloud CDN
    ○ Cloud CDN requires L7 LB
    ○ NGINX Ingress uses L4 LB
    ○ -> We made Cloudfront handle traffics for /assets/* this time
    ● Cannot use Google-managed DV certificate renewal
    ○ The same reason as above
    ○ -> We arranged our own Cert-Manager instead

    View Slide

  22. VPC Network
    Cloud Load
    Balancing (L4)
    PostgreSQL
    Cloud SQL
    Redis
    Cloud Memorystore
    K8s Cluster
    Kubernetes
    Engine
    Web
    Rails
    Background Job Workers
    Que
    Eventd (SSE Server)
    Memcached
    NGINX Ingress Controller
    Rails
    CDN
    AWS Cloudfront
    peering
    peering
    HTTP Traffics

    View Slide

  23. Managing Pod Memory Resource

    View Slide

  24. OOMKilled with 9 (SIGKILL)
    If a container exceeds its memory limit, it will be killed with
    SIGKILL immediately. HTTP requests might be aborted
    (broken pipe) due to this behavior.
    https://github.com/kubernetes/kubernetes/issues/40157

    View Slide

  25. How to terminate a container gradually?
    Disabled pod OOM Killer and created our own liveness
    probe in Go that mark a container as “unhealthy” when it
    exceeds its memory limit.
    https://github.com/golang-samples/gopher-vector

    View Slide

  26. Moving from Unicorn to Puma

    View Slide

  27. Motivation
    ● Single process seems match Docker
    ● Puma is the default HTTP server from Rails 5.0
    ● Hopefully, it reduces request queueing time of our app

    View Slide

  28. A problem: memory usage grew up rapidly
    And came up to about x1.5-2.0 usage comparing Unicorn.
    This problem set us off trying the followings:
    ● Reduce the number of arena by setting
    MALLOC_ARENA_MAX
    ○ Worked well for our app
    ○ Although it is not expensive, it is not free: space-time tradeoff (AFAIK)
    ● Change the memory allocator such as jemalloc
    ○ Worked well like a charm; adopted
    ○ Growing usage seemed relatively slower
    ○ This time we have chosen jemalloc 3.6
    https://www.speedshop.co/2017/12/04/malloc-doubles-ruby-memory.html
    https://bugs.ruby-lang.org/issues/14718

    View Slide

  29. end

    View Slide

  30. Conclusion
    K8s on GKE made infrastructure-management easier and
    made our application more robust. Although there were
    several challenges we had to struggle with, well-managed
    K8s for ordinary Rails apps seems one good choice today.

    View Slide