Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Istio and the Service Mesh Architecture

Istio and the Service Mesh Architecture

DevOps BKK 2018


Manatsawin Hanmongkolchai

September 08, 2018


  1. Istio and the Service Mesh Architecture DevOps BKK 2018

  2. About me • Manatsawin Hanmongkolchai • Junior Architect at Wongnai

  3. How I sold Istio to my team

  4. How Wongnai monitor microservices

  5. Microservice monitoring • In-service metrics eg. controller time

  6. Microservice monitoring • AWS X-Ray SDK

  7. Microservice monitoring • Sentry

  8. Microservice monitoring • ELB Error Rate

  9. Microservice monitoring These must be integrated into your service AWS

  10. Microservice monitoring The problem in microservice world • Service can

    be written in many languages. Not all tools support every languages
  11. Microservice monitoring The problem in microservice world • People in

    a rush skip implementing proper monitoring
  12. Meet Istio

  13. Service mesh Istio handle interservice connection Sidecar

  14. How Istio sidecar work? Istio use admission controller to install

    2 containers in your pod
  15. How Istio sidecar work? 1. Init container to setup transparent

    proxy iptables rule (as root) 2. Envoy running alongside your app as the transparent proxy
  16. What Istio can do for you Monitoring • Network calls

    • Tracing
  17. Network monitoring Istio provide insight into your network in layer

  18. Total requests 4xx 5xx

  19. Request count of service Response time

  20. Service network monitoring Measured client side Request count Success rate

    Resp. time Speed (for TCP) Measured server side
  21. Who call me?

  22. Distributed Tracing • All incoming/outgoing HTTP calls are traced to

    Jaeger • Needs to propagate OpenTracing headers from incoming call to outgoing call to track calls correctly
  23. Distributed Tracing • Easiest way is to just integrate Zipkin

    OpenTracing into your app
  24. Distributed Tracing

  25. Distributed Tracing

  26. What Istio can do for you • Traffic Management ◦

    Routing ▪ Traffic Shifting ▪ Mirror ◦ Fault Injection ◦ Circuit Breaker
  27. Routing • Kubernetes service operates in Layer 4 Cluster IP

    Backend Backend Backend Req Req Req Req Req Req
  28. Routing • Istio operate in layer 7 and can do

    per-call load balancing Envoy Req Req Req Req Req Req Backend Backend Backend
  29. Split traffic • Split traffic between service (eg. 1% to

    new version)
  30. Mirror traffic • Test in production by cloning traffic Envoy

    Live version Test version Req
  31. Fault Injection • Intentionally making service worse • Why? Let’s

    hear a story
  32. Fault Injection Site Reliability Engineering How Google runs production systems

    landing.google.com /sre/book/
  33. #WongnaiIsHiring • Wongnai is looking for our first Site Reliability

    Engineer • careers.wongnai.com
  34. Chubby

  35. Fault Injection Over time, we found that the failures of

    the global instance of Chubby consistently generated service outages.
  36. Fault Injection As it turns out, true global Chubby outages

    are so infrequent that service owners began to add dependencies to Chubby assuming that it would never go down.
  37. Fault Injection The solution to this Chubby scenario is interesting:

    SRE makes sure that global Chubby meets, but does not significantly exceed, its service level objective.
  38. Fault Injection In any given quarter, if a true failure

    has not dropped availability below the target, a controlled outage will be synthesized by intentionally taking down the system.
  39. Fault Injection • Slow down services ◦ Delay 80% of

    requests for 5 seconds • Make errors ◦ Return 500 error code for 80% of requests
  40. Circuit Breaker Remove a backend from service if it return

    too many errors in a row Frontend Backend Work Queue 503 Timeout F5
  41. Summary Istio provide visibility and configurability to your network. This

    is traditionally done by adding library, but in a microservice world you need a cross language solution
  42. The catch Here’s what we found while moving to Istio

    • While requiring zero code changes, your service must already be well behaved cloud application
  43. The catch • Do not connect directly to pod IP

    (eg. no service discovery - just use cluster IP and avoid headless service)
  44. The catch • Do not mix port type in the

    cluster (eg. don’t run HTTP server on port 6379 with another pod running TCP service at the same port)
  45. The catch • Set the Host header to the destination.

    Don’t connect to gateway and set Host header to cooking. ◦ This case is really hard to debug...
  46. The catch • External services (ie. outside Kubernetes) but in

    the capturing IP range must have ServiceEntry defined ◦ ServiceEntry is cluster-wide
  47. Slides on speakerdeck.com/whs