Implementation of service mesh using Envoy and Central Dogma

Implementation of service mesh using Envoy and Central Dogma

Ryo Fukumuro
LINE LINE Development Department 1 Software Engineer
https://linedevday.linecorp.com/jp/2019/sessions/S1-01

Be4518b119b8eb017625e0ead20f8fe7?s=128

LINE DevDay 2019

November 20, 2019
Tweet

Transcript

  1. 2019 DevDay Implementation of Service Mesh Using Envoy and Central

    Dogma > Ryo Fukumuro > LINE LINE Development Department 1 Software Engineer
  2. > Observability • Unified metrics, logging, tracing, etc > Fault

    Tolerance • Circuit Breakers, Timeout, Retries, etc Microservices Challenges
  3. Sidecar Proxy Pattern > Intercept all communication between services by

    sidecar proxy Service A Sidecar Proxy Service B Sidecar Proxy Service C Sidecar Proxy > Load Balancing > Metrics > Tracing > Retries > Circuit Breaker > Etc
  4. Control Plane > Central component for managing proxy (data plane)

    settings Service A Sidecar Proxy Service B Sidecar Proxy Service C Sidecar Proxy Control Plane Core Component of Service Mesh
  5. Data Plane Implementation Envoy > L7 Proxy > L3/L4 Proxy

    > HTTP/2 support > gRPC support > Thrift support (alpha) > Health checking > Load balancing > Tracing (Zipkin, Jaeger, etc) > Unified Metrics > External Authorization > Automatic retries > Circuit breaking > Rate limiting > Request shadowing > Outlier detection > Lua Scripting > Rich API (Dynamic configuration) > And more • Redis Proxy, Mongo Proxy, …
  6. An Experimental Control Plane for Envoy at LINE Lish >

    Designed for bare metal • (We may eventually move to in-house managed k8s cluster.) > Simple config generation pipeline > Speak Envoy’s xDS protocol (gRPC interface provided by Envoy) > Access log aggregator > Built with Scala, Armeria, Central Dogma, Monix, etc
  7. Ƃ Central Dogma > Repository service for configuration > Version

    controlled > Change notification > Fine-grained access control > Mirroring from an external Git repository > Built-in Online Editor
  8. Lish Architecture GHE Repository Central Dogma Repository Mirroring Consul Lish

    PMC Service Envoy Consul Agent > Clusters > Routes > Listeners > Runtime Values > Etc Service Discovery Generate Envoy Config Distribute Envoy Config (Bidirectional streaming gRPC) * PMC: In-house service registry Code Review
  9. Lish Config Resources > YAML format > Minimal abstraction •

    Raw Envoy config (+ Templating to reduce boilerplate) • + Domain specific service discovery
  10. Raw Envoy Resources Raw envoy config format

  11. Consul Based Endpoint Discovery

  12. CI/CD for Config Files > Drone CI • Matrix Build

    (Stage x Envoy version) > Continuous Testing • Launch Envoy + Lish, and run check scripts (send HC requests) • Will support more complex scenario > Semi-Automated Delivery • ChatOps (Slack) • beta -> staging -> canary -> release
  13. Global Dashboard

  14. Global Dashboard

  15. Service to Service Dashboard

  16. Outcome > Fine-grained traffic control • Bring LB management back

    to developers • Made server operations easy (e.g., cluster migration with traffic splitting) > Better observability • Easy to identify root cause of outage (Relatively !) • Easy to write cross-component alert rules thanks to unified metrics
  17. Gotchas > Need to understand how Envoy works, especially regarding

    load balancing • Case Study: We have experienced outage where requests to a services are unintentionally routed to nodes in maintenance-mode because of Panic mode caused by adding too many new nodes at once. > Wrong configuration could break mesh entirely • It is very important to provide a way to test config with staging/canary env
  18. Thank You