Upgrade to Pro — share decks privately, control downloads, hide ads and more …

J-Fall 2019: What's a service mesh and why do i need one?

J-Fall 2019: What's a service mesh and why do i need one?

You’ve been creating this cloud-native microservice based architecture. Continuous delivery pipelines, cloud-based deployments, and Kubernetes managed Docker containers. You are ready to scale beyond your wildest dreams. Now, while taking a step back, you notice that your services contain more than just the business logic you intended to write. Proper communication is key in a distributed system, but do you really need these extra libraries that increase the size of your microservice? Should the responsibility for reliable communication live within your application or can this be abstracted to a higher level? In this session, we will look at the concept of a Services Mesh and how it helps you put the responsibilities at the right layer. After this presentation you might have an answer to the question if you really need a service mesh.

2c3ad1c6891845c582e0171e2e1753b1?s=128

Jeroen Reijn

October 31, 2019
Tweet

Transcript

  1. What’s a Service Mesh and why do I need one?

    Jeroen Reijn #jfall
  2. About me: • (Java) Programmer and architect • Big fan

    of the DevOps culture • Enjoys building cloud native solutions • Community member and emeritus committer at Apache Jeroen Reijn @jreijn /jeroenreijn
  3. Monolith? Microservices? Kubernetes? Cloud?

  4. Service mesh, ... istio, … service mesh

  5. Have you heard about a service mesh before? +

  6. None
  7. None
  8. So what is a ‘Service Mesh’ and what problem does

    it solve?
  9. “A service mesh is a dedicated infrastructure layer for handling

    service-to-service communication”
  10. Why a dedicated layer?

  11. Microservices Distributed systems Network communication

  12. None
  13. complex Reliable communication is

  14. Evolution networking

  15. The evolution of networking Computer B Computer A Service A

    Service B Networking Stack Networking Stack Business Logic Business Logic
  16. The evolution of networking Computer B Computer A Service A

    Service B Networking Stack Networking Stack Business Logic Flow control Business Logic Flow control
  17. The evolution of networking Computer B Computer A Networking Stack

    Service A Service B Networking Stack Business Logic Flow control Business Logic Flow control
  18. None
  19. The 8 Fallacies of Distributed Computing 1. The network is

    reliable 2. Latency is zero 3. Bandwidth is infinite 4. The network is secure 5. Topology doesn’t change 6. There is one administrator 7. Transport cost is zero 8. The network is homogeneous Composed by Peter Deutsch and his fellow engineers at Sun Microsystems
  20. Critical functions for microservices Fast, reliable & safe microservices microservice

    microservice microservice Routing Dynamic discovery Load balancing Resiliency Circuit breaking Retries Rate limiting Observability Metrics Logging Tracing Security Policy Enforcement
  21. Routing - Service discovery Registry client Registry client Registry client

    Registry client Registry client Registry client Registry client Registry client Registry client Registry Registry client Registry client Service A Service B Service C Service D Service D Service A Service A Service C Service C Service B Service B Service D Registry-aware HTTP client Service Registry
  22. Resilience

  23. Resilience - Cascading failure Service 1 Service 2 Service 3

    Service 4
  24. The Circuit Breaker pattern “A service client should invoke a

    remote service via a ‘proxy’ that functions in a similar fashion to an electrical circuit breaker” https://microservices.io/patterns/reliability/circuit-breaker.html
  25. Circuit breaker Half Open Failure threshold exceeded Set breaker Failure

    threshold exceeded Set breaker Try reset after timeout Success Reset breaker Open Closed Success Fail (under threshold)
  26. Observability of your services Golden triangle of monitoring Metrics Logs

    Traces
  27. Security of microservices • OAuth / JWT Tokens • Mutual

    TLS / certificates
  28. Computer B The evolution of networking Computer A Service A

    Service B Networking Stack Networking Stack Business Logic Flow control Flow control Circuit Breaker Service Discovery Business Logic Circuit Breaker Service Discovery Logs, metrics, traces Security Logs, metrics, traces Security
  29. Computer B The evolution of networking Computer A Service A

    Service B Networking Stack Networking Stack Business Logic Flow control Flow control ??? ??? Circuit Breaker Service Discovery Business Logic Circuit Breaker Service Discovery Logs, metrics, traces Security Logs, metrics, traces Security
  30. Computer B The evolution of networking Computer A Service A

    Service B Networking Stack Networking Stack Business Logic Flow control Flow control Library Library Circuit Breaker Service Discovery Business Logic Circuit Breaker Service Discovery Logs, metrics, traces Security Logs, metrics, traces Security
  31. Libraries resilience4j hystrix

  32. Drawbacks of libraries • Glue linking the libraries: expensive •

    Limiting tools, runtimes, languages • Versioning hell • Teams should not forget to add them
  33. Computer B The evolution of networking Computer A Service A

    Service B Networking Stack Networking Stack Business Logic Flow control Flow control Library Library Circuit Breaker Service Discovery Business Logic Circuit Breaker Service Discovery Logs, metrics, traces Security Logs, metrics, traces Security
  34. Computer B Service B The evolution of networking Computer A

    Service A Service B Networking Stack Service A Networking Stack Business Logic Flow control Library ??? Flow control Library ??? Circuit Breaker Service Discovery Business Logic Circuit Breaker Service Discovery Logs, metrics, traces Security Logs, metrics, traces Security
  35. Computer B Service B The evolution of networking Computer A

    Networking Stack Service A Networking Stack Business Logic Flow control ??? Circuit Breaker Service Discovery Logs, metrics, traces Security Proxy Flow control ??? Circuit Breaker Service Discovery Logs, metrics, traces Security Proxy Business Logic
  36. OSI Model Level 7 Application: Spring, Vertx, WFSwarm Level 6

    Presentation: Json, XML Level 5 Session: Http 1/2, GRPC Level 4 Transport: TCP Level 1-3 Network (IP) / Data link / Physical From here To here
  37. Computer B Service B The evolution of networking Computer A

    Networking Stack Service A Networking Stack Business Logic Flow control Proxy Proxy Circuit Breaker Service Discovery Logs, metrics, traces Security Flow control Proxy Proxy Circuit Breaker Service Discovery Logs, metrics, traces Security Business Logic
  38. Responsibility shift Development team(s) Platform team(s)

  39. The evolution of networking

  40. Computer B Service D First generation service mesh Computer A

    Service A Proxy Proxy Service B Service C
  41. Second generation service mesh - Pods and sidecars • Container

    platforms • Kubernetes • Mesos Node Pod Pod Container Proxy Container Proxy
  42. Computer B Service B The evolution of networking Computer A

    Networking Stack Service A Networking Stack Business Logic Flow control Proxy Sidecar Proxy Circuit Breaker Service Discovery Logs, metrics, traces Security Flow control Proxy Sidecar Proxy Circuit Breaker Service Discovery Logs, metrics, traces Security Business Logic
  43. Complex micro-service architectures 450 + microservices

  44. Controlling the service mesh Computer A Service A Networking Stack

    Business Logic Flow control Sidecar proxy Computer B Service B Networking Stack Business Logic Flow control Sidecar proxy Control plane
  45. The service mesh control plane Control plane

  46. Proxy based Service meshes

  47. Istio • An open platform to connect, monitor, and secure

    microservices • Introduced by Google, Lyft, IBM and others • Manages authentication, authorization, and encryption of communication between microservices • Logging, monitoring, and keeping services operational • Traffic management and policy control
  48. Istio - Architecture B

  49. Envoy Proxy • Dynamic service discovery • Load balancing •

    TLS termination • HTTP/2 and gRPC proxies • Circuit breakers • Health checks • Staged rollouts with %-based traffic split • Fault injection • Rich metrics
  50. Istio - Proxy configuration YAML

  51. Istio - Discovery and Load-balancing

  52. Istio - Tracing • Automatic tracing of request • Asynchronous

    span reporting • Multiple backends • Zipkin • Jaeger
  53. Istio - Telemetry

  54. Istio - Advanced routing

  55. Istio - Security / Two way TLS

  56. Istio Security - RBAC • Role based access control •

    Based on rules and for instance HTTP methods • ServiceRole (rule) • ServiceRoleBinding (assign role to set of nodes)
  57. Istio gives you: • Telemetry • Security • Mutual TLS

    • Role based access control • Resilience • Circuit-breaker • Retry • Advanced routing
  58. Demo

  59. Overhead • Definitely not ‘free’, more parts in the system

    • Proxies are used for both inbound and outbound requests • A lot of effort going on to reduce overhead
  60. Debugging • Debugging Envoy and Pilot (configuration) • Networking Issues

    • TLS issues • Envoy bouncing requests • …
  61. Security • Many new parts of the system • Control

    plane components • Proxies • Envoys are everywhere • Role based access control
  62. Istio • Telemetry • Security • Circuit-breaker • Retry •

    Advanced routing What you (want to) get What you (don’t want to) get • Overhead • Debugging • Security complexity
  63. But are all service meshes equal? So we saw Istio…

  64. Comparing Service Meshes Source: https://kubedex.com/istio-vs-linkerd-vs-linkerd2-vs-consul/ (Sept 2018)

  65. https://smi-spec.io

  66. Do I really need a service mesh?

  67. Throwing more tech at the problem…

  68. Do you want to configure, install and renew (mutual) TLS

    certificates across an entire set of applications?
  69. Do you want to intercept and re-route network flows for:

    A/B testing, traffic shedding or failure tolerance (circuit breaking)?
  70. Do you want tracing / visibility of application request flows

    within your micro-service network?
  71. Should I just remove libraries from my apps?

  72. Istio - Circuit breaking - DestinationRule

  73. Istio - Circuit breaking - DestinationRule

  74. Spring + Hystrix Circuit breaker fallback Note: Hystrix is deprecated

    and only used as an example
  75. Spring + Hystrix Circuit breaker fallback Note: Hystrix is deprecated

    and only used as an example
  76. Tracing

  77. As an engineer you should still think about these concerns

  78. Key take-aways from this talk • A service mesh is

    a dedicated infra layer for service communication • Understand the why of using a service mesh • Understand the operational complexity, but also the benefits e.g. transparently adds cross-cutting concerns to a microservices architecture • Think about where you want to solve specific problems
  79. “Please rate my talk in the official J-Fall app” #jfall