Service Mesh - fixing Microservice Architecture for good

29bfc9e2bc82b775a30af70be3196386?s=47 Hanna Prinz
September 29, 2020

Service Mesh - fixing Microservice Architecture for good

Abstract: Let’s be honest, sometimes we wish we could go back to the good old monolith. A single application that can be easily operated, secured, and monitored and that does not have to deal with all the challenges a network introduces. But instead, many companies have decided to go with Microservices, for many good reasons such as faster delivery and more independence for developer teams.

Yet, the cross-cutting concerns developers implement around the business logic seem to have gotten a bit out of hand. Think about monitoring, circuit breaking, canary releasing, TLS termination. This is exactly what a Service Mesh promises to change. It lifts monitoring, resilience, routing, and security into the infrastructure. Sounds too good to be true? Indeed, a Service Mesh does not come without a price: cognitive complexity, increased resource consumption, and latency.
We need to talk: about meaningful use cases for Service Meshes as well as the drawbacks and implementations such as Istio and Linkerd.

29bfc9e2bc82b775a30af70be3196386?s=128

Hanna Prinz

September 29, 2020
Tweet

Transcript

  1. Service Mesh fixing Microservice Architecture for good 2 9 .

    0 9 . 2 0 2 0 1 Hanna Prinz
  2. ~ Fix your Microservices by throwing a Mesh at it!

    ~
  3. How did we get here? @INNOQ @HannaPrinz

  4. Monolith Microservices @INNOQ @HannaPrinz

  5. Microservices @INNOQ @HannaPrinz

  6. Timeout Circuit Breaking Encryption Retry collect & emit Metrics Decryption

    Authorization Routing @INNOQ @HannaPrinz
  7. Service Mesh Metrics Config Retry Timeout Circuit Breaker Routing Encrypt

    Decrpyt Authorization Metrics ... } @INNOQ @HannaPrinz
  8. Microservices with Service Mesh Service Mesh Evolution Monolith Microservices In

    Theory Microservices in Practice @INNOQ @HannaPrinz
  9. Infrastruktur-Service Y Service Mesh Architecture Microservice 1 Microservice 2 Proxy

    Proxy Control Plane App Infrastructure-Service X Application Data Plane Control Plane Infrastructure @INNOQ @HannaPrinz
  10. Hurray, Technology! @INNOQ @HannaPrinz

  11. Service Mesh Features @INNOQ @HannaPrinz Observability Resilience Routing Security

  12. Monitoring A Service Mesh can automatically deliver all 4 "Golden

    Signals": Latency Traffic Volume Errors (Status Codes) Saturation ... but it cannot look into the Microservices' Business Logic https://landing.google.com/sre/sre-book/chapters/monitoring-distributed-systems/#xref_monitoring_golden-signals @INNOQ @HannaPrinz
  13. Monitoring with a Service Mesh Record Network Traffic Metrics ->

    Latency / Response Time -> HTTP Status Codes -> Requests per Second ... make them available to a Monitoring-System ... and visualize them with dashboards @INNOQ @HannaPrinz
  14. Order Shipping Invoicing Postgres Demo Application Service use neither code

    nor libraries for monitoring! https://github.com/ewolff/microservice-istio
  15. @INNOQ @HannaPrinz Istio

  16. Service Mesh Features @INNOQ @HannaPrinz Observability Resilience Routing Security

  17. Routing Typically implemented in the Edge Router / API Gateway

    e.g. NGINX, Envoy, Ambassador, Traefik Instance A Instance B Load Balancing Instance A Instance B Path-based Routing /a /b Instance A Instance B Blue/Green Deployment Instance A Instance B A/B-Testing 50% 50% Instance A Instance B Canary Releasing Berlin World 17 @INNOQ @HannaPrinz
  18. Routing with a Service Mesh Microservice 1 Microservice 2 Proxy

    Proxy Control Plane App Application Data Plane Control Plane Routing Rules 18 @INNOQ @HannaPrinz
  19. Routing with a Service Mesh GET /new GET / 90%

    10% Service 1 Service 2A Proxy Proxy Service 2B Proxy Complex Routing Rules for A/B Testing and Canary Releasing Service 1 Service 2 Proxy Proxy Service 2 Proxy PRODUKTION STAGING Traffic Mirroring locality=Berlin locality=* 19 @INNOQ @HannaPrinz
  20. Service Mesh Features @INNOQ @HannaPrinz Observability Resilience Routing Security

  21. Resilience What if a service is not available as expected?

    Goal: Overall system continues to function ... with restrictions where necessary Methods: Retry, Timeout, Circuit Breaking 21 500 @INNOQ @HannaPrinz
  22. Resilience with a Service Mesh Microservice 1 Microservice 2 Proxy

    Proxy Control Plane App Application Data Plane Control Plane Resilience Rules 22 @INNOQ @HannaPrinz
  23. Resilience with a Service Mesh Fault Injection Delay Injection Service

    1 Service 2 Proxy Proxy Timeout Retry Service 1 Service 2 Proxy Proxy 4s 502 23 @INNOQ @HannaPrinz
  24. Service Mesh Features @INNOQ @HannaPrinz Observability Resilience Routing Security

  25. Security with a Service Mesh Microservice 1 Microservice 2 Proxy

    Proxy Application Data Plane Control Plane Control Plane App Authorization Rules TLS-Certificate 25 @INNOQ @HannaPrinz
  26. Security with a Service Mesh Service 1 Service 2 Proxy

    Proxy Authentication with mTLS Authorization Service 1 Service 2 Proxy Proxy GET /api GET / Authorization Rule TLS-Certificate 26 @INNOQ @HannaPrinz "Service 1"
  27. Service Mesh Features Network metrics and access logs Emit tracing

    data to backend Timeouts & Retries Circuit Breaking Business metrics or logs Passing on tracing headers Alerting Use cache or standard responses in Circuit Breaker Automatic Canary Releasing Authentication with mTLS Authorization Complex routing rules Canary Releasing & A/B-Testing Observability Resilience Routing Security @INNOQ @HannaPrinz
  28. Service Mesh Market @INNOQ @HannaPrinz

  29. Service Mesh Implementations Istio @INNOQ @HannaPrinz

  30. @INNOQ @HannaPrinz

  31. Nice Table. @INNOQ @HannaPrinz

  32. Let's not forget about the price @INNOQ @HannaPrinz

  33. 33 Latency •Additional ~3ms Latency - for each call between

    services! •Depending on the service mesh implementation & your architecture Highly depending on your project → make your own benchmark! @INNOQ @HannaPrinz
  34. 34 Resources •Additional containers for Control Plane & sidecars •→

    increased CPU & memory consumption •Resource overhead is depending on •... the service mesh implementation •... the number of services/pods •... the traffic volume → make your own benchmark! @INNOQ @HannaPrinz
  35. 35 Complexity •non-happy-path customization •Moving functionality from services into the

    mesh (Retry/Timeout, mTLS) •Organizational aspects: Who owns the service mesh config? •Debugging •Debugging •Debugging @INNOQ @HannaPrinz ... but the real price of a Service Mesh is
  36. apiVersion: networking.istio.io/v1alpha3 kind: EnvoyFilter metadata: name: istio-attributegen-filter spec: workloadSelector: labels:

    app: reviews configPatches: - applyTo: HTTP_FILTER match: context: SIDECAR_INBOUND proxy: proxyVersion: '1\.6.*' listener: filterChain: filter: name: "envoy.http_connection_manager" subFilter: name: "istio.stats" patch: operation: INSERT_BEFORE value: name: istio.attributegen typed_config: "@type": type.googleapis.com/udpa.type.v1.TypedStruct type_url: type.googleapis.com/envoy.extensions.filters.http.wa value: config: configuration: | { "attributes": [ { "output_attribute": "istio_operationId", "match": [ { "value": "GET /users", "condition": "request.url_path == '/users' && }, { "value": "POST /order", "condition": "request.url_path == '/order' && }, { "value": "GET /invoice/{id}", "condition": "request.url_path.matches('^/invo && request.method == 'GET'" } ] } ] } vm_config: runtime: envoy.wasm.runtime.null code: local: { inline_string: "envoy.wasm.attributegen" } Service Mesh Magic is build on a lot of YAML apiVersion: networking.istio.io/v1alpha3 kind: EnvoyFilter metadata: name: istio-attributegen-filter spec: workloadSelector: labels: app: reviews configPatches: - applyTo: HTTP_FILTER match: context: SIDECAR_INBOUND proxy: proxyVersion: '1\.6.*' listener: filterChain: filter: name: "envoy.http_connection_manager" subFilter: name: "istio.stats" patch: operation: INSERT_BEFORE value: name: istio.attributegen typed_config: "@type": type.googleapis.com/udpa.type.v1.TypedStruct type_url: type.googleapis.com/envoy.extensions.filters.http.was value: config: configuration: | { "attributes": [ { "output_attribute": "istio_operationId", "match": [ { "value": "GET /users", "condition": "request.url_path == '/users' && r }, { "value": "POST /order", "condition": "request.url_path == '/order' && r }, { "value": "GET /invoice/{id}", "condition": "request.url_path.matches('^/invoi && request.method == 'GET'" } ] } ] } vm_config: runtime: envoy.wasm.runtime.null code: local: { inline_string: "envoy.wasm.attributegen" }
  37. TL;DR @INNOQ @HannaPrinz

  38. Service Mesh Solves many essential problems of microservices + Another

    complex piece of technology – ... without changing the code! Increased latency and resource consumption 38 @INNOQ @HannaPrinz
  39. Decision support Service Mesh Indicators Selection criteria • Many microservices,

    many synchronous calls • Many unsolved problems in monitoring, routing, resilience and/or security • Most services run in Kubernetes • Which features are really missing? • Existing infrastructure - Kubernetes, Consul, AWS, ... • Temporal and cognitive capacity in the team • Activity of the Community @INNOQ @HannaPrinz Objective: As much complexity as necessary, but as little as possible
  40. Complexity? Uhm... @INNOQ @HannaPrinz

  41. Monolith Microservices @INNOQ @HannaPrinz Modules?

  42. "don't distribute your objects." ☝ https://martinfowler.com/articles/distributed-objects-microservices.html Martin Fowler @INNOQ @HannaPrinz

  43. Alternatives? @INNOQ @HannaPrinz https://www.infoq.com/articles/architecture-trends-2020/

  44. Try not to need a Service Mesh

  45. More Service Mesh • Service Mesh Comparison at servicemesh.es https://servicemesh.es/

    • Blog Post: Happy without a Service Mesh https://innoq.com/en/blog/happy-without-a-service-mesh/ • Example-Application with Istio and Linkerd Tutorial on GitHub https://github.com/ewolff/microservice-istio https://github.com/ewolff/microservice-linkerd • Linkerd Tutorial https://linkerd.io/2/tasks/ • Istio Tutorial https://istio.io/docs/setup/getting-started/ @INNOQ @HannaPrinz
  46. Krischerstr. 100 40789 Monheim am Rhein Germany +49 2173 3366-0

    Ohlauer Str. 43 10999 Berlin Germany +49 2173 3366-0 Ludwigstr. 180E 63067 Offenbach Germany +49 2173 3366-0 Kreuzstr. 16 80331 München Germany +49 2173 3366-0 Hermannstrasse 13 20095 Hamburg Germany +49 2173 3366-0 Gewerbestr. 11 CH-6330 Cham Switzerland +41 41 743 0116 innoQ Deutschland GmbH innoQ Schweiz GmbH www.innoq.com Thank you! Questions? Hanna Prinz hanna.prinz@innoq.com @HannaPrinz Icons made by srip, Smashicons, Nikita Golubev, Freepik, surang and Darius Dan from www.flaticon.com and licensed by CC 3.0 BY Service Mesh Primer - 2nd Edition Free at leanpub.com/service-mesh-primer