Slide 1

Slide 1 text

What’s a Service Mesh and why do I need one? Jeroen Reijn #jfall

Slide 2

Slide 2 text

About me: • (Java) Programmer and architect • Big fan of the DevOps culture • Enjoys building cloud native solutions • Community member and emeritus committer at Apache Jeroen Reijn @jreijn /jeroenreijn

Slide 3

Slide 3 text

Monolith? Microservices? Kubernetes? Cloud?

Slide 4

Slide 4 text

Service mesh, ... istio, … service mesh

Slide 5

Slide 5 text

Have you heard about a service mesh before? +

Slide 6

Slide 6 text

No content

Slide 7

Slide 7 text

No content

Slide 8

Slide 8 text

So what is a ‘Service Mesh’ and what problem does it solve?

Slide 9

Slide 9 text

“A service mesh is a dedicated infrastructure layer for handling service-to-service communication”

Slide 10

Slide 10 text

Why a dedicated layer?

Slide 11

Slide 11 text

Microservices Distributed systems Network communication

Slide 12

Slide 12 text

No content

Slide 13

Slide 13 text

complex Reliable communication is

Slide 14

Slide 14 text

Evolution networking

Slide 15

Slide 15 text

The evolution of networking Computer B Computer A Service A Service B Networking Stack Networking Stack Business Logic Business Logic

Slide 16

Slide 16 text

The evolution of networking Computer B Computer A Service A Service B Networking Stack Networking Stack Business Logic Flow control Business Logic Flow control

Slide 17

Slide 17 text

The evolution of networking Computer B Computer A Networking Stack Service A Service B Networking Stack Business Logic Flow control Business Logic Flow control

Slide 18

Slide 18 text

No content

Slide 19

Slide 19 text

The 8 Fallacies of Distributed Computing 1. The network is reliable 2. Latency is zero 3. Bandwidth is infinite 4. The network is secure 5. Topology doesn’t change 6. There is one administrator 7. Transport cost is zero 8. The network is homogeneous Composed by Peter Deutsch and his fellow engineers at Sun Microsystems

Slide 20

Slide 20 text

Critical functions for microservices Fast, reliable & safe microservices microservice microservice microservice Routing Dynamic discovery Load balancing Resiliency Circuit breaking Retries Rate limiting Observability Metrics Logging Tracing Security Policy Enforcement

Slide 21

Slide 21 text

Routing - Service discovery Registry client Registry client Registry client Registry client Registry client Registry client Registry client Registry client Registry client Registry Registry client Registry client Service A Service B Service C Service D Service D Service A Service A Service C Service C Service B Service B Service D Registry-aware HTTP client Service Registry

Slide 22

Slide 22 text

Resilience

Slide 23

Slide 23 text

Resilience - Cascading failure Service 1 Service 2 Service 3 Service 4

Slide 24

Slide 24 text

The Circuit Breaker pattern “A service client should invoke a remote service via a ‘proxy’ that functions in a similar fashion to an electrical circuit breaker” https://microservices.io/patterns/reliability/circuit-breaker.html

Slide 25

Slide 25 text

Circuit breaker Half Open Failure threshold exceeded Set breaker Failure threshold exceeded Set breaker Try reset after timeout Success Reset breaker Open Closed Success Fail (under threshold)

Slide 26

Slide 26 text

Observability of your services Golden triangle of monitoring Metrics Logs Traces

Slide 27

Slide 27 text

Security of microservices • OAuth / JWT Tokens • Mutual TLS / certificates

Slide 28

Slide 28 text

Computer B The evolution of networking Computer A Service A Service B Networking Stack Networking Stack Business Logic Flow control Flow control Circuit Breaker Service Discovery Business Logic Circuit Breaker Service Discovery Logs, metrics, traces Security Logs, metrics, traces Security

Slide 29

Slide 29 text

Computer B The evolution of networking Computer A Service A Service B Networking Stack Networking Stack Business Logic Flow control Flow control ??? ??? Circuit Breaker Service Discovery Business Logic Circuit Breaker Service Discovery Logs, metrics, traces Security Logs, metrics, traces Security

Slide 30

Slide 30 text

Computer B The evolution of networking Computer A Service A Service B Networking Stack Networking Stack Business Logic Flow control Flow control Library Library Circuit Breaker Service Discovery Business Logic Circuit Breaker Service Discovery Logs, metrics, traces Security Logs, metrics, traces Security

Slide 31

Slide 31 text

Libraries resilience4j hystrix

Slide 32

Slide 32 text

Drawbacks of libraries • Glue linking the libraries: expensive • Limiting tools, runtimes, languages • Versioning hell • Teams should not forget to add them

Slide 33

Slide 33 text

Computer B The evolution of networking Computer A Service A Service B Networking Stack Networking Stack Business Logic Flow control Flow control Library Library Circuit Breaker Service Discovery Business Logic Circuit Breaker Service Discovery Logs, metrics, traces Security Logs, metrics, traces Security

Slide 34

Slide 34 text

Computer B Service B The evolution of networking Computer A Service A Service B Networking Stack Service A Networking Stack Business Logic Flow control Library ??? Flow control Library ??? Circuit Breaker Service Discovery Business Logic Circuit Breaker Service Discovery Logs, metrics, traces Security Logs, metrics, traces Security

Slide 35

Slide 35 text

Computer B Service B The evolution of networking Computer A Networking Stack Service A Networking Stack Business Logic Flow control ??? Circuit Breaker Service Discovery Logs, metrics, traces Security Proxy Flow control ??? Circuit Breaker Service Discovery Logs, metrics, traces Security Proxy Business Logic

Slide 36

Slide 36 text

OSI Model Level 7 Application: Spring, Vertx, WFSwarm Level 6 Presentation: Json, XML Level 5 Session: Http 1/2, GRPC Level 4 Transport: TCP Level 1-3 Network (IP) / Data link / Physical From here To here

Slide 37

Slide 37 text

Computer B Service B The evolution of networking Computer A Networking Stack Service A Networking Stack Business Logic Flow control Proxy Proxy Circuit Breaker Service Discovery Logs, metrics, traces Security Flow control Proxy Proxy Circuit Breaker Service Discovery Logs, metrics, traces Security Business Logic

Slide 38

Slide 38 text

Responsibility shift Development team(s) Platform team(s)

Slide 39

Slide 39 text

The evolution of networking

Slide 40

Slide 40 text

Computer B Service D First generation service mesh Computer A Service A Proxy Proxy Service B Service C

Slide 41

Slide 41 text

Second generation service mesh - Pods and sidecars • Container platforms • Kubernetes • Mesos Node Pod Pod Container Proxy Container Proxy

Slide 42

Slide 42 text

Computer B Service B The evolution of networking Computer A Networking Stack Service A Networking Stack Business Logic Flow control Proxy Sidecar Proxy Circuit Breaker Service Discovery Logs, metrics, traces Security Flow control Proxy Sidecar Proxy Circuit Breaker Service Discovery Logs, metrics, traces Security Business Logic

Slide 43

Slide 43 text

Complex micro-service architectures 450 + microservices

Slide 44

Slide 44 text

Controlling the service mesh Computer A Service A Networking Stack Business Logic Flow control Sidecar proxy Computer B Service B Networking Stack Business Logic Flow control Sidecar proxy Control plane

Slide 45

Slide 45 text

The service mesh control plane Control plane

Slide 46

Slide 46 text

Proxy based Service meshes

Slide 47

Slide 47 text

Istio • An open platform to connect, monitor, and secure microservices • Introduced by Google, Lyft, IBM and others • Manages authentication, authorization, and encryption of communication between microservices • Logging, monitoring, and keeping services operational • Traffic management and policy control

Slide 48

Slide 48 text

Istio - Architecture B

Slide 49

Slide 49 text

Envoy Proxy • Dynamic service discovery • Load balancing • TLS termination • HTTP/2 and gRPC proxies • Circuit breakers • Health checks • Staged rollouts with %-based traffic split • Fault injection • Rich metrics

Slide 50

Slide 50 text

Istio - Proxy configuration YAML

Slide 51

Slide 51 text

Istio - Discovery and Load-balancing

Slide 52

Slide 52 text

Istio - Tracing • Automatic tracing of request • Asynchronous span reporting • Multiple backends • Zipkin • Jaeger

Slide 53

Slide 53 text

Istio - Telemetry

Slide 54

Slide 54 text

Istio - Advanced routing

Slide 55

Slide 55 text

Istio - Security / Two way TLS

Slide 56

Slide 56 text

Istio Security - RBAC • Role based access control • Based on rules and for instance HTTP methods • ServiceRole (rule) • ServiceRoleBinding (assign role to set of nodes)

Slide 57

Slide 57 text

Istio gives you: • Telemetry • Security • Mutual TLS • Role based access control • Resilience • Circuit-breaker • Retry • Advanced routing

Slide 58

Slide 58 text

Demo

Slide 59

Slide 59 text

Overhead • Definitely not ‘free’, more parts in the system • Proxies are used for both inbound and outbound requests • A lot of effort going on to reduce overhead

Slide 60

Slide 60 text

Debugging • Debugging Envoy and Pilot (configuration) • Networking Issues • TLS issues • Envoy bouncing requests • …

Slide 61

Slide 61 text

Security • Many new parts of the system • Control plane components • Proxies • Envoys are everywhere • Role based access control

Slide 62

Slide 62 text

Istio • Telemetry • Security • Circuit-breaker • Retry • Advanced routing What you (want to) get What you (don’t want to) get • Overhead • Debugging • Security complexity

Slide 63

Slide 63 text

But are all service meshes equal? So we saw Istio…

Slide 64

Slide 64 text

Comparing Service Meshes Source: https://kubedex.com/istio-vs-linkerd-vs-linkerd2-vs-consul/ (Sept 2018)

Slide 65

Slide 65 text

https://smi-spec.io

Slide 66

Slide 66 text

Do I really need a service mesh?

Slide 67

Slide 67 text

Throwing more tech at the problem…

Slide 68

Slide 68 text

Do you want to configure, install and renew (mutual) TLS certificates across an entire set of applications?

Slide 69

Slide 69 text

Do you want to intercept and re-route network flows for: A/B testing, traffic shedding or failure tolerance (circuit breaking)?

Slide 70

Slide 70 text

Do you want tracing / visibility of application request flows within your micro-service network?

Slide 71

Slide 71 text

Should I just remove libraries from my apps?

Slide 72

Slide 72 text

Istio - Circuit breaking - DestinationRule

Slide 73

Slide 73 text

Istio - Circuit breaking - DestinationRule

Slide 74

Slide 74 text

Spring + Hystrix Circuit breaker fallback Note: Hystrix is deprecated and only used as an example

Slide 75

Slide 75 text

Spring + Hystrix Circuit breaker fallback Note: Hystrix is deprecated and only used as an example

Slide 76

Slide 76 text

Tracing

Slide 77

Slide 77 text

As an engineer you should still think about these concerns

Slide 78

Slide 78 text

Key take-aways from this talk • A service mesh is a dedicated infra layer for service communication • Understand the why of using a service mesh • Understand the operational complexity, but also the benefits e.g. transparently adds cross-cutting concerns to a microservices architecture • Think about where you want to solve specific problems

Slide 79

Slide 79 text

“Please rate my talk in the official J-Fall app” #jfall