Slide 1

Slide 1 text

Seeking Observability: Getting Started with Service Mesh on GCP 16 November 2019 #DevFestLondon Jun Sakata @sakajunquality Google Developers Expert, Cloud

Slide 2

Slide 2 text

The Speaker - Jun Sakata - Google Developers Expert, Cloud - SRE at Ubie, Inc. - Social Media: @sakajunquality - First time

Slide 3

Slide 3 text

“Observability”

Slide 4

Slide 4 text

What is Observability?

Slide 5

Slide 5 text

Observ/ab/ility

Slide 6

Slide 6 text

Wikipedia says... In control theory, observability is a measure of how well internal states of a system can be inferred from knowledge of its external outputs. https://en.wikipedia.org/wiki/Observability

Slide 7

Slide 7 text

Wikipedia says... In control theory, observability is a measure of how well internal states of a system can be inferred from knowledge of its external outputs. https://en.wikipedia.org/wiki/Observability

Slide 8

Slide 8 text

In Software Engineering ... Observability: collecting diagnostics data all across the stack to identify and debug production problems and also to provide critical signals about usage to our highly adaptive and scalable environment. Jaana B. Dogan, Google https://medium.com/observability/googles-approach-to-observability-frameworks-c89fc1f0e058

Slide 9

Slide 9 text

In Software Engineering ... Observability: collecting diagnostics data all across the stack to identify and debug production problems and also to provide critical signals about usage to our highly adaptive and scalable environment. Jaana B. Dogan, Google https://medium.com/observability/googles-approach-to-observability-frameworks-c89fc1f0e058

Slide 10

Slide 10 text

Think what SREs / PEs do! SRE Books https://landing.google.com/sre/books/

Slide 11

Slide 11 text

What do we need?

Slide 12

Slide 12 text

We need metrics that matters!

Slide 13

Slide 13 text

We need logs that matter!

Slide 14

Slide 14 text

We need to trace what happened!

Slide 15

Slide 15 text

“Microservices”

Slide 16

Slide 16 text

Microservices (Generally speaking) Several, could be thousands of, services might be - written in Different Languages / Frameworks / Library - using Many Protocols - having Distributed system calls

Slide 17

Slide 17 text

Microservices Observability Think what happens - when starting a new service in a new language - when communicating with a new procol - when making a breaking change to network and infrastructure

Slide 18

Slide 18 text

Microservices Observability We want to implement something that - is decoupled from languages, frameworks and libraries - supports many protocols or other procedures - decouples applications and the whole infrastructure

Slide 19

Slide 19 text

Microservices Observability We want to implement something that - is decoupled from languages, frameworks and libraries - supports many protocols or other procedures - decouples applications and the whole infrastructure

Slide 20

Slide 20 text

“Service Mesh”

Slide 21

Slide 21 text

Service Mesh - is a transparent network between services - Decoupled from application - Language independent - provides automated applications network functions - Observability - Service Discovery - Policy Enforcement - etc...

Slide 22

Slide 22 text

Here’s what’s happening Let’s say we have two services written in different languages Service A (Java w/ Spring Boot) Service B (Python w/ Flask)

Slide 23

Slide 23 text

Here’s what’s happening Without Service Mesh, one call the other directly Service A (Java w/ Spring Boot) Service B (Python w/ Flask)

Slide 24

Slide 24 text

Here’s what’s happening For the observability, each services must implement things Service A (Java w/ Spring Boot) Service B (Python w/ Flask) Metrics / Logs Service Metcics / Tracing Codes Metcics / Tracing Codes

Slide 25

Slide 25 text

Here’s what’s happening What if another service is deployed...? and with new runtime or new protocol...? Service A (Java w/ Spring Boot) Service B (Python w/ Flask) Service C (Go w/o Framework) Metrics / Logs Service Metcics / Tracing Codes Metcics / Tracing Codes Metcics / Tracing Codes

Slide 26

Slide 26 text

Here’s what’s happening Next thing you see might be ... Service A (Java w/ Spring Boot) Service B (Python w/ Flask) Service C (Go w/o Framework) Service D (Scala w/ Play Framework) Service E (Python w/ Django) Service F (Python w/ own Framework) Metcics / Tracing Codes Metcics / Tracing Codes Metcics / Tracing Codes Metcics / Tracing Codes Metcics / Tracing Codes Metcics / Tracing Codes

Slide 27

Slide 27 text

Here’s what’s happening Next thing you see might be ... Service A (Java w/ Spring Boot) Service B (Python w/ Flask) Service C (Go w/o Framework) Service D (Scala w/ Play Framework) Service E (Python w/ Django) Service F (Python w/ own Framework) Service (Go Service (C++ Service (Go Servic (Kotlin Metcics / Tracing Codes Metcics / Tracing Codes Metcics / Tracing Codes Metcics / Tracing Codes Metcics / Tracing Codes Metcics / Tracing Codes

Slide 28

Slide 28 text

Here’s what’s happening Can you update all of them? Hopefully in a short time of period? Service A (Java w/ Spring Boot) Service B (Python w/ Flask) Service C (Go w/o Framework) Service D (Scala w/ Play Framework) Service E (Python w/ Django) Service F (Python w/ own Framework) Service (Go Service (C++ Service (Go Servic (Kotlin Metcics / Tracing Codes Metcics / Tracing Codes Metcics / Tracing Codes Metcics / Tracing Codes Metcics / Tracing Codes Metcics / Tracing Codes

Slide 29

Slide 29 text

Here’s what’s happening Instead of implementing those networking features in service applications, sidecar proxies are deployed along with them Service A (Java w/ Spring Boot) Service B (Python w/ Flask) Metcics / Tracing Codes Metcics / Tracing Codes Sidecar Proxy Sidecar Proxy

Slide 30

Slide 30 text

Here’s what’s happening Services, both internal and external, are called each other through sidecars Service A (Java w/ Spring Boot) Service B (Python w/ Flask) Sidecar Proxy Sidecar Proxy

Slide 31

Slide 31 text

Here’s what’s happening That way, we can let sidecar proxies, instead of applications, do what we need for observability! Service A (Java w/ Spring Boot) Service B (Python w/ Flask) Sidecar Proxy Sidecar Proxy Metrics / Logs Service

Slide 32

Slide 32 text

Envoy - L7 proxy - Originally from Lyft - Configurable via API w/o restart - 100% OSS! No Premium Version - High Performance / High Reliability - Widely used in service to service proxy

Slide 33

Slide 33 text

Envoy! Service A (Java w/ Spring Boot) Service B (Python w/ Flask) * envoy is not only the proxy for service mesh Universal dataplane proxy!

Slide 34

Slide 34 text

Envoy “The network should be transparent to applications. When network and application problems do occur it should be easy to determine the source of the problem.” Matt Klein, Lyft https://www.envoyproxy.io/docs/envoy/latest/intro/what_is_envoy

Slide 35

Slide 35 text

https://www.youtube.com/watch?v=55yi4MMVBi4&t=444s

Slide 36

Slide 36 text

Okay, sidecar is nice. But still need something that orchestrates those distributed.

Slide 37

Slide 37 text

Istio

Slide 38

Slide 38 text

Istio - Open source Service Mesh - Originally from Google, Lyft and etc… - (Lyft is not using Istio) - Envoy is used as sidecar

Slide 39

Slide 39 text

Istio in functionality Connect - Service Discovery Secure - Authentication - Authorization - Encryption Control - Policy like circuit breaker - A/B testing, Canary Release Observe - Monitor Traffic by telemetries 1 2 3 4 https://istio.io/

Slide 40

Slide 40 text

Istio architecture https://istio.io/docs/concepts/what-is-istio/

Slide 41

Slide 41 text

Istio architecture https://istio.io/docs/concepts/what-is-istio/ Complicated …?

Slide 42

Slide 42 text

Istio architecture https://istio.io/docs/concepts/what-is-istio/ Dataplane

Slide 43

Slide 43 text

Istio architecture https://istio.io/docs/concepts/what-is-istio/ Control Plane

Slide 44

Slide 44 text

How it Works

Slide 45

Slide 45 text

How it works... https://istio.io/docs/concepts/what-is-istio/

Slide 46

Slide 46 text

How it works... https://istio.io/docs/concepts/what-is-istio/ Remember the sidecar!

Slide 47

Slide 47 text

How it works... https://istio.io/docs/concepts/what-is-istio/

Slide 48

Slide 48 text

How it works... https://istio.io/docs/concepts/what-is-istio/ Send telemetries to control plane!

Slide 49

Slide 49 text

Metrics, Logs, Telemetry, whatever ... What kind of insights can we get from them?

Slide 50

Slide 50 text

List services and those status

Slide 51

Slide 51 text

How each services are connected

Slide 52

Slide 52 text

Detailed metrics of each service

Slide 53

Slide 53 text

How each services are connected

Slide 54

Slide 54 text

How each services are connected

Slide 55

Slide 55 text

Status of Istio itself

Slide 56

Slide 56 text

Visualizing tools - Prometheus - Grafana - Kiali - Stackdriver - Datadog - and more...!

Slide 57

Slide 57 text

Looks Great How can we start?

Slide 58

Slide 58 text

How to start - Do the official “Getting Starter” - https://istio.io/docs/setup/getting-started/ - Install Istio - Install demo app: guest book - Do some “Tasks” - https://istio.io/docs/tasks/

Slide 59

Slide 59 text

Read “ISTIO BY EXAMPLE” istiobyexample.dev

Slide 60

Slide 60 text

Try on GKE? - Try “Istio on GKE” w/ mTLS permissive - https://cloud.google.com/istio/docs/istio-on-gke /overview - Just one click - Not recommended for production yet!

Slide 61

Slide 61 text

What’s the latest update?

Slide 62

Slide 62 text

Istio 1.3 is released last month!

Slide 63

Slide 63 text

Istio 1.3 https://istio.io/news/2019/announcing-1.3/ - Performance Improvements - CLI Improvements - Dashboard Improvements - Intelligent Protocol Detection - Mixer-less HTTP Telemetry - Deployment Models Docs

Slide 64

Slide 64 text

Istio 1.4 is released last thursday!

Slide 65

Slide 65 text

Istio 1.4 https://istio.io/news/2019/announcing-1.4 - Mixer-less telemetry - Authorization policy model in beta - Improved troubleshooting - Better sidecar

Slide 66

Slide 66 text

Performance and User Experience are improving !

Slide 67

Slide 67 text

Any tips for GCP user?

Slide 68

Slide 68 text

Save access log to BigQuery - Enable envoy access log w/ json, export to /dev/stdout --set global.proxy.accessLogFile="/dev/stdout" --set global.proxy.accessLogEncoding="JSON" - That way logs are collected to Stackdriver Logging - and you can sync logs to BigQuery

Slide 69

Slide 69 text

Try in-proxy telemetry to stackdriver - In 1.4 mixerless telemetry is implemented for stackdriver - https://github.com/istio/proxy/blob/release-1.4/extensions/stackdriver/README.md

Slide 70

Slide 70 text

Wait for Anthos Service Mesh - Formally called “Cloud Service Mesh” - To be a Fully-Managed for Istio-based service mesh platform

Slide 71

Slide 71 text

Anthos Service Mesh Increase Observability With the Stackdriver Query Notation (Cloud Next '19) https://www.youtube.com/watch?v=NGFpGW8aQS8&t=2034s

Slide 72

Slide 72 text

Anthos Service Mesh Increase Observability With the Stackdriver Query Notation (Cloud Next '19) https://www.youtube.com/watch?v=NGFpGW8aQS8&t=2034s

Slide 73

Slide 73 text

Do I need a Service Mesh?

Slide 74

Slide 74 text

Maybe Not

Slide 75

Slide 75 text

Kelsey Hightower, Google https://twitter.com/kelseyhightower/status/1150158904900431873

Slide 76

Slide 76 text

Think Carefully ... - If you’re running a single monolith application, apparently you don’t - If you’re running services with a single technology stack, maybe you don’t - e.g. Java ecosystem - If public cloud provides, complete availability and observability of network, we don’t! - (Speaking of Istio) If you don’t plan to use most of its functions, consider creating controlplane on you own!

Slide 77

Slide 77 text

Takeaways

Slide 78

Slide 78 text

Takeaways - With Service Mesh you can get a consistent function for observability, along with other functions, between languages and frameworks - Service Mesh decouples network and infrastructure functionality from applications - Service Mesh uses sidecar proxy for this - Istio is an all-in-one solution for Service Mesh

Slide 79

Slide 79 text

Takeaways - Think if Service Mesh is a solution for you - So many ways to do this - Istio is not only option for Service Mesh

Slide 80

Slide 80 text

Thank You

Slide 81

Slide 81 text

No content