Slide 1

Slide 1 text

Building Reliable Microservices on GCP Jun Sakata Google Developers Expert, Cloud

Slide 2

Slide 2 text

Building Reliable Microservices on GCP Jun Sakata Google Developers Expert, Cloud Distributed Systems

Slide 3

Slide 3 text

- Google Developers Expert, Cloud - SRE/Technical Advisor - Travel/Photography/Cooking - GKE/Cloud Run @sakajunquality

Slide 4

Slide 4 text

Agenda - Microservices and Kubernetes - Service Mesh - Traffic Director - Serverless Runtime - Proxyless gRPC services - Takeaways

Slide 5

Slide 5 text

- Definition of microservices - Difference between microservices and distributed monolith - Pros and Cons of microservices - Technical Details of Kubernetes/Istio Those are NOT to be covered today!

Slide 6

Slide 6 text

Microservices and Kubernetes Distributed Systems

Slide 7

Slide 7 text

Microservices?

Slide 8

Slide 8 text

“Microservices are independently deployable services modeled around a business domain. They communicate with each other via networks, and as an architecture choice offer many options for solving the problems you may face.” Monolith to Microservices by Sam Newman

Slide 9

Slide 9 text

“Microservices are independently deployable services modeled around a business domain. They communicate with each other via networks, and as an architecture choice offer many options for solving the problems you may face.” Monolith to Microservices by Sam Newman Not covered today!

Slide 10

Slide 10 text

“Microservices are independently deployable services modeled around a business domain. They communicate with each other via networks, and as an architecture choice offer many options for solving the problems you may face.” Monolith to Microservices by Sam Newman

Slide 11

Slide 11 text

“Microservices are independently deployable services modeled around a business domain. They communicate with each other via networks, and as an architecture choice offer many options for solving the problems you may face.” Monolith to Microservices by Sam Newman

Slide 12

Slide 12 text

“Today, it’s arguable that most applications are distributed in some fashion, even if they don’t use microservices.” Distributed Tracing in Practice by Rebecca Isaacs; Ben Sigelman; Daniel Spoonhower; Jonathan Mace; Austin Parker

Slide 13

Slide 13 text

should support… - communication over network - variety of workloads, backends... Platform for Microservice-like Architecture

Slide 14

Slide 14 text

- Platform for container workloads - based on on Google’s Borg - Orchestrates computing, networking, and storage resources for containers Kubernetes (Quick Recap)

Slide 15

Slide 15 text

With Kubernetes… Kubernetes master Service A Manifest Image gcr.io/sakajunquality-test/foo-bar CPU 1, Memory 2G

Slide 16

Slide 16 text

With Kubernetes… Kubernetes master Service A Manifest Image gcr.io/sakajunquality-test/foo-bar CPU 1, Memory 2G apiVersion: apps/v1 kind: Deployment metadata: name: service-a labels: app: service-a spec: replicas: 1 selector: matchLabels: app: nginx template: metadata: labels: app: service-a spec: containers: - name: service-a image: gcr.io/sakajunquality... ports: - containerPort: 8080

Slide 17

Slide 17 text

Service A Workloads With Kubernetes… Kubernetes master Service A Manifest Image gcr.io/sakajunquality-test/foo-bar CPU 1, Memory 2G Image gcr.io/sakajunquality-test/foo-bar CPU 1, Memory 2G

Slide 18

Slide 18 text

Service A Workloads Service B Workloads With Kubernetes… Kubernetes master Service A Manifest Service B Manifest In the same way... And More…!

Slide 19

Slide 19 text

Perfect?

Slide 20

Slide 20 text

Not so much...

Slide 21

Slide 21 text

Consider Service to Service connection Service A Service B

Slide 22

Slide 22 text

Service to Service connection Service A Service B Where is Service B? When I should retry? How long should I wait the response? Is this a valid request? What’s going on here?

Slide 23

Slide 23 text

- (Intelligent) Service Discovery - (Intelligent) Traffic Control - Observability - Authn/Authz etc... What’s missing in Kubernetes

Slide 24

Slide 24 text

Service to Service connection Service A Service B Where is Service B? When I should retry? How long should I wait the response? Is this a valid request? What’s going on here? Observability Service Discovery Authn/Authz Traffic Control

Slide 25

Slide 25 text

Service Mesh

Slide 26

Slide 26 text

“A service mesh is a programmable framework that allows you to observe, secure, and connect microservices. It doesn’t establish connectivity between microservices, but instead has policies and controls that are applied on top of an existing network to govern how microservices interact. ” Istio Explained by By Lin Sun and Daniel Berg

Slide 27

Slide 27 text

“A service mesh is a programmable framework that allows you to observe, secure, and connect microservices. It doesn’t establish connectivity between microservices, but instead has policies and controls that are applied on top of an existing network to govern how microservices interact. ” Istio Explained by By Lin Sun and Daniel Berg

Slide 28

Slide 28 text

Without Service Mesh.. (or something equivalent)

Slide 29

Slide 29 text

Without Service Mesh Service A Service B

Slide 30

Slide 30 text

Without Service Mesh Service A Service B Service Discovery Business Logic Authentication Observability

Slide 31

Slide 31 text

Without Service Mesh Service A Service B Service Discovery Business Logic Authentication Observability Non-Business Logic

Slide 32

Slide 32 text

Without Service Mesh Service A Service B Service Discovery Business Logic Authentication Traffic Control Service Discovery Business Logic Authentication Traffic Control Non-Business Logic on every application

Slide 33

Slide 33 text

Trying to implement tracing! Service A Service B Service Discovery Business Logic Authentication Traffic Control Service Discovery Authentication Traffic Control Tracing Tracing Business Logic Increases non-business loging in codebase

Slide 34

Slide 34 text

With Service Mesh

Slide 35

Slide 35 text

Instead of communicating directly Service A Service B

Slide 36

Slide 36 text

In Service Mesh, proxies, called “sidecar”, communicate on behalf of applications Service A Service B Sidecar Proxy Sidecar Proxy

Slide 37

Slide 37 text

In Service Mesh, proxies, called “sidecar”, communicate on behalf of applications Service A Service B Sidecar Proxy Sidecar Proxy Each Application communicates only to sidecar-proxy

Slide 38

Slide 38 text

And sidecar proxies do non-business logic Service A Service B Sidecar Proxy Sidecar Proxy Service Discovery Traffic Control Tracing etc...

Slide 39

Slide 39 text

Envoy - L7 Proxy - Originally from Lyft - High Performance / High Reliability - Configurable via API - https://www.envoyproxy.io/

Slide 40

Slide 40 text

“The network should be transparent to applications. When network and application problems do occur it should be easy to determine the source of the problem.” What is Envoy (https://www.envoyproxy.io/docs/envoy/latest/intro/what_is_envoy) Announcing Envoy: C++ L7 proxy and communication bus by Matt Klein (https://eng.lyft.com/announcing-envoy-c-l7-proxy-and-communication-b us-92520b6c8191)

Slide 41

Slide 41 text

Envoy as a sidecar proxy Service A Service B

Slide 42

Slide 42 text

Envoy also works as a gateway Service A Service B Gateway

Slide 43

Slide 43 text

Generally combination of both Service A Service B Gateway

Slide 44

Slide 44 text

Need to configure each of proxies Service A Service B Sidecar Proxy Sidecar Proxy Configure

Slide 45

Slide 45 text

Need to configure each of proxies Service A Service B Sidecar Proxy Sidecar Proxy Configure { "configs": [ { "@type": "type.googleapis.com/envoy.admin.v3.BootstrapConfigDump", "bootstrap": { "node": { "id": "sidecar~10.23.3.28~foo-68f69cbfd5-7jdq7.fbar.svc.cluster.local", "cluster": "foo-68f69cbfd5.foo-staging", "metadata": { "PROXY_CONFIG": { "parentShutdownDuration": "60s", "proxyAdminPort": 15000, "controlPlaneAuthPolicy": "MUTUAL_TLS", "drainDuration": "45s", "proxyMetadata": { "DNS_AGENT": "" }, "terminationDrainDuration": "5s", "tracing": { "zipkin": { "address": "zipkin.istio-system:9411" } }, "statusPort": 15020, "serviceCluster": "foo-68f69cbfd5.bar", "envoyMetricsService": {}, "binaryPath": "/usr/local/bin/envoy", "discoveryAddress": "istiod.istio-system.svc:15012", "concurrency": 2, "envoyAccessLogService": {}, "statNameLength": 189, "configPath": "./etc/istio/proxy" }, "PLATFORM_METADATA": { "gcp_project_number": "1234566791234", "gcp_location": "asia-northeast1", "gcp_gke_cluster_url": "https://container.googleapis.com/v1/projects/sakajunquality-test/locations/asia-northeast1/clusters/kluster", "gcp_gke_cluster_name": "kluster", "gcp_project": "sakajunquality-test", "gcp_gce_instance_id": "1234566791234" }, "CLUSTER_ID": "Kubernetes", "APP_CONTAINERS": "foo-app", "LABELS": { "service.istio.io/canonical-revision": "release-20200702-2", "rollouts-pod-template-hash": "68f69cbfd5", "istio.io/rev": "default", "app": "foo", "service.istio.io/canonical-name": "foo-68f69cbfd5", "version": "xxxx", "security.istio.io/tlsMode": "istio" }, ….. Hard work… (not always)

Slide 46

Slide 46 text

Control Plane Service A Service B Sidecar Proxy Sidecar Proxy Control Plane Data Plane

Slide 47

Slide 47 text

- Open-source Service Mesh software - Originally from Google, Lyft and IBM - https://istio.io/ Open-source example: Istio

Slide 48

Slide 48 text

Istio https://istio.io/latest/docs/concepts/what-is-istio/

Slide 49

Slide 49 text

Istio https://istio.io/latest/docs/concepts/what-is-istio/

Slide 50

Slide 50 text

Service A Service B Istio example (simplified) Kubernetes Traffic Management Manifest Apply manifests as Kubernetes CRD Istiod Configure each proxies

Slide 51

Slide 51 text

Service Mesh - Using sidecar proxy, decouples infra-related non-business logic from applications

Slide 52

Slide 52 text

Looking for fully-managed solutions?

Slide 53

Slide 53 text

Traffic Director

Slide 54

Slide 54 text

- Service Mesh Control Plane - Fully-managed w/ SLA - Supports both VMs and containers Traffic Director

Slide 55

Slide 55 text

Traffic Director: Control Plane as a Service Service A Service B Traffic Director Data Plane Control Plane

Slide 56

Slide 56 text

- Traffic Splitting - Circuit Breaking - Outlier detection - Locality Load Balancing - etc Traffic Director’s Traffic Management

Slide 57

Slide 57 text

Example of traffic splitting Clients Service A Service B Version 1 Service B Version 2 10% 90%

Slide 58

Slide 58 text

- Manual Deployment for VM/Container - Automatic Deployment for GCE - GKE automatic injection - Proxyless Sidecar w/ Traffic Director

Slide 59

Slide 59 text

Traffic Director: GCE Service A Service B Traffic Director GCE envoy auto-deployment -service-proxy=enabled

Slide 60

Slide 60 text

Traffic Director: GKE Service A Service B Traffic Director Using Istio’s sidecar injection

Slide 61

Slide 61 text

Serverless Runtime

Slide 62

Slide 62 text

- Pay as you go - All the workloads are not necessarily required to be running all the time - e.g. event-driven workloads Serverless Computing Runtime

Slide 63

Slide 63 text

Serverless Computing Runtime Cloud Functions App Engine Cloud Run

Slide 64

Slide 64 text

- Fully-managed serverless environment for containers - Container with HTTP/gRPC listening to $PORT - Pay for CPU and memory @100ms + network transfer Cloud Run

Slide 65

Slide 65 text

- Managed Endpoint w/ TLS termination - Custom Domains w/ TLS - 1-80 concurrent requests per instance - Scale from zero to 1000 instance - Cloud SQL connection / VPC connection - 1-4 vCPU / 127MiB-4GiB RAM - Gradual Traffic Thrifting Cloud Run

Slide 66

Slide 66 text

- Managed Endpoint w/ TLS termination - Custom Domains w/ TLS - 1-80 concurrent requests per instance - Scale from zero to 1000 instance - Cloud SQL connection / VPC connection - 1-4 vCPU / 127MiB-4GiB RAM - Gradual Traffic Thrifting Cloud Run Easily Deployable Easily Scalable

Slide 67

Slide 67 text

- VPC Access w/ egress setting - GCLB w/ Seveless Neg - Cloud CDN / Cloud Armor / Cloud IAP - Events for Cloud Run - 1h request timeout - server-streaming for HTTP and gRPC - SIGTERM - 4GB RAM / 4 vCPUs - min instances - Cloud Code / Cloud buildpacs / Deploy YAML - … Cloud Run Updates

Slide 68

Slide 68 text

- VPC Access w/ egress setting - GCLB w/ Seveless Neg - Cloud CDN / Cloud Armor / Cloud IAP - Events for Cloud Run - 1h request timeout - server-streaming for HTTP and gRPC - SIGTERM - 4GB RAM / 4 vCPUs - min instances - Cloud Code / Cloud buildpacs / Deploy YAML - … Cloud Run Updates Updated Frequently!

Slide 69

Slide 69 text

Can I add apps running on serverless to the Mesh?

Slide 70

Slide 70 text

When you already have GKE-based mesh platform…. Service A Service B

Slide 71

Slide 71 text

How can a serverless app join the Mesh? Service A Service B Serverless Service C ???

Slide 72

Slide 72 text

No possible. as long as using fully-managed serverless environment

Slide 73

Slide 73 text

- Network Connectivity - Sidecar Proxy Injection Serveless to Mesh

Slide 74

Slide 74 text

Serverless VPC Access - Enables VPC access from fully-managed serverless environment - Supports Cloud Run/ App Engine / Cloud Functions - https://cloud.google.com/vpc/docs/confi gure-serverless-vpc-access?hl=en

Slide 75

Slide 75 text

Serverless VPC Access Non-VPC resources VPC resources

Slide 76

Slide 76 text

Sidecar Injection - Impossible... - or give-up the fully-managed env

Slide 77

Slide 77 text

Severless is nice But still wanna communicate with services in the mesh.

Slide 78

Slide 78 text

What if Service Mesh features are implemented as application library? and hopefully that does not increase the application codebase…

Slide 79

Slide 79 text

Traffic Director: proxyless gRPC services

Slide 80

Slide 80 text

- RPC using protocol buffers - Open-sourced by Google - Officially supports many languages - https://grpc.io/docs/languages/ - Great ecosystem gRPC

Slide 81

Slide 81 text

gRPC: xDS-Based Global Load Balancing https://github.com/grpc/proposal/blob/master/A27-xds-global-load-balancing.md

Slide 82

Slide 82 text

“gRPC currently supports its own "grpclb" protocol for look-aside load-balancing. However, the popular Envoy proxy uses the xDS API for many types of configuration, including load balancing, and that API is evolving into a standard that will be used to configure a variety of data plane software.” xDS-Based Global Load Balancing https://github.com/grpc/proposal/blob/master/A27-xds-global-load-bala ncing.md

Slide 83

Slide 83 text

“gRPC currently supports its own "grpclb" protocol for look-aside load-balancing. However, the popular Envoy proxy uses the xDS API for many types of configuration, including load balancing, and that API is evolving into a standard that will be used to configure a variety of data plane software.” xDS-Based Global Load Balancing https://github.com/grpc/proposal/blob/master/A27-xds-global-load-bala ncing.md

Slide 84

Slide 84 text

xDS Control Plane and Dataplane Control Plane Control Plane Control Plane ※ and more…!

Slide 85

Slide 85 text

xDS Control Plane and Dataplane Control Plane Control Plane Control Plane ※ and more…!

Slide 86

Slide 86 text

xDS Client (Data Plane) Application Source Code Application Source Code

Slide 87

Slide 87 text

xDS Client (Data Plane) Application Source Code Application Source Code // For client package main import ( // abbreviated // To install the xds resolvers and balancers. _ "google.golang.org/grpc/xds" )

Slide 88

Slide 88 text

Traffic Director: proxyless gRPC services Control Plane Control Plane Control Plane

Slide 89

Slide 89 text

Traffic Director: proxyless gRPC services Provides - Service Discover - Client-side load-balancing - Route Matching - Traffic Splitting

Slide 90

Slide 90 text

Let’s see the same example... Service A Service B

Slide 91

Slide 91 text

Traffic Director using proxy-less gRPC service Service A Service B

Slide 92

Slide 92 text

Services can communicate each other with endpoint xds://[service name]:[port], pre-defined in Traffic Director Service A Service B xds::///service-b:port

Slide 93

Slide 93 text

Here’s how it works... Service A Service B Traffic Director 1. Service B is registered in Traffic Director 2. gRPC will detect Control Plane using bootstrap file, which is defined in GRPC_XDS_BOOTSTRAP

Slide 94

Slide 94 text

Here’s how it works... Service A Service B Traffic Director 1. Service B is registered in Traffic Director 2. gRPC will detect Control Plane using bootstrap file, which is defined in GRPC_XDS_BOOTSTRAP { "xds_servers": [ { "server_uri": "trafficdirector.googleapis.com:443", "channel_creds": [ { "type": "google_default" } ] } ], "node": { "id": "b7f9c818-fb46-43ca-8662-d3bdbcf7ec18~10.0.0.1", "metadata": { "TRAFFICDIRECTOR_GCP_PROJECT_NUMBER": "123456789012", "TRAFFICDIRECTOR_NETWORK_NAME": "default" }, "locality": { "zone": "us-central1-a" } } }

Slide 95

Slide 95 text

Here’s how it works... Service A Service B Traffic Director 1. Service B is registered in Traffic Director 2. gRPC will detect Control Plane using bootstrap file, which is defined in GRPC_XDS_BOOTSTRAP { "xds_servers": [ { "server_uri": "trafficdirector.googleapis.com:443", "channel_creds": [ { "type": "google_default" } ] } ], "node": { "id": "b7f9c818-fb46-43ca-8662-d3bdbcf7ec18~10.0.0.1", "metadata": { "TRAFFICDIRECTOR_GCP_PROJECT_NUMBER": "123456789012", "TRAFFICDIRECTOR_NETWORK_NAME": "default" }, "locality": { "zone": "us-central1-a" } } } //... initContainers: - args: - --output - "/tmp/bootstrap/td-grpc-bootstrap.json" image: gcr.io/trafficdirector-prod/td-grpc-bootstrap:0.9.0 imagePullPolicy: IfNotPresent name: grpc-td-init resources: limits: cpu: 100m memory: 100Mi requests: cpu: 10m memory: 100Mi volumeMounts: - name: grpc-td-conf mountPath: /tmp/bootstrap/ //...

Slide 96

Slide 96 text

Here’s how it works... Service A Service B Traffic Director 1. Service B is registered in Traffic Director 2. gRPC will detect Control Plane using bootstrap file, which is defined in GRPC_XDS_BOOTSTRAP 3. Get Service B’s Info via xDS 4. make RPC call xds::///service-b:port

Slide 97

Slide 97 text

Accessing Mesh Service from Serverless Environment Service A Service B Serverless Service C

Slide 98

Slide 98 text

Accessing Mesh Service from Serverless Environment Service A Service B Serverless Service C xds::///service-b:port

Slide 99

Slide 99 text

- Using xDS implement in gRPC at data plane, Traffic Director is managing traffic between services. Traffic Director: proxyless gRPC services

Slide 100

Slide 100 text

- Support in gRPC client only - Limited features in xDS are available - Some languages are still in progress - e.g. Node.js - or officially not supported like Rust xDS gRPC

Slide 101

Slide 101 text

- A28: gRPC xDS traffic splitting and routing - A30: xDS v3 Support - A31: gRPC xDS Timeout Support and Config Selector Design - A32: gRPC xDS circuit breaking - A34: `weighted_round_robin` lb_policy for per endpoint weight from `ClusterLoadAssignment` response More xDS features are proposed to implement on gRPC-side

Slide 102

Slide 102 text

- Istio experimentally supports proxy-less data-plane Open-source project

Slide 103

Slide 103 text

“gRPC is the second xDS client… … but it will not be the last!” Mark D. Roth @ EnvoyCon 2020

Slide 104

Slide 104 text

Takeaways

Slide 105

Slide 105 text

- Implementations infra/network-related, non-businesslogic, are required in microservice-like architecture or distributed environment. - Service Mesh resolve this by sidecar proxy like Envoy. Takeaways 1/4

Slide 106

Slide 106 text

- Service Mesh Control Plane uses xDS API to configure Envoy. - Open-source example: Istio - Managed-solution in GCP: Traffic Director Takeaways 2/4

Slide 107

Slide 107 text

Takeaways 3/4 - gRPC implements xDS for its load-balancing capabilities - Traffic Director (and Istio experimentally) support gRPC as proxless data plane

Slide 108

Slide 108 text

- gRPC proxless services is extra dope! Takeaways 4/4

Slide 109

Slide 109 text

Thank you