Building Reliable Distributed Systems on GCP

Building Reliable Distributed Systems on GCP

DevFest 2020

923237754890d536819892ad42451555?s=128

sakajunquality

October 18, 2020
Tweet

Transcript

  1. Building Reliable Microservices on GCP Jun Sakata Google Developers Expert,

    Cloud
  2. Building Reliable Microservices on GCP Jun Sakata Google Developers Expert,

    Cloud Distributed Systems
  3. - Google Developers Expert, Cloud - SRE/Technical Advisor - Travel/Photography/Cooking

    - GKE/Cloud Run @sakajunquality
  4. Agenda - Microservices and Kubernetes - Service Mesh - Traffic

    Director - Serverless Runtime - Proxyless gRPC services - Takeaways
  5. - Definition of microservices - Difference between microservices and distributed

    monolith - Pros and Cons of microservices - Technical Details of Kubernetes/Istio Those are NOT to be covered today!
  6. Microservices and Kubernetes Distributed Systems

  7. Microservices?

  8. “Microservices are independently deployable services modeled around a business domain.

    They communicate with each other via networks, and as an architecture choice offer many options for solving the problems you may face.” Monolith to Microservices by Sam Newman
  9. “Microservices are independently deployable services modeled around a business domain.

    They communicate with each other via networks, and as an architecture choice offer many options for solving the problems you may face.” Monolith to Microservices by Sam Newman Not covered today!
  10. “Microservices are independently deployable services modeled around a business domain.

    They communicate with each other via networks, and as an architecture choice offer many options for solving the problems you may face.” Monolith to Microservices by Sam Newman
  11. “Microservices are independently deployable services modeled around a business domain.

    They communicate with each other via networks, and as an architecture choice offer many options for solving the problems you may face.” Monolith to Microservices by Sam Newman
  12. “Today, it’s arguable that most applications are distributed in some

    fashion, even if they don’t use microservices.” Distributed Tracing in Practice by Rebecca Isaacs; Ben Sigelman; Daniel Spoonhower; Jonathan Mace; Austin Parker
  13. should support… - communication over network - variety of workloads,

    backends... Platform for Microservice-like Architecture
  14. - Platform for container workloads - based on on Google’s

    Borg - Orchestrates computing, networking, and storage resources for containers Kubernetes (Quick Recap)
  15. With Kubernetes… Kubernetes master Service A Manifest Image gcr.io/sakajunquality-test/foo-bar CPU

    1, Memory 2G
  16. With Kubernetes… Kubernetes master Service A Manifest Image gcr.io/sakajunquality-test/foo-bar CPU

    1, Memory 2G apiVersion: apps/v1 kind: Deployment metadata: name: service-a labels: app: service-a spec: replicas: 1 selector: matchLabels: app: nginx template: metadata: labels: app: service-a spec: containers: - name: service-a image: gcr.io/sakajunquality... ports: - containerPort: 8080
  17. Service A Workloads With Kubernetes… Kubernetes master Service A Manifest

    Image gcr.io/sakajunquality-test/foo-bar CPU 1, Memory 2G Image gcr.io/sakajunquality-test/foo-bar CPU 1, Memory 2G
  18. Service A Workloads Service B Workloads With Kubernetes… Kubernetes master

    Service A Manifest Service B Manifest In the same way... And More…!
  19. Perfect?

  20. Not so much...

  21. Consider Service to Service connection Service A Service B

  22. Service to Service connection Service A Service B Where is

    Service B? When I should retry? How long should I wait the response? Is this a valid request? What’s going on here?
  23. - (Intelligent) Service Discovery - (Intelligent) Traffic Control - Observability

    - Authn/Authz etc... What’s missing in Kubernetes
  24. Service to Service connection Service A Service B Where is

    Service B? When I should retry? How long should I wait the response? Is this a valid request? What’s going on here? Observability Service Discovery Authn/Authz Traffic Control
  25. Service Mesh

  26. “A service mesh is a programmable framework that allows you

    to observe, secure, and connect microservices. It doesn’t establish connectivity between microservices, but instead has policies and controls that are applied on top of an existing network to govern how microservices interact. ” Istio Explained by By Lin Sun and Daniel Berg
  27. “A service mesh is a programmable framework that allows you

    to observe, secure, and connect microservices. It doesn’t establish connectivity between microservices, but instead has policies and controls that are applied on top of an existing network to govern how microservices interact. ” Istio Explained by By Lin Sun and Daniel Berg
  28. Without Service Mesh.. (or something equivalent)

  29. Without Service Mesh Service A Service B

  30. Without Service Mesh Service A Service B Service Discovery Business

    Logic Authentication Observability
  31. Without Service Mesh Service A Service B Service Discovery Business

    Logic Authentication Observability Non-Business Logic
  32. Without Service Mesh Service A Service B Service Discovery Business

    Logic Authentication Traffic Control Service Discovery Business Logic Authentication Traffic Control Non-Business Logic on every application
  33. Trying to implement tracing! Service A Service B Service Discovery

    Business Logic Authentication Traffic Control Service Discovery Authentication Traffic Control Tracing Tracing Business Logic Increases non-business loging in codebase
  34. With Service Mesh

  35. Instead of communicating directly Service A Service B

  36. In Service Mesh, proxies, called “sidecar”, communicate on behalf of

    applications Service A Service B Sidecar Proxy Sidecar Proxy
  37. In Service Mesh, proxies, called “sidecar”, communicate on behalf of

    applications Service A Service B Sidecar Proxy Sidecar Proxy Each Application communicates only to sidecar-proxy
  38. And sidecar proxies do non-business logic Service A Service B

    Sidecar Proxy Sidecar Proxy Service Discovery Traffic Control Tracing etc...
  39. Envoy - L7 Proxy - Originally from Lyft - High

    Performance / High Reliability - Configurable via API - https://www.envoyproxy.io/
  40. “The network should be transparent to applications. When network and

    application problems do occur it should be easy to determine the source of the problem.” What is Envoy (https://www.envoyproxy.io/docs/envoy/latest/intro/what_is_envoy) Announcing Envoy: C++ L7 proxy and communication bus by Matt Klein (https://eng.lyft.com/announcing-envoy-c-l7-proxy-and-communication-b us-92520b6c8191)
  41. Envoy as a sidecar proxy Service A Service B

  42. Envoy also works as a gateway Service A Service B

    Gateway
  43. Generally combination of both Service A Service B Gateway

  44. Need to configure each of proxies Service A Service B

    Sidecar Proxy Sidecar Proxy Configure
  45. Need to configure each of proxies Service A Service B

    Sidecar Proxy Sidecar Proxy Configure { "configs": [ { "@type": "type.googleapis.com/envoy.admin.v3.BootstrapConfigDump", "bootstrap": { "node": { "id": "sidecar~10.23.3.28~foo-68f69cbfd5-7jdq7.fbar.svc.cluster.local", "cluster": "foo-68f69cbfd5.foo-staging", "metadata": { "PROXY_CONFIG": { "parentShutdownDuration": "60s", "proxyAdminPort": 15000, "controlPlaneAuthPolicy": "MUTUAL_TLS", "drainDuration": "45s", "proxyMetadata": { "DNS_AGENT": "" }, "terminationDrainDuration": "5s", "tracing": { "zipkin": { "address": "zipkin.istio-system:9411" } }, "statusPort": 15020, "serviceCluster": "foo-68f69cbfd5.bar", "envoyMetricsService": {}, "binaryPath": "/usr/local/bin/envoy", "discoveryAddress": "istiod.istio-system.svc:15012", "concurrency": 2, "envoyAccessLogService": {}, "statNameLength": 189, "configPath": "./etc/istio/proxy" }, "PLATFORM_METADATA": { "gcp_project_number": "1234566791234", "gcp_location": "asia-northeast1", "gcp_gke_cluster_url": "https://container.googleapis.com/v1/projects/sakajunquality-test/locations/asia-northeast1/clusters/kluster", "gcp_gke_cluster_name": "kluster", "gcp_project": "sakajunquality-test", "gcp_gce_instance_id": "1234566791234" }, "CLUSTER_ID": "Kubernetes", "APP_CONTAINERS": "foo-app", "LABELS": { "service.istio.io/canonical-revision": "release-20200702-2", "rollouts-pod-template-hash": "68f69cbfd5", "istio.io/rev": "default", "app": "foo", "service.istio.io/canonical-name": "foo-68f69cbfd5", "version": "xxxx", "security.istio.io/tlsMode": "istio" }, ….. Hard work… (not always)
  46. Control Plane Service A Service B Sidecar Proxy Sidecar Proxy

    Control Plane Data Plane
  47. - Open-source Service Mesh software - Originally from Google, Lyft

    and IBM - https://istio.io/ Open-source example: Istio
  48. Istio https://istio.io/latest/docs/concepts/what-is-istio/

  49. Istio https://istio.io/latest/docs/concepts/what-is-istio/

  50. Service A Service B Istio example (simplified) Kubernetes Traffic Management

    Manifest Apply manifests as Kubernetes CRD Istiod Configure each proxies
  51. Service Mesh - Using sidecar proxy, decouples infra-related non-business logic

    from applications
  52. Looking for fully-managed solutions?

  53. Traffic Director

  54. - Service Mesh Control Plane - Fully-managed w/ SLA -

    Supports both VMs and containers Traffic Director
  55. Traffic Director: Control Plane as a Service Service A Service

    B Traffic Director Data Plane Control Plane
  56. - Traffic Splitting - Circuit Breaking - Outlier detection -

    Locality Load Balancing - etc Traffic Director’s Traffic Management
  57. Example of traffic splitting Clients Service A Service B Version

    1 Service B Version 2 10% 90%
  58. - Manual Deployment for VM/Container - Automatic Deployment for GCE

    - GKE automatic injection - Proxyless Sidecar w/ Traffic Director
  59. Traffic Director: GCE Service A Service B Traffic Director GCE

    envoy auto-deployment -service-proxy=enabled
  60. Traffic Director: GKE Service A Service B Traffic Director Using

    Istio’s sidecar injection
  61. Serverless Runtime

  62. - Pay as you go - All the workloads are

    not necessarily required to be running all the time - e.g. event-driven workloads Serverless Computing Runtime
  63. Serverless Computing Runtime Cloud Functions App Engine Cloud Run

  64. - Fully-managed serverless environment for containers - Container with HTTP/gRPC

    listening to $PORT - Pay for CPU and memory @100ms + network transfer Cloud Run
  65. - Managed Endpoint w/ TLS termination - Custom Domains w/

    TLS - 1-80 concurrent requests per instance - Scale from zero to 1000 instance - Cloud SQL connection / VPC connection - 1-4 vCPU / 127MiB-4GiB RAM - Gradual Traffic Thrifting Cloud Run
  66. - Managed Endpoint w/ TLS termination - Custom Domains w/

    TLS - 1-80 concurrent requests per instance - Scale from zero to 1000 instance - Cloud SQL connection / VPC connection - 1-4 vCPU / 127MiB-4GiB RAM - Gradual Traffic Thrifting Cloud Run Easily Deployable Easily Scalable
  67. - VPC Access w/ egress setting - GCLB w/ Seveless

    Neg - Cloud CDN / Cloud Armor / Cloud IAP - Events for Cloud Run - 1h request timeout - server-streaming for HTTP and gRPC - SIGTERM - 4GB RAM / 4 vCPUs - min instances - Cloud Code / Cloud buildpacs / Deploy YAML - … Cloud Run Updates
  68. - VPC Access w/ egress setting - GCLB w/ Seveless

    Neg - Cloud CDN / Cloud Armor / Cloud IAP - Events for Cloud Run - 1h request timeout - server-streaming for HTTP and gRPC - SIGTERM - 4GB RAM / 4 vCPUs - min instances - Cloud Code / Cloud buildpacs / Deploy YAML - … Cloud Run Updates Updated Frequently!
  69. Can I add apps running on serverless to the Mesh?

  70. When you already have GKE-based mesh platform…. Service A Service

    B
  71. How can a serverless app join the Mesh? Service A

    Service B Serverless Service C ???
  72. No possible. as long as using fully-managed serverless environment

  73. - Network Connectivity - Sidecar Proxy Injection Serveless to Mesh

  74. Serverless VPC Access - Enables VPC access from fully-managed serverless

    environment - Supports Cloud Run/ App Engine / Cloud Functions - https://cloud.google.com/vpc/docs/confi gure-serverless-vpc-access?hl=en
  75. Serverless VPC Access Non-VPC resources VPC resources

  76. Sidecar Injection - Impossible... - or give-up the fully-managed env

  77. Severless is nice But still wanna communicate with services in

    the mesh.
  78. What if Service Mesh features are implemented as application library?

    and hopefully that does not increase the application codebase…
  79. Traffic Director: proxyless gRPC services

  80. - RPC using protocol buffers - Open-sourced by Google -

    Officially supports many languages - https://grpc.io/docs/languages/ - Great ecosystem gRPC
  81. gRPC: xDS-Based Global Load Balancing https://github.com/grpc/proposal/blob/master/A27-xds-global-load-balancing.md

  82. “gRPC currently supports its own "grpclb" protocol for look-aside load-balancing.

    However, the popular Envoy proxy uses the xDS API for many types of configuration, including load balancing, and that API is evolving into a standard that will be used to configure a variety of data plane software.” xDS-Based Global Load Balancing https://github.com/grpc/proposal/blob/master/A27-xds-global-load-bala ncing.md
  83. “gRPC currently supports its own "grpclb" protocol for look-aside load-balancing.

    However, the popular Envoy proxy uses the xDS API for many types of configuration, including load balancing, and that API is evolving into a standard that will be used to configure a variety of data plane software.” xDS-Based Global Load Balancing https://github.com/grpc/proposal/blob/master/A27-xds-global-load-bala ncing.md
  84. xDS Control Plane and Dataplane Control Plane Control Plane Control

    Plane ※ and more…!
  85. xDS Control Plane and Dataplane Control Plane Control Plane Control

    Plane ※ and more…!
  86. xDS Client (Data Plane) Application Source Code Application Source Code

  87. xDS Client (Data Plane) Application Source Code Application Source Code

    // For client package main import ( // abbreviated // To install the xds resolvers and balancers. _ "google.golang.org/grpc/xds" )
  88. Traffic Director: proxyless gRPC services Control Plane Control Plane Control

    Plane
  89. Traffic Director: proxyless gRPC services Provides - Service Discover -

    Client-side load-balancing - Route Matching - Traffic Splitting
  90. Let’s see the same example... Service A Service B

  91. Traffic Director using proxy-less gRPC service Service A Service B

  92. Services can communicate each other with endpoint xds://[service name]:[port], pre-defined

    in Traffic Director Service A Service B xds::///service-b:port
  93. Here’s how it works... Service A Service B Traffic Director

    1. Service B is registered in Traffic Director 2. gRPC will detect Control Plane using bootstrap file, which is defined in GRPC_XDS_BOOTSTRAP
  94. Here’s how it works... Service A Service B Traffic Director

    1. Service B is registered in Traffic Director 2. gRPC will detect Control Plane using bootstrap file, which is defined in GRPC_XDS_BOOTSTRAP { "xds_servers": [ { "server_uri": "trafficdirector.googleapis.com:443", "channel_creds": [ { "type": "google_default" } ] } ], "node": { "id": "b7f9c818-fb46-43ca-8662-d3bdbcf7ec18~10.0.0.1", "metadata": { "TRAFFICDIRECTOR_GCP_PROJECT_NUMBER": "123456789012", "TRAFFICDIRECTOR_NETWORK_NAME": "default" }, "locality": { "zone": "us-central1-a" } } }
  95. Here’s how it works... Service A Service B Traffic Director

    1. Service B is registered in Traffic Director 2. gRPC will detect Control Plane using bootstrap file, which is defined in GRPC_XDS_BOOTSTRAP { "xds_servers": [ { "server_uri": "trafficdirector.googleapis.com:443", "channel_creds": [ { "type": "google_default" } ] } ], "node": { "id": "b7f9c818-fb46-43ca-8662-d3bdbcf7ec18~10.0.0.1", "metadata": { "TRAFFICDIRECTOR_GCP_PROJECT_NUMBER": "123456789012", "TRAFFICDIRECTOR_NETWORK_NAME": "default" }, "locality": { "zone": "us-central1-a" } } } //... initContainers: - args: - --output - "/tmp/bootstrap/td-grpc-bootstrap.json" image: gcr.io/trafficdirector-prod/td-grpc-bootstrap:0.9.0 imagePullPolicy: IfNotPresent name: grpc-td-init resources: limits: cpu: 100m memory: 100Mi requests: cpu: 10m memory: 100Mi volumeMounts: - name: grpc-td-conf mountPath: /tmp/bootstrap/ //...
  96. Here’s how it works... Service A Service B Traffic Director

    1. Service B is registered in Traffic Director 2. gRPC will detect Control Plane using bootstrap file, which is defined in GRPC_XDS_BOOTSTRAP 3. Get Service B’s Info via xDS 4. make RPC call xds::///service-b:port
  97. Accessing Mesh Service from Serverless Environment Service A Service B

    Serverless Service C
  98. Accessing Mesh Service from Serverless Environment Service A Service B

    Serverless Service C xds::///service-b:port
  99. - Using xDS implement in gRPC at data plane, Traffic

    Director is managing traffic between services. Traffic Director: proxyless gRPC services
  100. - Support in gRPC client only - Limited features in

    xDS are available - Some languages are still in progress - e.g. Node.js - or officially not supported like Rust xDS gRPC
  101. - A28: gRPC xDS traffic splitting and routing - A30:

    xDS v3 Support - A31: gRPC xDS Timeout Support and Config Selector Design - A32: gRPC xDS circuit breaking - A34: `weighted_round_robin` lb_policy for per endpoint weight from `ClusterLoadAssignment` response More xDS features are proposed to implement on gRPC-side
  102. - Istio experimentally supports proxy-less data-plane Open-source project

  103. “gRPC is the second xDS client… … but it will

    not be the last!” Mark D. Roth @ EnvoyCon 2020
  104. Takeaways

  105. - Implementations infra/network-related, non-businesslogic, are required in microservice-like architecture or

    distributed environment. - Service Mesh resolve this by sidecar proxy like Envoy. Takeaways 1/4
  106. - Service Mesh Control Plane uses xDS API to configure

    Envoy. - Open-source example: Istio - Managed-solution in GCP: Traffic Director Takeaways 2/4
  107. Takeaways 3/4 - gRPC implements xDS for its load-balancing capabilities

    - Traffic Director (and Istio experimentally) support gRPC as proxless data plane
  108. - gRPC proxless services is extra dope! Takeaways 4/4

  109. Thank you