Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Building Reliable Distributed Systems on GCP

Building Reliable Distributed Systems on GCP

DevFest 2020

sakajunquality

October 18, 2020
Tweet

More Decks by sakajunquality

Other Decks in Technology

Transcript

  1. Agenda - Microservices and Kubernetes - Service Mesh - Traffic

    Director - Serverless Runtime - Proxyless gRPC services - Takeaways
  2. - Definition of microservices - Difference between microservices and distributed

    monolith - Pros and Cons of microservices - Technical Details of Kubernetes/Istio Those are NOT to be covered today!
  3. “Microservices are independently deployable services modeled around a business domain.

    They communicate with each other via networks, and as an architecture choice offer many options for solving the problems you may face.” Monolith to Microservices by Sam Newman
  4. “Microservices are independently deployable services modeled around a business domain.

    They communicate with each other via networks, and as an architecture choice offer many options for solving the problems you may face.” Monolith to Microservices by Sam Newman Not covered today!
  5. “Microservices are independently deployable services modeled around a business domain.

    They communicate with each other via networks, and as an architecture choice offer many options for solving the problems you may face.” Monolith to Microservices by Sam Newman
  6. “Microservices are independently deployable services modeled around a business domain.

    They communicate with each other via networks, and as an architecture choice offer many options for solving the problems you may face.” Monolith to Microservices by Sam Newman
  7. “Today, it’s arguable that most applications are distributed in some

    fashion, even if they don’t use microservices.” Distributed Tracing in Practice by Rebecca Isaacs; Ben Sigelman; Daniel Spoonhower; Jonathan Mace; Austin Parker
  8. should support… - communication over network - variety of workloads,

    backends... Platform for Microservice-like Architecture
  9. - Platform for container workloads - based on on Google’s

    Borg - Orchestrates computing, networking, and storage resources for containers Kubernetes (Quick Recap)
  10. With Kubernetes… Kubernetes master Service A Manifest Image gcr.io/sakajunquality-test/foo-bar CPU

    1, Memory 2G apiVersion: apps/v1 kind: Deployment metadata: name: service-a labels: app: service-a spec: replicas: 1 selector: matchLabels: app: nginx template: metadata: labels: app: service-a spec: containers: - name: service-a image: gcr.io/sakajunquality... ports: - containerPort: 8080
  11. Service A Workloads With Kubernetes… Kubernetes master Service A Manifest

    Image gcr.io/sakajunquality-test/foo-bar CPU 1, Memory 2G Image gcr.io/sakajunquality-test/foo-bar CPU 1, Memory 2G
  12. Service A Workloads Service B Workloads With Kubernetes… Kubernetes master

    Service A Manifest Service B Manifest In the same way... And More…!
  13. Service to Service connection Service A Service B Where is

    Service B? When I should retry? How long should I wait the response? Is this a valid request? What’s going on here?
  14. Service to Service connection Service A Service B Where is

    Service B? When I should retry? How long should I wait the response? Is this a valid request? What’s going on here? Observability Service Discovery Authn/Authz Traffic Control
  15. “A service mesh is a programmable framework that allows you

    to observe, secure, and connect microservices. It doesn’t establish connectivity between microservices, but instead has policies and controls that are applied on top of an existing network to govern how microservices interact. ” Istio Explained by By Lin Sun and Daniel Berg
  16. “A service mesh is a programmable framework that allows you

    to observe, secure, and connect microservices. It doesn’t establish connectivity between microservices, but instead has policies and controls that are applied on top of an existing network to govern how microservices interact. ” Istio Explained by By Lin Sun and Daniel Berg
  17. Without Service Mesh Service A Service B Service Discovery Business

    Logic Authentication Observability Non-Business Logic
  18. Without Service Mesh Service A Service B Service Discovery Business

    Logic Authentication Traffic Control Service Discovery Business Logic Authentication Traffic Control Non-Business Logic on every application
  19. Trying to implement tracing! Service A Service B Service Discovery

    Business Logic Authentication Traffic Control Service Discovery Authentication Traffic Control Tracing Tracing Business Logic Increases non-business loging in codebase
  20. In Service Mesh, proxies, called “sidecar”, communicate on behalf of

    applications Service A Service B Sidecar Proxy Sidecar Proxy
  21. In Service Mesh, proxies, called “sidecar”, communicate on behalf of

    applications Service A Service B Sidecar Proxy Sidecar Proxy Each Application communicates only to sidecar-proxy
  22. And sidecar proxies do non-business logic Service A Service B

    Sidecar Proxy Sidecar Proxy Service Discovery Traffic Control Tracing etc...
  23. Envoy - L7 Proxy - Originally from Lyft - High

    Performance / High Reliability - Configurable via API - https://www.envoyproxy.io/
  24. “The network should be transparent to applications. When network and

    application problems do occur it should be easy to determine the source of the problem.” What is Envoy (https://www.envoyproxy.io/docs/envoy/latest/intro/what_is_envoy) Announcing Envoy: C++ L7 proxy and communication bus by Matt Klein (https://eng.lyft.com/announcing-envoy-c-l7-proxy-and-communication-b us-92520b6c8191)
  25. Need to configure each of proxies Service A Service B

    Sidecar Proxy Sidecar Proxy Configure
  26. Need to configure each of proxies Service A Service B

    Sidecar Proxy Sidecar Proxy Configure { "configs": [ { "@type": "type.googleapis.com/envoy.admin.v3.BootstrapConfigDump", "bootstrap": { "node": { "id": "sidecar~10.23.3.28~foo-68f69cbfd5-7jdq7.fbar.svc.cluster.local", "cluster": "foo-68f69cbfd5.foo-staging", "metadata": { "PROXY_CONFIG": { "parentShutdownDuration": "60s", "proxyAdminPort": 15000, "controlPlaneAuthPolicy": "MUTUAL_TLS", "drainDuration": "45s", "proxyMetadata": { "DNS_AGENT": "" }, "terminationDrainDuration": "5s", "tracing": { "zipkin": { "address": "zipkin.istio-system:9411" } }, "statusPort": 15020, "serviceCluster": "foo-68f69cbfd5.bar", "envoyMetricsService": {}, "binaryPath": "/usr/local/bin/envoy", "discoveryAddress": "istiod.istio-system.svc:15012", "concurrency": 2, "envoyAccessLogService": {}, "statNameLength": 189, "configPath": "./etc/istio/proxy" }, "PLATFORM_METADATA": { "gcp_project_number": "1234566791234", "gcp_location": "asia-northeast1", "gcp_gke_cluster_url": "https://container.googleapis.com/v1/projects/sakajunquality-test/locations/asia-northeast1/clusters/kluster", "gcp_gke_cluster_name": "kluster", "gcp_project": "sakajunquality-test", "gcp_gce_instance_id": "1234566791234" }, "CLUSTER_ID": "Kubernetes", "APP_CONTAINERS": "foo-app", "LABELS": { "service.istio.io/canonical-revision": "release-20200702-2", "rollouts-pod-template-hash": "68f69cbfd5", "istio.io/rev": "default", "app": "foo", "service.istio.io/canonical-name": "foo-68f69cbfd5", "version": "xxxx", "security.istio.io/tlsMode": "istio" }, ….. Hard work… (not always)
  27. - Open-source Service Mesh software - Originally from Google, Lyft

    and IBM - https://istio.io/ Open-source example: Istio
  28. Service A Service B Istio example (simplified) Kubernetes Traffic Management

    Manifest Apply manifests as Kubernetes CRD Istiod Configure each proxies
  29. - Service Mesh Control Plane - Fully-managed w/ SLA -

    Supports both VMs and containers Traffic Director
  30. Traffic Director: Control Plane as a Service Service A Service

    B Traffic Director Data Plane Control Plane
  31. - Traffic Splitting - Circuit Breaking - Outlier detection -

    Locality Load Balancing - etc Traffic Director’s Traffic Management
  32. - Manual Deployment for VM/Container - Automatic Deployment for GCE

    - GKE automatic injection - Proxyless Sidecar w/ Traffic Director
  33. Traffic Director: GCE Service A Service B Traffic Director GCE

    envoy auto-deployment -service-proxy=enabled
  34. - Pay as you go - All the workloads are

    not necessarily required to be running all the time - e.g. event-driven workloads Serverless Computing Runtime
  35. - Fully-managed serverless environment for containers - Container with HTTP/gRPC

    listening to $PORT - Pay for CPU and memory @100ms + network transfer Cloud Run
  36. - Managed Endpoint w/ TLS termination - Custom Domains w/

    TLS - 1-80 concurrent requests per instance - Scale from zero to 1000 instance - Cloud SQL connection / VPC connection - 1-4 vCPU / 127MiB-4GiB RAM - Gradual Traffic Thrifting Cloud Run
  37. - Managed Endpoint w/ TLS termination - Custom Domains w/

    TLS - 1-80 concurrent requests per instance - Scale from zero to 1000 instance - Cloud SQL connection / VPC connection - 1-4 vCPU / 127MiB-4GiB RAM - Gradual Traffic Thrifting Cloud Run Easily Deployable Easily Scalable
  38. - VPC Access w/ egress setting - GCLB w/ Seveless

    Neg - Cloud CDN / Cloud Armor / Cloud IAP - Events for Cloud Run - 1h request timeout - server-streaming for HTTP and gRPC - SIGTERM - 4GB RAM / 4 vCPUs - min instances - Cloud Code / Cloud buildpacs / Deploy YAML - … Cloud Run Updates
  39. - VPC Access w/ egress setting - GCLB w/ Seveless

    Neg - Cloud CDN / Cloud Armor / Cloud IAP - Events for Cloud Run - 1h request timeout - server-streaming for HTTP and gRPC - SIGTERM - 4GB RAM / 4 vCPUs - min instances - Cloud Code / Cloud buildpacs / Deploy YAML - … Cloud Run Updates Updated Frequently!
  40. How can a serverless app join the Mesh? Service A

    Service B Serverless Service C ???
  41. Serverless VPC Access - Enables VPC access from fully-managed serverless

    environment - Supports Cloud Run/ App Engine / Cloud Functions - https://cloud.google.com/vpc/docs/confi gure-serverless-vpc-access?hl=en
  42. What if Service Mesh features are implemented as application library?

    and hopefully that does not increase the application codebase…
  43. - RPC using protocol buffers - Open-sourced by Google -

    Officially supports many languages - https://grpc.io/docs/languages/ - Great ecosystem gRPC
  44. “gRPC currently supports its own "grpclb" protocol for look-aside load-balancing.

    However, the popular Envoy proxy uses the xDS API for many types of configuration, including load balancing, and that API is evolving into a standard that will be used to configure a variety of data plane software.” xDS-Based Global Load Balancing https://github.com/grpc/proposal/blob/master/A27-xds-global-load-bala ncing.md
  45. “gRPC currently supports its own "grpclb" protocol for look-aside load-balancing.

    However, the popular Envoy proxy uses the xDS API for many types of configuration, including load balancing, and that API is evolving into a standard that will be used to configure a variety of data plane software.” xDS-Based Global Load Balancing https://github.com/grpc/proposal/blob/master/A27-xds-global-load-bala ncing.md
  46. xDS Client (Data Plane) Application Source Code Application Source Code

    // For client package main import ( // abbreviated // To install the xds resolvers and balancers. _ "google.golang.org/grpc/xds" )
  47. Traffic Director: proxyless gRPC services Provides - Service Discover -

    Client-side load-balancing - Route Matching - Traffic Splitting
  48. Services can communicate each other with endpoint xds://[service name]:[port], pre-defined

    in Traffic Director Service A Service B xds::///service-b:port
  49. Here’s how it works... Service A Service B Traffic Director

    1. Service B is registered in Traffic Director 2. gRPC will detect Control Plane using bootstrap file, which is defined in GRPC_XDS_BOOTSTRAP
  50. Here’s how it works... Service A Service B Traffic Director

    1. Service B is registered in Traffic Director 2. gRPC will detect Control Plane using bootstrap file, which is defined in GRPC_XDS_BOOTSTRAP { "xds_servers": [ { "server_uri": "trafficdirector.googleapis.com:443", "channel_creds": [ { "type": "google_default" } ] } ], "node": { "id": "b7f9c818-fb46-43ca-8662-d3bdbcf7ec18~10.0.0.1", "metadata": { "TRAFFICDIRECTOR_GCP_PROJECT_NUMBER": "123456789012", "TRAFFICDIRECTOR_NETWORK_NAME": "default" }, "locality": { "zone": "us-central1-a" } } }
  51. Here’s how it works... Service A Service B Traffic Director

    1. Service B is registered in Traffic Director 2. gRPC will detect Control Plane using bootstrap file, which is defined in GRPC_XDS_BOOTSTRAP { "xds_servers": [ { "server_uri": "trafficdirector.googleapis.com:443", "channel_creds": [ { "type": "google_default" } ] } ], "node": { "id": "b7f9c818-fb46-43ca-8662-d3bdbcf7ec18~10.0.0.1", "metadata": { "TRAFFICDIRECTOR_GCP_PROJECT_NUMBER": "123456789012", "TRAFFICDIRECTOR_NETWORK_NAME": "default" }, "locality": { "zone": "us-central1-a" } } } //... initContainers: - args: - --output - "/tmp/bootstrap/td-grpc-bootstrap.json" image: gcr.io/trafficdirector-prod/td-grpc-bootstrap:0.9.0 imagePullPolicy: IfNotPresent name: grpc-td-init resources: limits: cpu: 100m memory: 100Mi requests: cpu: 10m memory: 100Mi volumeMounts: - name: grpc-td-conf mountPath: /tmp/bootstrap/ //...
  52. Here’s how it works... Service A Service B Traffic Director

    1. Service B is registered in Traffic Director 2. gRPC will detect Control Plane using bootstrap file, which is defined in GRPC_XDS_BOOTSTRAP 3. Get Service B’s Info via xDS 4. make RPC call xds::///service-b:port
  53. - Using xDS implement in gRPC at data plane, Traffic

    Director is managing traffic between services. Traffic Director: proxyless gRPC services
  54. - Support in gRPC client only - Limited features in

    xDS are available - Some languages are still in progress - e.g. Node.js - or officially not supported like Rust xDS gRPC
  55. - A28: gRPC xDS traffic splitting and routing - A30:

    xDS v3 Support - A31: gRPC xDS Timeout Support and Config Selector Design - A32: gRPC xDS circuit breaking - A34: `weighted_round_robin` lb_policy for per endpoint weight from `ClusterLoadAssignment` response More xDS features are proposed to implement on gRPC-side
  56. “gRPC is the second xDS client… … but it will

    not be the last!” Mark D. Roth @ EnvoyCon 2020
  57. - Implementations infra/network-related, non-businesslogic, are required in microservice-like architecture or

    distributed environment. - Service Mesh resolve this by sidecar proxy like Envoy. Takeaways 1/4
  58. - Service Mesh Control Plane uses xDS API to configure

    Envoy. - Open-source example: Istio - Managed-solution in GCP: Traffic Director Takeaways 2/4
  59. Takeaways 3/4 - gRPC implements xDS for its load-balancing capabilities

    - Traffic Director (and Istio experimentally) support gRPC as proxless data plane