$30 off During Our Annual Pro Sale. View Details »

Building Reliable Distributed Systems on GCP

Building Reliable Distributed Systems on GCP

DevFest 2020

sakajunquality

October 18, 2020
Tweet

More Decks by sakajunquality

Other Decks in Technology

Transcript

  1. Building Reliable
    Microservices on
    GCP
    Jun Sakata
    Google Developers Expert, Cloud

    View Slide

  2. Building Reliable
    Microservices on
    GCP
    Jun Sakata
    Google Developers Expert, Cloud
    Distributed Systems

    View Slide

  3. - Google Developers Expert, Cloud
    - SRE/Technical Advisor
    - Travel/Photography/Cooking
    - GKE/Cloud Run
    @sakajunquality

    View Slide

  4. Agenda
    - Microservices and Kubernetes
    - Service Mesh
    - Traffic Director
    - Serverless Runtime
    - Proxyless gRPC services
    - Takeaways

    View Slide

  5. - Definition of microservices
    - Difference between microservices and
    distributed monolith
    - Pros and Cons of microservices
    - Technical Details of Kubernetes/Istio
    Those are NOT to be covered today!

    View Slide

  6. Microservices
    and
    Kubernetes
    Distributed Systems

    View Slide

  7. Microservices?

    View Slide

  8. “Microservices are independently
    deployable services modeled around
    a business domain. They communicate
    with each other via networks, and
    as an architecture choice offer
    many options for solving the
    problems you may face.”
    Monolith to Microservices by Sam Newman

    View Slide

  9. “Microservices are independently
    deployable services modeled around
    a business domain. They communicate
    with each other via networks, and
    as an architecture choice offer
    many options for solving the
    problems you may face.”
    Monolith to Microservices by Sam Newman
    Not covered today!

    View Slide

  10. “Microservices are independently
    deployable services modeled around
    a business domain. They communicate
    with each other via networks, and
    as an architecture choice offer
    many options for solving the
    problems you may face.”
    Monolith to Microservices by Sam Newman

    View Slide

  11. “Microservices are independently
    deployable services modeled around
    a business domain. They communicate
    with each other via networks, and
    as an architecture choice offer
    many options for solving the
    problems you may face.”
    Monolith to Microservices by Sam Newman

    View Slide

  12. “Today, it’s arguable that most
    applications are distributed in
    some fashion, even if they don’t
    use microservices.”
    Distributed Tracing in Practice
    by Rebecca Isaacs; Ben Sigelman; Daniel Spoonhower; Jonathan Mace;
    Austin Parker

    View Slide

  13. should support…
    - communication over network
    - variety of workloads, backends...
    Platform for Microservice-like Architecture

    View Slide

  14. - Platform for container workloads
    - based on on Google’s Borg
    - Orchestrates computing, networking,
    and storage resources for containers
    Kubernetes (Quick Recap)

    View Slide

  15. With Kubernetes…
    Kubernetes
    master
    Service A
    Manifest
    Image gcr.io/sakajunquality-test/foo-bar
    CPU 1, Memory 2G

    View Slide

  16. With Kubernetes…
    Kubernetes
    master
    Service A
    Manifest
    Image gcr.io/sakajunquality-test/foo-bar
    CPU 1, Memory 2G
    apiVersion: apps/v1
    kind: Deployment
    metadata:
    name: service-a
    labels:
    app: service-a
    spec:
    replicas: 1
    selector:
    matchLabels:
    app: nginx
    template:
    metadata:
    labels:
    app: service-a
    spec:
    containers:
    - name: service-a
    image: gcr.io/sakajunquality...
    ports:
    - containerPort: 8080

    View Slide

  17. Service A
    Workloads
    With Kubernetes…
    Kubernetes
    master
    Service A
    Manifest
    Image gcr.io/sakajunquality-test/foo-bar
    CPU 1, Memory 2G
    Image gcr.io/sakajunquality-test/foo-bar
    CPU 1, Memory 2G

    View Slide

  18. Service A
    Workloads
    Service B
    Workloads
    With Kubernetes…
    Kubernetes
    master
    Service A
    Manifest
    Service B
    Manifest
    In the same way...
    And More…!

    View Slide

  19. Perfect?

    View Slide

  20. Not so much...

    View Slide

  21. Consider Service to Service connection
    Service A Service B

    View Slide

  22. Service to Service connection
    Service A Service B
    Where is
    Service B?
    When I should
    retry?
    How long should I
    wait the response?
    Is this a valid
    request?
    What’s going on
    here?

    View Slide

  23. - (Intelligent) Service Discovery
    - (Intelligent) Traffic Control
    - Observability
    - Authn/Authz
    etc...
    What’s missing in Kubernetes

    View Slide

  24. Service to Service connection
    Service A Service B
    Where is
    Service B?
    When I should
    retry?
    How long should I
    wait the response?
    Is this a valid
    request?
    What’s going on
    here?
    Observability
    Service Discovery Authn/Authz
    Traffic Control

    View Slide

  25. Service Mesh

    View Slide

  26. “A service mesh is a programmable
    framework that allows you to
    observe, secure, and connect
    microservices. It doesn’t establish
    connectivity between microservices,
    but instead has policies and
    controls that are applied on top of
    an existing network to govern how
    microservices interact. ”
    Istio Explained
    by By Lin Sun and Daniel Berg

    View Slide

  27. “A service mesh is a programmable
    framework that allows you to
    observe, secure, and connect
    microservices. It doesn’t establish
    connectivity between microservices,
    but instead has policies and
    controls that are applied on top of
    an existing network to govern how
    microservices interact. ”
    Istio Explained
    by By Lin Sun and Daniel Berg

    View Slide

  28. Without Service Mesh..
    (or something equivalent)

    View Slide

  29. Without Service Mesh
    Service A Service B

    View Slide

  30. Without Service Mesh
    Service A Service B
    Service Discovery
    Business
    Logic
    Authentication
    Observability

    View Slide

  31. Without Service Mesh
    Service A Service B
    Service Discovery
    Business
    Logic
    Authentication
    Observability
    Non-Business
    Logic

    View Slide

  32. Without Service Mesh
    Service A Service B
    Service Discovery
    Business
    Logic
    Authentication
    Traffic Control
    Service Discovery
    Business
    Logic
    Authentication
    Traffic Control
    Non-Business
    Logic on every
    application

    View Slide

  33. Trying to implement tracing!
    Service A Service B
    Service Discovery
    Business
    Logic
    Authentication
    Traffic Control
    Service Discovery
    Authentication
    Traffic Control
    Tracing Tracing
    Business
    Logic
    Increases non-business loging in codebase

    View Slide

  34. With Service Mesh

    View Slide

  35. Instead of communicating directly
    Service A Service B

    View Slide

  36. In Service Mesh, proxies, called “sidecar”, communicate on
    behalf of applications
    Service A Service B
    Sidecar
    Proxy
    Sidecar
    Proxy

    View Slide

  37. In Service Mesh, proxies, called “sidecar”, communicate on
    behalf of applications
    Service A Service B
    Sidecar
    Proxy
    Sidecar
    Proxy
    Each Application
    communicates only to
    sidecar-proxy

    View Slide

  38. And sidecar proxies do non-business logic
    Service A Service B
    Sidecar
    Proxy
    Sidecar
    Proxy
    Service Discovery
    Traffic Control
    Tracing
    etc...

    View Slide

  39. Envoy
    - L7 Proxy
    - Originally from Lyft
    - High Performance / High Reliability
    - Configurable via API
    - https://www.envoyproxy.io/

    View Slide

  40. “The network should be transparent
    to applications. When network and
    application problems do occur it
    should be easy to determine the
    source of the problem.”
    What is Envoy
    (https://www.envoyproxy.io/docs/envoy/latest/intro/what_is_envoy)
    Announcing Envoy: C++ L7 proxy and communication bus by Matt Klein
    (https://eng.lyft.com/announcing-envoy-c-l7-proxy-and-communication-b
    us-92520b6c8191)

    View Slide

  41. Envoy as a sidecar proxy
    Service A Service B

    View Slide

  42. Envoy also works as a gateway
    Service A
    Service B
    Gateway

    View Slide

  43. Generally combination of both
    Service A Service B
    Gateway

    View Slide

  44. Need to configure each of proxies
    Service A Service B
    Sidecar
    Proxy
    Sidecar
    Proxy
    Configure

    View Slide

  45. Need to configure each of proxies
    Service A Service B
    Sidecar
    Proxy
    Sidecar
    Proxy
    Configure
    {
    "configs": [
    {
    "@type": "type.googleapis.com/envoy.admin.v3.BootstrapConfigDump",
    "bootstrap": {
    "node": {
    "id": "sidecar~10.23.3.28~foo-68f69cbfd5-7jdq7.fbar.svc.cluster.local",
    "cluster": "foo-68f69cbfd5.foo-staging",
    "metadata": {
    "PROXY_CONFIG": {
    "parentShutdownDuration": "60s",
    "proxyAdminPort": 15000,
    "controlPlaneAuthPolicy": "MUTUAL_TLS",
    "drainDuration": "45s",
    "proxyMetadata": {
    "DNS_AGENT": ""
    }, "terminationDrainDuration": "5s",
    "tracing": {
    "zipkin": {
    "address": "zipkin.istio-system:9411"
    }
    },
    "statusPort": 15020,
    "serviceCluster": "foo-68f69cbfd5.bar",
    "envoyMetricsService": {},
    "binaryPath": "/usr/local/bin/envoy",
    "discoveryAddress": "istiod.istio-system.svc:15012",
    "concurrency": 2,
    "envoyAccessLogService": {},
    "statNameLength": 189,
    "configPath": "./etc/istio/proxy"
    },
    "PLATFORM_METADATA": {
    "gcp_project_number": "1234566791234",
    "gcp_location": "asia-northeast1",
    "gcp_gke_cluster_url": "https://container.googleapis.com/v1/projects/sakajunquality-test/locations/asia-northeast1/clusters/kluster",
    "gcp_gke_cluster_name": "kluster",
    "gcp_project": "sakajunquality-test",
    "gcp_gce_instance_id": "1234566791234"
    },
    "CLUSTER_ID": "Kubernetes",
    "APP_CONTAINERS": "foo-app",
    "LABELS": {
    "service.istio.io/canonical-revision": "release-20200702-2",
    "rollouts-pod-template-hash": "68f69cbfd5",
    "istio.io/rev": "default",
    "app": "foo",
    "service.istio.io/canonical-name": "foo-68f69cbfd5",
    "version": "xxxx",
    "security.istio.io/tlsMode": "istio"
    },
    …..
    Hard work…
    (not always)

    View Slide

  46. Control Plane
    Service A Service B
    Sidecar
    Proxy
    Sidecar
    Proxy
    Control Plane
    Data Plane

    View Slide

  47. - Open-source Service Mesh software
    - Originally from Google, Lyft and IBM
    - https://istio.io/
    Open-source example: Istio

    View Slide

  48. Istio
    https://istio.io/latest/docs/concepts/what-is-istio/

    View Slide

  49. Istio
    https://istio.io/latest/docs/concepts/what-is-istio/

    View Slide

  50. Service A
    Service B
    Istio example (simplified)
    Kubernetes
    Traffic Management
    Manifest
    Apply manifests as
    Kubernetes CRD
    Istiod
    Configure each proxies

    View Slide

  51. Service Mesh
    - Using sidecar proxy, decouples
    infra-related non-business logic from
    applications

    View Slide

  52. Looking for fully-managed
    solutions?

    View Slide

  53. Traffic
    Director

    View Slide

  54. - Service Mesh Control Plane
    - Fully-managed w/ SLA
    - Supports both VMs and containers
    Traffic Director

    View Slide

  55. Traffic Director: Control Plane as a Service
    Service A Service B
    Traffic Director
    Data Plane
    Control Plane

    View Slide

  56. - Traffic Splitting
    - Circuit Breaking
    - Outlier detection
    - Locality Load Balancing
    - etc
    Traffic Director’s Traffic Management

    View Slide

  57. Example of traffic splitting
    Clients Service A Service B
    Version 1
    Service B
    Version 2
    10%
    90%

    View Slide

  58. - Manual Deployment for VM/Container
    - Automatic Deployment for GCE
    - GKE automatic injection
    - Proxyless
    Sidecar w/ Traffic Director

    View Slide

  59. Traffic Director: GCE
    Service A Service B
    Traffic Director
    GCE envoy
    auto-deployment
    -service-proxy=enabled

    View Slide

  60. Traffic Director: GKE
    Service A Service B
    Traffic Director
    Using Istio’s sidecar
    injection

    View Slide

  61. Serverless
    Runtime

    View Slide

  62. - Pay as you go
    - All the workloads are not necessarily
    required to be running all the time
    - e.g. event-driven workloads
    Serverless Computing Runtime

    View Slide

  63. Serverless Computing Runtime
    Cloud Functions App Engine Cloud Run

    View Slide

  64. - Fully-managed serverless environment
    for containers
    - Container with HTTP/gRPC listening to
    $PORT
    - Pay for CPU and memory @100ms +
    network transfer
    Cloud Run

    View Slide

  65. - Managed Endpoint w/ TLS termination
    - Custom Domains w/ TLS
    - 1-80 concurrent requests per instance
    - Scale from zero to 1000 instance
    - Cloud SQL connection / VPC connection
    - 1-4 vCPU / 127MiB-4GiB RAM
    - Gradual Traffic Thrifting
    Cloud Run

    View Slide

  66. - Managed Endpoint w/ TLS termination
    - Custom Domains w/ TLS
    - 1-80 concurrent requests per instance
    - Scale from zero to 1000 instance
    - Cloud SQL connection / VPC connection
    - 1-4 vCPU / 127MiB-4GiB RAM
    - Gradual Traffic Thrifting
    Cloud Run
    Easily Deployable
    Easily Scalable

    View Slide

  67. - VPC Access w/ egress setting
    - GCLB w/ Seveless Neg
    - Cloud CDN / Cloud Armor / Cloud IAP
    - Events for Cloud Run
    - 1h request timeout
    - server-streaming for HTTP and gRPC
    - SIGTERM
    - 4GB RAM / 4 vCPUs
    - min instances
    - Cloud Code / Cloud buildpacs / Deploy YAML
    - …
    Cloud Run Updates

    View Slide

  68. - VPC Access w/ egress setting
    - GCLB w/ Seveless Neg
    - Cloud CDN / Cloud Armor / Cloud IAP
    - Events for Cloud Run
    - 1h request timeout
    - server-streaming for HTTP and gRPC
    - SIGTERM
    - 4GB RAM / 4 vCPUs
    - min instances
    - Cloud Code / Cloud buildpacs / Deploy YAML
    - …
    Cloud Run Updates
    Updated Frequently!

    View Slide

  69. Can I add apps running on
    serverless to the Mesh?

    View Slide

  70. When you already have GKE-based mesh platform….
    Service A Service B

    View Slide

  71. How can a serverless app join the Mesh?
    Service A Service B
    Serverless
    Service C
    ???

    View Slide

  72. No possible. as long as
    using fully-managed
    serverless environment

    View Slide

  73. - Network Connectivity
    - Sidecar Proxy Injection
    Serveless to Mesh

    View Slide

  74. Serverless VPC Access
    - Enables VPC access from fully-managed
    serverless environment
    - Supports Cloud Run/ App Engine / Cloud
    Functions
    - https://cloud.google.com/vpc/docs/confi
    gure-serverless-vpc-access?hl=en

    View Slide

  75. Serverless VPC Access
    Non-VPC
    resources
    VPC resources

    View Slide

  76. Sidecar Injection
    - Impossible...
    - or give-up the fully-managed env

    View Slide

  77. Severless is nice
    But still wanna communicate
    with services in the mesh.

    View Slide

  78. What if Service Mesh
    features are implemented as
    application library? and
    hopefully that does not
    increase the application
    codebase…

    View Slide

  79. Traffic
    Director:
    proxyless
    gRPC services

    View Slide

  80. - RPC using protocol buffers
    - Open-sourced by Google
    - Officially supports many languages
    - https://grpc.io/docs/languages/
    - Great ecosystem
    gRPC

    View Slide

  81. gRPC: xDS-Based Global Load Balancing
    https://github.com/grpc/proposal/blob/master/A27-xds-global-load-balancing.md

    View Slide

  82. “gRPC currently supports its own
    "grpclb" protocol for look-aside
    load-balancing. However, the
    popular Envoy proxy uses the xDS
    API for many types of configuration,
    including load balancing, and that
    API is evolving into a standard
    that will be used to configure a
    variety of data plane software.”
    xDS-Based Global Load Balancing
    https://github.com/grpc/proposal/blob/master/A27-xds-global-load-bala
    ncing.md

    View Slide

  83. “gRPC currently supports its own
    "grpclb" protocol for look-aside
    load-balancing. However, the
    popular Envoy proxy uses the xDS
    API for many types of configuration,
    including load balancing, and that
    API is evolving into a standard
    that will be used to configure a
    variety of data plane software.”
    xDS-Based Global Load Balancing
    https://github.com/grpc/proposal/blob/master/A27-xds-global-load-bala
    ncing.md

    View Slide

  84. xDS Control Plane and Dataplane
    Control Plane Control Plane Control Plane
    ※ and more…!

    View Slide

  85. xDS Control Plane and Dataplane
    Control Plane Control Plane Control Plane
    ※ and more…!

    View Slide

  86. xDS Client (Data Plane)
    Application
    Source Code
    Application
    Source Code

    View Slide

  87. xDS Client (Data Plane)
    Application
    Source Code
    Application
    Source Code
    // For client
    package main
    import (
    // abbreviated
    // To install the xds resolvers and balancers.
    _ "google.golang.org/grpc/xds"
    )

    View Slide

  88. Traffic Director: proxyless gRPC services
    Control Plane Control Plane Control Plane

    View Slide

  89. Traffic Director: proxyless gRPC services
    Provides
    - Service Discover
    - Client-side load-balancing
    - Route Matching
    - Traffic Splitting

    View Slide

  90. Let’s see the same example...
    Service A Service B

    View Slide

  91. Traffic Director using proxy-less gRPC service
    Service A Service B

    View Slide

  92. Services can communicate each other with endpoint
    xds://[service name]:[port], pre-defined in Traffic Director
    Service A Service B
    xds::///service-b:port

    View Slide

  93. Here’s how it works...
    Service A Service B
    Traffic Director 1. Service B is
    registered in Traffic
    Director
    2. gRPC will detect Control
    Plane using bootstrap file,
    which is defined in
    GRPC_XDS_BOOTSTRAP

    View Slide

  94. Here’s how it works...
    Service A Service B
    Traffic Director 1. Service B is
    registered in Traffic
    Director
    2. gRPC will detect Control
    Plane using bootstrap file,
    which is defined in
    GRPC_XDS_BOOTSTRAP
    {
    "xds_servers": [
    {
    "server_uri": "trafficdirector.googleapis.com:443",
    "channel_creds": [
    {
    "type": "google_default"
    }
    ]
    }
    ],
    "node": {
    "id": "b7f9c818-fb46-43ca-8662-d3bdbcf7ec18~10.0.0.1",
    "metadata": {
    "TRAFFICDIRECTOR_GCP_PROJECT_NUMBER": "123456789012",
    "TRAFFICDIRECTOR_NETWORK_NAME": "default"
    },
    "locality": {
    "zone": "us-central1-a"
    }
    }
    }

    View Slide

  95. Here’s how it works...
    Service A Service B
    Traffic Director 1. Service B is
    registered in Traffic
    Director
    2. gRPC will detect Control
    Plane using bootstrap file,
    which is defined in
    GRPC_XDS_BOOTSTRAP
    {
    "xds_servers": [
    {
    "server_uri": "trafficdirector.googleapis.com:443",
    "channel_creds": [
    {
    "type": "google_default"
    }
    ]
    }
    ],
    "node": {
    "id": "b7f9c818-fb46-43ca-8662-d3bdbcf7ec18~10.0.0.1",
    "metadata": {
    "TRAFFICDIRECTOR_GCP_PROJECT_NUMBER": "123456789012",
    "TRAFFICDIRECTOR_NETWORK_NAME": "default"
    },
    "locality": {
    "zone": "us-central1-a"
    }
    }
    }
    //...
    initContainers:
    - args:
    - --output
    - "/tmp/bootstrap/td-grpc-bootstrap.json"
    image: gcr.io/trafficdirector-prod/td-grpc-bootstrap:0.9.0
    imagePullPolicy: IfNotPresent
    name: grpc-td-init
    resources:
    limits:
    cpu: 100m
    memory: 100Mi
    requests:
    cpu: 10m
    memory: 100Mi
    volumeMounts:
    - name: grpc-td-conf
    mountPath: /tmp/bootstrap/
    //...

    View Slide

  96. Here’s how it works...
    Service A Service B
    Traffic Director 1. Service B is
    registered in Traffic
    Director
    2. gRPC will detect Control
    Plane using bootstrap file,
    which is defined in
    GRPC_XDS_BOOTSTRAP
    3. Get Service B’s Info
    via xDS
    4. make RPC call
    xds::///service-b:port

    View Slide

  97. Accessing Mesh Service from Serverless Environment
    Service A Service B
    Serverless
    Service C

    View Slide

  98. Accessing Mesh Service from Serverless Environment
    Service A Service B
    Serverless
    Service C
    xds::///service-b:port

    View Slide

  99. - Using xDS implement in gRPC at data
    plane, Traffic Director is managing
    traffic between services.
    Traffic Director: proxyless gRPC services

    View Slide

  100. - Support in gRPC client only
    - Limited features in xDS are available
    - Some languages are still in progress
    - e.g. Node.js
    - or officially not supported like Rust
    xDS gRPC

    View Slide

  101. - A28: gRPC xDS traffic splitting and routing
    - A30: xDS v3 Support
    - A31: gRPC xDS Timeout Support and Config Selector Design
    - A32: gRPC xDS circuit breaking
    - A34: `weighted_round_robin` lb_policy for per endpoint
    weight from `ClusterLoadAssignment` response
    More xDS features are proposed to
    implement on gRPC-side

    View Slide

  102. - Istio experimentally supports
    proxy-less data-plane
    Open-source project

    View Slide

  103. “gRPC is the second xDS client…
    … but it will not be the last!”
    Mark D. Roth @ EnvoyCon 2020

    View Slide

  104. Takeaways

    View Slide

  105. - Implementations infra/network-related,
    non-businesslogic, are required in
    microservice-like architecture or
    distributed environment.
    - Service Mesh resolve this by sidecar
    proxy like Envoy.
    Takeaways 1/4

    View Slide

  106. - Service Mesh Control Plane uses xDS
    API to configure Envoy.
    - Open-source example: Istio
    - Managed-solution in GCP: Traffic
    Director
    Takeaways 2/4

    View Slide

  107. Takeaways 3/4
    - gRPC implements xDS for its
    load-balancing capabilities
    - Traffic Director (and Istio
    experimentally) support gRPC as
    proxless data plane

    View Slide

  108. - gRPC proxless services is extra dope!
    Takeaways 4/4

    View Slide

  109. Thank you

    View Slide