Learnings from Implementing Microservices Architecture with Kubernetes

5e0c801ac1a5d0512bb9774ab158d06d?s=47 Ray Tsang
October 29, 2019

Learnings from Implementing Microservices Architecture with Kubernetes

Ray has been on a 6-months rotation with an internal Google team to help bringing a project to public Cloud using cloud-native technology stack and Kubernetes. Ray will share the architecture, development environment technicals, devops tools, and some tough decisions that needed to be made to move the project along while being prepared for changes in the future.

This session to learn the journey including development environment tools choices (Docker Compose, Skaffold, Kustomize, Jib), to the stack (Gradle, Spring Boot, Kafka, PostgreSQL, gRPC, gRPC-Web), to mono-repo vs multi-repo, to the runtime infrastructure (Kubernetes, Istio, Prometheus, Grafana). With hindsight 20-20, we’ll visit some best practices, lessons learned, and how decisions/compromises are being made.

5e0c801ac1a5d0512bb9774ab158d06d?s=128

Ray Tsang

October 29, 2019
Tweet

Transcript

  1. 2.

    2 @saturnism @gcpcloud Ray Tsang Developer Advocate Google Cloud Platform

    Java Champion Spring Cloud GCP spring.io/projects/spring-cloud-gcp @saturnism | saturnism.me
  2. 7.

    7 @saturnism @gcpcloud Kubernetes cluster / Private Cluster Istio Namespace

    Networking Istio Ingress Istio Project Namespace Virtual Services Istio Virtual Services Istio Frontend Deployment Backends Deployment Cloud Load Balancing Identity-Aware Proxy Istio Egress Istio Cloud NAT Third-Party Services PostgreSQL Cloud SQL Images Container Registry Prometheus Monitoring Grafana Monitoring Jaerger Distributed Trace
  3. 8.
  4. 9.

    9 @saturnism @gcpcloud Infrastructure as Code Terraform for all cloud

    infrastructure - GKE, Cloud SQL, Static IPs, VPC... Terraform State in Google Cloud Storage bucket Laydown other Kubernetes infrastructure - Let's Encrypt, Istio, ...
  5. 10.

    10 @saturnism @gcpcloud Kubernetes cluster / Private Cluster Istio Namespace

    Networking Istio Ingress Istio Project Namespace Virtual Services Istio Virtual Services Istio Frontend Deployment Backends Deployment Cloud Load Balancing Identity-Aware Proxy Istio Egress Istio Cloud NAT Third-Party Services PostgreSQL Cloud SQL Images Container Registry Prometheus Monitoring Grafana Monitoring Jaeger Distributed Trace Terraformed
  6. 11.

    11 @saturnism @gcpcloud terraform/ +-- modules/ +-- ... +-- dev/

    +-- backend.tf +-- main.tf +-- variables.tf +-- staging/ Terraform for all Environments
  7. 12.

    12 @saturnism @gcpcloud terraform { # Pin version, to avoid

    unintended upgrades # 0.12.x has good fixes around API enablement required_version = "= 0.12.3" backend "gcs" { bucket = "project-dev-terraform-bucket" prefix = "state/project-dev" } } backend.tf
  8. 13.

    13 @saturnism @gcpcloud Kubernetes cluster / Private Cluster Istio Namespace

    Networking Istio Ingress Istio Project Namespace Virtual Services Istio Virtual Services Istio Frontend Deployment Backends Deployment Cloud Load Balancing Identity-Aware Proxy Istio Egress Istio Cloud NAT Third-Party Services PostgreSQL Cloud SQL Images Container Registry Prometheus Monitoring Grafana Monitoring Jaeger Distributed Trace Ready for deployment
  9. 15.

    15 @saturnism @gcpcloud Local Development != Production Production IS hard

    You'll be stuck for a long time… Productivity, Design, Don't start with Kubernetes
  10. 18.

    18 @saturnism @gcpcloud Learned from Sergei Egorov You can also

    use Testcontainers during local run! https://bsideup.github.io/posts/local_development_with_testcontainers/
  11. 23.

    23 @saturnism @gcpcloud Mono Repo Multi Repo Project Structure Multi-module/multi-project

    Single module/project Dependency Management Parent / Includes All dependencies are up to date Common Parent/BOM Automate dependency version updates Artifact Management Can avoid initially Need to publish artifacts Where to publish? Testing Easy Against Snapshots, Flaky CI Just one pipeline Builds everything Copy of pipeline per repo Build only service that changed CD Which service to deploy? Deploy service that changed Initial Velocity Fast Slow... Long Term Velocity Slow down over time Long builds Faster
  12. 24.

    24 @saturnism @gcpcloud We still chose Mono Repo... Team is

    already familiar with Mono Repo Fast ramp up and velocity Lack of existing infrastructure for dependency and artifact management Setting up one repo and pipeline was difficult enough...
  13. 25.

    25 @saturnism @gcpcloud Anticipate Multi Repo We anticipate to out

    grow the Mono Repo Make sure the Mono Repo is still splittable! Successfully split out 3 services from the Mono Repo
  14. 26.
  15. 27.

    27 @saturnism @gcpcloud apply from: '../common.gradle' group = 'com.example.services' mainClassName

    = 'com.example.services.auth dependencies { implementation project(':common:user') protobuf project(path: ':services:user, configuration: 'proto') }
  16. 28.

    28 @saturnism @gcpcloud Learnings Consider risks outside of technical choices

    Design well - Anticipate changes Always list out pros/cons/why
  17. 32.

    32 @saturnism @gcpcloud gRPC Also because, .proto message AddUserRequest {

    string id = 1; string name = 2; google.protobuf.Timestamp registration_time = 3; } service UserService { rpc AddUser(AddUserRequest) returns (AddUserResponse); }
  18. 34.

    34 @saturnism @gcpcloud Kafka for Events / Async Process Using

    protobuf for Kafka messages New User (proto) → User Service → Audit Event (proto) → Kafka → Audit Service
  19. 35.

    35 @saturnism @gcpcloud .proto files Store all in the same

    directory? Or under individual services? Mono Repo or Multi Repo - again! Anticipate change! Store .proto files w/ the services
  20. 36.

    36 @saturnism @gcpcloud project/ +-- build.gradle +-- services/ +-- common.gradle

    +-- auth/ +-- src/main/proto/auth.proto +-- user/ +-- email/ Project Structure
  21. 37.

    37 @saturnism @gcpcloud configurations { proto } task protoJar(type: Jar)

    { archiveClassifier = 'proto' from 'src/main/proto' } artifacts { proto protoJar } Bundle .proto files as a JAR
  22. 38.

    38 @saturnism @gcpcloud apply from: '../common.gradle' group = 'com.example.services' mainClassName

    = 'com.example.services.auth dependencies { implementation project(':common:user') protobuf project(path: ':services:user, configuration: 'proto') }
  23. 40.

    40 @saturnism @gcpcloud Challenges Spring Data / ORM - Proto

    to POJO with MapStruct Spring Security - Custom interceptors for gRPC Bean Validation - Manual. Consider github.com/envoyproxy/protoc-gen-validate
  24. 41.

    41 @saturnism @gcpcloud gRPC-Web Most browsers don't support full HTTP/2

    required by gRPC Use gRPC-Web github.com/grpc/grpc-web Armeria gRPC-Web to gRPC Transcoding using Envoy or Istio
  25. 42.
  26. 43.

    43 @saturnism @gcpcloud Non-Local - Istio Service Mesh mTLS Traffic

    Routing gRPC-Web Transcoding Built-in Trace and Monitoring Egress Whitelisting Basic circuit breaking / retries
  27. 44.

    44 @saturnism @gcpcloud Why not Istio Locally? A lot to

    learn, troubleshoot Use less resource Focus on code
  28. 45.

    45 @saturnism @gcpcloud Istio - Helm Templating, No Tiller! #

    istio.yaml is preprended with CRD first helm template \ istio-${ISTIO_VERSION}/install/kubernetes/helm/istio \ --name istio \ --namespace istio-system \ -f dev-values.yaml >> istio.yaml istio.yaml change → PR (review, see diffs) → merge → CI/CD (apply)
  29. 46.

    46 @saturnism @gcpcloud Routing with Virtual Service apiVersion: networking.istio.io/v1alpha3 kind:

    VirtualService metadata: name: auth-virtualservice spec: hosts: - "*" gateways: - ingress-gateway http: - match: - uri: prefix: "/com.example.auth." route: - destination: host: "auth-service" port: number: ...
  30. 47.

    47 @saturnism @gcpcloud grpc-web transcoding apiVersion: v1 kind: Service metadata:

    name: auth-service spec: ports: - name: grpc-web-port port: ... targetPort: ... selector: app: ...
  31. 48.

    48 @saturnism @gcpcloud Learnings Keep Development Environment Simple Store definitions

    with corresponding service Keep service simple, off load to service mesh in production saturnism.me/talk/grpc-101 saturnism.me/talk/istio-101
  32. 50.

    50 @saturnism @gcpcloud Container Best Practices saturnism.me/talk/docker-tips-and-tricks What's in that

    image? Don't run as root Multi-stage build Create small image Fat JAR to Thin JAR Layering Build cache Pin versions Reduce layer size ...
  33. 53.

    53 @saturnism @gcpcloud Estimate Memory Cloud Foundry Buildpack Memory Calculator

    https://github.com/cloudfoundry/java-buildpack-memory-calculator
  34. 57.

    57 @saturnism @gcpcloud Local Development Started w/ Docker Compose Easy

    to get started But… Differed too much from Kubernetes
  35. 58.

    58 @saturnism @gcpcloud Local Kubernetes Linux - consider k3s, k3d,

    kind, … Mac - Docker for Desktop, Minikube
  36. 59.

    59 @saturnism @gcpcloud Friends don't let friends write YAML files

    kubectl create deployment myservice --image=... --dry-run -oyaml > k8s/deployment.yaml kubectl create svc clusterip myservice --tcp=8080:8080 --dry-run -oyaml/service.yaml Keep with individual services
  37. 63.

    63 @saturnism @gcpcloud When to use? Failure Means... Practices Example

    Liveness Probe If application is alive. Application will be restarted, and that a restart will help recover. Runs on serving port of the application, e.g., 8080. Don't check dependency. E.g., don't check dependent database connection, etc. A simple /alive URL that returns 200. Readiness Probe Ready to serve requests. Take the pod instance out of load balancer. Flip to ready when application has done all the initializations (cache preloaded). Upon SIGTERM, flip readiness to false. See Graceful Shutdown. /actuator/health on the management port.
  38. 64.

    64 @saturnism @gcpcloud Anatomy of a Graceful Shutdown 1. Receive

    SIGTERM or PreStop Lifecycle Hook 2. Fail Readiness Probe 3. Receive requests until Kubernetes detects readiness probe failure 4. Kubernetes removes pod endpoint from Service 5. Finish serving in-flight requests 6. Shutdown
  39. 65.

    65 @saturnism @gcpcloud Road to Production VPC, NAT, Egress Whitelisting,

    ... Pod Security Policy / Pod Security Context Expect your app to not work in production environment with hardened security Try this early and fix issues
  40. 66.

    66 @saturnism @gcpcloud ClickOps → GitOps - Check in your

    TF/YAML, reproducible environment 12 Factor App - Design your application well regardless of the target runtime environments Dev Env != Prod Env - Keep dev simple, test locally, Testcontainers Know your JDK Version - Use OpenJDK >= 8u142 Friends don't let friends write ________ - Use tools that automates best practices Kubernetes' Application Lifecycle - Liveness vs Readiness Probe, Graceful Shutdown Development Tools - gRPC, Testcontainer, Jib, Dekorate, Skaffold, Kustomize, Cloud Code Infrastructure Tools - Terraform, Kubernetes, Istio