$30 off During Our Annual Pro Sale. View Details »

Learnings from Implementing Microservices Architecture with Kubernetes

Ray Tsang
October 29, 2019

Learnings from Implementing Microservices Architecture with Kubernetes

Ray has been on a 6-months rotation with an internal Google team to help bringing a project to public Cloud using cloud-native technology stack and Kubernetes. Ray will share the architecture, development environment technicals, devops tools, and some tough decisions that needed to be made to move the project along while being prepared for changes in the future.

This session to learn the journey including development environment tools choices (Docker Compose, Skaffold, Kustomize, Jib), to the stack (Gradle, Spring Boot, Kafka, PostgreSQL, gRPC, gRPC-Web), to mono-repo vs multi-repo, to the runtime infrastructure (Kubernetes, Istio, Prometheus, Grafana). With hindsight 20-20, we’ll visit some best practices, lessons learned, and how decisions/compromises are being made.

Ray Tsang

October 29, 2019
Tweet

More Decks by Ray Tsang

Other Decks in Technology

Transcript

  1. Learnings from Implementing
    Microservices w/ Kubernetes

    View Slide

  2. 2
    @saturnism @gcpcloud
    Ray Tsang
    Developer Advocate
    Google Cloud Platform
    Java Champion
    @saturnism | saturnism.me

    View Slide

  3. 3
    @saturnism @gcpcloud 3
    The Project...

    View Slide

  4. 4
    @saturnism @gcpcloud
    Microservices on Kubernetes!

    View Slide

  5. 5
    @saturnism @gcpcloud
    Help Alphabet companies to adopt Cloud

    View Slide

  6. 6
    @saturnism @gcpcloud
    Move from internal technologies
    To open source technologies

    View Slide

  7. 7
    @saturnism @gcpcloud
    2 implementation teams
    Infrastructure, CI/CD, cloud practices, monitoring
    Application development, business logic

    View Slide

  8. 8
    @saturnism @gcpcloud
    2 teams - working together
    Discuss business requirements, architecture, operations needs
    Application implementation team takes over operation

    View Slide

  9. 9
    @saturnism @gcpcloud 9
    Infrastructure

    View Slide

  10. 10
    @saturnism @gcpcloud
    Kubernetes cluster / Private Cluster
    Istio Namespace
    Networking
    Istio Ingress
    Istio
    Project Namespace
    Virtual Services
    Istio
    Virtual Services
    Istio
    Frontend
    Deployment
    Backends
    Deployment
    Cloud Load
    Balancing
    Identity-Aware
    Proxy
    Istio Egress
    Istio
    Cloud NAT
    Third-Party Services
    PostgreSQL
    Cloud SQL
    Images
    Container
    Registry
    Prometheus
    Monitoring
    Grafana
    Monitoring
    Jaerger
    Distributed Trace

    View Slide

  11. 11
    @saturnism @gcpcloud 11
    #1 ClickOp → GitOp
    In the cloud, it's every easy to click
    It's hard to reproduce!

    View Slide

  12. 12
    @saturnism @gcpcloud
    Infrastructure as Code
    Terraform for all cloud infrastructure - GKE, Cloud SQL, Static IPs, VPC...
    Laydown other Kubernetes infrastructure - Istio, OPA Gatekeeper...

    View Slide

  13. 13
    @saturnism @gcpcloud
    Kubernetes cluster / Private Cluster
    Istio Namespace
    Networking
    Istio Ingress
    Istio
    Project Namespace
    Virtual Services
    Istio
    Virtual Services
    Istio
    Frontend
    Deployment
    Backends
    Deployment
    Cloud Load
    Balancing
    Identity-Aware
    Proxy
    Istio Egress
    Istio
    Cloud NAT
    Third-Party Services
    PostgreSQL
    Cloud SQL
    Images
    Container
    Registry
    Prometheus
    Monitoring
    Grafana
    Monitoring
    Jaeger
    Distributed Trace
    Terraformed

    View Slide

  14. 14
    @saturnism @gcpcloud
    Kubernetes cluster / Private Cluster
    Istio Namespace
    Networking
    Istio Ingress
    Istio
    Project Namespace
    Virtual Services
    Istio
    Virtual Services
    Istio
    Frontend
    Deployment
    Backends
    Deployment
    Cloud Load
    Balancing
    Identity-Aware
    Proxy
    Istio Egress
    Istio
    Cloud NAT
    Third-Party Services
    PostgreSQL
    Cloud SQL
    Images
    Container
    Registry
    Prometheus
    Monitoring
    Grafana
    Monitoring
    Jaeger
    Distributed Trace
    Ready for deployment

    View Slide

  15. 15
    @saturnism @gcpcloud
    There is a lot of YAML
    But at least, the process is repeatable

    View Slide

  16. 16
    @saturnism @gcpcloud
    Check-in the final configuration
    helm template \
    istio-${ISTIO_VERSION}/install/kubernetes/helm/istio \
    --name istio \
    --namespace istio-system \
    -f dev-values.yaml >> istio.yaml
    istio.yaml change → PR (review, see diffs) → merge → CI/CD (apply)
    (Depends on how much you trust the templating engine)

    View Slide

  17. 17
    @saturnism @gcpcloud
    CI Pipeline (triggered by code commit)
    1. Build, test application
    2. Create container image
    3. Commit deployment manifests with new container image tag/sha
    CD Pipeline (triggered by manifest commit)
    1. Render full manifest if necessary
    2. Apply the full manifest

    View Slide

  18. 18
    @saturnism @gcpcloud 18
    #2 Adopt incrementally
    Understand the requirements for production
    Have a roadmap to know what technology to adopt, when

    View Slide

  19. 19
    @saturnism @gcpcloud
    Initial Learning Click-op to GKE
    Repeatable Infrastructure Infrastructure as code Terraform
    Networking / Ingress POC with L4 LB Kubernetes
    Moved to L7 LB with Ingress Kubernetes
    SSL Let's Encrypt? GCP Managed Certificates?
    URL mapping/routing Envoy → Istio
    Security Enable mTLS Istio
    Deny/allow Egress Istio
    Add security policies Kubernetes
    Deny/allow container images OPA Gatekeeper
    Monitoring Collect Metrics Prometheus
    Scrape the metrics
    Dashboards Grafana
    Alerts SLOs
    Infrastructure-level alerts
    Uptime checks

    View Slide

  20. 20
    @saturnism @gcpcloud
    Somethings are is hard to change
    Be careful of one way doors
    Istio sidecar requires privileges → Reevaluate/reinstall Istio w/ CNI
    Public cluster to private cluster → Delete and recreate

    View Slide

  21. 21
    @saturnism @gcpcloud 21
    #3 Adopt carefully
    Don't go back and say "I need everything on that slide!"
    Consider what you really want to achieve
    Explore and _make_ sure everything works as advertised

    View Slide

  22. 22
    @saturnism @gcpcloud
    Problem Statement
    Goal / Scope
    Solutions / Alternatives
    Pros / Cons
    Recommendation / Decision

    View Slide

  23. 23
    @saturnism @gcpcloud 23
    #4 - Consider non-technical factors
    Write a doc!
    Consider reality
    Weigh the risks
    Maintenance / Operations

    View Slide

  24. 24
    @saturnism @gcpcloud
    Mono Repo or Multi Repo

    View Slide

  25. 25
    @saturnism @gcpcloud
    Distributed Monolith or Distributed Services?

    View Slide

  26. 26
    @saturnism @gcpcloud
    It Depends…!

    View Slide

  27. 27
    @saturnism @gcpcloud
    Mono Repo Multi Repo
    Project Structure Multi-module/multi-project Single module/project
    Dependency Management Parent / Includes
    All dependencies are up to date
    Common Parent/BOM
    Automate dependency version
    updates
    Artifact Management Can avoid initially Need to publish artifacts
    Where to publish?
    Testing Easy Against Snapshots, Flaky
    CI Just one pipeline
    Builds everything
    Copy of pipeline per repo
    Build only service that changed
    CD Which service to deploy? Deploy service that changed
    Initial Velocity Fast Slow...
    Long Term Velocity Slow down over time
    Long builds
    Faster

    View Slide

  28. 28
    @saturnism @gcpcloud
    We still chose Mono Repo...
    Team is already familiar with Mono Repo
    Fast ramp up and velocity
    Lack of existing infrastructure for dependency and artifact management
    Setting up one repo and pipeline was difficult enough...

    View Slide

  29. 29
    @saturnism @gcpcloud
    We do this analysis for everything
    Every service have their own database?
    gRPC or REST?
    Kafka?
    Knative?

    View Slide

  30. 30
    @saturnism @gcpcloud 30
    #5 Anticipate changes
    Choices made today is made.
    Design to expect changes tomorrow.
    Avoid one way doors.

    View Slide

  31. 31
    @saturnism @gcpcloud
    Anticipate Multi Repo
    We anticipate to out grow the Mono Repo
    Make sure the Mono Repo is still splittable!

    View Slide

  32. 32
    @saturnism @gcpcloud
    project/
    +-- build.gradle
    +-- services/
    +-- common.gradle
    +-- auth/
    +-- user/
    +-- email/
    Project Structure

    View Slide

  33. 33
    @saturnism @gcpcloud
    project/
    +-- build.gradle
    +-- services/
    +-- common.gradle
    +-- auth/
    +-- src/main/proto/auth.proto
    +-- user/
    +-- email/
    Project Structure

    View Slide

  34. 34
    @saturnism @gcpcloud
    apply from: '../common.gradle'
    group = 'com.example.services'
    mainClassName = 'com.example.services.auth
    dependencies {
    implementation project(':common:user')
    protobuf project(path: ':services:user, configuration: 'proto')
    }

    View Slide

  35. 35
    @saturnism @gcpcloud
    Anticipate Multi Repo
    As the team grows, and new teams comes to take over services...
    Successfully split out 3 services from the Mono Repo

    View Slide

  36. 36
    @saturnism @gcpcloud 36
    #6 Focus on your application
    Architecture and design - it has nothing to do with Kubernetes
    If you design well, you can almost always deploy into Kubernetes
    12factor.net

    View Slide

  37. 37
    @saturnism @gcpcloud
    Adopt Carefully
    Anticipate Changes
    Microservices architecture is not the answer to everything
    Monolith works too, as long as it is designed well!

    View Slide

  38. 38
    @saturnism @gcpcloud 38
    #7 Local != Production
    Do not bring Slide 10 to local development
    Focus on velocity of well-designed application
    Rely on self-encapsulating unit/integration tests

    View Slide

  39. 39
    @saturnism @gcpcloud
    Why not Istio Locally?
    A lot to learn and troubleshoot
    Use less compute resource

    View Slide

  40. 40
    @saturnism @gcpcloud
    Test Locally - without Kubernetes
    Unit tests
    Integrations tests
    Wiremocks
    Testcontainers

    View Slide

  41. 41
    @saturnism @gcpcloud
    If you need to test something...
    Simple Envoy Proxy
    Local Kubernetes (k3s, minikube, ...)

    View Slide

  42. 42
    @saturnism @gcpcloud
    Cloud Code / Skaffold
    After you test everything… want to see end-to-end result
    Continuous development loop

    View Slide

  43. 43
    @saturnism @gcpcloud
    Kustomize
    Every environment is different
    Single source of truth
    Kustomize for different environments

    View Slide

  44. 44
    @saturnism @gcpcloud 44
    #8 Friends don't let friends ______
    Write Dockerfiles
    or YAML!

    View Slide

  45. 45
    @saturnism @gcpcloud
    Just jib It
    Or pack it

    View Slide

  46. 46
    @saturnism @gcpcloud
    kubectl create deployment myservice --image=... --dry-run -oyaml > k8s/deployment.yaml
    kubectl create svc clusterip myservice --tcp=8080:8080 --dry-run -oyaml > k8s/service.yaml

    View Slide

  47. 47
    @saturnism @gcpcloud
    Automate best practices whenever possible

    View Slide

  48. 48
    @saturnism @gcpcloud
    In fact, automate the entire platform!

    View Slide

  49. 49
    @saturnism @gcpcloud 49
    #9 Contracts with the runtime environment
    When is your application ready to serve traffic?
    When is it in trouble?
    How do you shutdown gracefully?

    View Slide

  50. 50
    @saturnism @gcpcloud
    Resources
    Resource Request
    and Resource Limits

    View Slide

  51. 51
    @saturnism @gcpcloud
    When to use? Failure Means... Practices Example
    Liveness
    Probe
    If application is
    alive.
    Application will be
    restarted, and that a
    restart will help recover.
    Runs on serving port of the
    application, e.g., 8080.
    Don't check dependency. E.g.,
    don't check dependent database
    connection, etc.
    A simple /alive URL
    that returns 200.
    Readiness
    Probe
    Ready to serve
    requests.
    Take the pod instance
    out of load balancer.
    Flip to ready when application has
    done all the initializations (cache
    preloaded).
    Upon SIGTERM, flip readiness to
    false. See Graceful Shutdown.
    /actuator/health on
    the management
    port.

    View Slide

  52. 52
    @saturnism @gcpcloud
    Anatomy of a Graceful Shutdown
    1. Receive SIGTERM or PreStop Lifecycle Hook
    2. Fail Readiness Probe
    3. Receive requests until Kubernetes detects readiness probe
    failure
    4. Kubernetes removes pod endpoint from Service
    5. Finish serving in-flight requests
    6. Shutdown

    View Slide

  53. 53
    @saturnism @gcpcloud 53
    We now speak the same language

    View Slide

  54. 54
    @saturnism @gcpcloud
    All the cross-cutting concerns are the same
    Monolith, microservices, Kubernetes, not Kubernetes...
    But the 2 teams now speak with the same nouns:
    Deployment, Service, Ingress, Virtual Service, ...

    View Slide

  55. 55
    @saturnism @gcpcloud 55
    Thanks!
    saturnism.me/talk/kubernetes-microservices-lessons-learned/
    bit.ly/k8s-lab | bit.ly/istio-lab | bit.ly/spring-gcp-lab
    gcplab.me/spring
    @saturnism | saturnism.me

    View Slide