Learnings from Implementing Microservices Architecture with Kubernetes

Slide 1

Slide 1 text

Learnings from Implementing Microservices w/ Kubernetes

Slide 2

Slide 2 text

2 @saturnism @gcpcloud Ray Tsang Developer Advocate Google Cloud Platform Java Champion @saturnism | saturnism.me

Slide 3

Slide 3 text

3 @saturnism @gcpcloud 3 The Project...

Slide 4

Slide 4 text

4 @saturnism @gcpcloud Microservices on Kubernetes!

Slide 5

Slide 5 text

5 @saturnism @gcpcloud Help Alphabet companies to adopt Cloud

Slide 6

Slide 6 text

6 @saturnism @gcpcloud Move from internal technologies To open source technologies

Slide 7

Slide 7 text

7 @saturnism @gcpcloud 2 implementation teams Infrastructure, CI/CD, cloud practices, monitoring Application development, business logic

Slide 8

Slide 8 text

8 @saturnism @gcpcloud 2 teams - working together Discuss business requirements, architecture, operations needs Application implementation team takes over operation

Slide 9

Slide 9 text

9 @saturnism @gcpcloud 9 Infrastructure

Slide 10

Slide 10 text

10 @saturnism @gcpcloud Kubernetes cluster / Private Cluster Istio Namespace Networking Istio Ingress Istio Project Namespace Virtual Services Istio Virtual Services Istio Frontend Deployment Backends Deployment Cloud Load Balancing Identity-Aware Proxy Istio Egress Istio Cloud NAT Third-Party Services PostgreSQL Cloud SQL Images Container Registry Prometheus Monitoring Grafana Monitoring Jaerger Distributed Trace

Slide 11

Slide 11 text

11 @saturnism @gcpcloud 11 #1 ClickOp → GitOp In the cloud, it's every easy to click It's hard to reproduce!

Slide 12

Slide 12 text

12 @saturnism @gcpcloud Infrastructure as Code Terraform for all cloud infrastructure - GKE, Cloud SQL, Static IPs, VPC... Laydown other Kubernetes infrastructure - Istio, OPA Gatekeeper...

Slide 13

Slide 13 text

13 @saturnism @gcpcloud Kubernetes cluster / Private Cluster Istio Namespace Networking Istio Ingress Istio Project Namespace Virtual Services Istio Virtual Services Istio Frontend Deployment Backends Deployment Cloud Load Balancing Identity-Aware Proxy Istio Egress Istio Cloud NAT Third-Party Services PostgreSQL Cloud SQL Images Container Registry Prometheus Monitoring Grafana Monitoring Jaeger Distributed Trace Terraformed

Slide 14

Slide 14 text

14 @saturnism @gcpcloud Kubernetes cluster / Private Cluster Istio Namespace Networking Istio Ingress Istio Project Namespace Virtual Services Istio Virtual Services Istio Frontend Deployment Backends Deployment Cloud Load Balancing Identity-Aware Proxy Istio Egress Istio Cloud NAT Third-Party Services PostgreSQL Cloud SQL Images Container Registry Prometheus Monitoring Grafana Monitoring Jaeger Distributed Trace Ready for deployment

Slide 15

Slide 15 text

15 @saturnism @gcpcloud There is a lot of YAML But at least, the process is repeatable

Slide 16

Slide 16 text

16 @saturnism @gcpcloud Check-in the ﬁnal conﬁguration helm template \ istio-${ISTIO_VERSION}/install/kubernetes/helm/istio \ --name istio \ --namespace istio-system \ -f dev-values.yaml >> istio.yaml istio.yaml change → PR (review, see diffs) → merge → CI/CD (apply) (Depends on how much you trust the templating engine)

Slide 17

Slide 17 text

17 @saturnism @gcpcloud CI Pipeline (triggered by code commit) 1. Build, test application 2. Create container image 3. Commit deployment manifests with new container image tag/sha CD Pipeline (triggered by manifest commit) 1. Render full manifest if necessary 2. Apply the full manifest

Slide 18

Slide 18 text

18 @saturnism @gcpcloud 18 #2 Adopt incrementally Understand the requirements for production Have a roadmap to know what technology to adopt, when

Slide 19

Slide 19 text

19 @saturnism @gcpcloud Initial Learning Click-op to GKE Repeatable Infrastructure Infrastructure as code Terraform Networking / Ingress POC with L4 LB Kubernetes Moved to L7 LB with Ingress Kubernetes SSL Let's Encrypt? GCP Managed Certificates? URL mapping/routing Envoy → Istio Security Enable mTLS Istio Deny/allow Egress Istio Add security policies Kubernetes Deny/allow container images OPA Gatekeeper Monitoring Collect Metrics Prometheus Scrape the metrics Dashboards Grafana Alerts SLOs Infrastructure-level alerts Uptime checks

Slide 20

Slide 20 text

20 @saturnism @gcpcloud Somethings are is hard to change Be careful of one way doors Istio sidecar requires privileges → Reevaluate/reinstall Istio w/ CNI Public cluster to private cluster → Delete and recreate

Slide 21

Slide 21 text

21 @saturnism @gcpcloud 21 #3 Adopt carefully Don't go back and say "I need everything on that slide!" Consider what you really want to achieve Explore and _make_ sure everything works as advertised

Slide 22

Slide 22 text

22 @saturnism @gcpcloud Problem Statement Goal / Scope Solutions / Alternatives Pros / Cons Recommendation / Decision

Slide 23

Slide 23 text

23 @saturnism @gcpcloud 23 #4 - Consider non-technical factors Write a doc! Consider reality Weigh the risks Maintenance / Operations

Slide 24

Slide 24 text

24 @saturnism @gcpcloud Mono Repo or Multi Repo

Slide 25

Slide 25 text

25 @saturnism @gcpcloud Distributed Monolith or Distributed Services?

Slide 26

Slide 26 text

26 @saturnism @gcpcloud It Depends…!

Slide 27

Slide 27 text

27 @saturnism @gcpcloud Mono Repo Multi Repo Project Structure Multi-module/multi-project Single module/project Dependency Management Parent / Includes All dependencies are up to date Common Parent/BOM Automate dependency version updates Artifact Management Can avoid initially Need to publish artifacts Where to publish? Testing Easy Against Snapshots, Flaky CI Just one pipeline Builds everything Copy of pipeline per repo Build only service that changed CD Which service to deploy? Deploy service that changed Initial Velocity Fast Slow... Long Term Velocity Slow down over time Long builds Faster

Slide 28

Slide 28 text

28 @saturnism @gcpcloud We still chose Mono Repo... Team is already familiar with Mono Repo Fast ramp up and velocity Lack of existing infrastructure for dependency and artifact management Setting up one repo and pipeline was diﬃcult enough...

Slide 29

Slide 29 text

29 @saturnism @gcpcloud We do this analysis for everything Every service have their own database? gRPC or REST? Kafka? Knative?

Slide 30

Slide 30 text

30 @saturnism @gcpcloud 30 #5 Anticipate changes Choices made today is made. Design to expect changes tomorrow. Avoid one way doors.

Slide 31

Slide 31 text

31 @saturnism @gcpcloud Anticipate Multi Repo We anticipate to out grow the Mono Repo Make sure the Mono Repo is still splittable!

Slide 32

Slide 32 text

32 @saturnism @gcpcloud project/ +-- build.gradle +-- services/ +-- common.gradle +-- auth/ +-- user/ +-- email/ Project Structure

Slide 33

Slide 33 text

33 @saturnism @gcpcloud project/ +-- build.gradle +-- services/ +-- common.gradle +-- auth/ +-- src/main/proto/auth.proto +-- user/ +-- email/ Project Structure

Slide 34

Slide 34 text

34 @saturnism @gcpcloud apply from: '../common.gradle' group = 'com.example.services' mainClassName = 'com.example.services.auth dependencies { implementation project(':common:user') protobuf project(path: ':services:user, configuration: 'proto') }

Slide 35

Slide 35 text

35 @saturnism @gcpcloud Anticipate Multi Repo As the team grows, and new teams comes to take over services... Successfully split out 3 services from the Mono Repo

Slide 36

Slide 36 text

36 @saturnism @gcpcloud 36 #6 Focus on your application Architecture and design - it has nothing to do with Kubernetes If you design well, you can almost always deploy into Kubernetes 12factor.net

Slide 37

Slide 37 text

37 @saturnism @gcpcloud Adopt Carefully Anticipate Changes Microservices architecture is not the answer to everything Monolith works too, as long as it is designed well!

Slide 38

Slide 38 text

38 @saturnism @gcpcloud 38 #7 Local != Production Do not bring Slide 10 to local development Focus on velocity of well-designed application Rely on self-encapsulating unit/integration tests

Slide 39

Slide 39 text

39 @saturnism @gcpcloud Why not Istio Locally? A lot to learn and troubleshoot Use less compute resource

Slide 40

Slide 40 text

40 @saturnism @gcpcloud Test Locally - without Kubernetes Unit tests Integrations tests Wiremocks Testcontainers

Slide 41

Slide 41 text

41 @saturnism @gcpcloud If you need to test something... Simple Envoy Proxy Local Kubernetes (k3s, minikube, ...)

Slide 42

Slide 42 text

42 @saturnism @gcpcloud Cloud Code / Skaffold After you test everything… want to see end-to-end result Continuous development loop

Slide 43

Slide 43 text

43 @saturnism @gcpcloud Kustomize Every environment is different Single source of truth Kustomize for different environments

Slide 44

Slide 44 text

44 @saturnism @gcpcloud 44 #8 Friends don't let friends ______ Write Dockerﬁles or YAML!

Slide 45

Slide 45 text

45 @saturnism @gcpcloud Just jib It Or pack it

Slide 46

Slide 46 text

46 @saturnism @gcpcloud kubectl create deployment myservice --image=... --dry-run -oyaml > k8s/deployment.yaml kubectl create svc clusterip myservice --tcp=8080:8080 --dry-run -oyaml > k8s/service.yaml

Slide 47

Slide 47 text

47 @saturnism @gcpcloud Automate best practices whenever possible

Slide 48

Slide 48 text

48 @saturnism @gcpcloud In fact, automate the entire platform!

Slide 49

Slide 49 text

49 @saturnism @gcpcloud 49 #9 Contracts with the runtime environment When is your application ready to serve traﬃc? When is it in trouble? How do you shutdown gracefully?

Slide 50

Slide 50 text

50 @saturnism @gcpcloud Resources Resource Request and Resource Limits

Slide 51

Slide 51 text

51 @saturnism @gcpcloud When to use? Failure Means... Practices Example Liveness Probe If application is alive. Application will be restarted, and that a restart will help recover. Runs on serving port of the application, e.g., 8080. Don't check dependency. E.g., don't check dependent database connection, etc. A simple /alive URL that returns 200. Readiness Probe Ready to serve requests. Take the pod instance out of load balancer. Flip to ready when application has done all the initializations (cache preloaded). Upon SIGTERM, flip readiness to false. See Graceful Shutdown. /actuator/health on the management port.

Slide 52

Slide 52 text

52 @saturnism @gcpcloud Anatomy of a Graceful Shutdown 1. Receive SIGTERM or PreStop Lifecycle Hook 2. Fail Readiness Probe 3. Receive requests until Kubernetes detects readiness probe failure 4. Kubernetes removes pod endpoint from Service 5. Finish serving in-ﬂight requests 6. Shutdown

Slide 53

Slide 53 text

53 @saturnism @gcpcloud 53 We now speak the same language

Slide 54

Slide 54 text

54 @saturnism @gcpcloud All the cross-cutting concerns are the same Monolith, microservices, Kubernetes, not Kubernetes... But the 2 teams now speak with the same nouns: Deployment, Service, Ingress, Virtual Service, ...

Slide 55

Slide 55 text

55 @saturnism @gcpcloud 55 Thanks! saturnism.me/talk/kubernetes-microservices-lessons-learned/ bit.ly/k8s-lab | bit.ly/istio-lab | bit.ly/spring-gcp-lab gcplab.me/spring @saturnism | saturnism.me