Slide 1

Slide 1 text

Microservices On GKE At Mercari GCPUG Tokyo Kubernetes Engine Day @deeeet

Slide 2

Slide 2 text

@deeeet

Slide 3

Slide 3 text

Background

Slide 4

Slide 4 text

Start with Monolith

Slide 5

Slide 5 text

Small Overhead for cross domains Reusable code across domains 
 Effective operation by SRE team

Slide 6

Slide 6 text

3 scalabilities

Slide 7

Slide 7 text

Growth of business Growth of features Growth of organization

Slide 8

Slide 8 text

Growth of business Growth of features Growth of organization

Slide 9

Slide 9 text

Growth of business Growth of features Growth of organization

Slide 10

Slide 10 text

Huge Monolith

Slide 11

Slide 11 text

Difficult to understand change effect Difficult to test Difficult to on-board Difficult to isolate failure Difficult to scale independently Difficult to try new technologies

Slide 12

Slide 12 text

Growth of business Growth of features Growth of organization

Slide 13

Slide 13 text

Unclear ownership Communication overhead

Slide 14

Slide 14 text

Velocity is stalled ☔

Slide 15

Slide 15 text

Microservices

Slide 16

Slide 16 text

Microservices is a software development technique that structures an application as a collection of loosely coupled services with the smallest autonomous boundary.

Slide 17

Slide 17 text

Technical benefit Organization benefit

Slide 18

Slide 18 text

Technical benefit Organization benefit

Slide 19

Slide 19 text

Easy to test Easy to deploy Easy to on-board Easy to isolate failure Easy to scale independently

Slide 20

Slide 20 text

Technical benefit Organization benefit

Slide 21

Slide 21 text

Clear ownership Minimum communication overhead

Slide 22

Slide 22 text

Deliver new features faster ☀

Slide 23

Slide 23 text

How Microservices?

Slide 24

Slide 24 text

Gateway pattern Strangler pattern

Slide 25

Slide 25 text

Gateway pattern Strangler pattern

Slide 26

Slide 26 text

Service A Service B Mercari API

Slide 27

Slide 27 text

API Gateway Service A Service B Mercari API

Slide 28

Slide 28 text

API Gateway Service A Service B Service X Mercari API

Slide 29

Slide 29 text

API Gateway Service A Service B Service X Multiple services on a single endpoint SSL Termination DDoS Protection Common AuthZ/AuthN Mercari API

Slide 30

Slide 30 text

Gateway pattern Strangler pattern

Slide 31

Slide 31 text

Mercari API API Gateway Service A Service B Service X

Slide 32

Slide 32 text

Mercari API API Gateway Service B Service X Service A

Slide 33

Slide 33 text

Mercari API API Gateway Service X Service A Service B

Slide 34

Slide 34 text

Mercari API API Gateway Function X Function Y Function Z Service C

Slide 35

Slide 35 text

Mercari API API Gateway Function X Facade C Function Y Function Z Service C

Slide 36

Slide 36 text

Mercari API API Gateway Facade C Function Y Function Z Service C Function X

Slide 37

Slide 37 text

Mercari API API Gateway Facade C Function Z Service C Function X Function Y

Slide 38

Slide 38 text

Mercari API API Gateway Facade C Service C Function X Function Y Function Z

Slide 39

Slide 39 text

Mercari API API Gateway Service C Function X Function Y Function Z

Slide 40

Slide 40 text

Mercari API API Gateway Service C Function X Function Y Service D Function Z

Slide 41

Slide 41 text

Current Status

Slide 42

Slide 42 text

API Gateway Service A Service B Service X Mercari API

Slide 43

Slide 43 text

Technical Stack

Slide 44

Slide 44 text

API Gateway Authority Service A Service B Sakura Service X Mercari API

Slide 45

Slide 45 text

API Gateway Google Cloud Load balancing Authority Service A Service B Sakura Service X Mercari API GCP Kubernetes Engine

Slide 46

Slide 46 text

API Gateway Google Cloud Load balancing Authority Service A Service B Sakura Service X Mercari API GCP Kubernetes Engine Cloud Resources Managed Services

Slide 47

Slide 47 text

API Gateway Google Cloud Load balancing Authority Service A Service B Sakura Service X Mercari API GCP Kubernetes Engine Cloud Resources Managed Services Container

Slide 48

Slide 48 text

API Gateway Google Cloud Load balancing Authority Service A Service B Sakura Service X Mercari API GCP Kubernetes Engine Cloud Resources Managed Services Container Over HTTP

Slide 49

Slide 49 text

API Gateway Google Cloud Load balancing Authority Service A Service B Sakura Service X Mercari API GCP Kubernetes Engine Cloud Resources Managed Services Container Over HTTP SSL Termination DDoS Protection Cloud Amor?

Slide 50

Slide 50 text

API Gateway Google Cloud Load balancing Authority Service A Service B Sakura Service X Mercari API GCP Kubernetes Engine Cloud Resources Managed Services Container Over HTTP Routing to microservices Protocol tranformation (HTTP to gRPC) Common logging & Tracing Request buffering SSL Termination DDoS Protection Cloud Amor?

Slide 51

Slide 51 text

API Gateway Google Cloud Load balancing Authority Service A Service B Sakura Service X Mercari API GCP Kubernetes Engine Cloud Resources Managed Services Container Over HTTP Routing to microservices Protocol tranformation (HTTP to gRPC) Common logging & Tracing Request buffering SSL Termination DDoS Protection Cloud Amor? Common AuthZ/AuthN

Slide 52

Slide 52 text

API Gateway Google Cloud Load balancing Authority Service A Service B Sakura Service X Mercari API GCP Kubernetes Engine Cloud Resources Managed Services Container Over HTTP Routing to microservices Protocol tranformation (HTTP to gRPC) Common logging & Tracing Request buffering SSL Termination DDoS Protection Cloud Amor? Common AuthZ/AuthN Managed DB

Slide 53

Slide 53 text

No content

Slide 54

Slide 54 text

No content

Slide 55

Slide 55 text

No content

Slide 56

Slide 56 text

No content

Slide 57

Slide 57 text

No content

Slide 58

Slide 58 text

No content

Slide 59

Slide 59 text

Another important takeaway is that even though all of these listed items are important, ultimately the most critical thing is observability. As I like to say: observability, observability, observability - Matt Klein, Seeking SRE (Chapter6)

Slide 60

Slide 60 text

Service A Service B Network Logging? Tracing? (Observability) Network Logging? Tracing? (Observability)

Slide 61

Slide 61 text

Service A Service B Network AuthN and AuthZ? API limit ? Load balancing ? Request timeout ? Request retry with backoff? Circuit breaking ? Logging? Tracing? (Observability) Network Logging? Tracing? (Observability)

Slide 62

Slide 62 text

Service A Service B Network AuthN and AuthZ? API limit ? Load balancing ? Request timeout ? Request retry with backoff? Circuit breaking ? Logging? Tracing? (Observability) Network Logging? Tracing? (Observability) Different protocols..

Slide 63

Slide 63 text

Service A Service B Service C Service D

Slide 64

Slide 64 text

Service A Service B Service C Service D Se Se Se

Slide 65

Slide 65 text

No content

Slide 66

Slide 66 text

No content

Slide 67

Slide 67 text

No content

Slide 68

Slide 68 text

How we use GCP?

Slide 69

Slide 69 text

API Gateway Google Cloud Load balancing Authority Service X GCP Kubernetes Engine

Slide 70

Slide 70 text

API Gateway Google Cloud Load balancing Authority Service X GCP Kubernetes Engine How we use GKE?

Slide 71

Slide 71 text

Cluster strategy GCP project strategy Node pool strategy Namespace strategy

Slide 72

Slide 72 text

Cluster strategy GCP project strategy Node pool strategy Namespace strategy

Slide 73

Slide 73 text

asia-northeast1 us-west1 europe-west1 Each region has its own Cluster

Slide 74

Slide 74 text

Production Cluster Development Cluster Testing/QA will be done in development cluster All services in 1 cluster No special cluster for specific service

Slide 75

Slide 75 text

Production Cluster In future, 1 region 1 cluster like Google Borg

Slide 76

Slide 76 text

Cluster strategy GCP project strategy Node pool strategy Namespace strategy

Slide 77

Slide 77 text

GCP project: GKE Production Production Cluster GCP project: GKE Development Development Cluster IAM: SRE IAM: SRE + α 1 cluster for 1 GCP project Only SRE can access cluster nodes

Slide 78

Slide 78 text

Cluster strategy GCP project strategy Node pool strategy Namespace strategy

Slide 79

Slide 79 text

GCP project: GKE Production Production Cluster n1-standard-16 node pool n1-highmem-16 node pool Machine learning workloads Normal applications Auto scaling Enabled Automatic node repair Enabled Preemptible Enabled (only in US)

Slide 80

Slide 80 text

Cluster strategy GCP project strategy Node pool strategy Namespace strategy

Slide 81

Slide 81 text

Each services has its own kubernetes namespace GCP project: GKE Production Namespace: Service A Pod: A Pod: A Pod: A Namespace: Service B Pod: B Pod: B Production Cluster RBAC: Team X RBAC: Team X Each team can only access its own kubernetes namespace

Slide 82

Slide 82 text

API Gateway Google Cloud Load balancing Authority Service X GCP Kubernetes Engine How we use GCP services?

Slide 83

Slide 83 text

How access limit GCP services? Each service should be allowed to access only its own GCP resources

Slide 84

Slide 84 text

No content

Slide 85

Slide 85 text

GCP project: GKE Production IAM: SRE Namespace: Service A Pod: A Pod: A Pod: A Namespace: Service B Pod: B Pod: B Production Cluster RBAC: Team X RBAC: Team Y

Slide 86

Slide 86 text

GCP project: GKE Production IAM: SRE Namespace: Service A Pod: A Pod: A Pod: A Namespace: Service B Pod: B Pod: B GCP project: Service A IAM: Team X + SRE GCP project: Service B IAM: Team Y + SRE Production Cluster Each services has its own GCP project RBAC: Team X RBAC: Team Y

Slide 87

Slide 87 text

GCP project: GKE Production IAM: SRE Namespace: Service A Pod: A Pod: A Pod: A Namespace: Service B Pod: B Pod: B GCP project: Service A IAM: Team X + SRE Cloud SQL GCP project: Service B Spanner IAM: Team Y + SRE Production Cluster Each services has its own GCP project RBAC: Team X RBAC: Team Y Service resources in its own GCP project

Slide 88

Slide 88 text

GCP project: GKE Production IAM: SRE Namespace: Service A Pod: A Pod: A Pod: A Namespace: Service B Pod: B Pod: B GCP project: Service A IAM: Team X + SRE Cloud SQL GCP project: Service B Spanner IAM: Team Y + SRE Production Cluster Each services has its own GCP project Each namespace has its own service account for its own GCP project RBAC: Team X RBAC: Team Y Service resources in its own GCP project

Slide 89

Slide 89 text

Each namespace has its own service account

Slide 90

Slide 90 text

GCP project: GKE Production IAM: SRE Namespace: Service A RBAC: Team X Pod: A Pod: A Pod: A Namespace: Service B RBAC: Team Y Pod: B Pod: B GCP project: Service A IAM: Team X + SRE Cloud SQL GCP project: Service B Spanner IAM: Team Y + SRE Production Cluster Each services has its own GCP project Each namespace has its own service account for its own GCP project Service resources in its own GCP project

Slide 91

Slide 91 text

IAM: SRE Namespace: Service A RBAC: Team X Pod: A Pod: A Pod: A Namespace: Service B RBAC: Team Y Pod: B Pod: B GCP project: Service A IAM: Team X + SRE Cloud SQL GCP project: Service B Spanner IAM: Team Y + SRE Production Cluster GCP project creation…? Setup Spanner or Cloud SQL ..? GCP project: GKE Production

Slide 92

Slide 92 text

Infrastructure as Code

Slide 93

Slide 93 text

No content

Slide 94

Slide 94 text

CloudSQL instance creation

Slide 95

Slide 95 text

Spanner instance creation

Slide 96

Slide 96 text

mercari / microservices-terraform Private

Slide 97

Slide 97 text

Just create a PR to create new GCP project

Slide 98

Slide 98 text

Terraform plan on CI

Slide 99

Slide 99 text

Terraform apply on CI

Slide 100

Slide 100 text

Tool for notifying terraform result is open sourced https://github.com/mercari/tfnotify Terraform apply on CI

Slide 101

Slide 101 text

Common part (GCP project creation, Pagerduty setup) can be bootstrapped

Slide 102

Slide 102 text

IAM: SRE Namespace: Service A RBAC: Team X Pod: A Pod: A Pod: A Namespace: Service B RBAC: Team Y Pod: B Pod: B GCP project: Service A IAM: Team X + SRE Cloud SQL GCP project: Service B Spanner IAM: Team Y + SRE Production Cluster Stackdriver GCP project: GKE Production

Slide 103

Slide 103 text

IAM: SRE Namespace: Service A RBAC: Team X Pod: A Pod: A Pod: A Namespace: Service B RBAC: Team Y Pod: B Pod: B GCP project: Service A IAM: Team X + SRE Cloud SQL GCP project: Service B Spanner IAM: Team Y + SRE Production Cluster Logging…? Stackdriver GCP project: GKE Production

Slide 104

Slide 104 text

How access limit stackdriver logging? Each team should be allowed to access only its service log

Slide 105

Slide 105 text

No content

Slide 106

Slide 106 text

IAM: SRE Namespace: Service A RBAC: Team X Pod: A Pod: A Pod: A Namespace: Service B RBAC: Team Y Pod: B Pod: B GCP project: Service A IAM: Team X + SRE Cloud SQL GCP project: Service B Spanner IAM: Team Y + SRE Production Cluster Logging…? Stackdriver GCP project: GKE Production

Slide 107

Slide 107 text

IAM: SRE Namespace: Service A RBAC: Team X Pod: A Pod: A Pod: A Namespace: Service B RBAC: Team Y Pod: B Pod: B GCP project: Service A IAM: Team X + SRE Cloud SQL GCP project: Service B Spanner IAM: Team Y + SRE Production Cluster Stackdriver Big Query Big Query GCP project: GKE Production Create BQ for each services

Slide 108

Slide 108 text

IAM: SRE Namespace: Service A RBAC: Team X Pod: A Pod: A Pod: A Namespace: Service B RBAC: Team Y Pod: B Pod: B GCP project: Service A IAM: Team X + SRE Cloud SQL GCP project: Service B Spanner IAM: Team Y + SRE Production Cluster Create BQ sink for each services Stackdriver Big Query Big Query sink sink GCP project: GKE Production Create BQ for each services

Slide 109

Slide 109 text

BigQuery sink creation

Slide 110

Slide 110 text

No content

Slide 111

Slide 111 text

GCP and k8s Ecosystem

Slide 112

Slide 112 text

Just create ingress it automatically creates DNS records with Cloud DNS

Slide 113

Slide 113 text

Disaster Recovering Take backups of your cluster and restore in case of loss. with Cloud Storage

Slide 114

Slide 114 text

Non GCP?

Slide 115

Slide 115 text

Notification or Integration with GitHub vs. Container Builder

Slide 116

Slide 116 text

Integration with external services like CDN or AWS vs. Stackdriver monitoring

Slide 117

Slide 117 text

vs. Stackdriver error report Notification and Integration with GitHub

Slide 118

Slide 118 text

vs. ?? GCP does not have chaos as a service

Slide 119

Slide 119 text

Conclusion

Slide 120

Slide 120 text

Mercari ❤

Slide 121

Slide 121 text

@deeeet