Microservices on GKE at Mercari

Ecb3acc2d246962361a4f8b3f7a6dd12?s=47 taichi nakashima
April 26, 2018
4.7k

Microservices on GKE at Mercari

Ecb3acc2d246962361a4f8b3f7a6dd12?s=128

taichi nakashima

April 26, 2018
Tweet

Transcript

  1. Microservices On GKE At Mercari GCPUG Tokyo Kubernetes Engine Day

    @deeeet
  2. @deeeet

  3. Background

  4. Start with Monolith

  5. Small Overhead for cross domains Reusable code across domains 


    Effective operation by SRE team
  6. 3 scalabilities

  7. Growth of business Growth of features Growth of organization

  8. Growth of business Growth of features Growth of organization

  9. Growth of business Growth of features Growth of organization

  10. Huge Monolith

  11. Difficult to understand change effect Difficult to test Difficult to

    on-board Difficult to isolate failure Difficult to scale independently Difficult to try new technologies
  12. Growth of business Growth of features Growth of organization

  13. Unclear ownership Communication overhead

  14. Velocity is stalled ☔

  15. Microservices

  16. Microservices is a software development technique that structures an application

    as a collection of loosely coupled services with the smallest autonomous boundary.
  17. Technical benefit Organization benefit

  18. Technical benefit Organization benefit

  19. Easy to test Easy to deploy Easy to on-board Easy

    to isolate failure Easy to scale independently
  20. Technical benefit Organization benefit

  21. Clear ownership Minimum communication overhead

  22. Deliver new features faster ☀

  23. How Microservices?

  24. Gateway pattern Strangler pattern

  25. Gateway pattern Strangler pattern

  26. Service A Service B Mercari API

  27. API Gateway Service A Service B Mercari API

  28. API Gateway Service A Service B Service X Mercari API

  29. API Gateway Service A Service B Service X Multiple services

    on a single endpoint SSL Termination DDoS Protection Common AuthZ/AuthN Mercari API
  30. Gateway pattern Strangler pattern

  31. Mercari API API Gateway Service A Service B Service X

  32. Mercari API API Gateway Service B Service X Service A

  33. Mercari API API Gateway Service X Service A Service B

  34. Mercari API API Gateway Function X Function Y Function Z

    Service C
  35. Mercari API API Gateway Function X Facade C Function Y

    Function Z Service C
  36. Mercari API API Gateway Facade C Function Y Function Z

    Service C Function X
  37. Mercari API API Gateway Facade C Function Z Service C

    Function X Function Y
  38. Mercari API API Gateway Facade C Service C Function X

    Function Y Function Z
  39. Mercari API API Gateway Service C Function X Function Y

    Function Z
  40. Mercari API API Gateway Service C Function X Function Y

    Service D Function Z
  41. Current Status

  42. API Gateway Service A Service B Service X Mercari API

  43. Technical Stack

  44. API Gateway Authority Service A Service B Sakura Service X

    Mercari API
  45. API Gateway Google Cloud Load balancing Authority Service A Service

    B Sakura Service X Mercari API GCP Kubernetes Engine
  46. API Gateway Google Cloud Load balancing Authority Service A Service

    B Sakura Service X Mercari API GCP Kubernetes Engine Cloud Resources Managed Services
  47. API Gateway Google Cloud Load balancing Authority Service A Service

    B Sakura Service X Mercari API GCP Kubernetes Engine Cloud Resources Managed Services Container
  48. API Gateway Google Cloud Load balancing Authority Service A Service

    B Sakura Service X Mercari API GCP Kubernetes Engine Cloud Resources Managed Services Container Over HTTP
  49. API Gateway Google Cloud Load balancing Authority Service A Service

    B Sakura Service X Mercari API GCP Kubernetes Engine Cloud Resources Managed Services Container Over HTTP SSL Termination DDoS Protection Cloud Amor?
  50. API Gateway Google Cloud Load balancing Authority Service A Service

    B Sakura Service X Mercari API GCP Kubernetes Engine Cloud Resources Managed Services Container Over HTTP Routing to microservices Protocol tranformation (HTTP to gRPC) Common logging & Tracing Request buffering SSL Termination DDoS Protection Cloud Amor?
  51. API Gateway Google Cloud Load balancing Authority Service A Service

    B Sakura Service X Mercari API GCP Kubernetes Engine Cloud Resources Managed Services Container Over HTTP Routing to microservices Protocol tranformation (HTTP to gRPC) Common logging & Tracing Request buffering SSL Termination DDoS Protection Cloud Amor? Common AuthZ/AuthN
  52. API Gateway Google Cloud Load balancing Authority Service A Service

    B Sakura Service X Mercari API GCP Kubernetes Engine Cloud Resources Managed Services Container Over HTTP Routing to microservices Protocol tranformation (HTTP to gRPC) Common logging & Tracing Request buffering SSL Termination DDoS Protection Cloud Amor? Common AuthZ/AuthN Managed DB
  53. None
  54. None
  55. None
  56. None
  57. None
  58. None
  59. Another important takeaway is that even though all of these

    listed items are important, ultimately the most critical thing is observability. As I like to say: observability, observability, observability - Matt Klein, Seeking SRE (Chapter6)
  60. Service A Service B Network Logging? Tracing? (Observability) Network Logging?

    Tracing? (Observability)
  61. Service A Service B Network AuthN and AuthZ? API limit

    ? Load balancing ? Request timeout ? Request retry with backoff? Circuit breaking ? Logging? Tracing? (Observability) Network Logging? Tracing? (Observability)
  62. Service A Service B Network AuthN and AuthZ? API limit

    ? Load balancing ? Request timeout ? Request retry with backoff? Circuit breaking ? Logging? Tracing? (Observability) Network Logging? Tracing? (Observability) Different protocols..
  63. Service A Service B Service C Service D

  64. Service A Service B Service C Service D Se Se

    Se
  65. None
  66. None
  67. None
  68. How we use GCP?

  69. API Gateway Google Cloud Load balancing Authority Service X GCP

    Kubernetes Engine
  70. API Gateway Google Cloud Load balancing Authority Service X GCP

    Kubernetes Engine How we use GKE?
  71. Cluster strategy GCP project strategy Node pool strategy Namespace strategy

  72. Cluster strategy GCP project strategy Node pool strategy Namespace strategy

  73. asia-northeast1 us-west1 europe-west1 Each region has its own Cluster

  74. Production Cluster Development Cluster Testing/QA will be done in development

    cluster All services in 1 cluster No special cluster for specific service
  75. Production Cluster In future, 1 region 1 cluster like Google

    Borg
  76. Cluster strategy GCP project strategy Node pool strategy Namespace strategy

  77. GCP project: GKE Production Production Cluster GCP project: GKE Development

    Development Cluster IAM: SRE IAM: SRE + α 1 cluster for 1 GCP project Only SRE can access cluster nodes
  78. Cluster strategy GCP project strategy Node pool strategy Namespace strategy

  79. GCP project: GKE Production Production Cluster n1-standard-16 node pool n1-highmem-16

    node pool Machine learning workloads Normal applications Auto scaling Enabled Automatic node repair Enabled Preemptible Enabled (only in US)
  80. Cluster strategy GCP project strategy Node pool strategy Namespace strategy

  81. Each services has its own kubernetes namespace GCP project: GKE

    Production Namespace: Service A Pod: A Pod: A Pod: A Namespace: Service B Pod: B Pod: B Production Cluster RBAC: Team X RBAC: Team X Each team can only access its own kubernetes namespace
  82. API Gateway Google Cloud Load balancing Authority Service X GCP

    Kubernetes Engine How we use GCP services?
  83. How access limit GCP services? Each service should be allowed

    to access only its own GCP resources
  84. None
  85. GCP project: GKE Production IAM: SRE Namespace: Service A Pod:

    A Pod: A Pod: A Namespace: Service B Pod: B Pod: B Production Cluster RBAC: Team X RBAC: Team Y
  86. GCP project: GKE Production IAM: SRE Namespace: Service A Pod:

    A Pod: A Pod: A Namespace: Service B Pod: B Pod: B GCP project: Service A IAM: Team X + SRE GCP project: Service B IAM: Team Y + SRE Production Cluster Each services has its own GCP project RBAC: Team X RBAC: Team Y
  87. GCP project: GKE Production IAM: SRE Namespace: Service A Pod:

    A Pod: A Pod: A Namespace: Service B Pod: B Pod: B GCP project: Service A IAM: Team X + SRE Cloud SQL GCP project: Service B Spanner IAM: Team Y + SRE Production Cluster Each services has its own GCP project RBAC: Team X RBAC: Team Y Service resources in its own GCP project
  88. GCP project: GKE Production IAM: SRE Namespace: Service A Pod:

    A Pod: A Pod: A Namespace: Service B Pod: B Pod: B GCP project: Service A IAM: Team X + SRE Cloud SQL GCP project: Service B Spanner IAM: Team Y + SRE Production Cluster Each services has its own GCP project Each namespace has its own service account for its own GCP project RBAC: Team X RBAC: Team Y Service resources in its own GCP project
  89. Each namespace has its own service account

  90. GCP project: GKE Production IAM: SRE Namespace: Service A RBAC:

    Team X Pod: A Pod: A Pod: A Namespace: Service B RBAC: Team Y Pod: B Pod: B GCP project: Service A IAM: Team X + SRE Cloud SQL GCP project: Service B Spanner IAM: Team Y + SRE Production Cluster Each services has its own GCP project Each namespace has its own service account for its own GCP project Service resources in its own GCP project
  91. IAM: SRE Namespace: Service A RBAC: Team X Pod: A

    Pod: A Pod: A Namespace: Service B RBAC: Team Y Pod: B Pod: B GCP project: Service A IAM: Team X + SRE Cloud SQL GCP project: Service B Spanner IAM: Team Y + SRE Production Cluster GCP project creation…? Setup Spanner or Cloud SQL ..? GCP project: GKE Production
  92. Infrastructure as Code

  93. None
  94. CloudSQL instance creation

  95. Spanner instance creation

  96. mercari / microservices-terraform Private

  97. Just create a PR to create new GCP project

  98. Terraform plan on CI

  99. Terraform apply on CI

  100. Tool for notifying terraform result is open sourced https://github.com/mercari/tfnotify Terraform

    apply on CI
  101. Common part (GCP project creation, Pagerduty setup) can be bootstrapped

  102. IAM: SRE Namespace: Service A RBAC: Team X Pod: A

    Pod: A Pod: A Namespace: Service B RBAC: Team Y Pod: B Pod: B GCP project: Service A IAM: Team X + SRE Cloud SQL GCP project: Service B Spanner IAM: Team Y + SRE Production Cluster Stackdriver GCP project: GKE Production
  103. IAM: SRE Namespace: Service A RBAC: Team X Pod: A

    Pod: A Pod: A Namespace: Service B RBAC: Team Y Pod: B Pod: B GCP project: Service A IAM: Team X + SRE Cloud SQL GCP project: Service B Spanner IAM: Team Y + SRE Production Cluster Logging…? Stackdriver GCP project: GKE Production
  104. How access limit stackdriver logging? Each team should be allowed

    to access only its service log
  105. None
  106. IAM: SRE Namespace: Service A RBAC: Team X Pod: A

    Pod: A Pod: A Namespace: Service B RBAC: Team Y Pod: B Pod: B GCP project: Service A IAM: Team X + SRE Cloud SQL GCP project: Service B Spanner IAM: Team Y + SRE Production Cluster Logging…? Stackdriver GCP project: GKE Production
  107. IAM: SRE Namespace: Service A RBAC: Team X Pod: A

    Pod: A Pod: A Namespace: Service B RBAC: Team Y Pod: B Pod: B GCP project: Service A IAM: Team X + SRE Cloud SQL GCP project: Service B Spanner IAM: Team Y + SRE Production Cluster Stackdriver Big Query Big Query GCP project: GKE Production Create BQ for each services
  108. IAM: SRE Namespace: Service A RBAC: Team X Pod: A

    Pod: A Pod: A Namespace: Service B RBAC: Team Y Pod: B Pod: B GCP project: Service A IAM: Team X + SRE Cloud SQL GCP project: Service B Spanner IAM: Team Y + SRE Production Cluster Create BQ sink for each services Stackdriver Big Query Big Query sink sink GCP project: GKE Production Create BQ for each services
  109. BigQuery sink creation

  110. None
  111. GCP and k8s Ecosystem

  112. Just create ingress it automatically creates DNS records with Cloud

    DNS
  113. Disaster Recovering Take backups of your cluster and restore in

    case of loss. with Cloud Storage
  114. Non GCP?

  115. Notification or Integration with GitHub vs. Container Builder

  116. Integration with external services like CDN or AWS vs. Stackdriver

    monitoring
  117. vs. Stackdriver error report Notification and Integration with GitHub

  118. vs. ?? GCP does not have chaos as a service

  119. Conclusion

  120. Mercari ❤

  121. @deeeet