$30 off During Our Annual Pro Sale. View Details »

Microservices on GKE at Mercari

taichi nakashima
April 26, 2018
5.7k

Microservices on GKE at Mercari

taichi nakashima

April 26, 2018
Tweet

Transcript

  1. Microservices
    On GKE At Mercari
    GCPUG Tokyo Kubernetes Engine Day
    @deeeet

    View Slide

  2. @deeeet

    View Slide

  3. Background

    View Slide

  4. Start with Monolith

    View Slide

  5. Small Overhead for cross domains
    Reusable code across domains 

    Effective operation by SRE team

    View Slide

  6. 3 scalabilities

    View Slide

  7. Growth of business
    Growth of features
    Growth of organization

    View Slide

  8. Growth of business
    Growth of features
    Growth of organization

    View Slide

  9. Growth of business
    Growth of features
    Growth of organization

    View Slide

  10. Huge Monolith

    View Slide

  11. Difficult to understand change effect
    Difficult to test
    Difficult to on-board
    Difficult to isolate failure
    Difficult to scale independently
    Difficult to try new technologies

    View Slide

  12. Growth of business
    Growth of features
    Growth of organization

    View Slide

  13. Unclear ownership
    Communication overhead

    View Slide

  14. Velocity is stalled ☔

    View Slide

  15. Microservices

    View Slide

  16. Microservices is a software development technique that structures an
    application as a collection of loosely coupled services with the
    smallest autonomous boundary.

    View Slide

  17. Technical benefit
    Organization benefit

    View Slide

  18. Technical benefit
    Organization benefit

    View Slide

  19. Easy to test
    Easy to deploy
    Easy to on-board
    Easy to isolate failure
    Easy to scale independently

    View Slide

  20. Technical benefit
    Organization benefit

    View Slide

  21. Clear ownership
    Minimum communication overhead

    View Slide

  22. Deliver new features faster ☀

    View Slide

  23. How Microservices?

    View Slide

  24. Gateway pattern
    Strangler pattern

    View Slide

  25. Gateway pattern
    Strangler pattern

    View Slide

  26. Service A
    Service B
    Mercari API

    View Slide

  27. API Gateway
    Service A
    Service B
    Mercari API

    View Slide

  28. API Gateway
    Service A
    Service B
    Service X
    Mercari API

    View Slide

  29. API Gateway
    Service A
    Service B
    Service X
    Multiple services on a single endpoint
    SSL Termination
    DDoS Protection
    Common AuthZ/AuthN
    Mercari API

    View Slide

  30. Gateway pattern
    Strangler pattern

    View Slide

  31. Mercari API
    API Gateway
    Service A
    Service B
    Service X

    View Slide

  32. Mercari API
    API Gateway
    Service B
    Service X Service A

    View Slide

  33. Mercari API
    API Gateway
    Service X Service A Service B

    View Slide

  34. Mercari API
    API Gateway
    Function X
    Function Y
    Function Z
    Service C

    View Slide

  35. Mercari API
    API Gateway
    Function X
    Facade C
    Function Y
    Function Z
    Service C

    View Slide

  36. Mercari API
    API Gateway
    Facade C
    Function Y
    Function Z
    Service C
    Function X

    View Slide

  37. Mercari API
    API Gateway
    Facade C
    Function Z
    Service C
    Function X
    Function Y

    View Slide

  38. Mercari API
    API Gateway
    Facade C
    Service C
    Function X
    Function Y
    Function Z

    View Slide

  39. Mercari API
    API Gateway
    Service C
    Function X
    Function Y
    Function Z

    View Slide

  40. Mercari API
    API Gateway
    Service C
    Function X
    Function Y
    Service D
    Function Z

    View Slide

  41. Current Status

    View Slide

  42. API Gateway
    Service A
    Service B
    Service X
    Mercari API

    View Slide

  43. Technical Stack

    View Slide

  44. API Gateway
    Authority
    Service A
    Service B
    Sakura
    Service X
    Mercari API

    View Slide

  45. API Gateway
    Google Cloud Load balancing
    Authority
    Service A
    Service B
    Sakura
    Service X
    Mercari API
    GCP
    Kubernetes Engine

    View Slide

  46. API Gateway
    Google Cloud Load balancing
    Authority
    Service A
    Service B
    Sakura
    Service X
    Mercari API
    GCP
    Kubernetes Engine
    Cloud Resources
    Managed Services

    View Slide

  47. API Gateway
    Google Cloud Load balancing
    Authority
    Service A
    Service B
    Sakura
    Service X
    Mercari API
    GCP
    Kubernetes Engine
    Cloud Resources
    Managed Services
    Container

    View Slide

  48. API Gateway
    Google Cloud Load balancing
    Authority
    Service A
    Service B
    Sakura
    Service X
    Mercari API
    GCP
    Kubernetes Engine
    Cloud Resources
    Managed Services
    Container
    Over HTTP

    View Slide

  49. API Gateway
    Google Cloud Load balancing
    Authority
    Service A
    Service B
    Sakura
    Service X
    Mercari API
    GCP
    Kubernetes Engine
    Cloud Resources
    Managed Services
    Container
    Over HTTP
    SSL Termination
    DDoS Protection
    Cloud Amor?

    View Slide

  50. API Gateway
    Google Cloud Load balancing
    Authority
    Service A
    Service B
    Sakura
    Service X
    Mercari API
    GCP
    Kubernetes Engine
    Cloud Resources
    Managed Services
    Container
    Over HTTP
    Routing to microservices
    Protocol tranformation (HTTP to gRPC)
    Common logging & Tracing
    Request buffering
    SSL Termination
    DDoS Protection
    Cloud Amor?

    View Slide

  51. API Gateway
    Google Cloud Load balancing
    Authority
    Service A
    Service B
    Sakura
    Service X
    Mercari API
    GCP
    Kubernetes Engine
    Cloud Resources
    Managed Services
    Container
    Over HTTP
    Routing to microservices
    Protocol tranformation (HTTP to gRPC)
    Common logging & Tracing
    Request buffering
    SSL Termination
    DDoS Protection
    Cloud Amor?
    Common AuthZ/AuthN

    View Slide

  52. API Gateway
    Google Cloud Load balancing
    Authority
    Service A
    Service B
    Sakura
    Service X
    Mercari API
    GCP
    Kubernetes Engine
    Cloud Resources
    Managed Services
    Container
    Over HTTP
    Routing to microservices
    Protocol tranformation (HTTP to gRPC)
    Common logging & Tracing
    Request buffering
    SSL Termination
    DDoS Protection
    Cloud Amor?
    Common AuthZ/AuthN
    Managed DB

    View Slide

  53. View Slide

  54. View Slide

  55. View Slide

  56. View Slide

  57. View Slide

  58. View Slide

  59. Another important takeaway is that even though all of these listed
    items are important, ultimately the most critical thing is observability.
    As I like to say: observability, observability, observability
    - Matt Klein, Seeking SRE (Chapter6)

    View Slide

  60. Service A Service B
    Network
    Logging? Tracing? (Observability)
    Network
    Logging? Tracing? (Observability)

    View Slide

  61. Service A Service B
    Network
    AuthN and AuthZ?
    API limit ?
    Load balancing ?
    Request timeout ?
    Request retry with backoff?
    Circuit breaking ?
    Logging? Tracing? (Observability)
    Network
    Logging? Tracing? (Observability)

    View Slide

  62. Service A Service B
    Network
    AuthN and AuthZ?
    API limit ?
    Load balancing ?
    Request timeout ?
    Request retry with backoff?
    Circuit breaking ?
    Logging? Tracing? (Observability)
    Network
    Logging? Tracing? (Observability)
    Different protocols..

    View Slide

  63. Service A Service B
    Service C
    Service D

    View Slide

  64. Service A Service B
    Service C
    Service D
    Se
    Se
    Se

    View Slide

  65. View Slide

  66. View Slide

  67. View Slide

  68. How we use GCP?

    View Slide

  69. API Gateway
    Google Cloud Load balancing
    Authority
    Service X
    GCP
    Kubernetes Engine

    View Slide

  70. API Gateway
    Google Cloud Load balancing
    Authority
    Service X
    GCP
    Kubernetes Engine
    How we use GKE?

    View Slide

  71. Cluster strategy
    GCP project strategy
    Node pool strategy
    Namespace strategy

    View Slide

  72. Cluster strategy
    GCP project strategy
    Node pool strategy
    Namespace strategy

    View Slide

  73. asia-northeast1
    us-west1
    europe-west1
    Each region has its own Cluster

    View Slide

  74. Production Cluster
    Development Cluster
    Testing/QA will be done in
    development cluster
    All services in 1 cluster
    No special cluster for specific service

    View Slide

  75. Production Cluster
    In future, 1 region 1 cluster
    like Google Borg

    View Slide

  76. Cluster strategy
    GCP project strategy
    Node pool strategy
    Namespace strategy

    View Slide

  77. GCP project: GKE Production
    Production Cluster
    GCP project: GKE Development
    Development Cluster
    IAM: SRE IAM: SRE + α
    1 cluster for 1 GCP project
    Only SRE can access cluster nodes

    View Slide

  78. Cluster strategy
    GCP project strategy
    Node pool strategy
    Namespace strategy

    View Slide

  79. GCP project: GKE Production
    Production Cluster
    n1-standard-16
    node pool
    n1-highmem-16
    node pool
    Machine learning workloads
    Normal applications
    Auto scaling Enabled
    Automatic node repair Enabled
    Preemptible Enabled (only in US)

    View Slide

  80. Cluster strategy
    GCP project strategy
    Node pool strategy
    Namespace strategy

    View Slide

  81. Each services has its own
    kubernetes namespace
    GCP project: GKE Production
    Namespace: Service A
    Pod: A Pod: A Pod: A
    Namespace: Service B
    Pod: B Pod: B
    Production Cluster
    RBAC: Team X
    RBAC: Team X
    Each team can only access
    its own kubernetes namespace

    View Slide

  82. API Gateway
    Google Cloud Load balancing
    Authority
    Service X
    GCP
    Kubernetes Engine
    How we use GCP services?

    View Slide

  83. How access limit GCP services?
    Each service should be allowed to access
    only its own GCP resources

    View Slide

  84. View Slide

  85. GCP project: GKE Production
    IAM: SRE
    Namespace: Service A
    Pod: A Pod: A Pod: A
    Namespace: Service B
    Pod: B Pod: B
    Production Cluster
    RBAC: Team X
    RBAC: Team Y

    View Slide

  86. GCP project: GKE Production
    IAM: SRE
    Namespace: Service A
    Pod: A Pod: A Pod: A
    Namespace: Service B
    Pod: B Pod: B
    GCP project: Service A
    IAM: Team X + SRE
    GCP project: Service B
    IAM: Team Y + SRE
    Production Cluster
    Each services has its own GCP project
    RBAC: Team X
    RBAC: Team Y

    View Slide

  87. GCP project: GKE Production
    IAM: SRE
    Namespace: Service A
    Pod: A Pod: A Pod: A
    Namespace: Service B
    Pod: B Pod: B
    GCP project: Service A
    IAM: Team X + SRE
    Cloud SQL
    GCP project: Service B
    Spanner
    IAM: Team Y + SRE
    Production Cluster
    Each services has its own GCP project
    RBAC: Team X
    RBAC: Team Y
    Service resources in
    its own GCP project

    View Slide

  88. GCP project: GKE Production
    IAM: SRE
    Namespace: Service A
    Pod: A Pod: A Pod: A
    Namespace: Service B
    Pod: B Pod: B
    GCP project: Service A
    IAM: Team X + SRE
    Cloud SQL
    GCP project: Service B
    Spanner
    IAM: Team Y + SRE
    Production Cluster
    Each services has its own GCP project
    Each namespace has its own service account
    for its own GCP project
    RBAC: Team X
    RBAC: Team Y
    Service resources in
    its own GCP project

    View Slide

  89. Each namespace has its own service account

    View Slide

  90. GCP project: GKE Production
    IAM: SRE
    Namespace: Service A
    RBAC: Team X
    Pod: A Pod: A Pod: A
    Namespace: Service B
    RBAC: Team Y
    Pod: B Pod: B
    GCP project: Service A
    IAM: Team X + SRE
    Cloud SQL
    GCP project: Service B
    Spanner
    IAM: Team Y + SRE
    Production Cluster
    Each services has its own GCP project
    Each namespace has its own service account
    for its own GCP project
    Service resources in
    its own GCP project

    View Slide

  91. IAM: SRE
    Namespace: Service A
    RBAC: Team X
    Pod: A Pod: A Pod: A
    Namespace: Service B
    RBAC: Team Y
    Pod: B Pod: B
    GCP project: Service A
    IAM: Team X + SRE
    Cloud SQL
    GCP project: Service B
    Spanner
    IAM: Team Y + SRE
    Production Cluster
    GCP project creation…?
    Setup Spanner or Cloud SQL ..?
    GCP project: GKE Production

    View Slide

  92. Infrastructure as Code

    View Slide

  93. View Slide

  94. CloudSQL instance creation

    View Slide

  95. Spanner instance creation

    View Slide

  96. mercari / microservices-terraform Private

    View Slide

  97. Just create a PR to create new GCP project

    View Slide

  98. Terraform plan on CI

    View Slide

  99. Terraform apply on CI

    View Slide

  100. Tool for notifying terraform result is open sourced
    https://github.com/mercari/tfnotify
    Terraform apply on CI

    View Slide

  101. Common part (GCP project creation, Pagerduty setup) can be bootstrapped

    View Slide

  102. IAM: SRE
    Namespace: Service A
    RBAC: Team X
    Pod: A Pod: A Pod: A
    Namespace: Service B
    RBAC: Team Y
    Pod: B Pod: B
    GCP project: Service A
    IAM: Team X + SRE
    Cloud SQL
    GCP project: Service B
    Spanner
    IAM: Team Y + SRE
    Production Cluster
    Stackdriver
    GCP project: GKE Production

    View Slide

  103. IAM: SRE
    Namespace: Service A
    RBAC: Team X
    Pod: A Pod: A Pod: A
    Namespace: Service B
    RBAC: Team Y
    Pod: B Pod: B
    GCP project: Service A
    IAM: Team X + SRE
    Cloud SQL
    GCP project: Service B
    Spanner
    IAM: Team Y + SRE
    Production Cluster
    Logging…?
    Stackdriver
    GCP project: GKE Production

    View Slide

  104. How access limit stackdriver logging?
    Each team should be allowed to access
    only its service log

    View Slide

  105. View Slide

  106. IAM: SRE
    Namespace: Service A
    RBAC: Team X
    Pod: A Pod: A Pod: A
    Namespace: Service B
    RBAC: Team Y
    Pod: B Pod: B
    GCP project: Service A
    IAM: Team X + SRE
    Cloud SQL
    GCP project: Service B
    Spanner
    IAM: Team Y + SRE
    Production Cluster
    Logging…?
    Stackdriver
    GCP project: GKE Production

    View Slide

  107. IAM: SRE
    Namespace: Service A
    RBAC: Team X
    Pod: A Pod: A Pod: A
    Namespace: Service B
    RBAC: Team Y
    Pod: B Pod: B
    GCP project: Service A
    IAM: Team X + SRE
    Cloud SQL
    GCP project: Service B
    Spanner
    IAM: Team Y + SRE
    Production Cluster
    Stackdriver
    Big Query
    Big Query
    GCP project: GKE Production
    Create BQ for each services

    View Slide

  108. IAM: SRE
    Namespace: Service A
    RBAC: Team X
    Pod: A Pod: A Pod: A
    Namespace: Service B
    RBAC: Team Y
    Pod: B Pod: B
    GCP project: Service A
    IAM: Team X + SRE
    Cloud SQL
    GCP project: Service B
    Spanner
    IAM: Team Y + SRE
    Production Cluster
    Create BQ sink for each services
    Stackdriver
    Big Query
    Big Query
    sink
    sink
    GCP project: GKE Production
    Create BQ for each services

    View Slide

  109. BigQuery sink creation

    View Slide

  110. View Slide

  111. GCP and k8s Ecosystem

    View Slide

  112. Just create ingress it automatically creates DNS records
    with Cloud DNS

    View Slide

  113. Disaster Recovering
    Take backups of your cluster and restore in case of loss.
    with Cloud Storage

    View Slide

  114. Non GCP?

    View Slide

  115. Notification or Integration with GitHub
    vs. Container Builder

    View Slide

  116. Integration with external services like CDN or AWS
    vs. Stackdriver monitoring

    View Slide

  117. vs. Stackdriver error report
    Notification and Integration with GitHub

    View Slide

  118. vs. ??
    GCP does not have chaos as a service

    View Slide

  119. Conclusion

    View Slide

  120. Mercari ❤

    View Slide

  121. @deeeet

    View Slide