Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Microservices on GKE at Mercari

taichi nakashima
April 26, 2018
5.8k

Microservices on GKE at Mercari

taichi nakashima

April 26, 2018
Tweet

Transcript

  1. Microservices
    On GKE At Mercari
    GCPUG Tokyo Kubernetes Engine Day
    @deeeet

    View full-size slide

  2. Start with Monolith

    View full-size slide

  3. Small Overhead for cross domains
    Reusable code across domains 

    Effective operation by SRE team

    View full-size slide

  4. 3 scalabilities

    View full-size slide

  5. Growth of business
    Growth of features
    Growth of organization

    View full-size slide

  6. Growth of business
    Growth of features
    Growth of organization

    View full-size slide

  7. Growth of business
    Growth of features
    Growth of organization

    View full-size slide

  8. Huge Monolith

    View full-size slide

  9. Difficult to understand change effect
    Difficult to test
    Difficult to on-board
    Difficult to isolate failure
    Difficult to scale independently
    Difficult to try new technologies

    View full-size slide

  10. Growth of business
    Growth of features
    Growth of organization

    View full-size slide

  11. Unclear ownership
    Communication overhead

    View full-size slide

  12. Velocity is stalled ☔

    View full-size slide

  13. Microservices

    View full-size slide

  14. Microservices is a software development technique that structures an
    application as a collection of loosely coupled services with the
    smallest autonomous boundary.

    View full-size slide

  15. Technical benefit
    Organization benefit

    View full-size slide

  16. Technical benefit
    Organization benefit

    View full-size slide

  17. Easy to test
    Easy to deploy
    Easy to on-board
    Easy to isolate failure
    Easy to scale independently

    View full-size slide

  18. Technical benefit
    Organization benefit

    View full-size slide

  19. Clear ownership
    Minimum communication overhead

    View full-size slide

  20. Deliver new features faster ☀

    View full-size slide

  21. How Microservices?

    View full-size slide

  22. Gateway pattern
    Strangler pattern

    View full-size slide

  23. Gateway pattern
    Strangler pattern

    View full-size slide

  24. Service A
    Service B
    Mercari API

    View full-size slide

  25. API Gateway
    Service A
    Service B
    Mercari API

    View full-size slide

  26. API Gateway
    Service A
    Service B
    Service X
    Mercari API

    View full-size slide

  27. API Gateway
    Service A
    Service B
    Service X
    Multiple services on a single endpoint
    SSL Termination
    DDoS Protection
    Common AuthZ/AuthN
    Mercari API

    View full-size slide

  28. Gateway pattern
    Strangler pattern

    View full-size slide

  29. Mercari API
    API Gateway
    Service A
    Service B
    Service X

    View full-size slide

  30. Mercari API
    API Gateway
    Service B
    Service X Service A

    View full-size slide

  31. Mercari API
    API Gateway
    Service X Service A Service B

    View full-size slide

  32. Mercari API
    API Gateway
    Function X
    Function Y
    Function Z
    Service C

    View full-size slide

  33. Mercari API
    API Gateway
    Function X
    Facade C
    Function Y
    Function Z
    Service C

    View full-size slide

  34. Mercari API
    API Gateway
    Facade C
    Function Y
    Function Z
    Service C
    Function X

    View full-size slide

  35. Mercari API
    API Gateway
    Facade C
    Function Z
    Service C
    Function X
    Function Y

    View full-size slide

  36. Mercari API
    API Gateway
    Facade C
    Service C
    Function X
    Function Y
    Function Z

    View full-size slide

  37. Mercari API
    API Gateway
    Service C
    Function X
    Function Y
    Function Z

    View full-size slide

  38. Mercari API
    API Gateway
    Service C
    Function X
    Function Y
    Service D
    Function Z

    View full-size slide

  39. Current Status

    View full-size slide

  40. API Gateway
    Service A
    Service B
    Service X
    Mercari API

    View full-size slide

  41. Technical Stack

    View full-size slide

  42. API Gateway
    Authority
    Service A
    Service B
    Sakura
    Service X
    Mercari API

    View full-size slide

  43. API Gateway
    Google Cloud Load balancing
    Authority
    Service A
    Service B
    Sakura
    Service X
    Mercari API
    GCP
    Kubernetes Engine

    View full-size slide

  44. API Gateway
    Google Cloud Load balancing
    Authority
    Service A
    Service B
    Sakura
    Service X
    Mercari API
    GCP
    Kubernetes Engine
    Cloud Resources
    Managed Services

    View full-size slide

  45. API Gateway
    Google Cloud Load balancing
    Authority
    Service A
    Service B
    Sakura
    Service X
    Mercari API
    GCP
    Kubernetes Engine
    Cloud Resources
    Managed Services
    Container

    View full-size slide

  46. API Gateway
    Google Cloud Load balancing
    Authority
    Service A
    Service B
    Sakura
    Service X
    Mercari API
    GCP
    Kubernetes Engine
    Cloud Resources
    Managed Services
    Container
    Over HTTP

    View full-size slide

  47. API Gateway
    Google Cloud Load balancing
    Authority
    Service A
    Service B
    Sakura
    Service X
    Mercari API
    GCP
    Kubernetes Engine
    Cloud Resources
    Managed Services
    Container
    Over HTTP
    SSL Termination
    DDoS Protection
    Cloud Amor?

    View full-size slide

  48. API Gateway
    Google Cloud Load balancing
    Authority
    Service A
    Service B
    Sakura
    Service X
    Mercari API
    GCP
    Kubernetes Engine
    Cloud Resources
    Managed Services
    Container
    Over HTTP
    Routing to microservices
    Protocol tranformation (HTTP to gRPC)
    Common logging & Tracing
    Request buffering
    SSL Termination
    DDoS Protection
    Cloud Amor?

    View full-size slide

  49. API Gateway
    Google Cloud Load balancing
    Authority
    Service A
    Service B
    Sakura
    Service X
    Mercari API
    GCP
    Kubernetes Engine
    Cloud Resources
    Managed Services
    Container
    Over HTTP
    Routing to microservices
    Protocol tranformation (HTTP to gRPC)
    Common logging & Tracing
    Request buffering
    SSL Termination
    DDoS Protection
    Cloud Amor?
    Common AuthZ/AuthN

    View full-size slide

  50. API Gateway
    Google Cloud Load balancing
    Authority
    Service A
    Service B
    Sakura
    Service X
    Mercari API
    GCP
    Kubernetes Engine
    Cloud Resources
    Managed Services
    Container
    Over HTTP
    Routing to microservices
    Protocol tranformation (HTTP to gRPC)
    Common logging & Tracing
    Request buffering
    SSL Termination
    DDoS Protection
    Cloud Amor?
    Common AuthZ/AuthN
    Managed DB

    View full-size slide

  51. Another important takeaway is that even though all of these listed
    items are important, ultimately the most critical thing is observability.
    As I like to say: observability, observability, observability
    - Matt Klein, Seeking SRE (Chapter6)

    View full-size slide

  52. Service A Service B
    Network
    Logging? Tracing? (Observability)
    Network
    Logging? Tracing? (Observability)

    View full-size slide

  53. Service A Service B
    Network
    AuthN and AuthZ?
    API limit ?
    Load balancing ?
    Request timeout ?
    Request retry with backoff?
    Circuit breaking ?
    Logging? Tracing? (Observability)
    Network
    Logging? Tracing? (Observability)

    View full-size slide

  54. Service A Service B
    Network
    AuthN and AuthZ?
    API limit ?
    Load balancing ?
    Request timeout ?
    Request retry with backoff?
    Circuit breaking ?
    Logging? Tracing? (Observability)
    Network
    Logging? Tracing? (Observability)
    Different protocols..

    View full-size slide

  55. Service A Service B
    Service C
    Service D

    View full-size slide

  56. Service A Service B
    Service C
    Service D
    Se
    Se
    Se

    View full-size slide

  57. How we use GCP?

    View full-size slide

  58. API Gateway
    Google Cloud Load balancing
    Authority
    Service X
    GCP
    Kubernetes Engine

    View full-size slide

  59. API Gateway
    Google Cloud Load balancing
    Authority
    Service X
    GCP
    Kubernetes Engine
    How we use GKE?

    View full-size slide

  60. Cluster strategy
    GCP project strategy
    Node pool strategy
    Namespace strategy

    View full-size slide

  61. Cluster strategy
    GCP project strategy
    Node pool strategy
    Namespace strategy

    View full-size slide

  62. asia-northeast1
    us-west1
    europe-west1
    Each region has its own Cluster

    View full-size slide

  63. Production Cluster
    Development Cluster
    Testing/QA will be done in
    development cluster
    All services in 1 cluster
    No special cluster for specific service

    View full-size slide

  64. Production Cluster
    In future, 1 region 1 cluster
    like Google Borg

    View full-size slide

  65. Cluster strategy
    GCP project strategy
    Node pool strategy
    Namespace strategy

    View full-size slide

  66. GCP project: GKE Production
    Production Cluster
    GCP project: GKE Development
    Development Cluster
    IAM: SRE IAM: SRE + α
    1 cluster for 1 GCP project
    Only SRE can access cluster nodes

    View full-size slide

  67. Cluster strategy
    GCP project strategy
    Node pool strategy
    Namespace strategy

    View full-size slide

  68. GCP project: GKE Production
    Production Cluster
    n1-standard-16
    node pool
    n1-highmem-16
    node pool
    Machine learning workloads
    Normal applications
    Auto scaling Enabled
    Automatic node repair Enabled
    Preemptible Enabled (only in US)

    View full-size slide

  69. Cluster strategy
    GCP project strategy
    Node pool strategy
    Namespace strategy

    View full-size slide

  70. Each services has its own
    kubernetes namespace
    GCP project: GKE Production
    Namespace: Service A
    Pod: A Pod: A Pod: A
    Namespace: Service B
    Pod: B Pod: B
    Production Cluster
    RBAC: Team X
    RBAC: Team X
    Each team can only access
    its own kubernetes namespace

    View full-size slide

  71. API Gateway
    Google Cloud Load balancing
    Authority
    Service X
    GCP
    Kubernetes Engine
    How we use GCP services?

    View full-size slide

  72. How access limit GCP services?
    Each service should be allowed to access
    only its own GCP resources

    View full-size slide

  73. GCP project: GKE Production
    IAM: SRE
    Namespace: Service A
    Pod: A Pod: A Pod: A
    Namespace: Service B
    Pod: B Pod: B
    Production Cluster
    RBAC: Team X
    RBAC: Team Y

    View full-size slide

  74. GCP project: GKE Production
    IAM: SRE
    Namespace: Service A
    Pod: A Pod: A Pod: A
    Namespace: Service B
    Pod: B Pod: B
    GCP project: Service A
    IAM: Team X + SRE
    GCP project: Service B
    IAM: Team Y + SRE
    Production Cluster
    Each services has its own GCP project
    RBAC: Team X
    RBAC: Team Y

    View full-size slide

  75. GCP project: GKE Production
    IAM: SRE
    Namespace: Service A
    Pod: A Pod: A Pod: A
    Namespace: Service B
    Pod: B Pod: B
    GCP project: Service A
    IAM: Team X + SRE
    Cloud SQL
    GCP project: Service B
    Spanner
    IAM: Team Y + SRE
    Production Cluster
    Each services has its own GCP project
    RBAC: Team X
    RBAC: Team Y
    Service resources in
    its own GCP project

    View full-size slide

  76. GCP project: GKE Production
    IAM: SRE
    Namespace: Service A
    Pod: A Pod: A Pod: A
    Namespace: Service B
    Pod: B Pod: B
    GCP project: Service A
    IAM: Team X + SRE
    Cloud SQL
    GCP project: Service B
    Spanner
    IAM: Team Y + SRE
    Production Cluster
    Each services has its own GCP project
    Each namespace has its own service account
    for its own GCP project
    RBAC: Team X
    RBAC: Team Y
    Service resources in
    its own GCP project

    View full-size slide

  77. Each namespace has its own service account

    View full-size slide

  78. GCP project: GKE Production
    IAM: SRE
    Namespace: Service A
    RBAC: Team X
    Pod: A Pod: A Pod: A
    Namespace: Service B
    RBAC: Team Y
    Pod: B Pod: B
    GCP project: Service A
    IAM: Team X + SRE
    Cloud SQL
    GCP project: Service B
    Spanner
    IAM: Team Y + SRE
    Production Cluster
    Each services has its own GCP project
    Each namespace has its own service account
    for its own GCP project
    Service resources in
    its own GCP project

    View full-size slide

  79. IAM: SRE
    Namespace: Service A
    RBAC: Team X
    Pod: A Pod: A Pod: A
    Namespace: Service B
    RBAC: Team Y
    Pod: B Pod: B
    GCP project: Service A
    IAM: Team X + SRE
    Cloud SQL
    GCP project: Service B
    Spanner
    IAM: Team Y + SRE
    Production Cluster
    GCP project creation…?
    Setup Spanner or Cloud SQL ..?
    GCP project: GKE Production

    View full-size slide

  80. Infrastructure as Code

    View full-size slide

  81. CloudSQL instance creation

    View full-size slide

  82. Spanner instance creation

    View full-size slide

  83. mercari / microservices-terraform Private

    View full-size slide

  84. Just create a PR to create new GCP project

    View full-size slide

  85. Terraform plan on CI

    View full-size slide

  86. Terraform apply on CI

    View full-size slide

  87. Tool for notifying terraform result is open sourced
    https://github.com/mercari/tfnotify
    Terraform apply on CI

    View full-size slide

  88. Common part (GCP project creation, Pagerduty setup) can be bootstrapped

    View full-size slide

  89. IAM: SRE
    Namespace: Service A
    RBAC: Team X
    Pod: A Pod: A Pod: A
    Namespace: Service B
    RBAC: Team Y
    Pod: B Pod: B
    GCP project: Service A
    IAM: Team X + SRE
    Cloud SQL
    GCP project: Service B
    Spanner
    IAM: Team Y + SRE
    Production Cluster
    Stackdriver
    GCP project: GKE Production

    View full-size slide

  90. IAM: SRE
    Namespace: Service A
    RBAC: Team X
    Pod: A Pod: A Pod: A
    Namespace: Service B
    RBAC: Team Y
    Pod: B Pod: B
    GCP project: Service A
    IAM: Team X + SRE
    Cloud SQL
    GCP project: Service B
    Spanner
    IAM: Team Y + SRE
    Production Cluster
    Logging…?
    Stackdriver
    GCP project: GKE Production

    View full-size slide

  91. How access limit stackdriver logging?
    Each team should be allowed to access
    only its service log

    View full-size slide

  92. IAM: SRE
    Namespace: Service A
    RBAC: Team X
    Pod: A Pod: A Pod: A
    Namespace: Service B
    RBAC: Team Y
    Pod: B Pod: B
    GCP project: Service A
    IAM: Team X + SRE
    Cloud SQL
    GCP project: Service B
    Spanner
    IAM: Team Y + SRE
    Production Cluster
    Logging…?
    Stackdriver
    GCP project: GKE Production

    View full-size slide

  93. IAM: SRE
    Namespace: Service A
    RBAC: Team X
    Pod: A Pod: A Pod: A
    Namespace: Service B
    RBAC: Team Y
    Pod: B Pod: B
    GCP project: Service A
    IAM: Team X + SRE
    Cloud SQL
    GCP project: Service B
    Spanner
    IAM: Team Y + SRE
    Production Cluster
    Stackdriver
    Big Query
    Big Query
    GCP project: GKE Production
    Create BQ for each services

    View full-size slide

  94. IAM: SRE
    Namespace: Service A
    RBAC: Team X
    Pod: A Pod: A Pod: A
    Namespace: Service B
    RBAC: Team Y
    Pod: B Pod: B
    GCP project: Service A
    IAM: Team X + SRE
    Cloud SQL
    GCP project: Service B
    Spanner
    IAM: Team Y + SRE
    Production Cluster
    Create BQ sink for each services
    Stackdriver
    Big Query
    Big Query
    sink
    sink
    GCP project: GKE Production
    Create BQ for each services

    View full-size slide

  95. BigQuery sink creation

    View full-size slide

  96. GCP and k8s Ecosystem

    View full-size slide

  97. Just create ingress it automatically creates DNS records
    with Cloud DNS

    View full-size slide

  98. Disaster Recovering
    Take backups of your cluster and restore in case of loss.
    with Cloud Storage

    View full-size slide

  99. Notification or Integration with GitHub
    vs. Container Builder

    View full-size slide

  100. Integration with external services like CDN or AWS
    vs. Stackdriver monitoring

    View full-size slide

  101. vs. Stackdriver error report
    Notification and Integration with GitHub

    View full-size slide

  102. vs. ??
    GCP does not have chaos as a service

    View full-size slide