$30 off During Our Annual Pro Sale. View Details »

How We Harden Platform Security at Mercari

How We Harden Platform Security at Mercari

This is a slide for CloudNative Days Tokyo 2021 Keynote (https://event.cloudnativedays.jp/cndt2021/talks/1208).

At Mercari, we've been building internal development platform top on Kubernetes and Cloud-native ecosystem for more than 3 years. The history of building the platform is the history of security hardening. In this session, I'm going to introduce what kind of security hardening we've implemented from basic k8s manifest security policy enforcement to supply chain integrity checking, IaC automation security, and zero-touch-based access automation.

taichi nakashima

November 04, 2021
Tweet

More Decks by taichi nakashima

Other Decks in Technology

Transcript

  1. How We Harden Platform Security
    CloudNative Days Tokyo 2021

    View Slide

  2. View Slide

  3. How We Harden Platform Security
    CloudNative Days Tokyo 2021

    View Slide

  4. Taichi Nakashima
    @deeeet / @tcnksm
    Engineering head of Developer Productivity Engineering

    View Slide

  5. https://e34.fm

    View Slide

  6. Table of Contents

    ● Microservices Platform Overview
    ● 3 Cases of Harden Platform Security
    ○ Multi Tenant Security
    ○ Production Operation Security
    ○ Supply Chain Security
    ● Lessons Learned

    View Slide

  7. Microservices Platform Overview

    View Slide

  8. Service C
    Service D
    Service B
    Service E
    Service A
    Mercari and Merpay Microservices
    Google Kubernetes Engine
    200+ Microservices
    4000+Kubernetes Pods
    2 main business

    View Slide

  9. Service A Team
    Mercari SRE Merpay SRE
    Service A
    Service B Team
    Service B
    Service C Team
    Service C
    Work closely or embedded
    Platform
    Platform Team

    View Slide

  10. Harden Platform Security

    View Slide

  11. Base Principle: Shared Responsibility


    View Slide

  12. Base Principle: Defense in Depth


    View Slide

  13. Harden Platform Security

    ● Multi-tenant Security
    ● Production Operation Security
    ● Supply Chain Security

    View Slide

  14. Multi-tenancy

    Multi-tenancy is architecture pattern which, instead of building platforms
    per business or services, prepares isolated tenant per services and hosts them
    together on single platform.
    While multi-tenancy increase complexity, you can avoid reinventing wheels in
    the organization, reduce the operational costs, and leverage improvements to
    all.

    View Slide

  15. Principle: Least Privilege

    Least privilege means human user or workload must be able to access only
    resources that are necessary for its legitimate purpose.
    In multi-tenancy context, it’s important to make sure only tenant owners are
    able to access its tenant’s resources.

    View Slide

  16. Multi-tenant Least Privileges on

    ● Kubernetes Cluster
    ● Infrastructure as Code (IaC) Monorepo and Build System

    View Slide

  17. Service A Namespace
    Kubernetes Cluster
    Service B Namespace
    System Namespace
    Container A
    Container A
    Resources
    Container A
    Container A
    Resources
    Container A
    Container A
    Resources
    Service A Team

    RBAC
    🚫
    🚫

    View Slide

  18. Multi-tenant Least Privileges on

    ● Kubernetes Cluster
    ● Infrastructure as Code (IaC) Monorepo and Build System

    View Slide

  19. CI System
    Build System
    Service A Tenant
    Container A
    Container A
    Resources
    CI System
    Service B Tenant
    Container A
    Container A
    Resources
    CI System
    System Tenant
    Container A
    Container A
    Resources
    IaC Monorepo
    Service A Team
    Service B Team
    Platform Team
    PR Configure

    View Slide

  20. CI System
    Build System
    Service A Tenant
    Container A
    Container A
    Resources
    CI System
    Service B Tenant
    Container A
    Container A
    Resources
    CI System
    System Tenant
    Container A
    Container A
    Resources
    IaC Monorepo
    Service A Team
    Service B Team
    Platform Team
    PR Configure

    View Slide

  21. Service A Team
    ✅ CODEOWNER
    🚫
    🚫
    /service-a
    module.tf
    google_spanner_database.tf
    google_storage_bucket.tf
    ...
    /service-b
    module.tf
    google_bigquery_dataset.tf
    google_pubsub_topic.tf
    ...
    /system
    module.tf
    google_container_cluster.tf
    google_compute_firewall.tf
    ...
    Infra as Code Monorepo

    View Slide

  22. CI System
    Build System
    Service A Tenant
    Container A
    Container A
    Resources
    CI System
    Service B Tenant
    Container A
    Container A
    Resources
    CI System
    System Tenant
    Container A
    Container A
    Resources
    IaC Monorepo
    Service A Team
    Service B Team
    Platform Team
    PR Configure

    View Slide

  23. CI System
    Build System
    Service A Tenant
    Container A
    Container A
    Resources
    Build Account
    IAM
    CI System
    Service B Tenant
    Container A
    Container A
    Resources
    CI System
    System Tenant
    Container A
    Container A
    Resources
    IAM
    IAM

    View Slide

  24. CI System
    Build System
    Service A Tenant
    Container A
    Container A
    Resources
    Service A Account
    (Keyless)
    Build Account
    (keyless) CI System
    Service B Tenant
    Container A
    Container A
    Resources
    CI System
    System Tenant
    Container A
    Container A
    Resources
    Impersonate
    IAM
    Service B Account
    (Keyless)
    IAM
    Impersonate
    System Account
    (Keyless)
    IAM
    Impersonate

    View Slide

  25. CI System
    Build System
    Service A Tenant
    Container A
    Container A
    Resources
    CI System
    Service B Tenant
    Container A
    Container A
    Resources
    CI System
    System Tenant
    Container A
    Container A
    Resources
    Short-lived token
    🚫
    🚫

    Impersonate
    IAM
    Service A Account
    (Keyless)
    Build Account
    (keyless)

    View Slide

  26. Harden Platform Security

    ● Multi-tenant Security
    ● Production Operation Security
    ● Supply Chain Security

    View Slide

  27. https://sre.google/books/building-secure-reliable-systems/

    View Slide

  28. Goal: Zero Touch Production

    The specific goal of these interfaces—like Zero Touch Production (ZTP) ,..., is to
    make Google safer and reduce outages by removing direct human access to
    production roles. Instead, humans have indirect access to production through
    tooling and automation that make predictable and controlled changes to
    production infrastructure.
    - Building Secure and Reliable Systems, Chapter 5

    View Slide

  29. Service A Team
    Service A Namespace
    Kubernetes Cluster
    Container A
    Container A
    Resources
    View
    Edit
    IaC Repository
    +Build System
    Service B Namespace
    System Namespace

    View Slide

  30. Service A Team
    Service A Namespace
    Kubernetes Cluster
    Container A
    Container A
    Resources
    View
    Edit
    Edit
    IaC Repository
    +Build System
    Temporary
    Role Grant
    Service B Namespace
    System Namespace

    View Slide

  31. Service A Team
    Service A Namespace
    Kubernetes Cluster
    Container A
    Container A
    Resources
    View
    Edit
    Edit
    Edit
    IaC Repository
    +Build System
    Automated
    Workflows
    Temporary
    Role Grant
    Service B Namespace
    System Namespace

    View Slide

  32. Harden Platform Security

    ● Multi-tenant Security
    ● Production Operation Security
    ● Supply Chain Security

    View Slide

  33. Source Build Deploy
    Registry Cluster
    Dependency

    View Slide

  34. Source Build Deploy
    Registry Cluster
    Dependency
    Compromise build
    system
    Compromise artifact
    registry
    Inject bad container
    image
    Bypass code review
    Inject bad/vulnerable
    dependency
    Compromise source
    control system
    Alter code
    Compromise deploy system
    Use bad image

    View Slide

  35. Practice: Verify Artifacts, Not Just People
    The controls around the source, build, and test infrastructure have limited
    effect if adversaries can bypass them by deploying directly to production. It is
    not sufficient to verify who initiated a deployment, because that actor may
    make a mistake or may be intentionally deploying a malicious change.
    Instead, deployment environments should verify what is being deployed.
    - Building Secure and Reliable Systems, Chapter 14

    View Slide

  36. Source Build Deploy
    Registry
    Metadata
    Cluster
    Kritis
    Dependency

    Sign
    Check

    View Slide

  37. Lessons Learned

    View Slide

  38. Secure By Default

    Security hardening = “migration” takes lots of time and costs...
    Build the security policy by allowlist, instead of denylist!

    View Slide

  39. Build Abstraction

    Hide infrastructure and security complexity from the developers and control
    them centrally in background by experts.
    Make the future migration easy!

    View Slide

  40. Example abstraction built internally at Mercari with CUE

    View Slide

  41. Thank you!

    View Slide

  42. We are Hiring!
    https://careers.mercari.com/search-jobs/

    View Slide