Upgrade to PRO for Only $50/Year—Limited-Time Offer! 🔥

Kubernetes Multi-tenancy: Principles and Practi...

Avatar for hhiroshell hhiroshell
December 08, 2025

Kubernetes Multi-tenancy: Principles and Practices for Large Scale Internal Platforms

This is for my presentation at Open Source Summit Japan 2025.

# Description:
Platform Engineering enables developers to focus on business value-aligned tasks by providing internal developer platforms (IDPs) that streamline development workflows. Kubernetes is widely used as a foundation for IDPs thanks to its scalability and flexibility.

As organizations grow, supporting multiple development projects often requires a multi-tenant platform built on Kubernetes. To meet demands for security, resource efficiency, and scalability in a multi-tenancy scenario, it is essential to understand and effectively utilize Kubernetes’ features, which support them. Moreover, real-world use cases often require expanding the scope of consideration to other cloud native technologies, utilized for enabling IDP capabilities on Kubernetes.

In this session, he will explore key considerations for designing a multi-tenant Kubernetes-based platform, drawing on real-world experience in its development and operation. The content will span from Kubernetes fundamentals to relevant technologies across the cloud native ecosystem. Participants will gain a comprehensive view of the multi-tenant landscape and practical insights for operating their own platforms at scale.

Avatar for hhiroshell

hhiroshell

December 08, 2025
Tweet

More Decks by hhiroshell

Other Decks in Technology

Transcript

  1. 1 Open Source Summit Japan 2025 Hiroshi Hayakawa, LY Corporation

    Kubernetes Multi-tenancy: Principles and Practices for Large Scale Internal Platforms
  2. 2 Open Source Summit Japan 2025 About Me • Working

    for LY Corporation - An internet company that offers various services, including communication, internet portals, media, and commerce …etc, primarily in Japan. • Contributing to CNCF platform engineering community group • Author of books on Kubernetes • DIY keyboard enthusiast Hiroshi Hayakawa | @hhiroshell
  3. 5 Open Source Summit Japan 2025 Agenda 1. Background 2.

    Architecture Patterns for Kubernetes Based IDPs 3. Practices for Multi-tenant Kubernetes Based IDPs 4. Conclusion
  4. 6 Open Source Summit Japan 2025 Agenda 1. Background 2.

    Architecture Patterns for Kubernetes Based IDPs 3. Practices for Multi-tenant Kubernetes Based IDPs 4. Conclusion
  5. 7 Open Source Summit Japan 2025 What is Platform Engineering?

    • Recent IT paradigms have shortened release cycles but also burdened developers with mastering many tools (cf. extraneous cognitive load) Cloud Infrastructure & IaC Microservices Continuous Integration & Delivery (CI/CD) DevOps
  6. 8 Open Source Summit Japan 2025 What is Platform Engineering?

    • An initiative to provide foundations called IDP(Internal Developer Platform) • IDP allows internal developers to focus on creating essential values for their business versatile but burdening tools and infrastructures IDP Platform Team Developers use provide use
  7. 9 Open Source Summit Japan 2025 Kubernetes is an IDP,

    right? • It streamlines application developments and day-by-day operations by providing: - abstraction of low-level computing resources - declarative workload management - auto-healing and scaling - safe application updates • It has extensibility to optimize for organizational requirements
  8. 10 Open Source Summit Japan 2025 Dedicated Kubernetes Cluster for

    Each Tenant + Addons CLI / GUI Developers request use EKS Provider Helm Provider Platform Cluster ✓ RBAC ✓ ingress controller ✓ monitoring agent ✓ … create cluster apply policies and addons Crossplane composition
  9. 11 Open Source Summit Japan 2025 Dedicated Kubernetes Cluster for

    Each Tenant + Addons • Pros - Strong isolation – security / fault containment / cluster lifecycle - High configurability for users • Cons - Operational overhead for managing a large number of dedicated clusters - Low resource efficiency
  10. 12 Open Source Summit Japan 2025 Dedicated Kubernetes Cluster for

    Each Tenant + Addons • Pros - Strong isolation – security / fault containment / cluster lifecycle - High configurability for users • Cons - Operational overhead for managing a large number of dedicated clusters - Low resource efficiency In organizations with lots of tenants, the cons really start to show. This is where multi-tenancy comes in!
  11. 13 Open Source Summit Japan 2025 ( Multi-Tenancy ) is

    Not a Silver Bullet Find the right balance for your organization. Isolation Configurability Operational overhead Resource efficiency
  12. 14 Open Source Summit Japan 2025 Agenda 1. Background 2.

    Architecture Patterns for Kubernetes Based IDPs 3. Practices for Multi-tenant Kubernetes Based IDPs 4. Conclusion
  13. 15 Open Source Summit Japan 2025 Architecture Patterns for Multi-tenant,

    Kubernetes Based IDP • #1: Single large Kubernetes cluster • #2: Multiple Kubernetes clusters with Single multi-tenant API
  14. 16 Open Source Summit Japan 2025 #1 Single large Kubernetes

    cluster Developer Application Pod System Component Pod Application Pod Application Pod Application Pod Application Pod Application Pod Application Pod Application Pod Application Pod Application Pod System Component Pod System Component Pod System Component Pod Application Pod Application Pod Application Pod Application Pod Application Pod Application Pod Application Pod Application Pod Application Pod Application Pod Application Pod Application Pod Application Pod Application Pod Application Pod Application Pod Application Pod Application Pod Application Pod Application Pod
  15. 17 Open Source Summit Japan 2025 #1 Single large Kubernetes

    cluster Developer Application Pod System Component Pod Application Pod Application Pod Application Pod Application Pod Application Pod Application Pod Application Pod Application Pod Application Pod System Component Pod System Component Pod System Component Pod Application Pod Application Pod Application Pod Application Pod Application Pod Application Pod Application Pod Application Pod Application Pod Application Pod Application Pod Application Pod Application Pod Application Pod Application Pod Application Pod Application Pod Application Pod Application Pod Application Pod Namespace / vCluster
  16. 18 Open Source Summit Japan 2025 System Component Pod #2

    Multiple Kubernetes clusters with single multi-tenant API System Component Pod Application Pod Application Pod System Component Pod Application Pod Application Pod Application Pod Application Pod Cluster Scheduler (Custom Controller) Developer 18 Open Source Summit Japan 2025 System Component Pod System Component Pod Application Pod Application Pod Application Pod Application Pod Application Pod Application Pod Application Pod Application Pod Developer Control Plane Cluster Workload Clusters Interact with developers’ own namespaces
  17. 19 Open Source Summit Japan 2025 System Component Pod #2

    Multiple Kubernetes clusters with single multi-tenant API System Component Pod Application Pod Application Pod System Component Pod Application Pod Application Pod Application Pod Application Pod Cluster Scheduler (Custom Controller) Developer 19 Open Source Summit Japan 2025 System Component Pod System Component Pod Application Pod Application Pod Application Pod Application Pod Application Pod Application Pod Application Pod Application Pod Developer Control Plane Cluster Workload Clusters Interact with developers’ own tenancy endpoint
  18. 20 Open Source Summit Japan 2025 Single Large Cluster vs.

    Multiple Clusters Isolation Configurability Resource Efficiency Management Overhead Multiple Clusters Single Large Cluster
  19. 21 Open Source Summit Japan 2025 Single Large Cluster vs.

    Multiple Clusters Isolation Configurability Resource Efficiency Management Overhead Multiple Clusters Single Large Cluster
  20. 22 Open Source Summit Japan 2025 Single Large Cluster vs.

    Multiple Clusters Isolation Configurability Resource Efficiency Management Overhead Multiple Clusters Single Large Cluster = =
  21. 23 Open Source Summit Japan 2025 Single Large Cluster vs.

    Multiple Clusters Isolation Configurability Resource Efficiency Management Overhead Multiple Clusters Single Large Cluster = =
  22. 24 Open Source Summit Japan 2025 Single Large Cluster vs.

    Multiple Clusters Isolation Configurability Resource Efficiency Management Overhead Multiple Clusters Single Large Cluster = =
  23. 25 Open Source Summit Japan 2025 Single Large Cluster vs.

    Multiple Clusters Isolation Configurability Resource Efficiency Scalability Management Overhead Multiple Clusters Single Large Cluster = = need to handle the less predictable bottlenecks = Need to handle the complexity introduced by additional moving parts =
  24. 26 Open Source Summit Japan 2025 Which one is the

    winner? ✓ Both are well-proven approaches. ✓ The decision should be based on the requirements and skill set of the platform team. - management cost and resource efficiency → single large cluster - tenant isolation options and predictable scalability → multi-cluster ✓ There are different ways to implement each of them (e.g., vCluster, KCP)
  25. 27 Open Source Summit Japan 2025 Agenda 1. Background 2.

    Architecture Patterns for Kubernetes Based IDPs 3. Practices for Multi-tenant Kubernetes Based IDPs 4. Conclusion
  26. 28 Open Source Summit Japan 2025 Practices for Multi-tenant Kubernetes

    Based IDPs • Leverage Kubernetes features for workload isolation • Protect your system components • Plan isolation strategies based on workload trait • Handle tenant context properly
  27. 29 Open Source Summit Japan 2025 Leverage Kubernetes features for

    workload isolation • In both the single large cluster and multi-cluster architectures, multiple tenants’ applications share the resources of a single cluster. • Kubernetes features can provide various forms of isolation to support this.
  28. 30 Open Source Summit Japan 2025 Kubernetes’ Isolation Capabilities •

    Computing resources - ResourceQuota, LimitRange - Set limits on CPU, memory, and other computing resources available to containers and namespaces. • permissions for users - Role/RoleBinding - ClusterRole/ClusterRoleBinding - Enforce least-privilege access boundaries between tenants. • Networking - NeworkPolicy - Restrict pod-to-pod communication across namespaces and tenants. • Runtime isolation from lower layers - Pod Security Admission - Restrict host-level access (privileged container, hostNetwork, hostPID, etc.).
  29. 31 Open Source Summit Japan 2025 Protect Your System Components

    • When as system component runs into trouble, multiple tenants can be affected • We need to protect system components from anything – including tenant workloads – that could make them unstable
  30. 32 Open Source Summit Japan 2025 PriorityClass • Assigns higher

    scheduling priority to system and platform components, ensuring Kubernetes treats them as more important than tenant workloads. • Helps keep system system components stabile, even under heavy load or tenant misbehavior.
  31. 33 Open Source Summit Japan 2025 Implementing Rate Limiting •

    Apply rate limiting to system components whose load fluctuates based on tenant application behavior. This helps shed excess load before those system components become unstable. • Effective for components such as logging and metrics agents. container container container container Kubernetes Node Telemetry Agent Telemetry Backend
  32. 34 Open Source Summit Japan 2025 Plan isolation strategies based

    on workload trait • Isolate workloads that may affect other tenants or are vulnerable to noisy-neighbor issues. • Useful when allocating specific node resources (e.g., GPUs) exclusively to particular tenants. • Isolation units can include: - individual nodes - entire clusters (in multi-cluster setups).
  33. 35 Open Source Summit Japan 2025 Kubernetes’ Node Isolation Capabilities

    • NodeSelector - Schedules a Pod only onto nodes that have specific labels. - Simple and explicit, but limited in flexibility. • Taints & Tolerations - Nodes can “repel” Pods using taints; only Pods with matching tolerations can be scheduled onto them. - Useful for creating exclusive nodes or protecting critical workloads from interference.
  34. 36 Open Source Summit Japan 2025 Distribution of Resource Consumption

    per Application • It polarizes into a few massive applications and countless tiny ones. 2500 [core] / 2.5 [TB Mem] 640 [core] / 100 [GB Mem] 360 [core] / 128 [GB Mem] ... … … … … …
  35. 37 Open Source Summit Japan 2025 Distribution of Resource Consumption

    per Application • It polarizes into a few massive applications and countless tiny ones. 2500 [core] / 2.5 [TB Mem] 640 [core] / 100 [GB Mem] 360 [core] / 128 [GB Mem] ... … … … … …
  36. 38 Open Source Summit Japan 2025 Scheduling Strategies to Avoid

    Noisy Neighbors • Isolate Massive Applications into dedicated clusters Workload clusters shared with multiple tenants Workload clusters dedicated to specific tenants Cluster Scheduler
  37. 39 Open Source Summit Japan 2025 Pool, Silo, and Tenant

    Context • Pool: - Resources shared across multiple tenants - The tenant should be isolated at the upper layer • Silo: - Resources dedicated to a single tenant - It offers a physical boundary between other tenants, and the tenant is essentially identified with the underlying resource • Tenant Context: - Information that identifies the tenant when a workload runs or operates, represented as tokens or other elements https://www.oreilly.com/library/view/building-multi-tenant-saas/9781098140632/
  38. 40 Open Source Summit Japan 2025 Handle tenant context properly

    • Propagate tenant context across the platform and to any external systems it integrates with. • This helps maximize the benefits of multi-tenancy, even when external systems are involved.
  39. 41 Open Source Summit Japan 2025 Pipelines for Container Resource

    Metrics - Before • Lack of tenant context in metrics causes backend overflow kubelet container container container MQ Platform Metrics Backends per Tenants kubelet container container container Kubernetes Node Kubernetes Node default Tenant A Tenant B Developer Developer Metrics Agent (daemonset) Metrics Agent (daemonset)
  40. 42 Open Source Summit Japan 2025 Pipelines for Container Resource

    Metrics - Before • Lack of tenant context in metrics causes backend overflow kubelet container container container MQ Platform Metrics Backends per Tenants kubelet container container container Kubernetes Node Kubernetes Node default Tenant A Tenant B Developer Developer Metrics Agent (daemonset) Metrics Agent (daemonset) I can’t identify tenants from kubelet metrics…
  41. 43 Open Source Summit Japan 2025 Pipelines for Container Resource

    Metrics - After • Tenant contexts allow the backend to leverage its multi-tenant capabilities. Metrics Agent (daemonset) kubelet container container container MQ Platform Metrics Backends per Tenants kubelet container container container Kubernetes Node Kubernetes Node Developer Developer default Tenant A Tenant B Metrics Agent (daemonset) The new plugin empowered me to do that.
  42. 44 Open Source Summit Japan 2025 Pipelines for Application Specific

    Metrics - Before • The inability to dynamically identify tenant hinders the scaling of agents. MQ Platform Metrics Backends per Tenants Developer Developer default Tenant A Tenant B Metrics Agent (deployment) container container container Kubernetes Node container container container container container Kubernetes Node Metrics Agent (deployment)
  43. 45 Open Source Summit Japan 2025 Pipelines for Application Specific

    Metrics - Before • The inability to dynamically identify tenant hinders the scaling of agents. MQ Platform Metrics Backends per Tenants Developer Developer default Tenant A Tenant B Metrics Agent (deployment) container container container Kubernetes Node container container container container container Kubernetes Node Metrics Agent (deployment) Metrics in my tenant are overwhelming …
  44. 46 Open Source Summit Japan 2025 Pipelines for Application Specific

    Metrics - After • Dynamic tenant identification allows the agents to scale. Metrics Agent (daemonset) container container container MQ Platform Metrics Backends per Tenants Kubernetes Node Developer Developer default Tenant A Tenant B container container container container container Kubernetes Node Metrics Agent (daemonset)
  45. 47 Open Source Summit Japan 2025 Agenda 1. Background 2.

    Architecture Patterns for Kubernetes Based IDPs 3. Practices for Multi-tenant Kubernetes Based IDPs 4. Conclusion
  46. 48 Open Source Summit Japan 2025 Lessons & Key Takeaways

    ✓ Why Multi-Tenancy Matters • As organizations scale, shared platform efficiency and tenant isolation become increasingly essential. ✓ Choose the Right Architecture • Single-cluster and multi-cluster approaches each involve clear trade-offs in isolation, efficiency, scalability, and operational overhead. ✓ Practices that Make Multi-Tenancy Work • Effective isolation, protection of system components, workload-aware separation, and proper tenant context propagation are key.
  47. 49 Open Source Summit Japan 2025 Cloud Native Platform Engineering

    • Cloud Native technologies serve as crucial building blocks in creating IDPs - A variety of middleware and a robust OSS ecosystem centered around Kubernetes * CNCF graduated projects