Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Multi-tenancy Best Practices for Google Kubernetes Engine

Multi-tenancy Best Practices for Google Kubernetes Engine

Video: https://www.youtube.com/watch?v=RkY8u1_f5yY

In this talk, we talk about the Kubernetes APIs and GKE features that allow you to create a multi-tenant cluster.

Refer to documentation at https://cloud.google.com/kubernetes-engine/ or https://kubernetes.io for up-to-date instructions as information on the slide deck can be out-of-date very quickly.

Ahmet Alp Balkan

July 26, 2018
Tweet

More Decks by Ahmet Alp Balkan

Other Decks in Technology

Transcript

  1. AHMET ALP BALKAN SOFTWARE ENGINEER, GOOGLE CLOUD YOSHI TAMURA PRODUCT

    MANAGER, GOOGLE CLOUD THURSDAY, JULY 26 IO232 Multi-Tenancy Best Practices for Google Kubernetes Engine 1
  2. Who are we? Ahmet Alp Balkan (@ahmetb) Software Engineer at

    Developer Relations I work on making Kubernetes Engine easier to understand and use for developers and operators and write open source tools for Kubernetes. Previously, I worked at Microsoft Azure, on porting Docker to Windows and ACR. I maintain "kubectx". 2
  3. Yoshi Tamura (@yoshiat) Product Manager, Kubernetes Engine I work on

    Multi-tenancy and Hardware Accelerators (GPU and Cloud TPU) in Kubernetes Engine. Who are we? 3
  4. Practical Multi-Tenancy on Kubernetes Engine Following slides heavily inspired by

    KubeCon EU'18 talk of David Oppenheimer, Software Engineer, Google 4 Register your interest at: gke.page.link/multi-tenancy
  5. trust multi-tenancy modes isolation access control resource usage scheduling multi-tenancy

    features policy management preventing contention billing 5
  6. 0 What is multi-tenancy? 6

  7. Software Multi-tenancy single instance of software runs on a server

    and serves multiple tenants. 7
  8. 8

  9. Kubernetes Multi-tenancy Providing isolation and fair resource sharing between multiple

    users and their workloads within a cluster. 9
  10. 1 Trust 10

  11. • Your compiler* • Operating system • Dependencies • Deployment

    pipeline • Container runtime ... Do you trust... * Bonus reading on compilers: - Reflections on trusting trust. Ken Thompson. 1984. CACM 27, 8 (August 1984), 761-763. - Fully Countering Trusting Trust through Diverse Double-Compiling. D A Wheeler. PhD thesis, George Mason University, Oct. 2009. 11
  12. Levels of trust software multi-tenancy Trusted Semi-trusted Non-trusted the code

    comes from an audited source, built and run by trusted components (a.k.a “the dream”) the code comes from potentially hostile users, cannot assume good intent (a.k.a. hosting providers) trusted code, but has 3rd party dependencies or software not fully audited (a.k.a most people) 12
  13. 2 Kubernetes Engine Multi-Tenancy Primitives 13

  14. Kubernetes Cluster vs Namespace boundary cluster cluster namespace namespace namespace

    namespace namespace namespace namespace namespace 14
  15. project-2 Pros • Separate control plane (API) for each tenant

    (for free*) • Strong network isolation (if it's per-cluster VPC) However: • Need tools to manage 10s or 100s of clusters • Resource/configuration fragmentation of clusters • Slow turn-up: need to create a cluster for a new tenant * Google Kubernetes Engine control plane (master) is free of charge. Cluster per Tenant cluster cluster cluster project-1 15
  16. Namespace per tenant (intra-cluster multi-tenancy) Namespaces provide logical isolation between

    tenants on a cluster. Kubernetes policies are namespace-scoped. • Logical isolation between tenants • Policies for API access restrictions & resource usage constraints Pros: • Tenants can reuse extensions/controllers/CRDs • Shared control plane (=shared ops, shared security/auditing…) ns1 ns2 ns3 ns4 16
  17. Kubernetes Engine primitives Quotas Network Policy Pod Security Policy Pod

    Priority Limit Range IAM Sandbox Pods RBAC Access Control Resource Sharing Runtime Isolation Pod Affinity /Anti-Affinity Admissio n Control 17
  18. 3 Use cases of Kubernetes Multi-tenancy 18

  19. Enterprise SaaS (Software as a Service) Multi-tenancy use cases in

    Kubernetes KaaS (Kubernetes as a Service) 19
  20. All users from the same company/organization Namespaces ⇔ Tenants ⇔

    Teams Semi-trusted tenants (you can fire them on violation) Cluster Roles: • Cluster Admin ◦ CRUD any policy objects ◦ Create/assign namespaces to “Namespace Admins” ◦ Manage policies (resource usage quotas, networking) • Namespace Admin ◦ Manage users in the namespace(s) they own. • User ◦ CRUD non-policy objects in the namespace(s) they have access to “Enterprise” Model Control Plane (apiserver) Cluster Admin ns2 ns3 ns4 ns1 Namespace Admin Namespace Admin Namespace Admin 20
  21. Many apps from different teams, semi-trusted • Vanilla container isolation

    may suffice • If not: Sandboxing with gVisor, limit capabilities, use seccomp/AppArmor/... Network isolation: • Allow all traffic within a namespace • Whitelist traffic from/to other namespaces (=teams) “Enterprise” Model Control Plane (apiserver) Cluster Admin ns2 ns3 ns4 ns1 Namespace Admin Namespace Admin Namespace Admin 21
  22. “Software as a Service” model Control Plane (apiserver) Cluster Admin

    SaaS API/proxy SaaS Consumers cluster 22 Consumer deploys their app through a custom control plane.
  23. “Software as a Service” model 23 Control Plane (apiserver) Cluster

    Admin SaaS API/proxy SaaS Consumers cluster Consumer deploys their app through a custom control plane. After the app is deployed, customers directly connect to the app. Example: Wordpress hosting
  24. “Software as a Service” model 24 Control Plane (apiserver) Cluster

    Admin SaaS API/proxy SaaS Consumers cluster Consumer deploys their app through a custom control plane. After the app is deployed, customers directly connect to the app. Example: Wordpress hosting SaaS API is a trusted client of Kubernetes. Cluster admins can access the Kubernetes API directly. Tenant workloads may have untrusted pieces: • such as WordPress extensions • may require sandboxing with gVisor etc.
  25. Untrusted tenants running untrusted code. (Platform as a Service or

    hosting companies.) Tenants may create their namespaces, but cannot set policy objects. Stronger isolation requirements than enterprise/SaaS: • isolated world view (separate control plane) • tenants must not see each other • strong node and network isolation ◦ sandbox pods ◦ sole-tenant nodes ◦ multi-tenant networking/DNS “Kubernetes as a Service” model Control Plane (apiserver) Cluster Admin ns1 ns2 ns3 ns4 25
  26. Untrusted tenants running untrusted code. (Platform as a Service or

    hosting companies.) Tenants may create their namespaces, but cannot set policy objects. Stronger isolation requirements than enterprise/SaaS: • isolated world view (separate control plane) • tenants must not see each other • strong node and network isolation ◦ sandbox pods ◦ sole-tenant nodes ◦ multi-tenant networking/DNS “Kubernetes as a Service” model Control Plane (apiserver) Cluster Admin ns1 ns2 ns3 ns4 26
  27. 4 Kubernetes Multi-tenancy Policy APIs and Features 27

  28. Kubernetes Engine multi-tenancy primitives Quotas Network Policy Pod Security Policy

    Pod Priority Limit Range IAM Sandbox Pods RBAC Access Control Resource Sharing Runtime Isolation Pod Security Context Pod Affinity Admissio n Control 28
  29. Kubernetes Engine multi-tenancy primitives Quotas Network Policy Pod Priority Limit

    Range IAM Sandbox Pods RBAC Auth related Scheduling related Pod Security Context Pod Affinity Admissio n Control Pod Security Policy 29
  30. Auth related features 30

  31. Authentication, Authorization, Admission Control Plane (apiserver) Authorizer Pluggable Auth (GKE

    IAM) RBAC Admission Control allow etcd Cloud IAM Policies {Cluster,}Role {Cluster,}RoleBinding allow Pods 31
  32. Kubernetes RBAC Which users/groups/Service Accounts can do which operations on

    which API resources in which namespaces. 32
  33. Kubernetes RBAC Mostly useful for: • Giving access to pods

    calling Kubernetes API (with Kubernetes Service Accounts) • Giving fine-grained access to people/groups calling Kubernetes API (with Google accounts) Concepts: ClusterRole A preset of capabilities, cluster-wide Role ClusterRole, but namespace-scoped ClusterRoleBinding Give permissions of a ClusterRole to: • Google users/groups • Google Cloud IAM Service Accounts • Kubernetes Service Accounts RoleBinding ClusterRoleBinding, but namespace-scoped. 33
  34. kind: ClusterRoleBinding apiVersion: rbac.authorization.k8s.io/v1 metadata: name: "admins:namespace-creator" roleRef: kind: Role

    name: "namespace-creator" apiGroup: rbac.authorization.k8s.io subjects: - kind: User name: "ahmetalpbalkan@gmail.com" # Google user apiGroup: rbac.authorization.k8s.io Kubernetes RBAC Example ClusterRole+Binding for namespace-creator: kind: ClusterRole apiVersion: rbac.authorization.k8s.io/v1 metadata: name: "namespace-creator" rules: - apiGroups: [""] # core resources: ["namespaces"] verbs: ["get", "list", "watch", "create", "update", "patch", "delete"] 34
  35. Practical for giving Google users/groups project-wide access: Curated IAM “Roles”:

    Kubernetes Engine + Cloud IAM Admin *Can do everything* Viewer *Can view everything* Cluster Admin Can manage clusters (create/delete/upgrade clusters) Cannot view what's in the clusters (Kubernetes API) Developer Can do everything in a cluster (Kubernetes API) Cannot manage clusters (create/delete/upgrade clusters) You can curate new ones with Cloud IAM Custom Roles. 35
  36. Kubernetes Engine + IAM Give someone "Developer" role on all

    clusters in the project: gcloud projects add-iam-policy-binding PROJECT_ID \ --member=user:SOMEONE_ELSE@gmail.com \ --role=roles/container.developer Give a Google Group "Viewer" role on all clusters in the project: gcloud projects add-iam-policy-binding PROJECT_ID \ --member=group:SOME_TEAM@googlegroups.com \ --role=roles/container.viewer 36
  37. Admission Controls Intercept API request before resource is persisted. Admission

    control can mutate and allow/deny. Admission Control etcd Admission Plugins allow 37
  38. Compiled into Kubernetes apiserver binary. Enabled Admission Plugins be changed

    on Kubernetes Engine. But these 15 admission plugins are already enabled: Initializers, NamespaceLifecycle, LimitRanger, ServiceAccount, PersistentVolumeLabel, DefaultStorageClass, DefaultTolerationSeconds, NodeRestriction, PodPreset, ExtendedResourceToleration, PersistentVolumeClaimResize, Priority, StorageObjectInUseProtection, MutatingAdmissionWebhook, ValidatingAdmissionWebhook Admission Controls 38
  39. Extending Admission Controls You can develop webhooks to create your

    own Admission Controllers. Admission Control etcd ValidatingAdmissionWebHook MutatingAdmissionWebHook allow <your webhooks> Other Admission Plugins <your webhooks> 39
  40. PodSecurityPolicy Restricts access to host {filesystem, network, ports, PID namespace,

    IFS namespace}... Limits privileged containers, volume types, enforces read-only filesystem etc. Enforced through its own admission plugin. Admission Control Pod Spec PSP Admission Controller allow/deny PodSecurityPolicy Spec 40
  41. PodSecurityPolicy apiVersion: policy/v1beta1 kind: PodSecurityPolicy metadata: name: prevent-root-privileged spec: #

    Don't allow privileged pods! privileged: false # Don't allow root containers! runAsUser: rule: "MustRunAsNonRoot" $ kubectl create role psp:unprivileged \ --verb=use \ --resource=podsecuritypolicy \ --resource-name=unprivileged $ kubectl create rolebinding developers:unprivileged \ --role=psp:unprivileged \ --group=developers@googlegroups.com \ --user=ahmetb@example.com apiVersion: v1 kind: Pod metadata: name: foo spec: containers: - image: k8s.gcr.io/pause securityContext: privileged: true REJECT 41
  42. Which pods can talk to which other pods (based on

    their namespace/labels) or IP ranges. Available on Kubernetes Engine with Calico Network Plugin (--enable-network-policy). Network Policy 42
  43. Network Policy kind: NetworkPolicy apiVersion: networking.k8s.io/v1 metadata: name: db-allow-frontend spec:

    podSelector: matchLabels: app: mysql ingress: - from: - podSelector: matchLabels: app: frontend Example: Allow traffic to "mysql" Pods from "frontend" pods 43
  44. ...Pragmatic recipes at github.com/ahmetb/kubernetes-network-policy-recipes Network Policy 44

  45. ...Pragmatic recipes at github.com/ahmetb/kubernetes-network-policy-recipes Network Policy 45

  46. Scheduling related features 46

  47. Pod Priority/Preemption (beta – Kubernetes 1.11) Pod Priority: Puts high

    priority pods waiting in Pending state in front of the scheduling queue. Pod Preemption: Evicts lower priority pod(s) from a Node, if high priority pod cannot be scheduled due to not enough space/resources in the cluster. Use PriorityClasses to define: apiVersion: scheduling.k8s.io/v1beta1 kind: PriorityClass metadata: name: "high" value: 1000000 apiVersion: scheduling.k8s.io/v1beta1 kind: PriorityClass metadata: name: "normal" value: 1000 globalDefault: true apiVersion: scheduling.k8s.io/v1beta1 kind: PriorityClass metadata: name: "low" value: 10 47
  48. Resource Quotas Limits total memory/cpu/storage that pods can use, and

    how many objects of each type (pods, load balancers, ConfigMaps, etc.) on a per-namespace basis 48
  49. apiVersion: v1 kind: ResourceQuota metadata: name: compute-quota namespace: staging spec:

    hard: requests.cpu: "8" requests.memory: 2Gi limits.cpu: "10" limits.memory: 3Gi requests.storage: 120Gi Resource Quotas – Example apiVersion: v1 kind: ResourceQuota metadata: name: object-quota namespace: staging spec: hard: pods: "30" services: "2" services.loadbalancers: "0" persistentvolumeclaims: "5" 49
  50. apiVersion: v1 kind: ResourceQuota metadata: name: low-priority-compute spec: scopeSelector: matchExpressions:

    - operator : In scopeName: PriorityClass values: ["low"] hard: pods: "100" cpu: "10" memory: 12GiB apiVersion: scheduling.k8s.io/v1beta1 kind: PriorityClass metadata: name: low value: 10 Resource Quotas + PriorityClass Set different quotas for pods per PriorityClass (alpha in Kubernetes 1.11, disabled by default) apiVersion: v1 kind: Pod metadata: name: unimportant-pod spec: containers: [...] priorityClassName: low 50
  51. If pod spec doesn't specify limits/requests use these defaults. Limit

    Range Specify {default, min, max} resource constraints for each pod/container per namespace. apiVersion: v1 kind: LimitRange metadata: name: default-compute-limits spec: limits: - type: Pod # or "Container" default: memory: 128MiB cpu: 200m defaultRequest: memory: 64MiB cpu: 100m 51
  52. apiVersion: v1 kind: LimitRange metadata: name: compute-limits spec: limits: -

    type: "Container" min: memory: 32MiB cpu: 10m max: memory: 800MiB cpu: "2" A container cannot have less resources than these. Limit Range Specify {default, min, max} resource constraints for each pod/container. A container cannot have more resources than these. 52
  53. Pod Anti-Affinity apiVersion: v1 kind: Pod metadata: name: foo labels:

    team: "billing" spec: ... apiVersion: v1 kind: Pod metadata: name: bar labels: team: "billing" spec: affinity: podAntiAffinity: requiredDuringSchedulingIgnoredDuringExecution: - topologyKey: "kubernetes.io/hostname" labelSelector: matchExpressions: - key: "team" operator: NotIn values: ["billing"] Constrain scheduling of pods, based on the labels of other pods scheduled on the node. Example: 53 keep me off of nodes that have pods that don't have the "billing" label
  54. Use taints on nodes and tolerations on Pods to dedicate

    partition of cluster to particular pods/users. Useful for partitioning/dedicating special machines on the cluster to the team(s) that asked for it. Dedicated Nodes GPU node GPU node node node node node GPU node GPU node node node node node Reserved for ML Team node node 54
  55. You can apply "taints" to Kubernetes Engine node-pools at creation

    time: $ gcloud container node-pools create gpu-pool \ --cluster=example-cluster \ --node-taints=team=machine-learning:NoSchedule (This is better than “kubectl taint nodes” command as it keeps working when node pools resize or nodes are auto-repaired.) Dedicated Nodes 55
  56. apiVersion: v1 kind: Pod metadata: labels: team: "machine-learning" spec: tolerations:

    - key: "team" operator: "Equal" value: "machine-learning" effect: "NoSchedule" You can apply "taints" to Kubernetes Engine node-pools at creation time: $ gcloud container node-pools create gpu-pool \ --cluster=example-cluster \ --node-taints=team=machine-learning:NoSchedule (This is better than “kubectl taint nodes” command as it keeps working when node pools resize or nodes are auto-repaired.) Use a "toleration" on the pods from this team: Dedicated Nodes 56
  57. Sandboxed Pods Linux kernel bugs and security vulnerabilities may bypass

    container security boundaries. Approaches in this space: • Kata Containers • gVisor (Google’s approach!) Check out talk: IO310-Sandboxing your containers with gVisor 57
  58. gVisor - Google approach to Sandbox Pods Sandbox for Containers

    Implements Linux system calls in user space Zero config Written in Go Container Kernel System Calls Hardware gVisor Limited System Calls Independent Kernel Virtualization-based Strong Isolation 58
  59. gVisor on Kubernetes - Architecture runsc: OCI runtime powered by

    gVisor Sentry (emulated Linux Kernel) is the 1st isolation boundary seccomp + namespace is the 2nd isolation boundary Gofer handles Network and File I/O KVM Gofer Host Linux Kernel Container Sentry (emulated Linux Kernel) Sandbox User Kernel 9P seccomp + ns runsc OCI Kubernetes 59
  60. Sandbox Pods in Kubernetes Work In Progress RuntimeClass is a

    new API to specify runtimes Specify the RuntimeClass in your Pod spec apiVersion: v1alpha1 kind: RuntimeClass metadata: name: gvisor spec: runtimeHandler: gvisor ... apiVersion: v1 kind: Pod ... spec: ... runtimeClassName: gvisor
  61. 5 Applying multi-tenancy & Current limitations 61

  62. project cluster1 You wrote all these policies, but how do

    you deploy and manage them in practice? Keeping Kubernetes/IAM policies up to date across namespaces / clusters / projects is difficult! Scalable Policy Management ns2 ns1 cluster2 ns2 ns1 project cluster3 ns2 ns1 project cluster4 ns2 ns1 62
  63. Kubernetes Engine Policy Management NEW! (alpha) Centrally defined policies. •

    Single source of truth • ..as opposed to "git" vs "Kubernetes API" vs "Cloud IAM" Applies policies hierarchically. • Organization → Folder → Project → Cluster → Namespace • Policies are inherited. Lets you manage namespaces, RBAC, and more… Check out talk (happening now): IO200-Take Control of your Multi-cluster, Multi-Tenant Kubernetes Workloads Participate in alpha: goog.page.link/kpm-alpha 63
  64. Kubernetes API: • Currently API calls are not rate limited,

    open to DoS from tenants, impacting others. Networking: • Networking is not a scheduled resource in Kubernetes, yet (cannot use with limits/requests) • Tenants can still discover each other via Kubernetes DNS Many more... Kubernetes Multi-tenancy Limitations Today 64
  65. Determine your use case • How trusted are your tenant

    users and workloads? • What degree and kinds of isolation do you need? Namespace-centric multi-tenancy • Utilize Policy objects for scheduling and access control. • Think about personas and map them to RBAC cluster roles. • Automate policies across clusters with GKE Policy Management (alpha). Key Takeaways 65
  66. Kubernetes Multi-tenancy Working Group - https://github.com/kubernetes/community/tree/master/wg-multitenancy - kubernetes-wg-multitenancy@googlegroups.com - Organizers:

    - David Oppenheimer (@davidopp), Google - Jessie Frazelle (@jessfraz), Microsoft Kubernetes Policy Working Group - https://github.com/kubernetes/community/tree/master/wg-policy - kubernetes-wg-policy@googlegroups.com Participate! 66 Register your interest at: gke.page.link/multi-tenancy
  67. Thank you. Ahmet Alp Balkan (@ahmetb) Yoshi Tamura (@yoshiat) 67

    Register your interest at: gke.page.link/multi-tenancy
  68. Example: “testing team has 10,000 CPU hours per month” Most

    of the resources are billable on the cloud: • Compute: CPU/memory • Networking: transfer costs, load balancing, reserved IPs • Storage: persistent disks, SSDs • Other services (Cloud PubSub, Cloud SQL, …) provisioned through Service Catalog. Kubernetes doesn't offer a way to do internal chargeback for compute/cloud resources used. Internal Billing/Chargeback 68
  69. Function ldap Date LGTM Notes Speaker(s) ahmetb / yoshiat ahmetb

    → Done (7/19) yoshiat → Peer Reviewer davidopp 7/23 a couple of small remaining comments to resolve, but nothing to block LGTM PR jacinda Legal Design PMM hrdinsky / praveenz Practice Buddy (optional) Approvals & Reviews 69