Manage compute resources using Kubernetes

Google Cloud Platform Compute Resource Management in Kubernetes Bay Area
Kubernetes Meetup November 2016 David Oppenheimer <[email protected]> @davidopp Senior Staff Software Engineer, Google Vishnu Kannan <[email protected]> @vishh Senior Software Engineer, Google

Google Cloud Platform Kubernetes Greek for “Helmsman”; also the root
of the words “governor” and “cybernetic” • Manages container clusters • Inspired and informed by Google’s experiences and internal systems • Supports multiple cloud and bare-metal environments • Supports multiple container runtimes • 100% Open source, written in Go Manage applications, not machines

Google Cloud Platform Why Kubernetes • Stable & Portable APIs
across cloud providers and on-prem data centers • Ability to manage 2000+ machines with lots of compute resources • Separation of concerns ◦ Admins vs developers • Maximize utilization without compromising performance • Self healing • Building block for Distributed Systems

Google Cloud Platform Agenda • Supported compute resources • Node-level
resource isolation: QoS • Resource monitoring • Cluster-level resource isolation: Quota • Cluster-level scheduling • Future plans

Google Cloud Platform Compute Resources CPU & Memory • Accounted
• Scheduled • Isolated Local Storage (Disk or SSD) • Accounted (restriction to two partitions / & docker image fs) Nvidia GPU • Alpha support (1 GPU per-node)

Google Cloud Platform Pros: • Sharing - apps can’t affect
each other’s performance • users don’t worry about interference • Predictability - repeated runs of same app gives ~equal perf • allows strong performance SLAs Cons: • Usability - how do I know how much I need? • system can help with this • Utilization - strong isolation: unused resources get lost • costly - system has to keep them available for requester • mitigate with overcommitment • challenge: how to do it safely Strong isolation

Google Cloud Platform Requests and Limits Request: • how much
of a resource container is asking to use with a strong guarantee of availability • CPU (fractional cores) • RAM (bytes) • scheduler will not over-commit requests Limit: • max amount of a resource container can access • scheduler ignores limits (overcommitment) Repercussions: • Usage > Request: resources might be available • Usage > Limit: killed or throttled Limit Request

Google Cloud Platform Quality of Service Classes Guaranteed: highest protection
• request > 0 && limit == request for all containers in a Pod • Sensitive or Critical Apps Best Effort: lowest protection • request == 0 (limit == node size) for all containers in a Pod • Data processing Apps Burstable: medium protection • request > 0 && limit > request for any container in a Pod Limit Request

Google Cloud Platform Quality of Service Classes How is “protection”
implemented? • CPU: some Best Effort/Burstable container using more than its request is throttled • CPU shares + CPU quota • Memory: some Best Effort/Burstable container using more than its request is killed • OOM score + user-space evictions • Storage: Isolated on a best-effort basis. • User space evictions to improve node stability • Evict images, dead containers and pods (based on QoS) to free up disk space • Evictions based on disk space & inodes Limit Request

Google Cloud Platform Node Stability • For cluster admins •
Reserve resources for System daemons ◦ Kubelet, Docker, systemd, sshd, etc. • Node Overhead a function of pod & container density (Dashboard) • Statically configurable via Kubelet • Scheduler uses Allocatable as “usable capacity” • Eviction thresholds ◦ Hard, Soft, minimum reclamation Capacity Allocatable System Daemons

Google Cloud Platform Demo Resources • Limits enforced: CPU throttling,
OOM killing

Google Cloud Platform Monitoring Run cAdvisor on each node (in
kubelet) • gather node & container stats • export via REST Run Heapster as a pod in the cluster • aggregates stats • just another container, no special access Writes metrics to InfluxDB, can be viewed by Grafana Dashboard • Observe cpu & memory usage Or plug in your own monitoring system! • Sysdig, Prometheus, Datadog, Stackdriver, etc.

Google Cloud Platform From monitoring to auto-everything • Initial resources
- sets Request based on observed historical CPU/memory usage of the container History

- sets Request based on observed historical CPU/memory usage of the container • Horizontal Pod Autoscaling - Change # of pod replicas based on CPU usage & app metrics (alpha) History

- sets Request based on observed historical CPU/memory usage of the container • Horizontal Pod Autoscaling - Change # of pod replicas based on CPU usage & app metrics (alpha) • Cluster autoscaling • Add nodes when needed • e.g. CPU usage too high • Remove nodes when not needed • e.g. CPU usage too low History

Google Cloud Platform Demo Quota • Cluster autoscaling in GKE

Google Cloud Platform ResourceQuota Per-namespace • maximum request and limit
across all pods • applies to each type of resource (CPU, mem) • user must specify request or limit • maximum number of a particular kind of object Ensure no user/app/department abuses the cluster Applied at admission time Just another API object

Google Cloud Platform Demo Quota • Quota is enforced

Google Cloud Platform Scheduling Scheduler is just another container running
on the cluster • uses the Kubernetes API to learn cluster state and request bindings • you can run multiple schedulers in parallel, responsible for different pods 1) Predicate functions determine which nodes a pod is eligible to run on 2) Priority functions determines which of those nodes is “best”

Google Cloud Platform Predicate functions • sufficient free resources (CPU,
mem) • node affinity/anti-affinity • label(s) on node, label query on pod • e.g. “put the pod on a node in zone abc” • e.g. “put the pod on a node with an Intel CPU” • inter-pod affinity/anti-affinity • label(s) on pod, label query on pod • which other pods this pod must/cannot co-exist with • e.g. “co-locate the pods from service A and service B in the same node/zone/... since they communicate a lot with each other” • e.g. “never run pods from service A and service B on the same node” • others

Google Cloud Platform Priority functions “Best” node chosen by computing
score combining these factors: • spreading or best-fit (resources) • node affinity/anti-affinity (“soft” version) • e.g. “put the pod in zone abc if possible” • inter-pod affinity/anti-affinity (“soft” version) • e.g. “co-locate the pods of service A and service B in the same zone as much as possible” • e.g. “spread the pods from service A across zones” • node has image(s) the container needs already cached

Google Cloud Platform Future Disk as a first-class resource Improved
GPU support (e.g. multiple GPUs/node) Runtime-extensible resource set Usage-based scheduling Dedicated nodes - mix private and shared nodes in a single cluster Priority/preemption - when quota overcommitted and cluster full, who runs Improved QoS enforcement • Pod & QoS cgroups (alpha in v1.5) • Linux disk quota for tracking and isolation • Exclusive CPUs, NUMA, etc.

Google Cloud Platform Community Top 0.01% of all Github projects
1200+ external projects based on k8s Companies Contributing Companies Using 800+ unique contributors

Google Cloud Platform 26 Kubernetes is Open https://kubernetes.io Code: github.com/kubernetes/kubernetes
Chat: slack.k8s.io Twitter: @kubernetesio open community open design open source open to ideas

Google Cloud Platform Thank you!

Manage compute resources using Kubernetes

Manage compute resources using Kubernetes

Vish Kannan

More Decks by Vish Kannan

Other Decks in Education

Featured

Transcript

Google Cloud Platform Compute Resource Management in Kubernetes Bay Area

Google Cloud Platform Kubernetes Greek for “Helmsman”; also the root

Google Cloud Platform Why Kubernetes • Stable & Portable APIs

Google Cloud Platform Agenda • Supported compute resources • Node-level

Google Cloud Platform Compute Resources CPU & Memory • Accounted

Google Cloud Platform Pros: • Sharing - apps can’t affect

Google Cloud Platform Requests and Limits Request: • how much

Google Cloud Platform Quality of Service Classes Guaranteed: highest protection

Google Cloud Platform Quality of Service Classes How is “protection”

Google Cloud Platform Node Stability • For cluster admins •

Google Cloud Platform Demo Resources • Limits enforced: CPU throttling,

Google Cloud Platform Monitoring Run cAdvisor on each node (in

Google Cloud Platform From monitoring to auto-everything • Initial resources

Google Cloud Platform From monitoring to auto-everything • Initial resources

Google Cloud Platform From monitoring to auto-everything • Initial resources

Google Cloud Platform Demo Quota • Cluster autoscaling in GKE

Google Cloud Platform Agenda • Supported compute resources • Node-level

Google Cloud Platform ResourceQuota Per-namespace • maximum request and limit

Google Cloud Platform Demo Quota • Quota is enforced

Google Cloud Platform Agenda • Supported compute resources • Node-level

Google Cloud Platform Scheduling Scheduler is just another container running

Google Cloud Platform Predicate functions • sufficient free resources (CPU,

Google Cloud Platform Priority functions “Best” node chosen by computing

Google Cloud Platform Future Disk as a first-class resource Improved

Google Cloud Platform Community Top 0.01% of all Github projects

Google Cloud Platform 26 Kubernetes is Open https://kubernetes.io Code: github.com/kubernetes/kubernetes

Google Cloud Platform Thank you!