A Crash Course on Container Orchestration

Google Cloud Platform logo A Crash Course on Container Orchestration
& Kubernetes Interop ITX May 15, 2017 Tim Hockin <thockin@google.com> Principal Software Engineer @thockin

Google Cloud Platform Containers are a great way to package
and run apps: • Self-contained • Low overhead • Fast starting • Easy to compose • Easy to replace

Google Cloud Platform Almost everything at Google runs in containers:
• Gmail, Web Search, Maps, ... • MapReduce, batch, ... • GFS, Colossus, ... • Even Google’s Cloud Platform: our VMs run in containers!

Google Cloud Platform Almost everything at Google runs in containers:
• Gmail, Web Search, Maps, ... • MapReduce, batch, ... • GFS, Colossus, ... • Even Google’s Cloud Platform: our VMs run in containers! We launch billions of containers every week

Google Cloud Platform For the rest of this presentation, I
am going to assume that you are bought in to containers. Images by Connie Zhou

Google Cloud Platform How do I deploy containers?

Google Cloud Platform Understanding how containers work is not the
same as using them in production (duh!) Different organizations will have very different needs (duh!) • Not everyone needs to operate “at scale” There are many ways to run and manage containers (duh!) Deployment options

Google Cloud Platform Never underestimate the value of manual solutions
SSH into machines and run docker Pro: simple, available everywhere, no special tools needed, easily understood Con: not automated, not reproducible (human make mistakes), doesn’t scale, doesn’t self-heal Humans

Google Cloud Platform A very common first step into containers
Puppet, Chef, Ansible, Salt, or just bespoke scripts Pro: integrates with existing environments, easily understood results, reproducible Con: manual scheduling, doesn’t self-heal, doesn’t scale, generally non-portable Scripts

Google Cloud Platform Automatically match containers to available machines Mesos,
Kubernetes, Docker Swarm, Nomad, or even home-grown systems Pro: automated, reproducible, self-healing, scalable, generally portable Con: some overhead, requires new tooling and training, more complex results Orchestration systems

Google Cloud Platform What’s in an orchestration system?

Google Cloud Platform Scheduling: match containers to machines • by
resource needs (CPU, memory) • by affinity requirements (put X near Y) • by labels (put X on a “test” machine) Replication: run N copies Handle machine failures Discovery: find peers and services in other containers Inspection: tell me what is happening Basic features

Google Cloud Platform Built-in load-balancers Automated updates Cluster auto-scaling: better
utilization App auto-scaling: handle spikes and troughs Provisioning storage Re-packing machines Late-binding configuration Advanced features

Google Cloud Platform These are things that were historically managed
by humans at human speed Making better use of human time

by humans at human speed Who carries an on-call duty pager, or has a dev/ops team that does? Making better use of human time

by humans at human speed Who carries an on-call duty pager, or has a dev/ops team that does? When does that pager usually go off? • In my experience, usually 3am Making better use of human time

by humans at human speed Who carries an on-call duty pager, or has a dev/ops team that does? When does that pager usually go off? • In my experience, usually 3am Orchestration can handle a lot of situations for you automatically - turn 3am pages into advisory emails Making better use of human time

Google Cloud Platform What options do I have?

Google Cloud Platform Mesos: • Most mature (predates Docker) •
Two-level system Docker Swarm Mode: • Built-in to Docker • Easy to set up Nomad: • HashiCorp • Youngest of the bunch Kubernetes: • Derives from Google’s Borg & Omega • Rapidly growing adoption Orchestrators, at a glance

Google Cloud Platform Started as a UC Berkeley research project
Now owned by The Apache Foundation Commercial support by Mesosphere (DC/OS) Two-level scheduler - core and “frameworks” (e.g. Spark, Cassandra, Marathon) Big tech shops (Twitter, Uber, Apple, NetFlix) Scales very well (10k+ machines) Complex to set up and administer Mesos, at a glance

Google Cloud Platform Built into Docker Focuses on ease of
use & easy setup Very similar to Kubernetes in many ways • First version “Docker Swarm” was totally different Less mature than Mesos or Kubernetes Primarily developed by Docker, Inc. Scales to thousands of machines Docker Swarm Mode, at a glance

Google Cloud Platform From HashiCorp Integrates with Consul and Vault
(both very well regarded) Designed to be simple Least mature of the bunch Scales to thousands of machines Not much adoption, yet Nomad, at a glance

Google Cloud Platform Derives ideas from Google’s Borg & Omega
Owned by Cloud Native Compute Foundation Designed to be composable More complex than some others Scales to thousands of machines Very rapid adoption Large community - thousands of developers Kubernetes, at a glance

Google Cloud Platform Diving deeper into Kubernetes

Google Cloud Platform Greek for “Helmsman”; also the root of
the words “governor” and “cybernetic” • Manages container clusters • Inspired and informed by Google’s experiences and internal systems • Supports multiple cloud and bare-metal environments • Supports multiple container runtimes • 100% Open source, written in Go Manage applications, not machines Kubernetes

Google Cloud Platform kubelet UI kubelet CLI API users master
nodes etcd kubelet scheduler controllers apiserver The 10000 foot view

Google Cloud Platform UI API Container Cluster All you really
care about

Google Cloud Platform Running a container

Google Cloud Platform apiserver

Google Cloud Platform apiserver etcd

Google Cloud Platform apiserver scheduler etcd controller manager

Google Cloud Platform kubelet apiserver scheduler controller manager etcd

Google Cloud Platform scheduler controller manager etcd apiserver kubelet docker

Google Cloud Platform kubelet scheduler controller manager docker etcd apiserver

Google Cloud Platform kubelet apiserver scheduler controller manager docker cloud
provider etcd

Google Cloud Platform Co-scheduling

Google Cloud Platform Highly-coupled containers File Puller Web Server ?

Google Cloud Platform Highly-coupled containers File Puller Web Server

Google Cloud Platform Highly-coupled containers File Puller Web Server REJECTED

Google Cloud Platform Small group of containers & volumes Tightly
coupled The atom of scheduling & placement Shared namespaces • share IP address & localhost • share IPC, etc. Managed lifecycle • bound to a node, restart in place • can die, cannot be reborn with same ID Pods Consumers Content Manager File Puller Web Server Volume Pod

Google Cloud Platform Examples: • data syncer (e.g. from git)
& server • log producer & log saver • monitoring adapter • policy-enforcing (e.g. auth) proxy • cache Pods Consumers Content Manager File Puller Web Server Volume Pod

Google Cloud Platform Finding things

Google Cloud Platform Physical view

Google Cloud Platform Logical view

Google Cloud Platform Labels and selectors Arbitrary metadata Attached to
any API object Generally represent identity Queryable by selectors • think SQL ‘select ... where ...’ The only grouping mechanism • pods in a ReplicaSet • pods in a Service • capabilities of a node (constraints)

Google Cloud Platform App: MyApp Phase: prod Role: FE App:
MyApp Phase: test Role: FE App: MyApp Phase: prod Role: BE App: MyApp Phase: test Role: BE Selectors

MyApp Phase: test Role: FE App: MyApp Phase: prod Role: BE App: MyApp Phase: test Role: BE App = MyApp Selectors

MyApp Phase: test Role: FE App: MyApp Phase: prod Role: BE App: MyApp Phase: test Role: BE App = MyApp, Role = FE Selectors

MyApp Phase: test Role: FE App: MyApp Phase: prod Role: BE App: MyApp Phase: test Role: BE App = MyApp, Role = BE Selectors

MyApp Phase: test Role: FE App: MyApp Phase: prod Role: BE App: MyApp Phase: test Role: BE App = MyApp, Phase = prod Selectors

MyApp Phase: test Role: FE App: MyApp Phase: prod Role: BE App: MyApp Phase: test Role: BE App = MyApp, Phase = test Selectors

Google Cloud Platform Logical view

Google Cloud Platform Discovery

Google Cloud Platform A group of pods that work together
• grouped by a selector Defines access policy • “load balanced” or “headless” Can have a stable virtual IP and port • also a DNS name! VIP is managed by kube-proxy • watches all services • updates iptables when backends change • default implementation - can be replaced! Hides complexity Client Virtual IP Services

Google Cloud Platform Service VIPs are only available inside the
cluster Need to receive traffic from “the outside world” Service “type” • NodePort: expose on a port on every node • LoadBalancer: provision a cloud load-balancer DiY load-balancer solutions • socat (for nodePort remapping) • haproxy • nginx External services

Google Cloud Platform Replication

Google Cloud Platform Declaration of intent: run N copies of
a pod Simple control loop One job: ensure N copies • too few? start some • too many? kill some Layered on top of Pods ReplicaSet - name = “my-rc” - selector = {“App”: “MyApp”} - template = { ... } - replicas = 4 API Server How many? 3 Start 1 more OK How many? 4 ReplicaSets

Google Cloud Platform Manages replica changes for you • stable
object name • simply edit the object • configurable server-side rolling-updates Can have multiple updates in flight Layered on top of ReplicaSets ... Deployments

Google Cloud Platform Updates

Google Cloud Platform ReplicaSet - name: my-app-v1 - replicas: 3
- selector: - app: MyApp - version: v1 Service - app: MyApp Rolling Update

- selector: - app: MyApp - version: v1 ReplicaSet - name: my-app-v2 - replicas: 0 - selector: - app: MyApp - version: v2 Service - app: MyApp Rolling Update

- selector: - app: MyApp - version: v2 Service - app: MyApp Rolling Update

Google Cloud Platform Configuration and secrets

Google Cloud Platform node API Pod Config Map ConfigMaps &
Secrets Goal: manage app configuration & secrets • ...without making overly-brittle container images 12-factor says config comes from the environment • Kubernetes is the environment Manage configs via the Kubernetes API Inject them as virtual volumes into your Pods • late-binding, live-updated (atomic) • also available as env vars

Google Cloud Platform Auto-scaling

Google Cloud Platform Automatically scale number of pods as needed
• based on CPU utilization (for now) • custom metrics coming Efficiency now, capacity when you need it Operates within user-defined min/max bounds Set it and forget it ... Stats HorizontalPodAutoScalers

Google Cloud Platform Automatically scale number of nodes as needed
• based on scheduler backlog & idleness Efficiency now, capacity when you need it Operates within user-defined min/max bounds Set it and forget it ... Sched ClusterAutoScaler

Google Cloud Platform Storage

Google Cloud Platform Manage storage with its own lifecycle Driver
plugins - more than 20 supported • Google Persistent Disk • Amazon EBS • Azure Volumes • Gluster • Ceph Dynamic provisioning - allocate on-demand Local disks volumes in development Containers are not just for stateless apps! PersistentVolumes • iSCSI • Cinder • ScaleIO • Portworx • ...

Google Cloud Platform About the Kubernetes Project

Google Cloud Platform 1500+ Contributors 400+ Person-Years of Effort Top
0.001% of all Github Projects 4000+ External Projects Based on K8s Contributors Users Community

86 86 Kubernetes is Open https://kubernetes.io Code: github.com/kubernetes/kubernetes Chat: slack.k8s.io
Twitter: @kubernetesio open community open design open source open to ideas

A Crash Course on Container Orchestration

A Crash Course on Container Orchestration

More Decks by Tim Hockin

Other Decks in Technology

Featured

Transcript