Upgrade to Pro — share decks privately, control downloads, hide ads and more …

A Crash Course on Container Orchestration

A Crash Course on Container Orchestration

Tim Hockin

May 15, 2017
Tweet

More Decks by Tim Hockin

Other Decks in Technology

Transcript

  1. Google Cloud Platform logo A Crash Course on Container Orchestration

    & Kubernetes Interop ITX May 15, 2017 Tim Hockin <[email protected]> Principal Software Engineer @thockin
  2. Google Cloud Platform Containers are a great way to package

    and run apps: • Self-contained • Low overhead • Fast starting • Easy to compose • Easy to replace
  3. Google Cloud Platform Almost everything at Google runs in containers:

    • Gmail, Web Search, Maps, ... • MapReduce, batch, ... • GFS, Colossus, ... • Even Google’s Cloud Platform: our VMs run in containers!
  4. Google Cloud Platform Almost everything at Google runs in containers:

    • Gmail, Web Search, Maps, ... • MapReduce, batch, ... • GFS, Colossus, ... • Even Google’s Cloud Platform: our VMs run in containers! We launch billions of containers every week
  5. Google Cloud Platform For the rest of this presentation, I

    am going to assume that you are bought in to containers. Images by Connie Zhou
  6. Google Cloud Platform Understanding how containers work is not the

    same as using them in production (duh!) Different organizations will have very different needs (duh!) • Not everyone needs to operate “at scale” There are many ways to run and manage containers (duh!) Deployment options
  7. Google Cloud Platform Never underestimate the value of manual solutions

    SSH into machines and run docker Pro: simple, available everywhere, no special tools needed, easily understood Con: not automated, not reproducible (human make mistakes), doesn’t scale, doesn’t self-heal Humans
  8. Google Cloud Platform A very common first step into containers

    Puppet, Chef, Ansible, Salt, or just bespoke scripts Pro: integrates with existing environments, easily understood results, reproducible Con: manual scheduling, doesn’t self-heal, doesn’t scale, generally non-portable Scripts
  9. Google Cloud Platform Automatically match containers to available machines Mesos,

    Kubernetes, Docker Swarm, Nomad, or even home-grown systems Pro: automated, reproducible, self-healing, scalable, generally portable Con: some overhead, requires new tooling and training, more complex results Orchestration systems
  10. Google Cloud Platform Scheduling: match containers to machines • by

    resource needs (CPU, memory) • by affinity requirements (put X near Y) • by labels (put X on a “test” machine) Replication: run N copies Handle machine failures Discovery: find peers and services in other containers Inspection: tell me what is happening Basic features
  11. Google Cloud Platform Built-in load-balancers Automated updates Cluster auto-scaling: better

    utilization App auto-scaling: handle spikes and troughs Provisioning storage Re-packing machines Late-binding configuration Advanced features
  12. Google Cloud Platform These are things that were historically managed

    by humans at human speed Making better use of human time
  13. Google Cloud Platform These are things that were historically managed

    by humans at human speed Who carries an on-call duty pager, or has a dev/ops team that does? Making better use of human time
  14. Google Cloud Platform These are things that were historically managed

    by humans at human speed Who carries an on-call duty pager, or has a dev/ops team that does? When does that pager usually go off? • In my experience, usually 3am Making better use of human time
  15. Google Cloud Platform These are things that were historically managed

    by humans at human speed Who carries an on-call duty pager, or has a dev/ops team that does? When does that pager usually go off? • In my experience, usually 3am Orchestration can handle a lot of situations for you automatically - turn 3am pages into advisory emails Making better use of human time
  16. Google Cloud Platform Mesos: • Most mature (predates Docker) •

    Two-level system Docker Swarm Mode: • Built-in to Docker • Easy to set up Nomad: • HashiCorp • Youngest of the bunch Kubernetes: • Derives from Google’s Borg & Omega • Rapidly growing adoption Orchestrators, at a glance
  17. Google Cloud Platform Started as a UC Berkeley research project

    Now owned by The Apache Foundation Commercial support by Mesosphere (DC/OS) Two-level scheduler - core and “frameworks” (e.g. Spark, Cassandra, Marathon) Big tech shops (Twitter, Uber, Apple, NetFlix) Scales very well (10k+ machines) Complex to set up and administer Mesos, at a glance
  18. Google Cloud Platform Built into Docker Focuses on ease of

    use & easy setup Very similar to Kubernetes in many ways • First version “Docker Swarm” was totally different Less mature than Mesos or Kubernetes Primarily developed by Docker, Inc. Scales to thousands of machines Docker Swarm Mode, at a glance
  19. Google Cloud Platform From HashiCorp Integrates with Consul and Vault

    (both very well regarded) Designed to be simple Least mature of the bunch Scales to thousands of machines Not much adoption, yet Nomad, at a glance
  20. Google Cloud Platform Derives ideas from Google’s Borg & Omega

    Owned by Cloud Native Compute Foundation Designed to be composable More complex than some others Scales to thousands of machines Very rapid adoption Large community - thousands of developers Kubernetes, at a glance
  21. Google Cloud Platform Greek for “Helmsman”; also the root of

    the words “governor” and “cybernetic” • Manages container clusters • Inspired and informed by Google’s experiences and internal systems • Supports multiple cloud and bare-metal environments • Supports multiple container runtimes • 100% Open source, written in Go Manage applications, not machines Kubernetes
  22. Google Cloud Platform kubelet UI kubelet CLI API users master

    nodes etcd kubelet scheduler controllers apiserver The 10000 foot view
  23. Google Cloud Platform Small group of containers & volumes Tightly

    coupled The atom of scheduling & placement Shared namespaces • share IP address & localhost • share IPC, etc. Managed lifecycle • bound to a node, restart in place • can die, cannot be reborn with same ID Pods Consumers Content Manager File Puller Web Server Volume Pod
  24. Google Cloud Platform Examples: • data syncer (e.g. from git)

    & server • log producer & log saver • monitoring adapter • policy-enforcing (e.g. auth) proxy • cache Pods Consumers Content Manager File Puller Web Server Volume Pod
  25. Google Cloud Platform Labels and selectors Arbitrary metadata Attached to

    any API object Generally represent identity Queryable by selectors • think SQL ‘select ... where ...’ The only grouping mechanism • pods in a ReplicaSet • pods in a Service • capabilities of a node (constraints)
  26. Google Cloud Platform App: MyApp Phase: prod Role: FE App:

    MyApp Phase: test Role: FE App: MyApp Phase: prod Role: BE App: MyApp Phase: test Role: BE Selectors
  27. Google Cloud Platform App: MyApp Phase: prod Role: FE App:

    MyApp Phase: test Role: FE App: MyApp Phase: prod Role: BE App: MyApp Phase: test Role: BE App = MyApp Selectors
  28. Google Cloud Platform App: MyApp Phase: prod Role: FE App:

    MyApp Phase: test Role: FE App: MyApp Phase: prod Role: BE App: MyApp Phase: test Role: BE App = MyApp, Role = FE Selectors
  29. Google Cloud Platform App: MyApp Phase: prod Role: FE App:

    MyApp Phase: test Role: FE App: MyApp Phase: prod Role: BE App: MyApp Phase: test Role: BE App = MyApp, Role = BE Selectors
  30. Google Cloud Platform App: MyApp Phase: prod Role: FE App:

    MyApp Phase: test Role: FE App: MyApp Phase: prod Role: BE App: MyApp Phase: test Role: BE App = MyApp, Phase = prod Selectors
  31. Google Cloud Platform App: MyApp Phase: prod Role: FE App:

    MyApp Phase: test Role: FE App: MyApp Phase: prod Role: BE App: MyApp Phase: test Role: BE App = MyApp, Phase = test Selectors
  32. Google Cloud Platform A group of pods that work together

    • grouped by a selector Defines access policy • “load balanced” or “headless” Can have a stable virtual IP and port • also a DNS name! VIP is managed by kube-proxy • watches all services • updates iptables when backends change • default implementation - can be replaced! Hides complexity Client Virtual IP Services
  33. Google Cloud Platform Service VIPs are only available inside the

    cluster Need to receive traffic from “the outside world” Service “type” • NodePort: expose on a port on every node • LoadBalancer: provision a cloud load-balancer DiY load-balancer solutions • socat (for nodePort remapping) • haproxy • nginx External services
  34. Google Cloud Platform Declaration of intent: run N copies of

    a pod Simple control loop One job: ensure N copies • too few? start some • too many? kill some Layered on top of Pods ReplicaSet - name = “my-rc” - selector = {“App”: “MyApp”} - template = { ... } - replicas = 4 API Server How many? 3 Start 1 more OK How many? 4 ReplicaSets
  35. Google Cloud Platform Manages replica changes for you • stable

    object name • simply edit the object • configurable server-side rolling-updates Can have multiple updates in flight Layered on top of ReplicaSets ... Deployments
  36. Google Cloud Platform ReplicaSet - name: my-app-v1 - replicas: 3

    - selector: - app: MyApp - version: v1 Service - app: MyApp Rolling Update
  37. Google Cloud Platform ReplicaSet - name: my-app-v1 - replicas: 3

    - selector: - app: MyApp - version: v1 ReplicaSet - name: my-app-v2 - replicas: 0 - selector: - app: MyApp - version: v2 Service - app: MyApp Rolling Update
  38. Google Cloud Platform ReplicaSet - name: my-app-v1 - replicas: 3

    - selector: - app: MyApp - version: v1 ReplicaSet - name: my-app-v2 - replicas: 1 - selector: - app: MyApp - version: v2 Service - app: MyApp Rolling Update
  39. Google Cloud Platform ReplicaSet - name: my-app-v1 - replicas: 2

    - selector: - app: MyApp - version: v1 ReplicaSet - name: my-app-v2 - replicas: 1 - selector: - app: MyApp - version: v2 Service - app: MyApp Rolling Update
  40. Google Cloud Platform ReplicaSet - name: my-app-v1 - replicas: 2

    - selector: - app: MyApp - version: v1 ReplicaSet - name: my-app-v2 - replicas: 2 - selector: - app: MyApp - version: v2 Service - app: MyApp Rolling Update
  41. Google Cloud Platform ReplicaSet - name: my-app-v1 - replicas: 1

    - selector: - app: MyApp - version: v1 ReplicaSet - name: my-app-v2 - replicas: 2 - selector: - app: MyApp - version: v2 Service - app: MyApp Rolling Update
  42. Google Cloud Platform ReplicaSet - name: my-app-v1 - replicas: 1

    - selector: - app: MyApp - version: v1 ReplicaSet - name: my-app-v2 - replicas: 3 - selector: - app: MyApp - version: v2 Service - app: MyApp Rolling Update
  43. Google Cloud Platform ReplicaSet - name: my-app-v1 - replicas: 0

    - selector: - app: MyApp - version: v1 ReplicaSet - name: my-app-v2 - replicas: 3 - selector: - app: MyApp - version: v2 Service - app: MyApp Rolling Update
  44. Google Cloud Platform ReplicaSet - name: my-app-v2 - replicas: 3

    - selector: - app: MyApp - version: v2 Service - app: MyApp Rolling Update
  45. Google Cloud Platform node API Pod Config Map ConfigMaps &

    Secrets Goal: manage app configuration & secrets • ...without making overly-brittle container images 12-factor says config comes from the environment • Kubernetes is the environment Manage configs via the Kubernetes API Inject them as virtual volumes into your Pods • late-binding, live-updated (atomic) • also available as env vars
  46. Google Cloud Platform Automatically scale number of pods as needed

    • based on CPU utilization (for now) • custom metrics coming Efficiency now, capacity when you need it Operates within user-defined min/max bounds Set it and forget it ... Stats HorizontalPodAutoScalers
  47. Google Cloud Platform Automatically scale number of nodes as needed

    • based on scheduler backlog & idleness Efficiency now, capacity when you need it Operates within user-defined min/max bounds Set it and forget it ... Sched ClusterAutoScaler
  48. Google Cloud Platform Manage storage with its own lifecycle Driver

    plugins - more than 20 supported • Google Persistent Disk • Amazon EBS • Azure Volumes • Gluster • Ceph Dynamic provisioning - allocate on-demand Local disks volumes in development Containers are not just for stateless apps! PersistentVolumes • iSCSI • Cinder • ScaleIO • Portworx • ...
  49. Google Cloud Platform 1500+ Contributors 400+ Person-Years of Effort Top

    0.001% of all Github Projects 4000+ External Projects Based on K8s Contributors Users Community
  50. 86 86 Kubernetes is Open https://kubernetes.io Code: github.com/kubernetes/kubernetes Chat: slack.k8s.io

    Twitter: @kubernetesio open community open design open source open to ideas