Upgrade to Pro — share decks privately, control downloads, hide ads and more …

A Crash Course on Container Orchestration

A Crash Course on Container Orchestration

569f10721398d92f5033097ac6d9132c?s=128

Tim Hockin

May 15, 2017
Tweet

Transcript

  1. Google Cloud Platform logo A Crash Course on Container Orchestration

    & Kubernetes Interop ITX May 15, 2017 Tim Hockin <thockin@google.com> Principal Software Engineer @thockin
  2. Google Cloud Platform Containers are a great way to package

    and run apps: • Self-contained • Low overhead • Fast starting • Easy to compose • Easy to replace
  3. Google Cloud Platform Almost everything at Google runs in containers:

    • Gmail, Web Search, Maps, ... • MapReduce, batch, ... • GFS, Colossus, ... • Even Google’s Cloud Platform: our VMs run in containers!
  4. Google Cloud Platform Almost everything at Google runs in containers:

    • Gmail, Web Search, Maps, ... • MapReduce, batch, ... • GFS, Colossus, ... • Even Google’s Cloud Platform: our VMs run in containers! We launch billions of containers every week
  5. Google Cloud Platform For the rest of this presentation, I

    am going to assume that you are bought in to containers. Images by Connie Zhou
  6. Google Cloud Platform How do I deploy containers?

  7. Google Cloud Platform Understanding how containers work is not the

    same as using them in production (duh!) Different organizations will have very different needs (duh!) • Not everyone needs to operate “at scale” There are many ways to run and manage containers (duh!) Deployment options
  8. Google Cloud Platform Never underestimate the value of manual solutions

    SSH into machines and run docker Pro: simple, available everywhere, no special tools needed, easily understood Con: not automated, not reproducible (human make mistakes), doesn’t scale, doesn’t self-heal Humans
  9. Google Cloud Platform A very common first step into containers

    Puppet, Chef, Ansible, Salt, or just bespoke scripts Pro: integrates with existing environments, easily understood results, reproducible Con: manual scheduling, doesn’t self-heal, doesn’t scale, generally non-portable Scripts
  10. Google Cloud Platform Automatically match containers to available machines Mesos,

    Kubernetes, Docker Swarm, Nomad, or even home-grown systems Pro: automated, reproducible, self-healing, scalable, generally portable Con: some overhead, requires new tooling and training, more complex results Orchestration systems
  11. Google Cloud Platform What’s in an orchestration system?

  12. Google Cloud Platform Scheduling: match containers to machines • by

    resource needs (CPU, memory) • by affinity requirements (put X near Y) • by labels (put X on a “test” machine) Replication: run N copies Handle machine failures Discovery: find peers and services in other containers Inspection: tell me what is happening Basic features
  13. Google Cloud Platform Built-in load-balancers Automated updates Cluster auto-scaling: better

    utilization App auto-scaling: handle spikes and troughs Provisioning storage Re-packing machines Late-binding configuration Advanced features
  14. Google Cloud Platform These are things that were historically managed

    by humans at human speed Making better use of human time
  15. Google Cloud Platform These are things that were historically managed

    by humans at human speed Who carries an on-call duty pager, or has a dev/ops team that does? Making better use of human time
  16. Google Cloud Platform These are things that were historically managed

    by humans at human speed Who carries an on-call duty pager, or has a dev/ops team that does? When does that pager usually go off? • In my experience, usually 3am Making better use of human time
  17. Google Cloud Platform These are things that were historically managed

    by humans at human speed Who carries an on-call duty pager, or has a dev/ops team that does? When does that pager usually go off? • In my experience, usually 3am Orchestration can handle a lot of situations for you automatically - turn 3am pages into advisory emails Making better use of human time
  18. Google Cloud Platform What options do I have?

  19. Google Cloud Platform Mesos: • Most mature (predates Docker) •

    Two-level system Docker Swarm Mode: • Built-in to Docker • Easy to set up Nomad: • HashiCorp • Youngest of the bunch Kubernetes: • Derives from Google’s Borg & Omega • Rapidly growing adoption Orchestrators, at a glance
  20. Google Cloud Platform Started as a UC Berkeley research project

    Now owned by The Apache Foundation Commercial support by Mesosphere (DC/OS) Two-level scheduler - core and “frameworks” (e.g. Spark, Cassandra, Marathon) Big tech shops (Twitter, Uber, Apple, NetFlix) Scales very well (10k+ machines) Complex to set up and administer Mesos, at a glance
  21. Google Cloud Platform Built into Docker Focuses on ease of

    use & easy setup Very similar to Kubernetes in many ways • First version “Docker Swarm” was totally different Less mature than Mesos or Kubernetes Primarily developed by Docker, Inc. Scales to thousands of machines Docker Swarm Mode, at a glance
  22. Google Cloud Platform From HashiCorp Integrates with Consul and Vault

    (both very well regarded) Designed to be simple Least mature of the bunch Scales to thousands of machines Not much adoption, yet Nomad, at a glance
  23. Google Cloud Platform Derives ideas from Google’s Borg & Omega

    Owned by Cloud Native Compute Foundation Designed to be composable More complex than some others Scales to thousands of machines Very rapid adoption Large community - thousands of developers Kubernetes, at a glance
  24. Google Cloud Platform Diving deeper into Kubernetes

  25. Google Cloud Platform Greek for “Helmsman”; also the root of

    the words “governor” and “cybernetic” • Manages container clusters • Inspired and informed by Google’s experiences and internal systems • Supports multiple cloud and bare-metal environments • Supports multiple container runtimes • 100% Open source, written in Go Manage applications, not machines Kubernetes
  26. Google Cloud Platform kubelet UI kubelet CLI API users master

    nodes etcd kubelet scheduler controllers apiserver The 10000 foot view
  27. Google Cloud Platform UI API Container Cluster All you really

    care about
  28. Google Cloud Platform Running a container

  29. Google Cloud Platform apiserver

  30. Google Cloud Platform apiserver etcd

  31. Google Cloud Platform apiserver etcd

  32. Google Cloud Platform apiserver scheduler etcd controller manager

  33. Google Cloud Platform apiserver scheduler etcd controller manager

  34. Google Cloud Platform kubelet apiserver scheduler controller manager etcd

  35. Google Cloud Platform scheduler controller manager etcd apiserver kubelet docker

  36. Google Cloud Platform kubelet scheduler controller manager docker etcd apiserver

  37. Google Cloud Platform kubelet apiserver scheduler controller manager docker cloud

    provider etcd
  38. Google Cloud Platform kubelet apiserver scheduler controller manager docker cloud

    provider etcd
  39. Google Cloud Platform Co-scheduling

  40. Google Cloud Platform Highly-coupled containers File Puller Web Server ?

  41. Google Cloud Platform Highly-coupled containers File Puller Web Server

  42. Google Cloud Platform Highly-coupled containers File Puller Web Server REJECTED

  43. Google Cloud Platform Small group of containers & volumes Tightly

    coupled The atom of scheduling & placement Shared namespaces • share IP address & localhost • share IPC, etc. Managed lifecycle • bound to a node, restart in place • can die, cannot be reborn with same ID Pods Consumers Content Manager File Puller Web Server Volume Pod
  44. Google Cloud Platform Examples: • data syncer (e.g. from git)

    & server • log producer & log saver • monitoring adapter • policy-enforcing (e.g. auth) proxy • cache Pods Consumers Content Manager File Puller Web Server Volume Pod
  45. Google Cloud Platform Finding things

  46. Google Cloud Platform Physical view

  47. Google Cloud Platform Physical view

  48. Google Cloud Platform Physical view

  49. Google Cloud Platform Logical view

  50. Google Cloud Platform Labels and selectors Arbitrary metadata Attached to

    any API object Generally represent identity Queryable by selectors • think SQL ‘select ... where ...’ The only grouping mechanism • pods in a ReplicaSet • pods in a Service • capabilities of a node (constraints)
  51. Google Cloud Platform App: MyApp Phase: prod Role: FE App:

    MyApp Phase: test Role: FE App: MyApp Phase: prod Role: BE App: MyApp Phase: test Role: BE Selectors
  52. Google Cloud Platform App: MyApp Phase: prod Role: FE App:

    MyApp Phase: test Role: FE App: MyApp Phase: prod Role: BE App: MyApp Phase: test Role: BE App = MyApp Selectors
  53. Google Cloud Platform App: MyApp Phase: prod Role: FE App:

    MyApp Phase: test Role: FE App: MyApp Phase: prod Role: BE App: MyApp Phase: test Role: BE App = MyApp, Role = FE Selectors
  54. Google Cloud Platform App: MyApp Phase: prod Role: FE App:

    MyApp Phase: test Role: FE App: MyApp Phase: prod Role: BE App: MyApp Phase: test Role: BE App = MyApp, Role = BE Selectors
  55. Google Cloud Platform App: MyApp Phase: prod Role: FE App:

    MyApp Phase: test Role: FE App: MyApp Phase: prod Role: BE App: MyApp Phase: test Role: BE App = MyApp, Phase = prod Selectors
  56. Google Cloud Platform App: MyApp Phase: prod Role: FE App:

    MyApp Phase: test Role: FE App: MyApp Phase: prod Role: BE App: MyApp Phase: test Role: BE App = MyApp, Phase = test Selectors
  57. Google Cloud Platform Logical view

  58. Google Cloud Platform Logical view

  59. Google Cloud Platform Logical view

  60. Google Cloud Platform Logical view

  61. Google Cloud Platform Discovery

  62. Google Cloud Platform A group of pods that work together

    • grouped by a selector Defines access policy • “load balanced” or “headless” Can have a stable virtual IP and port • also a DNS name! VIP is managed by kube-proxy • watches all services • updates iptables when backends change • default implementation - can be replaced! Hides complexity Client Virtual IP Services
  63. Google Cloud Platform Service VIPs are only available inside the

    cluster Need to receive traffic from “the outside world” Service “type” • NodePort: expose on a port on every node • LoadBalancer: provision a cloud load-balancer DiY load-balancer solutions • socat (for nodePort remapping) • haproxy • nginx External services
  64. Google Cloud Platform Replication

  65. Google Cloud Platform Declaration of intent: run N copies of

    a pod Simple control loop One job: ensure N copies • too few? start some • too many? kill some Layered on top of Pods ReplicaSet - name = “my-rc” - selector = {“App”: “MyApp”} - template = { ... } - replicas = 4 API Server How many? 3 Start 1 more OK How many? 4 ReplicaSets
  66. Google Cloud Platform Manages replica changes for you • stable

    object name • simply edit the object • configurable server-side rolling-updates Can have multiple updates in flight Layered on top of ReplicaSets ... Deployments
  67. Google Cloud Platform Updates

  68. Google Cloud Platform ReplicaSet - name: my-app-v1 - replicas: 3

    - selector: - app: MyApp - version: v1 Service - app: MyApp Rolling Update
  69. Google Cloud Platform ReplicaSet - name: my-app-v1 - replicas: 3

    - selector: - app: MyApp - version: v1 ReplicaSet - name: my-app-v2 - replicas: 0 - selector: - app: MyApp - version: v2 Service - app: MyApp Rolling Update
  70. Google Cloud Platform ReplicaSet - name: my-app-v1 - replicas: 3

    - selector: - app: MyApp - version: v1 ReplicaSet - name: my-app-v2 - replicas: 1 - selector: - app: MyApp - version: v2 Service - app: MyApp Rolling Update
  71. Google Cloud Platform ReplicaSet - name: my-app-v1 - replicas: 2

    - selector: - app: MyApp - version: v1 ReplicaSet - name: my-app-v2 - replicas: 1 - selector: - app: MyApp - version: v2 Service - app: MyApp Rolling Update
  72. Google Cloud Platform ReplicaSet - name: my-app-v1 - replicas: 2

    - selector: - app: MyApp - version: v1 ReplicaSet - name: my-app-v2 - replicas: 2 - selector: - app: MyApp - version: v2 Service - app: MyApp Rolling Update
  73. Google Cloud Platform ReplicaSet - name: my-app-v1 - replicas: 1

    - selector: - app: MyApp - version: v1 ReplicaSet - name: my-app-v2 - replicas: 2 - selector: - app: MyApp - version: v2 Service - app: MyApp Rolling Update
  74. Google Cloud Platform ReplicaSet - name: my-app-v1 - replicas: 1

    - selector: - app: MyApp - version: v1 ReplicaSet - name: my-app-v2 - replicas: 3 - selector: - app: MyApp - version: v2 Service - app: MyApp Rolling Update
  75. Google Cloud Platform ReplicaSet - name: my-app-v1 - replicas: 0

    - selector: - app: MyApp - version: v1 ReplicaSet - name: my-app-v2 - replicas: 3 - selector: - app: MyApp - version: v2 Service - app: MyApp Rolling Update
  76. Google Cloud Platform ReplicaSet - name: my-app-v2 - replicas: 3

    - selector: - app: MyApp - version: v2 Service - app: MyApp Rolling Update
  77. Google Cloud Platform Configuration and secrets

  78. Google Cloud Platform node API Pod Config Map ConfigMaps &

    Secrets Goal: manage app configuration & secrets • ...without making overly-brittle container images 12-factor says config comes from the environment • Kubernetes is the environment Manage configs via the Kubernetes API Inject them as virtual volumes into your Pods • late-binding, live-updated (atomic) • also available as env vars
  79. Google Cloud Platform Auto-scaling

  80. Google Cloud Platform Automatically scale number of pods as needed

    • based on CPU utilization (for now) • custom metrics coming Efficiency now, capacity when you need it Operates within user-defined min/max bounds Set it and forget it ... Stats HorizontalPodAutoScalers
  81. Google Cloud Platform Automatically scale number of nodes as needed

    • based on scheduler backlog & idleness Efficiency now, capacity when you need it Operates within user-defined min/max bounds Set it and forget it ... Sched ClusterAutoScaler
  82. Google Cloud Platform Storage

  83. Google Cloud Platform Manage storage with its own lifecycle Driver

    plugins - more than 20 supported • Google Persistent Disk • Amazon EBS • Azure Volumes • Gluster • Ceph Dynamic provisioning - allocate on-demand Local disks volumes in development Containers are not just for stateless apps! PersistentVolumes • iSCSI • Cinder • ScaleIO • Portworx • ...
  84. Google Cloud Platform About the Kubernetes Project

  85. Google Cloud Platform 1500+ Contributors 400+ Person-Years of Effort Top

    0.001% of all Github Projects 4000+ External Projects Based on K8s Contributors Users Community
  86. 86 86 Kubernetes is Open https://kubernetes.io Code: github.com/kubernetes/kubernetes Chat: slack.k8s.io

    Twitter: @kubernetesio open community open design open source open to ideas