Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Pain Management for Containers

Pain Management for Containers

Tim Hockin

April 27, 2016
Tweet

More Decks by Tim Hockin

Other Decks in Technology

Transcript

  1. Google Cloud Platform Pain Management for Containers OpenStack Summit Containers

    Meetup 4/26/2016 Tim Hockin <[email protected]> Senior Staff SW Engineer @thockin
  2. Google Cloud Platform Google has been developing and using containers

    to manage our applications for over 12 years. Images by Connie Zhou
  3. Google Cloud Platform ca. 2002: App-specific machine pools • Inefficient

    and painful to manage Shared machines • Chroots, ulimits, and nice • Noisy neighbors: a real problem • Limited our ability to share Images by Connie Zhou
  4. Google Cloud Platform ca. 2002: App-specific machine pools • Inefficient

    and painful to manage Shared machines • Chroots, ulimits, and nice • Noisy neighbors: a real problem • Limited our ability to share The fleet gets larger • Inefficiency hurts more at scale Images by Connie Zhou
  5. Google Cloud Platform ca. 2002: App-specific machine pools • Inefficient

    and painful to manage Shared machines • Chroots, ulimits, and nice • Noisy neighbors: a real problem • Limited our ability to share The fleet gets larger • Inefficiency hurts more at scale Share harder! • Good fences make good neighbors Images by Connie Zhou
  6. Google Cloud Platform ca. 2006: Google develops cgroups • Inescapable

    resource isolation • Enables better sharing Images by Connie Zhou
  7. Google Cloud Platform ca. 2006: Google develops cgroups • Inescapable

    resource isolation • Enables better sharing Isolation is paramount • Namespacing is secondary • c.f. github.com/google/lmctfy Images by Connie Zhou
  8. Google Cloud Platform ca. 2006: Google develops cgroups • Inescapable

    resource isolation • Enables better sharing Isolation is paramount • Namespacing is secondary • c.f. github.com/google/lmctfy Needs change, solutions evolve • mistakes were made • lessons were learned Images by Connie Zhou
  9. Google Cloud Platform Kubernetes Greek for “Helmsman”; also the root

    of the words “governor” and “cybernetic” • Manages container clusters • Inspired and informed by Google’s experiences and internal systems • Supports multiple cloud and bare-metal environments • Supports multiple container runtimes • 100% Open source, written in Go Manage applications, not machines
  10. Google Cloud Platform Goal: Avoid vendor lock-in Runs in many

    environments, including “bare metal” and “your laptop” The API and the implementation are 100% open The whole system is modular and replaceable Workload Portability
  11. Google Cloud Platform Goal: Write once, run anywhere* Don’t force

    apps to know about concepts that are cloud-provider-specific Examples of this: • Network model • Ingress • Service load-balancers • PersistentVolumes * approximately Workload Portability
  12. Google Cloud Platform Goal: Avoid coupling Don’t force apps to

    know about concepts that are Kubernetes-specific Examples of this: • Namespaces • Services / DNS • Downward API • Secrets / ConfigMaps Workload Portability
  13. Google Cloud Platform Result: Portability Build your apps on-prem, lift-and-shift

    into cloud when you are ready Don’t get stuck with a platform that doesn’t work for you Put your app on wheels and move it whenever and wherever you need Workload Portability
  14. Google Cloud Platform Host ports A: 172.16.1.1 3306 B: 172.16.1.2

    80 9376 11878 SNAT SNAT C: 172.16.1.1 8000
  15. Google Cloud Platform Host ports A: 172.16.1.1 3306 B: 172.16.1.2

    80 9376 11878 SNAT SNAT C: 172.16.1.1 8000 REJECTED
  16. Google Cloud Platform Kubernetes networking IPs are cluster-scoped • vs

    docker default private IP Pods can reach each other directly • even across nodes No brokering of port numbers • too complex, why bother? This is a fundamental requirement • can be L3 routed • can be underlayed (cloud) • can be overlayed (SDN)
  17. Google Cloud Platform Pods Small group of containers & volumes

    Tightly coupled The atom of scheduling & placement Shared namespace • share IP address & localhost • share IPC, etc. Managed lifecycle • bound to a node, restart in place • can die, cannot be reborn with same ID Example: data puller & web server Consumers Content Manager File Puller Web Server Volume Pod
  18. Google Cloud Platform Volumes Pod-scoped storage Support many types of

    volume plugins • Empty dir (and tmpfs) • Host path • Git repository • GCE Persistent Disk • AWS Elastic Block Store • Azure File Storage • iSCSI • Flocker • NFS • GlusterFS • Ceph File and RBD • Cinder • FibreChannel • Secret, ConfigMap, DownwardAPI • Flex (exec a binary) • ...
  19. Google Cloud Platform PersistentVolumes A higher-level storage abstraction • insulation

    from any one cloud environment Admin provisions them, users claim them • NEW: auto-provisioning (alpha in v1.2) Independent lifetime from consumers • lives until user is done with it • can be handed-off between pods Dynamically “scheduled” and managed, like nodes and pods Claim
  20. Google Cloud Platform Arbitrary metadata Attached to any API object

    Generally represent identity Queryable by selectors • think SQL ‘select ... where ...’ The only grouping mechanism • pods in a ReplicaSet • pods in a Service • capabilities of a node (constraints) Labels
  21. Google Cloud Platform Rolling Updates ReplicationController - replicas: 3 -

    selector: - app: MyApp - version: v1 Service - app: MyApp
  22. Google Cloud Platform ReplicationController - replicas: 3 - selector: -

    app: MyApp - version: v1 ReplicationController - replicas: 0 - selector: - app: MyApp - version: v2 Service - app: MyApp Rolling Updates
  23. Google Cloud Platform ReplicationController - replicas: 3 - selector: -

    app: MyApp - version: v1 ReplicationController - replicas: 1 - selector: - app: MyApp - version: v2 Service - app: MyApp Rolling Updates
  24. Google Cloud Platform ReplicationController - replicas: 2 - selector: -

    app: MyApp - version: v1 ReplicationController - replicas: 1 - selector: - app: MyApp - version: v2 Service - app: MyApp Rolling Updates
  25. Google Cloud Platform ReplicationController - replicas: 2 - selector: -

    app: MyApp - version: v1 ReplicationController - replicas: 2 - selector: - app: MyApp - version: v2 Service - app: MyApp Rolling Updates
  26. Google Cloud Platform ReplicationController - replicas: 1 - selector: -

    app: MyApp - version: v1 ReplicationController - replicas: 2 - selector: - app: MyApp - version: v2 Service - app: MyApp Rolling Updates
  27. Google Cloud Platform ReplicationController - replicas: 1 - selector: -

    app: MyApp - version: v1 ReplicationController - replicas: 3 - selector: - app: MyApp - version: v2 Service - app: MyApp Rolling Updates
  28. Google Cloud Platform ReplicationController - replicas: 0 - selector: -

    app: MyApp - version: v1 ReplicationController - replicas: 3 - selector: - app: MyApp - version: v2 Service - app: MyApp Rolling Updates
  29. Google Cloud Platform ReplicationController - replicas: 3 - selector: -

    app: MyApp - version: v2 Service - app: MyApp Rolling Updates
  30. Google Cloud Platform Deployments Updates-as-a-service • Rolling update is imperative,

    client-side Deployment manages replica changes for you • stable object name • updates are configurable, done server-side • kubectl edit or kubectl apply Aggregates stats Can have multiple updates in flight Status: BETA in Kubernetes v1.2 ...
  31. 54 Goal: zone-fault tolerance for applications Zero API changes relative

    to kubernetes • Create services, replication controllers, etc. exactly as usual Nodes and PersistentVolumes are labelled with their availability zone • Fully automatic for GKE, GCE, AWS • Manual for on-premise and other cloud providers (for now) Status: GA in Kubernetes v1.2 User Zone A Zone C Zone B Master Multi-Zone Clusters
  32. Google Cloud Platform HorizontalPodAutoScalers Goal: Automatically scale pods as needed

    • based on CPU utilization (for now) • custom metrics in Alpha Efficiency now, capacity when you need it Operates within user-defined min/max bounds Set it and forget it Status: GA in Kubernetes v1.2 ... Stats
  33. Google Cloud Platform DaemonSets Problem: how to run a Pod

    on every node? • or a subset of nodes Similar to ReplicationController • principle: do one thing, don’t overload “Which nodes?” is a selector Use familiar tools and patterns Status: BETA in Kubernetes v1.2 Pod
  34. Google Cloud Platform Node Drain Goal: Evacuate a node for

    maintenance • e.g. kernel upgrades CLI: kubectl drain • disallow scheduling • allow grace period for pods to terminate • kill pods When done: kubectl uncordon • the node rejoins the cluster
  35. Google Cloud Platform Ingress (L7) Many apps are HTTP/HTTPS Services

    are L3/L4 (IP + port) Ingress maps incoming traffic to backend services • by HTTP host headers • by HTTP URL paths HAProxy, NGINX, AWS and GCE implementations in progress Now with SSL! Status: BETA in Kubernetes v1.2 Client URL Map
  36. Google Cloud Platform ConfigMaps Goal: manage app configuration • ...without

    making overly-brittle container images 12-factor says config comes from the environment • Kubernetes is the environment Manage config via the Kubernetes API Inject config as a virtual volume into your Pods • late-binding, live-updated (atomic) • also available as env vars Status: GA in Kubernetes v1.2 node API Pod Config Map
  37. Google Cloud Platform Secrets Goal: grant a pod access to

    a secured something • don’t put secrets in the container image! 12-factor says config comes from the environment • Kubernetes is the environment Manage secrets via the Kubernetes API Inject secrets as virtual volumes into your Pods • late-binding, tmpfs - never touches disk • also available as env vars node API Pod Secret
  38. Google Cloud Platform Network Isolation Describe the DAG of your

    app, enforce it in the network Restrict Pod-to-Pod traffic or across Namespaces Designed by the network SIG • implementations for Calico, OpenShift, Romana, OpenContrail (so far) Status: Alpha in v1.2, expect beta in v1.3
  39. Google Cloud Platform Community Top 0.01% of all Github projects

    1200+ external projects based on k8s Companies Contributing Companies Using 800+ unique contributors
  40. Google Cloud Platform Velocity 1.0 1.1 1.2 v1.2: - 5k

    commits, - +50% unique contributors
  41. 71 71 Kubernetes is Open https://kubernetes.io Code: github.com/kubernetes/kubernetes Chat: slack.k8s.io

    Twitter: @kubernetesio open community open design open source open to ideas