Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Pain Management for Containers

Pain Management for Containers

569f10721398d92f5033097ac6d9132c?s=128

Tim Hockin

April 27, 2016
Tweet

Transcript

  1. Google Cloud Platform Pain Management for Containers OpenStack Summit Containers

    Meetup 4/26/2016 Tim Hockin <thockin@google.com> Senior Staff SW Engineer @thockin
  2. Google Cloud Platform Google has been developing and using containers

    to manage our applications for over 12 years. Images by Connie Zhou
  3. Google Cloud Platform A Brief History

  4. Google Cloud Platform ca. 2002: App-specific machine pools • Inefficient

    and painful to manage Images by Connie Zhou
  5. Google Cloud Platform ca. 2002: App-specific machine pools • Inefficient

    and painful to manage Shared machines • Chroots, ulimits, and nice • Noisy neighbors: a real problem • Limited our ability to share Images by Connie Zhou
  6. Google Cloud Platform ca. 2002: App-specific machine pools • Inefficient

    and painful to manage Shared machines • Chroots, ulimits, and nice • Noisy neighbors: a real problem • Limited our ability to share The fleet gets larger • Inefficiency hurts more at scale Images by Connie Zhou
  7. Google Cloud Platform ca. 2002: App-specific machine pools • Inefficient

    and painful to manage Shared machines • Chroots, ulimits, and nice • Noisy neighbors: a real problem • Limited our ability to share The fleet gets larger • Inefficiency hurts more at scale Share harder! • Good fences make good neighbors Images by Connie Zhou
  8. Google Cloud Platform ca. 2006: Google develops cgroups • Inescapable

    resource isolation • Enables better sharing Images by Connie Zhou
  9. Google Cloud Platform ca. 2006: Google develops cgroups • Inescapable

    resource isolation • Enables better sharing Isolation is paramount • Namespacing is secondary • c.f. github.com/google/lmctfy Images by Connie Zhou
  10. Google Cloud Platform ca. 2006: Google develops cgroups • Inescapable

    resource isolation • Enables better sharing Isolation is paramount • Namespacing is secondary • c.f. github.com/google/lmctfy Needs change, solutions evolve • mistakes were made • lessons were learned Images by Connie Zhou
  11. Google Cloud Platform ca. 2013: Docker! Images by Connie Zhou

  12. Google Cloud Platform Kubernetes Greek for “Helmsman”; also the root

    of the words “governor” and “cybernetic” • Manages container clusters • Inspired and informed by Google’s experiences and internal systems • Supports multiple cloud and bare-metal environments • Supports multiple container runtimes • 100% Open source, written in Go Manage applications, not machines
  13. Google Cloud Platform Pain: Lock-In

  14. Google Cloud Platform Goal: Avoid vendor lock-in Runs in many

    environments, including “bare metal” and “your laptop” The API and the implementation are 100% open The whole system is modular and replaceable Workload Portability
  15. Google Cloud Platform Goal: Write once, run anywhere* Don’t force

    apps to know about concepts that are cloud-provider-specific Examples of this: • Network model • Ingress • Service load-balancers • PersistentVolumes * approximately Workload Portability
  16. Google Cloud Platform Goal: Avoid coupling Don’t force apps to

    know about concepts that are Kubernetes-specific Examples of this: • Namespaces • Services / DNS • Downward API • Secrets / ConfigMaps Workload Portability
  17. Google Cloud Platform Result: Portability Build your apps on-prem, lift-and-shift

    into cloud when you are ready Don’t get stuck with a platform that doesn’t work for you Put your app on wheels and move it whenever and wherever you need Workload Portability
  18. Google Cloud Platform Pain: Networking & port mapping

  19. Google Cloud Platform 172.16.1.1 172.16.1.2 Docker networking 172.16.1.1 172.16.1.1

  20. Google Cloud Platform 172.16.1.1 172.16.1.2 172.16.1.1 172.16.1.1 NAT NAT NAT

    NAT NAT Docker networking
  21. Google Cloud Platform Host ports A: 172.16.1.1 3306 B: 172.16.1.2

    80 9376 11878 SNAT SNAT C: 172.16.1.1 8000
  22. Google Cloud Platform Host ports A: 172.16.1.1 3306 B: 172.16.1.2

    80 9376 11878 SNAT SNAT C: 172.16.1.1 8000 REJECTED
  23. Google Cloud Platform Kubernetes networking IPs are cluster-scoped • vs

    docker default private IP Pods can reach each other directly • even across nodes No brokering of port numbers • too complex, why bother? This is a fundamental requirement • can be L3 routed • can be underlayed (cloud) • can be overlayed (SDN)
  24. Google Cloud Platform 10.1.1.0/24 10.1.1.1 10.1.1.2 10.1.2.0/24 10.1.2.1 10.1.3.0/24 10.1.3.1

    Kubernetes networking
  25. Google Cloud Platform Pain: Running together

  26. Google Cloud Platform Coupled containers File Puller Web Server ?

  27. Google Cloud Platform Coupled containers File Puller Web Server

  28. Google Cloud Platform Coupled containers File Puller Web Server REJECTED

  29. Google Cloud Platform Pods Small group of containers & volumes

    Tightly coupled The atom of scheduling & placement Shared namespace • share IP address & localhost • share IPC, etc. Managed lifecycle • bound to a node, restart in place • can die, cannot be reborn with same ID Example: data puller & web server Consumers Content Manager File Puller Web Server Volume Pod
  30. Google Cloud Platform Pain: Storage

  31. Google Cloud Platform Volumes Pod-scoped storage Support many types of

    volume plugins • Empty dir (and tmpfs) • Host path • Git repository • GCE Persistent Disk • AWS Elastic Block Store • Azure File Storage • iSCSI • Flocker • NFS • GlusterFS • Ceph File and RBD • Cinder • FibreChannel • Secret, ConfigMap, DownwardAPI • Flex (exec a binary) • ...
  32. Google Cloud Platform PersistentVolumes A higher-level storage abstraction • insulation

    from any one cloud environment Admin provisions them, users claim them • NEW: auto-provisioning (alpha in v1.2) Independent lifetime from consumers • lives until user is done with it • can be handed-off between pods Dynamically “scheduled” and managed, like nodes and pods Claim
  33. Google Cloud Platform Pain: Container soup

  34. Google Cloud Platform Physical view

  35. Google Cloud Platform Physical view

  36. Google Cloud Platform Logical view

  37. Google Cloud Platform Arbitrary metadata Attached to any API object

    Generally represent identity Queryable by selectors • think SQL ‘select ... where ...’ The only grouping mechanism • pods in a ReplicaSet • pods in a Service • capabilities of a node (constraints) Labels
  38. Google Cloud Platform Logical view

  39. Google Cloud Platform Logical view

  40. Google Cloud Platform Logical view

  41. Google Cloud Platform Logical view

  42. Google Cloud Platform Pain: Updates

  43. Google Cloud Platform Rolling Updates ReplicationController - replicas: 3 -

    selector: - app: MyApp - version: v1 Service - app: MyApp
  44. Google Cloud Platform ReplicationController - replicas: 3 - selector: -

    app: MyApp - version: v1 ReplicationController - replicas: 0 - selector: - app: MyApp - version: v2 Service - app: MyApp Rolling Updates
  45. Google Cloud Platform ReplicationController - replicas: 3 - selector: -

    app: MyApp - version: v1 ReplicationController - replicas: 1 - selector: - app: MyApp - version: v2 Service - app: MyApp Rolling Updates
  46. Google Cloud Platform ReplicationController - replicas: 2 - selector: -

    app: MyApp - version: v1 ReplicationController - replicas: 1 - selector: - app: MyApp - version: v2 Service - app: MyApp Rolling Updates
  47. Google Cloud Platform ReplicationController - replicas: 2 - selector: -

    app: MyApp - version: v1 ReplicationController - replicas: 2 - selector: - app: MyApp - version: v2 Service - app: MyApp Rolling Updates
  48. Google Cloud Platform ReplicationController - replicas: 1 - selector: -

    app: MyApp - version: v1 ReplicationController - replicas: 2 - selector: - app: MyApp - version: v2 Service - app: MyApp Rolling Updates
  49. Google Cloud Platform ReplicationController - replicas: 1 - selector: -

    app: MyApp - version: v1 ReplicationController - replicas: 3 - selector: - app: MyApp - version: v2 Service - app: MyApp Rolling Updates
  50. Google Cloud Platform ReplicationController - replicas: 0 - selector: -

    app: MyApp - version: v1 ReplicationController - replicas: 3 - selector: - app: MyApp - version: v2 Service - app: MyApp Rolling Updates
  51. Google Cloud Platform ReplicationController - replicas: 3 - selector: -

    app: MyApp - version: v2 Service - app: MyApp Rolling Updates
  52. Google Cloud Platform Deployments Updates-as-a-service • Rolling update is imperative,

    client-side Deployment manages replica changes for you • stable object name • updates are configurable, done server-side • kubectl edit or kubectl apply Aggregates stats Can have multiple updates in flight Status: BETA in Kubernetes v1.2 ...
  53. Google Cloud Platform Pain: High availability

  54. 54 Goal: zone-fault tolerance for applications Zero API changes relative

    to kubernetes • Create services, replication controllers, etc. exactly as usual Nodes and PersistentVolumes are labelled with their availability zone • Fully automatic for GKE, GCE, AWS • Manual for on-premise and other cloud providers (for now) Status: GA in Kubernetes v1.2 User Zone A Zone C Zone B Master Multi-Zone Clusters
  55. Google Cloud Platform Pain: Handling load

  56. Google Cloud Platform HorizontalPodAutoScalers Goal: Automatically scale pods as needed

    • based on CPU utilization (for now) • custom metrics in Alpha Efficiency now, capacity when you need it Operates within user-defined min/max bounds Set it and forget it Status: GA in Kubernetes v1.2 ... Stats
  57. Google Cloud Platform Pain: Run once on each node

  58. Google Cloud Platform DaemonSets Problem: how to run a Pod

    on every node? • or a subset of nodes Similar to ReplicationController • principle: do one thing, don’t overload “Which nodes?” is a selector Use familiar tools and patterns Status: BETA in Kubernetes v1.2 Pod
  59. Google Cloud Platform Pain: Node maintenance

  60. Google Cloud Platform Node Drain Goal: Evacuate a node for

    maintenance • e.g. kernel upgrades CLI: kubectl drain • disallow scheduling • allow grace period for pods to terminate • kill pods When done: kubectl uncordon • the node rejoins the cluster
  61. Google Cloud Platform Pain: HTTP load balancing

  62. Google Cloud Platform Ingress (L7) Many apps are HTTP/HTTPS Services

    are L3/L4 (IP + port) Ingress maps incoming traffic to backend services • by HTTP host headers • by HTTP URL paths HAProxy, NGINX, AWS and GCE implementations in progress Now with SSL! Status: BETA in Kubernetes v1.2 Client URL Map
  63. Google Cloud Platform Pain: Configuration

  64. Google Cloud Platform ConfigMaps Goal: manage app configuration • ...without

    making overly-brittle container images 12-factor says config comes from the environment • Kubernetes is the environment Manage config via the Kubernetes API Inject config as a virtual volume into your Pods • late-binding, live-updated (atomic) • also available as env vars Status: GA in Kubernetes v1.2 node API Pod Config Map
  65. Google Cloud Platform Secrets Goal: grant a pod access to

    a secured something • don’t put secrets in the container image! 12-factor says config comes from the environment • Kubernetes is the environment Manage secrets via the Kubernetes API Inject secrets as virtual volumes into your Pods • late-binding, tmpfs - never touches disk • also available as env vars node API Pod Secret
  66. Google Cloud Platform Pain: Security

  67. Google Cloud Platform Network Isolation Describe the DAG of your

    app, enforce it in the network Restrict Pod-to-Pod traffic or across Namespaces Designed by the network SIG • implementations for Calico, OpenShift, Romana, OpenContrail (so far) Status: Alpha in v1.2, expect beta in v1.3
  68. Google Cloud Platform Enough with the pain!

  69. Google Cloud Platform Community Top 0.01% of all Github projects

    1200+ external projects based on k8s Companies Contributing Companies Using 800+ unique contributors
  70. Google Cloud Platform Velocity 1.0 1.1 1.2 v1.2: - 5k

    commits, - +50% unique contributors
  71. 71 71 Kubernetes is Open https://kubernetes.io Code: github.com/kubernetes/kubernetes Chat: slack.k8s.io

    Twitter: @kubernetesio open community open design open source open to ideas