Slide 1

Slide 1 text

Google Cloud Platform Pain Management for Containers OpenStack Summit Containers Meetup 4/26/2016 Tim Hockin Senior Staff SW Engineer @thockin

Slide 2

Slide 2 text

Google Cloud Platform Google has been developing and using containers to manage our applications for over 12 years. Images by Connie Zhou

Slide 3

Slide 3 text

Google Cloud Platform A Brief History

Slide 4

Slide 4 text

Google Cloud Platform ca. 2002: App-specific machine pools • Inefficient and painful to manage Images by Connie Zhou

Slide 5

Slide 5 text

Google Cloud Platform ca. 2002: App-specific machine pools • Inefficient and painful to manage Shared machines • Chroots, ulimits, and nice • Noisy neighbors: a real problem • Limited our ability to share Images by Connie Zhou

Slide 6

Slide 6 text

Google Cloud Platform ca. 2002: App-specific machine pools • Inefficient and painful to manage Shared machines • Chroots, ulimits, and nice • Noisy neighbors: a real problem • Limited our ability to share The fleet gets larger • Inefficiency hurts more at scale Images by Connie Zhou

Slide 7

Slide 7 text

Google Cloud Platform ca. 2002: App-specific machine pools • Inefficient and painful to manage Shared machines • Chroots, ulimits, and nice • Noisy neighbors: a real problem • Limited our ability to share The fleet gets larger • Inefficiency hurts more at scale Share harder! • Good fences make good neighbors Images by Connie Zhou

Slide 8

Slide 8 text

Google Cloud Platform ca. 2006: Google develops cgroups • Inescapable resource isolation • Enables better sharing Images by Connie Zhou

Slide 9

Slide 9 text

Google Cloud Platform ca. 2006: Google develops cgroups • Inescapable resource isolation • Enables better sharing Isolation is paramount • Namespacing is secondary • c.f. github.com/google/lmctfy Images by Connie Zhou

Slide 10

Slide 10 text

Google Cloud Platform ca. 2006: Google develops cgroups • Inescapable resource isolation • Enables better sharing Isolation is paramount • Namespacing is secondary • c.f. github.com/google/lmctfy Needs change, solutions evolve • mistakes were made • lessons were learned Images by Connie Zhou

Slide 11

Slide 11 text

Google Cloud Platform ca. 2013: Docker! Images by Connie Zhou

Slide 12

Slide 12 text

Google Cloud Platform Kubernetes Greek for “Helmsman”; also the root of the words “governor” and “cybernetic” • Manages container clusters • Inspired and informed by Google’s experiences and internal systems • Supports multiple cloud and bare-metal environments • Supports multiple container runtimes • 100% Open source, written in Go Manage applications, not machines

Slide 13

Slide 13 text

Google Cloud Platform Pain: Lock-In

Slide 14

Slide 14 text

Google Cloud Platform Goal: Avoid vendor lock-in Runs in many environments, including “bare metal” and “your laptop” The API and the implementation are 100% open The whole system is modular and replaceable Workload Portability

Slide 15

Slide 15 text

Google Cloud Platform Goal: Write once, run anywhere* Don’t force apps to know about concepts that are cloud-provider-specific Examples of this: ● Network model ● Ingress ● Service load-balancers ● PersistentVolumes * approximately Workload Portability

Slide 16

Slide 16 text

Google Cloud Platform Goal: Avoid coupling Don’t force apps to know about concepts that are Kubernetes-specific Examples of this: ● Namespaces ● Services / DNS ● Downward API ● Secrets / ConfigMaps Workload Portability

Slide 17

Slide 17 text

Google Cloud Platform Result: Portability Build your apps on-prem, lift-and-shift into cloud when you are ready Don’t get stuck with a platform that doesn’t work for you Put your app on wheels and move it whenever and wherever you need Workload Portability

Slide 18

Slide 18 text

Google Cloud Platform Pain: Networking & port mapping

Slide 19

Slide 19 text

Google Cloud Platform 172.16.1.1 172.16.1.2 Docker networking 172.16.1.1 172.16.1.1

Slide 20

Slide 20 text

Google Cloud Platform 172.16.1.1 172.16.1.2 172.16.1.1 172.16.1.1 NAT NAT NAT NAT NAT Docker networking

Slide 21

Slide 21 text

Google Cloud Platform Host ports A: 172.16.1.1 3306 B: 172.16.1.2 80 9376 11878 SNAT SNAT C: 172.16.1.1 8000

Slide 22

Slide 22 text

Google Cloud Platform Host ports A: 172.16.1.1 3306 B: 172.16.1.2 80 9376 11878 SNAT SNAT C: 172.16.1.1 8000 REJECTED

Slide 23

Slide 23 text

Google Cloud Platform Kubernetes networking IPs are cluster-scoped • vs docker default private IP Pods can reach each other directly • even across nodes No brokering of port numbers • too complex, why bother? This is a fundamental requirement • can be L3 routed • can be underlayed (cloud) • can be overlayed (SDN)

Slide 24

Slide 24 text

Google Cloud Platform 10.1.1.0/24 10.1.1.1 10.1.1.2 10.1.2.0/24 10.1.2.1 10.1.3.0/24 10.1.3.1 Kubernetes networking

Slide 25

Slide 25 text

Google Cloud Platform Pain: Running together

Slide 26

Slide 26 text

Google Cloud Platform Coupled containers File Puller Web Server ?

Slide 27

Slide 27 text

Google Cloud Platform Coupled containers File Puller Web Server

Slide 28

Slide 28 text

Google Cloud Platform Coupled containers File Puller Web Server REJECTED

Slide 29

Slide 29 text

Google Cloud Platform Pods Small group of containers & volumes Tightly coupled The atom of scheduling & placement Shared namespace • share IP address & localhost • share IPC, etc. Managed lifecycle • bound to a node, restart in place • can die, cannot be reborn with same ID Example: data puller & web server Consumers Content Manager File Puller Web Server Volume Pod

Slide 30

Slide 30 text

Google Cloud Platform Pain: Storage

Slide 31

Slide 31 text

Google Cloud Platform Volumes Pod-scoped storage Support many types of volume plugins • Empty dir (and tmpfs) • Host path • Git repository • GCE Persistent Disk • AWS Elastic Block Store • Azure File Storage • iSCSI • Flocker • NFS • GlusterFS • Ceph File and RBD • Cinder • FibreChannel • Secret, ConfigMap, DownwardAPI • Flex (exec a binary) • ...

Slide 32

Slide 32 text

Google Cloud Platform PersistentVolumes A higher-level storage abstraction • insulation from any one cloud environment Admin provisions them, users claim them • NEW: auto-provisioning (alpha in v1.2) Independent lifetime from consumers • lives until user is done with it • can be handed-off between pods Dynamically “scheduled” and managed, like nodes and pods Claim

Slide 33

Slide 33 text

Google Cloud Platform Pain: Container soup

Slide 34

Slide 34 text

Google Cloud Platform Physical view

Slide 35

Slide 35 text

Google Cloud Platform Physical view

Slide 36

Slide 36 text

Google Cloud Platform Logical view

Slide 37

Slide 37 text

Google Cloud Platform Arbitrary metadata Attached to any API object Generally represent identity Queryable by selectors • think SQL ‘select ... where ...’ The only grouping mechanism • pods in a ReplicaSet • pods in a Service • capabilities of a node (constraints) Labels

Slide 38

Slide 38 text

Google Cloud Platform Logical view

Slide 39

Slide 39 text

Google Cloud Platform Logical view

Slide 40

Slide 40 text

Google Cloud Platform Logical view

Slide 41

Slide 41 text

Google Cloud Platform Logical view

Slide 42

Slide 42 text

Google Cloud Platform Pain: Updates

Slide 43

Slide 43 text

Google Cloud Platform Rolling Updates ReplicationController - replicas: 3 - selector: - app: MyApp - version: v1 Service - app: MyApp

Slide 44

Slide 44 text

Google Cloud Platform ReplicationController - replicas: 3 - selector: - app: MyApp - version: v1 ReplicationController - replicas: 0 - selector: - app: MyApp - version: v2 Service - app: MyApp Rolling Updates

Slide 45

Slide 45 text

Google Cloud Platform ReplicationController - replicas: 3 - selector: - app: MyApp - version: v1 ReplicationController - replicas: 1 - selector: - app: MyApp - version: v2 Service - app: MyApp Rolling Updates

Slide 46

Slide 46 text

Google Cloud Platform ReplicationController - replicas: 2 - selector: - app: MyApp - version: v1 ReplicationController - replicas: 1 - selector: - app: MyApp - version: v2 Service - app: MyApp Rolling Updates

Slide 47

Slide 47 text

Google Cloud Platform ReplicationController - replicas: 2 - selector: - app: MyApp - version: v1 ReplicationController - replicas: 2 - selector: - app: MyApp - version: v2 Service - app: MyApp Rolling Updates

Slide 48

Slide 48 text

Google Cloud Platform ReplicationController - replicas: 1 - selector: - app: MyApp - version: v1 ReplicationController - replicas: 2 - selector: - app: MyApp - version: v2 Service - app: MyApp Rolling Updates

Slide 49

Slide 49 text

Google Cloud Platform ReplicationController - replicas: 1 - selector: - app: MyApp - version: v1 ReplicationController - replicas: 3 - selector: - app: MyApp - version: v2 Service - app: MyApp Rolling Updates

Slide 50

Slide 50 text

Google Cloud Platform ReplicationController - replicas: 0 - selector: - app: MyApp - version: v1 ReplicationController - replicas: 3 - selector: - app: MyApp - version: v2 Service - app: MyApp Rolling Updates

Slide 51

Slide 51 text

Google Cloud Platform ReplicationController - replicas: 3 - selector: - app: MyApp - version: v2 Service - app: MyApp Rolling Updates

Slide 52

Slide 52 text

Google Cloud Platform Deployments Updates-as-a-service • Rolling update is imperative, client-side Deployment manages replica changes for you • stable object name • updates are configurable, done server-side • kubectl edit or kubectl apply Aggregates stats Can have multiple updates in flight Status: BETA in Kubernetes v1.2 ...

Slide 53

Slide 53 text

Google Cloud Platform Pain: High availability

Slide 54

Slide 54 text

54 Goal: zone-fault tolerance for applications Zero API changes relative to kubernetes ● Create services, replication controllers, etc. exactly as usual Nodes and PersistentVolumes are labelled with their availability zone ● Fully automatic for GKE, GCE, AWS ● Manual for on-premise and other cloud providers (for now) Status: GA in Kubernetes v1.2 User Zone A Zone C Zone B Master Multi-Zone Clusters

Slide 55

Slide 55 text

Google Cloud Platform Pain: Handling load

Slide 56

Slide 56 text

Google Cloud Platform HorizontalPodAutoScalers Goal: Automatically scale pods as needed • based on CPU utilization (for now) • custom metrics in Alpha Efficiency now, capacity when you need it Operates within user-defined min/max bounds Set it and forget it Status: GA in Kubernetes v1.2 ... Stats

Slide 57

Slide 57 text

Google Cloud Platform Pain: Run once on each node

Slide 58

Slide 58 text

Google Cloud Platform DaemonSets Problem: how to run a Pod on every node? • or a subset of nodes Similar to ReplicationController • principle: do one thing, don’t overload “Which nodes?” is a selector Use familiar tools and patterns Status: BETA in Kubernetes v1.2 Pod

Slide 59

Slide 59 text

Google Cloud Platform Pain: Node maintenance

Slide 60

Slide 60 text

Google Cloud Platform Node Drain Goal: Evacuate a node for maintenance • e.g. kernel upgrades CLI: kubectl drain • disallow scheduling • allow grace period for pods to terminate • kill pods When done: kubectl uncordon • the node rejoins the cluster

Slide 61

Slide 61 text

Google Cloud Platform Pain: HTTP load balancing

Slide 62

Slide 62 text

Google Cloud Platform Ingress (L7) Many apps are HTTP/HTTPS Services are L3/L4 (IP + port) Ingress maps incoming traffic to backend services • by HTTP host headers • by HTTP URL paths HAProxy, NGINX, AWS and GCE implementations in progress Now with SSL! Status: BETA in Kubernetes v1.2 Client URL Map

Slide 63

Slide 63 text

Google Cloud Platform Pain: Configuration

Slide 64

Slide 64 text

Google Cloud Platform ConfigMaps Goal: manage app configuration • ...without making overly-brittle container images 12-factor says config comes from the environment • Kubernetes is the environment Manage config via the Kubernetes API Inject config as a virtual volume into your Pods • late-binding, live-updated (atomic) • also available as env vars Status: GA in Kubernetes v1.2 node API Pod Config Map

Slide 65

Slide 65 text

Google Cloud Platform Secrets Goal: grant a pod access to a secured something • don’t put secrets in the container image! 12-factor says config comes from the environment • Kubernetes is the environment Manage secrets via the Kubernetes API Inject secrets as virtual volumes into your Pods • late-binding, tmpfs - never touches disk • also available as env vars node API Pod Secret

Slide 66

Slide 66 text

Google Cloud Platform Pain: Security

Slide 67

Slide 67 text

Google Cloud Platform Network Isolation Describe the DAG of your app, enforce it in the network Restrict Pod-to-Pod traffic or across Namespaces Designed by the network SIG • implementations for Calico, OpenShift, Romana, OpenContrail (so far) Status: Alpha in v1.2, expect beta in v1.3

Slide 68

Slide 68 text

Google Cloud Platform Enough with the pain!

Slide 69

Slide 69 text

Google Cloud Platform Community Top 0.01% of all Github projects 1200+ external projects based on k8s Companies Contributing Companies Using 800+ unique contributors

Slide 70

Slide 70 text

Google Cloud Platform Velocity 1.0 1.1 1.2 v1.2: - 5k commits, - +50% unique contributors

Slide 71

Slide 71 text

71 71 Kubernetes is Open https://kubernetes.io Code: github.com/kubernetes/kubernetes Chat: slack.k8s.io Twitter: @kubernetesio open community open design open source open to ideas