What's new in Kuberland?

Slide 1

Slide 1 text

What’s New in Kuberland? NY Kubernetes Meetup Nov. 5, 2015 Tim Hockin Senior Staff Software Engineer @thockin (GitHub, Slack, IRC, Twitter)

Slide 2

Slide 2 text

Everything at Google runs in containers: • Gmail, Web Search, Maps, ... • MapReduce, batch, ... • GFS, Colossus, ... • Even Google’s Cloud Platform: VMs run in containers!

Slide 3

Slide 3 text

Everything at Google runs in containers: • Gmail, Web Search, Maps, ... • MapReduce, batch, ... • GFS, Colossus, ... • Even Google’s Cloud Platform: VMs run in containers! We launch over 2 billion containers per week

Slide 4

Slide 4 text

Kubernetes Greek for “Helmsman”; also the root of the words “governor” and “cybernetic” • Runs and manages containers • Inspired and informed by Google’s experiences and internal systems • Supports multiple cloud and bare-metal environments • Supports multiple container runtimes • 100% Open source, written in Go Manage applications, not machines

Slide 5

Slide 5 text

Container clusters: A story in two parts

Slide 6

Slide 6 text

Container clusters: A story in two parts 1. Setting up the cluster • Choose a cloud: GCE, AWS, Azure, Rackspace, on-premises, ... • Choose a node OS: CoreOS, Atomic, RHEL, Debian, CentOS, Ubuntu, ... • Provision machines: Boot VMs, install and run kube components, ... • Configure networking: IP ranges for Pods, Services, SDN, ... • Start cluster services: DNS, logging, monitoring, ... • Manage nodes: kernel upgrades, OS updates, hardware failures... Not the easy or fun part, but unavoidable This is where things like Google Container Engine (GKE) really help

Slide 7

Slide 7 text

2. Using the cluster • Run Pods & Containers • Replication controllers • Services • Volumes This is the fun part! A distinct set of problems from cluster setup and management Don’t make developers deal with cluster administration! Accelerate development by focusing on the applications, not the cluster Container clusters: A story in two parts

Slide 8

Slide 8 text

Pods

Slide 9

Slide 9 text

Pods Small group of containers & volumes Tightly coupled The atom of scheduling & placement Shared namespace • share IP address & localhost • share IPC, etc. Managed lifecycle • bound to a node, restart in place • can die, cannot be reborn with same ID Example: data puller & web server Consumers Content Manager File Puller Web Server Volume Pod

Slide 10

Slide 10 text

Volumes Very similar to Docker’s concept Pod scoped storage Share the pod’s lifetime & fate Support many types of volume plugins • Empty dir (and tmpfs) • Host path • Git repository • GCE Persistent Disk • AWS Elastic Block Store • iSCSI • NFS • GlusterFS • Ceph File and RBD • Cinder • Secret • ...

Slide 11

Slide 11 text

Labels & Selectors

Slide 12

Slide 12 text

App: MyApp Phase: prod Role: FE App: MyApp Phase: test Role: FE App: MyApp Phase: prod Role: BE App: MyApp Phase: test Role: BE Labels

Slide 13

Slide 13 text

ReplicationControllers

Slide 14

Slide 14 text

ReplicationControllers A simple control loop Runs out-of-process wrt API server Has 1 job: ensure N copies of a pod • if too few, start some • if too many, kill some • grouped by a selector Cleanly layered on top of the core • all access is by public APIs Replicated pods are fungible • No implied order or identity ReplicationController - name = “my-rc” - selector = {“App”: “MyApp”} - podTemplate = { ... } - replicas = 4 API Server How many? 3 Start 1 more OK How many? 4

Slide 15

Slide 15 text

Services

Slide 16

Slide 16 text

Services A group of pods that work together • grouped by a selector Defines access policy • “load balanced” or “headless” Gets a stable virtual IP and port • sometimes called the service portal • also a DNS name VIP is managed by kube-proxy • watches all services • updates iptables when backends change Hides complexity - ideal for non-native apps Client Virtual IP

Slide 17

Slide 17 text

External Services Services IPs are only available inside the cluster Need to receive traffic from “the outside world” Builtin: Service “type” • nodePort: expose on a port on every node • loadBalancer: provision a cloud load-balancer DiY load-balancer solutions • socat (for nodePort remapping) • haproxy • nginx

Slide 18

Slide 18 text

Ingress (L7) Services are assumed L3/L4 Lots of apps want HTTP/HTTPS Ingress maps incoming traffic to backend services • by HTTP host headers • by HTTP URL paths HAProxy and GCE implementations No SSL yet Status: BETA in Kubernetes v1.1 Ingress URL Map Client

Slide 19

Slide 19 text

Rolling updates

Slide 20

Slide 20 text

Rolling Updates ReplicationController - replicas: 3 - selector: - app: MyApp - version: v1 Service - app: MyApp

Slide 21

Slide 21 text

Rolling Updates ReplicationController - replicas: 3 - selector: - app: MyApp - version: v1 ReplicationController - replicas: 0 - selector: - app: MyApp - version: v2 Service - app: MyApp

Slide 22

Slide 22 text

Rolling Updates ReplicationController - replicas: 3 - selector: - app: MyApp - version: v1 ReplicationController - replicas: 1 - selector: - app: MyApp - version: v2 Service - app: MyApp

Slide 23

Slide 23 text

Rolling Updates ReplicationController - replicas: 2 - selector: - app: MyApp - version: v1 ReplicationController - replicas: 1 - selector: - app: MyApp - version: v2 Service - app: MyApp

Slide 24

Slide 24 text

Rolling Updates ReplicationController - replicas: 2 - selector: - app: MyApp - version: v1 ReplicationController - replicas: 2 - selector: - app: MyApp - version: v2 Service - app: MyApp

Slide 25

Slide 25 text

Rolling Updates ReplicationController - replicas: 1 - selector: - app: MyApp - version: v1 ReplicationController - replicas: 2 - selector: - app: MyApp - version: v2 Service - app: MyApp

Slide 26

Slide 26 text

Rolling Updates ReplicationController - replicas: 1 - selector: - app: MyApp - version: v1 ReplicationController - replicas: 3 - selector: - app: MyApp - version: v2 Service - app: MyApp

Slide 27

Slide 27 text

Rolling Updates ReplicationController - replicas: 0 - selector: - app: MyApp - version: v1 ReplicationController - replicas: 3 - selector: - app: MyApp - version: v2 Service - app: MyApp

Slide 28

Slide 28 text

Rolling Updates ReplicationController - replicas: 3 - selector: - app: MyApp - version: v2 Service - app: MyApp

Slide 29

Slide 29 text

Secrets

Slide 30

Slide 30 text

Secrets Problem: how to grant a pod access to a secured something? • don’t put secrets in the container image! 12-factor says: config comes from the environment • Kubernetes is the environment Manage secrets via the Kubernetes API Inject them as virtual volumes into Pods • late-binding • tmpfs - never touches disk node API Pod Secret

Slide 31

Slide 31 text

Graceful Termination

Slide 32

Slide 32 text

Graceful Termination Give pods time to clean up • finish in-flight operations • log state • flush to disk • 30 seconds by default Catch SIGTERM, cleanup, exit ASAP Pod status “Terminating” Declarative: ‘DELETE’ manifests as an object field in the API

Slide 33

Slide 33 text

DaemonSets

Slide 34

Slide 34 text

DaemonSets Problem: how to run a Pod on every node • or a subset of nodes Similar to ReplicationController • principle: do one thing, don’t overload “Which nodes?” is a selector Use familiar tools and patterns Status: ALPHA in Kubernetes v1.1 Pod

Slide 35

Slide 35 text

PersistentVolumes

Slide 36

Slide 36 text

PersistentVolumes A higher-level abstraction • insulation from any one cloud environment Admin provisions them, users claim them Independent lifetime and fate Can be handed-off between pods and lives until user is done with it Dynamically “scheduled” and managed, like nodes and pods Claim

Slide 37

Slide 37 text

PersistentVolumes Cluster Admin

Slide 38

Slide 38 text

PersistentVolumes Provision Cluster Admin PersistentVolumes

Slide 39

Slide 39 text

PersistentVolumes User Cluster Admin PersistentVolumes

Slide 40

Slide 40 text

PersistentVolumes User PVClaim Create Cluster Admin PersistentVolumes

Slide 41

Slide 41 text

PersistentVolumes User PVClaim Binder Cluster Admin PersistentVolumes

Slide 42

Slide 42 text

PersistentVolumes User PVClaim Pod Create Cluster Admin PersistentVolumes

Slide 43

Slide 43 text

PersistentVolumes User PVClaim Pod Cluster Admin PersistentVolumes *

Slide 44

Slide 44 text

PersistentVolumes User PVClaim Pod Delete * Cluster Admin PersistentVolumes *

Slide 45

Slide 45 text

PersistentVolumes User PVClaim Cluster Admin PersistentVolumes *

Slide 46

Slide 46 text

PersistentVolumes User PVClaim Pod Create Cluster Admin PersistentVolumes *

Slide 47

Slide 47 text

PersistentVolumes User PVClaim Pod Cluster Admin PersistentVolumes *

Slide 48

Slide 48 text

PersistentVolumes User PVClaim Pod Delete Cluster Admin PersistentVolumes *

Slide 49

Slide 49 text

PersistentVolumes User PVClaim Delete Cluster Admin PersistentVolumes *

Slide 50

Slide 50 text

PersistentVolumes User Recycler Cluster Admin PersistentVolumes

Slide 51

Slide 51 text

Namespaces

Slide 52

Slide 52 text

Namespaces Problem: I have too much stuff! • name collisions in the API • poor isolation between users • don’t want to expose things like Secrets Solution: Slice up the cluster • create new Namespaces as needed • per-user, per-app, per-department, etc. • part of the API - NOT private machines • most API objects are namespaced • part of the REST URL path • Namespaces are just another API object • One-step cleanup - delete the Namespace • Obvious hook for policy enforcement (e.g. quota)

Slide 53

Slide 53 text

Resource Isolation

Slide 54

Slide 54 text

Resource Isolation Principles: • Apps must not be able to affect each other’ s perf • if so it is an isolation failure • Repeated runs of the same app should see ~equal behavior • QoS levels drives resource decisions in (soft) real-time • Correct in all cases, optimal in some • reduce unreliable components • SLOs are the lingua franca

Slide 55

Slide 55 text

Requests and Limits Request: • how much of a resource you are asking to use, with a strong guarantee of availability • CPU (seconds/second) • RAM (bytes) • scheduler will not over-commit requests Limit: • max amount of a resource you can access Repercussions: • Usage > Request: resources might be available • Usage > Limit: throttled or killed

Slide 56

Slide 56 text

Quality of Service Defined in terms of Request and Limit Guaranteed: highest protection • request > 0 && limit == request Burstable: medium protection • request > 0 && limit > request Best Effort: lowest protection • request == 0 What does “protection” mean? • OOM score • CPU scheduling

Slide 57

Slide 57 text

Quota and Limits

Slide 58

Slide 58 text

ResourceQuota Admission control: apply limits in aggregate Per-namespace: ensure no user/app/department abuses the cluster Reminiscent of disk quota by design Applies to each type of resource • CPU and memory for now Disallows pods without resources

Slide 59

Slide 59 text

LimitRange Admission control: limit the limits • min and max • ratio of limit/request Default values for unspecified limits Per-namespace Together with ResourceQuota gives cluster admins powerful tools

Slide 60

Slide 60 text

Network Plugins

Slide 61

Slide 61 text

Network Plugins Introduced in Kubernetes v1.0 • VERY experimental Uses CNI (CoreOS) in v1.1 • Simple exec interface • Not using Docker libnetwork • but can defer to Docker for networking Cluster admins can customize their installs • DHCP, MACVLAN, Flannel, custom Status: ALPHA in Kubernetes v1.1 Plugin Plugin Plugin

Slide 62

Slide 62 text

HorizontalPodAutoscalers

Slide 63

Slide 63 text

HorizontalPodAutoScalers Automatically scale ReplicationControllers to a target utilization • CPU utilization for now • Probably more later Operates within user-defined min/max bounds Set it and forget it Status: BETA in Kubernetes v1.1 ... Stats

Slide 64

Slide 64 text

New and coming soon • Cluster auto-scaling • Jobs (run-to-completion) • Cron (scheduled jobs) • Privileged containers • Graceful termination • Downward API • L7 load-balancing • Interactive containers • Network plugins: CNI • Bandwidth shaping • Scalability++ (250 in v1.1) • Performance++ • HA masters • Config injection • Simpler deployments • Cluster federation • Easier setup (e.g. networking) • More volume types • Private Docker registry • External DNS integration • Volume auto-provisioning • Pod auto-scaling

Slide 65

Slide 65 text

Kubernetes status & plans Open sourced in June, 2014 • v1.0 in July, 2015 • v1.1 in Nov, 2015 Google Container Engine (GKE) • hosted Kubernetes - don’t think about cluster setup • GA in August, 2015 PaaSes: • RedHat OpenShift, Deis, Stratos Distros: • CoreOS Tectonic, Mirantis Murano (OpenStack), RedHat Atomic, Mesos Shooting for a 1.2 release in O(months)

Slide 66

Slide 66 text

The Goal: Shake things up Containers are a new way of working Requires new concepts and new tools Google has a lot of experience... ...but we are listening to the users Workload portability is important!

Slide 67

Slide 67 text

Kubernetes is Open - open community - open design - open source - open to ideas http://kubernetes.io https://github.com/kubernetes/kubernetes slack: kubernetes twitter: @kubernetesio

Slide 68

Slide 68 text

Backup Slides

Slide 69

Slide 69 text

Networking

Slide 70

Slide 70 text

10.1.1.0/24 172.16.1.1 172.16.1.2 Docker networking 10.1.2.0/24 172.16.1.1 10.1.3.0/24 172.16.1.1

Slide 71

Slide 71 text

10.1.1.0/24 172.16.1.1 172.16.1.2 Docker networking 10.1.2.0/24 172.16.1.1 10.1.3.0/24 172.16.1.1 NAT NAT NAT NAT NAT

Slide 72

Slide 72 text

Host ports 10.1.1.0/24 10.1.3.0/24 A: 172.16.1.1 3306 B: 172.16.1.2 80 9376 11878 SNAT SNAT C: 172.16.1.1 8000

Slide 73

Slide 73 text

Kubernetes networking IPs are routable • vs docker default private IP Pods can reach each other without NAT • even across nodes No brokering of port numbers • too complex, why bother? This is a fundamental requirement • can be L3 routed • can be underlayed (cloud) • can be overlayed (SDN)

Slide 74

Slide 74 text

10.1.1.0/24 172.16.1.1 172.16.1.2 Kubernetes networking 10.1.2.0/24 172.16.1.1 10.1.3.0/24 172.16.1.1 Label Label

Slide 75

Slide 75 text

Arbitrary metadata Attached to any API object Generally represent identity Queryable by selectors • think SQL ‘select ... where ...’ The only grouping mechanism • pods under a ReplicationController • pods in a Service • capabilities of a node (constraints) Labels

Slide 76

Slide 76 text

Cluster Add-Ons

Slide 77

Slide 77 text

Monitoring Run cAdvisor on each node (in kubelet) • gather stats from all containers • export via REST Run Heapster as a pod in the cluster • just another pod, no special access • aggregate stats Run Influx and Grafana in the cluster • more pods • alternately: store in Google Cloud Monitoring Or plug in your own! • e.g. Google Cloud Monitoring

Slide 78

Slide 78 text

Logging Run fluentd as a pod on each node • gather logs from all containers • export to elasticsearch Run Elasticsearch as a pod in the cluster • just another pod, no special access • aggregate logs Run Kibana in the cluster • yet another pod • alternately: store in Google Cloud Logging Or plug in your own! • e.g. Google Cloud Logging

Slide 79

Slide 79 text

DNS Run SkyDNS as a pod in the cluster • kube2sky bridges Kubernetes API -> SkyDNS • Tell kubelets about it (static service IP) Strictly optional, but practically required • LOTS of things depend on it • Probably will become more integrated Or plug in your own!