Slide 1

Slide 1 text

Cluster Management with Kubernetes Please open the gears tab below for the speaker notes Satnam Singh [email protected] Work of the Google Kubernetes team and many open source contributors University of Edinburgh, 5 June 2015

Slide 2

Slide 2 text

The promise of cloud computing

Slide 3

Slide 3 text

Cloud software deployment is soul destroying Typically a cloud cluster node is a VM running a specific version of Linux. User applications comprise components each of which may have different and conflicting requirements from libraries, runtimes and kernel features. Applications are coupled to the version of the host operating system: bad. Evolution of the application components is coupled to (and in tension with) the evolution of the host operating system: bad. Also need to deal with node failures, spinning up and turning down replicas to deal with varying load, updating components with disruption … You thought you were a programmer but you are now a sys-admin.

Slide 4

Slide 4 text

Docker Source: Google Trends

Slide 5

Slide 5 text

What is Docker? An implementation of the container idea A package format Resource isolation An ecosystem

Slide 6

Slide 6 text

Virtual Machines workloads? We need to isolate the application components from the host environment.

Slide 7

Slide 7 text

VM vs. Docker

Slide 8

Slide 8 text

Docker “build once, run anywhere”

Slide 9

Slide 9 text

Resource isolation Implemented by a number of Linux APIs: • cgroups: Restrict resources a process can consume • CPU, memory, disk IO, ... • namespaces: Change a process’s view of the system • Network interfaces, PIDs, users, mounts, ... • capabilities: Limits what a user can do • mount, kill, chown, ... • chroots: Determines what parts of the filesystem a user can see

Slide 10

Slide 10 text

We need more than just packing and isolation Scheduling: Where should my containers run? Lifecycle and health: Keep my containers running despite failures Discovery: Where are my containers now? Monitoring: What’s happening with my containers? Auth{n,z}: Control who can do things to my containers Aggregates: Compose sets of containers into jobs Scaling: Making jobs bigger or smaller ...

Slide 11

Slide 11 text

Google confidential │ Do not distribute Everything at Google runs in containers: • Gmail, Web Search, Maps, ... • MapReduce, MillWheel, Pregel, ... • Colossus, BigTable, Spanner, ... • Even Google’s Cloud Computing product GCE itself: VMs run in containers

Slide 12

Slide 12 text

Google confidential │ Do not distribute Open Source Containers: Kubernetes Greek for “Helmsman”; also the root of the word “Governor” and “cybernetic” • Container orchestrator • Builds on Docker containers • also supporting other container technologies • Multiple cloud and bare-metal environments • Supports existing OSS apps • cannot require apps becoming cloud-native • Inspired and informed by Google’s experiences and internal systems • 100% Open source, written in Go Let users manage applications, not machines

Slide 13

Slide 13 text

Primary concepts Container: A sealed application package (Docker) Pod: A small group of tightly coupled Containers Labels: Identifying metadata attached to objects Selector: A query against labels, producing a set result Controller: A reconciliation loop that drives current state towards desired state Service: A set of pods that work together

Slide 14

Slide 14 text

Application Containers Homogenous Machine Fleet (Virtual or Physical) Kubernetes API: Unified Compute Substrate

Slide 15

Slide 15 text

Kubernetes Architecture etcd API Server Scheduler Controller Manager Kubelet Service Proxy kubectl, ajax, etc

Slide 16

Slide 16 text

Modularity Loose coupling is a goal everywhere • simpler • composable • extensible Code-level plugins where possible Multi-process where possible Isolate risk by interchangeable parts Example: ReplicationController Example: Scheduler

Slide 17

Slide 17 text

Reconciliation between declared and actual state

Slide 18

Slide 18 text

Control loops Drive current state -> desired state Act independently APIs - no shortcuts or back doors Observed state is truth Recurring pattern in the system Example: ReplicationController observe diff act

Slide 19

Slide 19 text

Atomic storage Backing store for all master state Hidden behind an abstract interface Stateless means scalable Watchable • this is a fundamental primitive • don’t poll, watch Using CoreOS etcd

Slide 20

Slide 20 text

Pods: Grouping containers Container Foo Namespaces - Net - IPC - .. Container Bar

Slide 21

Slide 21 text

Pods: Networking Container Foo Container Bar Namespaces - Net - IPC - ..

Slide 22

Slide 22 text

Pods: Volumes Container Foo Container Bar Namespaces - Net - IPC - ..

Slide 23

Slide 23 text

Pods: Labels Container Foo Container Bar Namespaces - Net - IPC - ..

Slide 24

Slide 24 text

Google confidential │ Do not distribute User owned Admin owned Persistent Volumes A higher-level abstraction - insulation from any one cloud environment Admin provisions them, users claim them Independent lifetime and fate Can be handed-off between pods and lives until user is done with it Dynamically “scheduled” and managed, like nodes and pods Pod ClaimRef PVClaim PersistentVolume GCE PD AWS ELB NFS iSCSI

Slide 25

Slide 25 text

Labels Arbitrary metadata Attached to any API object Generally represent identity Queryable by selectors • think SQL ‘select ... where ...’ The only grouping mechanism Use to determine which objects to apply an operation to • pods under a ReplicationController • pods in a Service • capabilities of a node (scheduling constraints) App: Nifty Phase: Dev Role: FE App: Nifty Phase: Dev Role: BE App: Nifty Phase: Test Role: FE App: Nifty Phase: Test Role: BE

Slide 26

Slide 26 text

Selectors App: Nifty Phase: Dev Role: FE App: Nifty Phase: Test Role: FE App: Nifty Phase: Dev Role: BE App: Nifty Phase: Test Role: BE

Slide 27

Slide 27 text

App == Nifty App: Nifty Phase: Dev Role: FE App: Nifty Phase: Test Role: FE App: Nifty Phase: Dev Role: BE App: Nifty Phase: Test Role: BE Selectors

Slide 28

Slide 28 text

App == Nifty Role == FE App: Nifty Phase: Dev Role: FE App: Nifty Phase: Test Role: FE App: Nifty Phase: Dev Role: BE App: Nifty Phase: Test Role: BE Selectors

Slide 29

Slide 29 text

App == Nifty Role == BE App: Nifty Phase: Dev Role: FE App: Nifty Phase: Test Role: FE App: Nifty Phase: Dev Role: BE App: Nifty Phase: Test Role: BE Selectors

Slide 30

Slide 30 text

App == Nifty Phase == Dev App: Nifty Phase: Dev Role: FE App: Nifty Phase: Test Role: FE App: Nifty Phase: Dev Role: BE App: Nifty Phase: Test Role: BE Selectors

Slide 31

Slide 31 text

App == Nifty Phase == Test App: Nifty Phase: Dev Role: FE App: Nifty Phase: Test Role: FE App: Nifty Phase: Dev Role: BE App: Nifty Phase: Test Role: BE Selectors

Slide 32

Slide 32 text

Pod lifecycle Once scheduled to a node, pods do not move • restart policy means restart in-place Pods can be observed pending, running, succeeded, or failed • failed is really the end - no more restarts • no complex state machine logic Pods are not rescheduled by the scheduler or apiserver • even if a node dies • controllers are responsible for this • keeps the scheduler simple Apps should consider these rules • Services hide this • Makes pod-to-pod communication more formal

Slide 33

Slide 33 text

Replication Controllers production backend production backend production backend #N

Slide 34

Slide 34 text

Replication Controllers A type of controller (control loop) Ensure N copies of a pod always running • if too few, start new ones • if too many, kill some • group == selector Cleanly layered on top of the core • all access is by public APIs Replicated pods are fungible • No implied ordinality or identity Other kinds of controllers coming • e.g. job controller for batch Replication Controller - Name = “nifty-rc” - Selector = {“App”: “Nifty”} - PodTemplate = { ... } - NumReplicas = 4 API Server How many? 3 Start 1 more OK How many? 4

Slide 35

Slide 35 text

Services production backend production backend production backend port(s) name 1.2.3.4 “name”

Slide 36

Slide 36 text

Services 10.0.0.1 : 9376 Client kube-proxy Service - Name = “nifty-svc” - Selector = {“App”: “Nifty”} - Port = 9376 - ContainerPort = 8080 Portal IP is assigned iptables DNAT TCP / UDP apiserver watch 10.240.2.2 : 8080 10.240.1.1 : 8080 10.240.3.3 : 8080 TCP / UDP

Slide 37

Slide 37 text

A Kubernetes cluster on Google Compute Engine

Slide 38

Slide 38 text

A Kubernetes cluster on Google Compute Engine

Slide 39

Slide 39 text

No content

Slide 40

Slide 40 text

A fresh Kubernetes cluster

Slide 41

Slide 41 text

Node 0f64: logging

Slide 42

Slide 42 text

Node 02ej: logging, monitoring

Slide 43

Slide 43 text

Node pk22: logging, DNS

Slide 44

Slide 44 text

Node 27gf: logging

Slide 45

Slide 45 text

A counter pod apiVersion: v1 kind: Pod metadata: name: counter namespace: demo spec: containers: - name: count image: ubuntu:14.04 args: [bash, -c, 'for ((i = 0; ; i++)); do echo "$i: $(date)"; sleep 1; done']

Slide 46

Slide 46 text

A counter pod $ kubectl create -f counter-pod.yaml --namespace=demo pods/counter $ kubectl get pods NAME READY REASON RESTARTS AGE fluentd-cloud-logging-kubernetes-minion-1xe3 1/1 Running 0 5m fluentd-cloud-logging-kubernetes-minion-p6cu 1/1 Running 0 5m fluentd-cloud-logging-kubernetes-minion-s2dl 1/1 Running 0 5m fluentd-cloud-logging-kubernetes-minion-ypau 1/1 Running 0 5m kube-dns-v3-55k7n 3/3 Running 0 6m monitoring-heapster-v1-55ix9 0/1 Running 12 6m

Slide 47

Slide 47 text

Node 27gf: logging, counter

Slide 48

Slide 48 text

Observing the output of the counter $ kubectl logs counter --namespace=demo 0: Tue Jun 2 21:37:31 UTC 2015 1: Tue Jun 2 21:37:32 UTC 2015 2: Tue Jun 2 21:37:33 UTC 2015 3: Tue Jun 2 21:37:34 UTC 2015 4: Tue Jun 2 21:37:35 UTC 2015 5: Tue Jun 2 21:37:36 UTC 2015 ...

Slide 49

Slide 49 text

ssh onto node and “ps” # docker ps CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 532247036a78 ubuntu:14.04 "\"bash -c 'i=0; whi About a minute ago Up About a minute k8s_count.dca54bea_counter_demo_479b8894-0971-11e5-a784-42010af00df1_f6159d40 8cd07658287d gcr.io/google_containers/pause:0.8.0 "/pause" About a minute ago Up About a minute k8s_POD.e4cc795_counter_demo_479b8894-0971-11e5-a784-42010af00df1_7de2fec0 b2dc87db6608 gcr.io/google_containers/fluentd-gcp:1.6 "\"/bin/sh -c '/usr/ 16 minutes ago Up 16 minutes k8s_fluentd-cloud-logging.463ca0af_fluentd-cloud-logging-kubernetes-minion- 27gf_default_4ab77985c0cb4f28a020d3b097af9654_3e908886 c5d8641d884d gcr.io/google_containers/pause:0.8.0 "/pause" 16 minutes ago Up 16 minutes k8s_POD.e4cc795_fluentd-cloud-logging-kubernetes-minion-27gf_default_4ab77985c0cb4f28a020d3b097af9654_2b980b91

Slide 50

Slide 50 text

Example: Music DB + UI http://music-db:9200 http://music-ui:5601 music-db music-db music-db music-db music-ui

Slide 51

Slide 51 text

Example: Elasticsearch + Kibana Music DB & UI apiVersion: v1 kind: ReplicationController metadata: labels: app: music-db name: music-db spec: replicas: 4 selector: app: music-db template: metadata: labels: app: music-db spec: containers: - name: es image: kubernetes/elasticsearch:1.0 env: - name: "CLUSTER_NAME" value: "mytunes-db" - name: "SELECTOR" value: "name=music-db" - name: "NAMESPACE" value: "mytunes" ports: - name: es containerPort: 9200 - name: es-transport containerPort: 9300

Slide 52

Slide 52 text

Music DB Replication Controller apiVersion: v1 kind: ReplicationController metadata: labels: app: music-db name: music-db spec: replicas: 4 selector: app: music-db template: metadata: labels: app: music-db spec: containers: ...

Slide 53

Slide 53 text

Music DB container containers: - name: es image: kubernetes/elasticsearch:1.0 env: - name: "CLUSTER_NAME" value: "mytunes-db" - name: "SELECTOR" value: "name=music-db" - name: "NAMESPACE" value: "mytunes" ports: - name: es containerPort: 9200 - name: es-transport containerPort: 9300

Slide 54

Slide 54 text

Music DB Service apiVersion: v1 kind: Service metadata: app: music-db labels: app: music-db spec: selector: app: music-db ports: - name: db port: 9200 targetPort: es

Slide 55

Slide 55 text

Music DB http://music-db:9200 music-db music-db music-db music-db

Slide 56

Slide 56 text

Music DB Query

Slide 57

Slide 57 text

Music UI Pod apiVersion: v1 kind: Pod metadata: name: music-ui labels: app: music-ui spec: containers: - name: kibana image: kubernetes/kibana:1.0 env: - name: "ELASTICSEARCH_URL" value: "http://music-db:9200" ports: - name: kibana containerPort: 5601

Slide 58

Slide 58 text

Music UI Service apiVersion: v1 kind: Service metadata: name: music-ui labels: app: music-ui spec: selector: app: music-ui ports: - name: kibana port: 5601 targetPort: kibana type: LoadBalancer

Slide 59

Slide 59 text

Music DB + UI http://music-db:9200 http://music-ui:5601 music-db music-db music-db music-db music-ui http://104.197.86.235:5601

Slide 60

Slide 60 text

Music UI Query

Slide 61

Slide 61 text

Scale DB and UI independently music-db music-db music-db music-ui music-ui

Slide 62

Slide 62 text

Monitoring Optional add-on to Kubernetes clusters Run cAdvisor as a pod on each node • gather stats from all containers • export via REST Run Heapster as a pod in the cluster • just another pod, no special access • aggregate stats Run Influx and Grafana in the cluster • more pods • alternately: store in Google Cloud Monitoring

Slide 63

Slide 63 text

Logging Optional add-on to Kubernetes clusters Run fluentd as a pod on each node • gather logs from all containers • export to elasticsearch Run Elasticsearch as a pod in the cluster • just another pod, no special access • aggregate logs Run Kibana in the cluster • yet another pod • alternately: store in Google Cloud Logging

Slide 64

Slide 64 text

Example: Rolling Upgrade with Labels Servers: Labels: backend v1.2 backend v1.2 backend v1.2 backend v1.2 backend v1.3 backend v1.3 backend v1.3 backend v1.3 backend Replication Controller replicas: 4 v1.2 Replication Controller replicas: 1 v1.3 replicas: 3 replicas: 2 replicas: 3 replicas: 2 replicas: 1 replicas: 4 replicas: 0

Slide 65

Slide 65 text

ISA

Slide 66

Slide 66 text

ISA?

Slide 67

Slide 67 text

Open source: contribute!

Slide 68

Slide 68 text

Pets vs. Cattle

Slide 69

Slide 69 text

Questions? Images by Connie Zhou http://kubernetes.io