Cluster Management with Kubernetes

Cluster Management with Kubernetes Please open the gears tab below
for the speaker notes Satnam Singh [email protected] Work of the Google Kubernetes team and many open source contributors University of Edinburgh, 5 June 2015

The promise of cloud computing

Cloud software deployment is soul destroying Typically a cloud cluster
node is a VM running a specific version of Linux. User applications comprise components each of which may have different and conflicting requirements from libraries, runtimes and kernel features. Applications are coupled to the version of the host operating system: bad. Evolution of the application components is coupled to (and in tension with) the evolution of the host operating system: bad. Also need to deal with node failures, spinning up and turning down replicas to deal with varying load, updating components with disruption … You thought you were a programmer but you are now a sys-admin.

Docker Source: Google Trends

What is Docker? An implementation of the container idea A
package format Resource isolation An ecosystem

Virtual Machines workloads? We need to isolate the application components
from the host environment.

VM vs. Docker

Docker “build once, run anywhere”

Resource isolation Implemented by a number of Linux APIs: •
cgroups: Restrict resources a process can consume • CPU, memory, disk IO, ... • namespaces: Change a process’s view of the system • Network interfaces, PIDs, users, mounts, ... • capabilities: Limits what a user can do • mount, kill, chown, ... • chroots: Determines what parts of the filesystem a user can see

We need more than just packing and isolation Scheduling: Where
should my containers run? Lifecycle and health: Keep my containers running despite failures Discovery: Where are my containers now? Monitoring: What’s happening with my containers? Auth{n,z}: Control who can do things to my containers Aggregates: Compose sets of containers into jobs Scaling: Making jobs bigger or smaller ...

Google confidential │ Do not distribute Everything at Google runs
in containers: • Gmail, Web Search, Maps, ... • MapReduce, MillWheel, Pregel, ... • Colossus, BigTable, Spanner, ... • Even Google’s Cloud Computing product GCE itself: VMs run in containers

Google confidential │ Do not distribute Open Source Containers: Kubernetes
Greek for “Helmsman”; also the root of the word “Governor” and “cybernetic” • Container orchestrator • Builds on Docker containers • also supporting other container technologies • Multiple cloud and bare-metal environments • Supports existing OSS apps • cannot require apps becoming cloud-native • Inspired and informed by Google’s experiences and internal systems • 100% Open source, written in Go Let users manage applications, not machines

Primary concepts Container: A sealed application package (Docker) Pod: A
small group of tightly coupled Containers Labels: Identifying metadata attached to objects Selector: A query against labels, producing a set result Controller: A reconciliation loop that drives current state towards desired state Service: A set of pods that work together

Application Containers Homogenous Machine Fleet (Virtual or Physical) Kubernetes API:
Unified Compute Substrate

Kubernetes Architecture etcd API Server Scheduler Controller Manager Kubelet Service
Proxy kubectl, ajax, etc

Modularity Loose coupling is a goal everywhere • simpler •
composable • extensible Code-level plugins where possible Multi-process where possible Isolate risk by interchangeable parts Example: ReplicationController Example: Scheduler

Reconciliation between declared and actual state

Control loops Drive current state -> desired state Act independently
APIs - no shortcuts or back doors Observed state is truth Recurring pattern in the system Example: ReplicationController observe diff act

Atomic storage Backing store for all master state Hidden behind
an abstract interface Stateless means scalable Watchable • this is a fundamental primitive • don’t poll, watch Using CoreOS etcd

Pods: Grouping containers Container Foo Namespaces - Net - IPC
- .. Container Bar

Pods: Networking Container Foo Container Bar Namespaces - Net -
IPC - ..

Pods: Volumes Container Foo Container Bar Namespaces - Net -
IPC - ..

Pods: Labels Container Foo Container Bar Namespaces - Net -
IPC - ..

Google confidential │ Do not distribute User owned Admin owned
Persistent Volumes A higher-level abstraction - insulation from any one cloud environment Admin provisions them, users claim them Independent lifetime and fate Can be handed-off between pods and lives until user is done with it Dynamically “scheduled” and managed, like nodes and pods Pod ClaimRef PVClaim PersistentVolume GCE PD AWS ELB NFS iSCSI

Labels Arbitrary metadata Attached to any API object Generally represent
identity Queryable by selectors • think SQL ‘select ... where ...’ The only grouping mechanism Use to determine which objects to apply an operation to • pods under a ReplicationController • pods in a Service • capabilities of a node (scheduling constraints) App: Nifty Phase: Dev Role: FE App: Nifty Phase: Dev Role: BE App: Nifty Phase: Test Role: FE App: Nifty Phase: Test Role: BE

Selectors App: Nifty Phase: Dev Role: FE App: Nifty Phase:
Test Role: FE App: Nifty Phase: Dev Role: BE App: Nifty Phase: Test Role: BE

App == Nifty App: Nifty Phase: Dev Role: FE App:
Nifty Phase: Test Role: FE App: Nifty Phase: Dev Role: BE App: Nifty Phase: Test Role: BE Selectors

App == Nifty Role == FE App: Nifty Phase: Dev
Role: FE App: Nifty Phase: Test Role: FE App: Nifty Phase: Dev Role: BE App: Nifty Phase: Test Role: BE Selectors

App == Nifty Role == BE App: Nifty Phase: Dev

App == Nifty Phase == Dev App: Nifty Phase: Dev

App == Nifty Phase == Test App: Nifty Phase: Dev

Pod lifecycle Once scheduled to a node, pods do not
move • restart policy means restart in-place Pods can be observed pending, running, succeeded, or failed • failed is really the end - no more restarts • no complex state machine logic Pods are not rescheduled by the scheduler or apiserver • even if a node dies • controllers are responsible for this • keeps the scheduler simple Apps should consider these rules • Services hide this • Makes pod-to-pod communication more formal

Replication Controllers production backend production backend production backend #N

Replication Controllers A type of controller (control loop) Ensure N
copies of a pod always running • if too few, start new ones • if too many, kill some • group == selector Cleanly layered on top of the core • all access is by public APIs Replicated pods are fungible • No implied ordinality or identity Other kinds of controllers coming • e.g. job controller for batch Replication Controller - Name = “nifty-rc” - Selector = {“App”: “Nifty”} - PodTemplate = { ... } - NumReplicas = 4 API Server How many? 3 Start 1 more OK How many? 4

Services production backend production backend production backend port(s) name 1.2.3.4
“name”

Services 10.0.0.1 : 9376 Client kube-proxy Service - Name =
“nifty-svc” - Selector = {“App”: “Nifty”} - Port = 9376 - ContainerPort = 8080 Portal IP is assigned iptables DNAT TCP / UDP apiserver watch 10.240.2.2 : 8080 10.240.1.1 : 8080 10.240.3.3 : 8080 TCP / UDP

A Kubernetes cluster on Google Compute Engine

A fresh Kubernetes cluster

Node 0f64: logging

Node 02ej: logging, monitoring

Node pk22: logging, DNS

Node 27gf: logging

A counter pod apiVersion: v1 kind: Pod metadata: name: counter
namespace: demo spec: containers: - name: count image: ubuntu:14.04 args: [bash, -c, 'for ((i = 0; ; i++)); do echo "$i: $(date)"; sleep 1; done']

A counter pod $ kubectl create -f counter-pod.yaml --namespace=demo pods/counter
$ kubectl get pods NAME READY REASON RESTARTS AGE fluentd-cloud-logging-kubernetes-minion-1xe3 1/1 Running 0 5m fluentd-cloud-logging-kubernetes-minion-p6cu 1/1 Running 0 5m fluentd-cloud-logging-kubernetes-minion-s2dl 1/1 Running 0 5m fluentd-cloud-logging-kubernetes-minion-ypau 1/1 Running 0 5m kube-dns-v3-55k7n 3/3 Running 0 6m monitoring-heapster-v1-55ix9 0/1 Running 12 6m

Node 27gf: logging, counter

Observing the output of the counter $ kubectl logs counter
--namespace=demo 0: Tue Jun 2 21:37:31 UTC 2015 1: Tue Jun 2 21:37:32 UTC 2015 2: Tue Jun 2 21:37:33 UTC 2015 3: Tue Jun 2 21:37:34 UTC 2015 4: Tue Jun 2 21:37:35 UTC 2015 5: Tue Jun 2 21:37:36 UTC 2015 ...

ssh onto node and “ps” # docker ps CONTAINER ID
IMAGE COMMAND CREATED STATUS PORTS NAMES 532247036a78 ubuntu:14.04 "\"bash -c 'i=0; whi About a minute ago Up About a minute k8s_count.dca54bea_counter_demo_479b8894-0971-11e5-a784-42010af00df1_f6159d40 8cd07658287d gcr.io/google_containers/pause:0.8.0 "/pause" About a minute ago Up About a minute k8s_POD.e4cc795_counter_demo_479b8894-0971-11e5-a784-42010af00df1_7de2fec0 b2dc87db6608 gcr.io/google_containers/fluentd-gcp:1.6 "\"/bin/sh -c '/usr/ 16 minutes ago Up 16 minutes k8s_fluentd-cloud-logging.463ca0af_fluentd-cloud-logging-kubernetes-minion- 27gf_default_4ab77985c0cb4f28a020d3b097af9654_3e908886 c5d8641d884d gcr.io/google_containers/pause:0.8.0 "/pause" 16 minutes ago Up 16 minutes k8s_POD.e4cc795_fluentd-cloud-logging-kubernetes-minion-27gf_default_4ab77985c0cb4f28a020d3b097af9654_2b980b91

Example: Music DB + UI http://music-db:9200 http://music-ui:5601 music-db music-db music-db
music-db music-ui

Example: Elasticsearch + Kibana Music DB & UI apiVersion: v1
kind: ReplicationController metadata: labels: app: music-db name: music-db spec: replicas: 4 selector: app: music-db template: metadata: labels: app: music-db spec: containers: - name: es image: kubernetes/elasticsearch:1.0 env: - name: "CLUSTER_NAME" value: "mytunes-db" - name: "SELECTOR" value: "name=music-db" - name: "NAMESPACE" value: "mytunes" ports: - name: es containerPort: 9200 - name: es-transport containerPort: 9300

Music DB Replication Controller apiVersion: v1 kind: ReplicationController metadata: labels:
app: music-db name: music-db spec: replicas: 4 selector: app: music-db template: metadata: labels: app: music-db spec: containers: ...

Music DB container containers: - name: es image: kubernetes/elasticsearch:1.0 env:
- name: "CLUSTER_NAME" value: "mytunes-db" - name: "SELECTOR" value: "name=music-db" - name: "NAMESPACE" value: "mytunes" ports: - name: es containerPort: 9200 - name: es-transport containerPort: 9300

Music DB Service apiVersion: v1 kind: Service metadata: app: music-db
labels: app: music-db spec: selector: app: music-db ports: - name: db port: 9200 targetPort: es

Music DB http://music-db:9200 music-db music-db music-db music-db

Music DB Query

Music UI Pod apiVersion: v1 kind: Pod metadata: name: music-ui
labels: app: music-ui spec: containers: - name: kibana image: kubernetes/kibana:1.0 env: - name: "ELASTICSEARCH_URL" value: "http://music-db:9200" ports: - name: kibana containerPort: 5601

Music UI Service apiVersion: v1 kind: Service metadata: name: music-ui
labels: app: music-ui spec: selector: app: music-ui ports: - name: kibana port: 5601 targetPort: kibana type: LoadBalancer

Music DB + UI http://music-db:9200 http://music-ui:5601 music-db music-db music-db music-db
music-ui http://104.197.86.235:5601

Music UI Query

Scale DB and UI independently music-db music-db music-db music-ui music-ui

Monitoring Optional add-on to Kubernetes clusters Run cAdvisor as a
pod on each node • gather stats from all containers • export via REST Run Heapster as a pod in the cluster • just another pod, no special access • aggregate stats Run Influx and Grafana in the cluster • more pods • alternately: store in Google Cloud Monitoring

Logging Optional add-on to Kubernetes clusters Run fluentd as a
pod on each node • gather logs from all containers • export to elasticsearch Run Elasticsearch as a pod in the cluster • just another pod, no special access • aggregate logs Run Kibana in the cluster • yet another pod • alternately: store in Google Cloud Logging

Example: Rolling Upgrade with Labels Servers: Labels: backend v1.2 backend
v1.2 backend v1.2 backend v1.2 backend v1.3 backend v1.3 backend v1.3 backend v1.3 backend Replication Controller replicas: 4 v1.2 Replication Controller replicas: 1 v1.3 replicas: 3 replicas: 2 replicas: 3 replicas: 2 replicas: 1 replicas: 4 replicas: 0

Open source: contribute!

Pets vs. Cattle

Questions? Images by Connie Zhou http://kubernetes.io

Cluster Management with Kubernetes

Cluster Management with Kubernetes

Other Decks in Technology

Featured

Transcript