Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Cluster Management with Kubernetes

Cluster Management with Kubernetes

On Friday 5 June 2015 I gave a talk called Cluster Management with Kubernetes to a general audience at the University of Edinburgh. The talk includes an example of a music store system with a Kibana front end UI and an Elasticsearch based back end which helps to make concrete concepts like pods, replication controllers and services.

Satnam Singh

June 05, 2015

Other Decks in Technology


  1. Cluster Management with Kubernetes Please open the gears tab below

    for the speaker notes Satnam Singh satnam@google.com Work of the Google Kubernetes team and many open source contributors University of Edinburgh, 5 June 2015
  2. Cloud software deployment is soul destroying Typically a cloud cluster

    node is a VM running a specific version of Linux. User applications comprise components each of which may have different and conflicting requirements from libraries, runtimes and kernel features. Applications are coupled to the version of the host operating system: bad. Evolution of the application components is coupled to (and in tension with) the evolution of the host operating system: bad. Also need to deal with node failures, spinning up and turning down replicas to deal with varying load, updating components with disruption … You thought you were a programmer but you are now a sys-admin.
  3. What is Docker? An implementation of the container idea A

    package format Resource isolation An ecosystem
  4. Resource isolation Implemented by a number of Linux APIs: •

    cgroups: Restrict resources a process can consume • CPU, memory, disk IO, ... • namespaces: Change a process’s view of the system • Network interfaces, PIDs, users, mounts, ... • capabilities: Limits what a user can do • mount, kill, chown, ... • chroots: Determines what parts of the filesystem a user can see
  5. We need more than just packing and isolation Scheduling: Where

    should my containers run? Lifecycle and health: Keep my containers running despite failures Discovery: Where are my containers now? Monitoring: What’s happening with my containers? Auth{n,z}: Control who can do things to my containers Aggregates: Compose sets of containers into jobs Scaling: Making jobs bigger or smaller ...
  6. Google confidential │ Do not distribute Everything at Google runs

    in containers: • Gmail, Web Search, Maps, ... • MapReduce, MillWheel, Pregel, ... • Colossus, BigTable, Spanner, ... • Even Google’s Cloud Computing product GCE itself: VMs run in containers
  7. Google confidential │ Do not distribute Open Source Containers: Kubernetes

    Greek for “Helmsman”; also the root of the word “Governor” and “cybernetic” • Container orchestrator • Builds on Docker containers • also supporting other container technologies • Multiple cloud and bare-metal environments • Supports existing OSS apps • cannot require apps becoming cloud-native • Inspired and informed by Google’s experiences and internal systems • 100% Open source, written in Go Let users manage applications, not machines
  8. Primary concepts Container: A sealed application package (Docker) Pod: A

    small group of tightly coupled Containers Labels: Identifying metadata attached to objects Selector: A query against labels, producing a set result Controller: A reconciliation loop that drives current state towards desired state Service: A set of pods that work together
  9. Modularity Loose coupling is a goal everywhere • simpler •

    composable • extensible Code-level plugins where possible Multi-process where possible Isolate risk by interchangeable parts Example: ReplicationController Example: Scheduler
  10. Control loops Drive current state -> desired state Act independently

    APIs - no shortcuts or back doors Observed state is truth Recurring pattern in the system Example: ReplicationController observe diff act
  11. Atomic storage Backing store for all master state Hidden behind

    an abstract interface Stateless means scalable Watchable • this is a fundamental primitive • don’t poll, watch Using CoreOS etcd
  12. Google confidential │ Do not distribute User owned Admin owned

    Persistent Volumes A higher-level abstraction - insulation from any one cloud environment Admin provisions them, users claim them Independent lifetime and fate Can be handed-off between pods and lives until user is done with it Dynamically “scheduled” and managed, like nodes and pods Pod ClaimRef PVClaim PersistentVolume GCE PD AWS ELB NFS iSCSI
  13. Labels Arbitrary metadata Attached to any API object Generally represent

    identity Queryable by selectors • think SQL ‘select ... where ...’ The only grouping mechanism Use to determine which objects to apply an operation to • pods under a ReplicationController • pods in a Service • capabilities of a node (scheduling constraints) App: Nifty Phase: Dev Role: FE App: Nifty Phase: Dev Role: BE App: Nifty Phase: Test Role: FE App: Nifty Phase: Test Role: BE
  14. Selectors App: Nifty Phase: Dev Role: FE App: Nifty Phase:

    Test Role: FE App: Nifty Phase: Dev Role: BE App: Nifty Phase: Test Role: BE
  15. App == Nifty App: Nifty Phase: Dev Role: FE App:

    Nifty Phase: Test Role: FE App: Nifty Phase: Dev Role: BE App: Nifty Phase: Test Role: BE Selectors
  16. App == Nifty Role == FE App: Nifty Phase: Dev

    Role: FE App: Nifty Phase: Test Role: FE App: Nifty Phase: Dev Role: BE App: Nifty Phase: Test Role: BE Selectors
  17. App == Nifty Role == BE App: Nifty Phase: Dev

    Role: FE App: Nifty Phase: Test Role: FE App: Nifty Phase: Dev Role: BE App: Nifty Phase: Test Role: BE Selectors
  18. App == Nifty Phase == Dev App: Nifty Phase: Dev

    Role: FE App: Nifty Phase: Test Role: FE App: Nifty Phase: Dev Role: BE App: Nifty Phase: Test Role: BE Selectors
  19. App == Nifty Phase == Test App: Nifty Phase: Dev

    Role: FE App: Nifty Phase: Test Role: FE App: Nifty Phase: Dev Role: BE App: Nifty Phase: Test Role: BE Selectors
  20. Pod lifecycle Once scheduled to a node, pods do not

    move • restart policy means restart in-place Pods can be observed pending, running, succeeded, or failed • failed is really the end - no more restarts • no complex state machine logic Pods are not rescheduled by the scheduler or apiserver • even if a node dies • controllers are responsible for this • keeps the scheduler simple Apps should consider these rules • Services hide this • Makes pod-to-pod communication more formal
  21. Replication Controllers A type of controller (control loop) Ensure N

    copies of a pod always running • if too few, start new ones • if too many, kill some • group == selector Cleanly layered on top of the core • all access is by public APIs Replicated pods are fungible • No implied ordinality or identity Other kinds of controllers coming • e.g. job controller for batch Replication Controller - Name = “nifty-rc” - Selector = {“App”: “Nifty”} - PodTemplate = { ... } - NumReplicas = 4 API Server How many? 3 Start 1 more OK How many? 4
  22. Services : 9376 Client kube-proxy Service - Name =

    “nifty-svc” - Selector = {“App”: “Nifty”} - Port = 9376 - ContainerPort = 8080 Portal IP is assigned iptables DNAT TCP / UDP apiserver watch : 8080 : 8080 : 8080 TCP / UDP
  23. A counter pod apiVersion: v1 kind: Pod metadata: name: counter

    namespace: demo spec: containers: - name: count image: ubuntu:14.04 args: [bash, -c, 'for ((i = 0; ; i++)); do echo "$i: $(date)"; sleep 1; done']
  24. A counter pod $ kubectl create -f counter-pod.yaml --namespace=demo pods/counter

    $ kubectl get pods NAME READY REASON RESTARTS AGE fluentd-cloud-logging-kubernetes-minion-1xe3 1/1 Running 0 5m fluentd-cloud-logging-kubernetes-minion-p6cu 1/1 Running 0 5m fluentd-cloud-logging-kubernetes-minion-s2dl 1/1 Running 0 5m fluentd-cloud-logging-kubernetes-minion-ypau 1/1 Running 0 5m kube-dns-v3-55k7n 3/3 Running 0 6m monitoring-heapster-v1-55ix9 0/1 Running 12 6m
  25. Observing the output of the counter $ kubectl logs counter

    --namespace=demo 0: Tue Jun 2 21:37:31 UTC 2015 1: Tue Jun 2 21:37:32 UTC 2015 2: Tue Jun 2 21:37:33 UTC 2015 3: Tue Jun 2 21:37:34 UTC 2015 4: Tue Jun 2 21:37:35 UTC 2015 5: Tue Jun 2 21:37:36 UTC 2015 ...
  26. ssh onto node and “ps” # docker ps CONTAINER ID

    IMAGE COMMAND CREATED STATUS PORTS NAMES 532247036a78 ubuntu:14.04 "\"bash -c 'i=0; whi About a minute ago Up About a minute k8s_count.dca54bea_counter_demo_479b8894-0971-11e5-a784-42010af00df1_f6159d40 8cd07658287d gcr.io/google_containers/pause:0.8.0 "/pause" About a minute ago Up About a minute k8s_POD.e4cc795_counter_demo_479b8894-0971-11e5-a784-42010af00df1_7de2fec0 b2dc87db6608 gcr.io/google_containers/fluentd-gcp:1.6 "\"/bin/sh -c '/usr/ 16 minutes ago Up 16 minutes k8s_fluentd-cloud-logging.463ca0af_fluentd-cloud-logging-kubernetes-minion- 27gf_default_4ab77985c0cb4f28a020d3b097af9654_3e908886 c5d8641d884d gcr.io/google_containers/pause:0.8.0 "/pause" 16 minutes ago Up 16 minutes k8s_POD.e4cc795_fluentd-cloud-logging-kubernetes-minion-27gf_default_4ab77985c0cb4f28a020d3b097af9654_2b980b91
  27. Example: Elasticsearch + Kibana Music DB & UI apiVersion: v1

    kind: ReplicationController metadata: labels: app: music-db name: music-db spec: replicas: 4 selector: app: music-db template: metadata: labels: app: music-db spec: containers: - name: es image: kubernetes/elasticsearch:1.0 env: - name: "CLUSTER_NAME" value: "mytunes-db" - name: "SELECTOR" value: "name=music-db" - name: "NAMESPACE" value: "mytunes" ports: - name: es containerPort: 9200 - name: es-transport containerPort: 9300
  28. Music DB Replication Controller apiVersion: v1 kind: ReplicationController metadata: labels:

    app: music-db name: music-db spec: replicas: 4 selector: app: music-db template: metadata: labels: app: music-db spec: containers: ...
  29. Music DB container containers: - name: es image: kubernetes/elasticsearch:1.0 env:

    - name: "CLUSTER_NAME" value: "mytunes-db" - name: "SELECTOR" value: "name=music-db" - name: "NAMESPACE" value: "mytunes" ports: - name: es containerPort: 9200 - name: es-transport containerPort: 9300
  30. Music DB Service apiVersion: v1 kind: Service metadata: app: music-db

    labels: app: music-db spec: selector: app: music-db ports: - name: db port: 9200 targetPort: es
  31. Music UI Pod apiVersion: v1 kind: Pod metadata: name: music-ui

    labels: app: music-ui spec: containers: - name: kibana image: kubernetes/kibana:1.0 env: - name: "ELASTICSEARCH_URL" value: "http://music-db:9200" ports: - name: kibana containerPort: 5601
  32. Music UI Service apiVersion: v1 kind: Service metadata: name: music-ui

    labels: app: music-ui spec: selector: app: music-ui ports: - name: kibana port: 5601 targetPort: kibana type: LoadBalancer
  33. Monitoring Optional add-on to Kubernetes clusters Run cAdvisor as a

    pod on each node • gather stats from all containers • export via REST Run Heapster as a pod in the cluster • just another pod, no special access • aggregate stats Run Influx and Grafana in the cluster • more pods • alternately: store in Google Cloud Monitoring
  34. Logging Optional add-on to Kubernetes clusters Run fluentd as a

    pod on each node • gather logs from all containers • export to elasticsearch Run Elasticsearch as a pod in the cluster • just another pod, no special access • aggregate logs Run Kibana in the cluster • yet another pod • alternately: store in Google Cloud Logging
  35. Example: Rolling Upgrade with Labels Servers: Labels: backend v1.2 backend

    v1.2 backend v1.2 backend v1.2 backend v1.3 backend v1.3 backend v1.3 backend v1.3 backend Replication Controller replicas: 4 v1.2 Replication Controller replicas: 1 v1.3 replicas: 3 replicas: 2 replicas: 3 replicas: 2 replicas: 1 replicas: 4 replicas: 0
  36. ISA