Slide 1

Slide 1 text

Google confidential │ Do not distribute Kubernetes & cAdvisor Docker meetup - Bangalore Vishnu Kannan ([email protected]) Software Enginner, Google Inc. Github, IRC: vishh

Slide 2

Slide 2 text

Google confidential │ Do not distribute Google has been developing and using containers to manage our applications for over 10 years. Images by Connie Zhou

Slide 3

Slide 3 text

Google confidential │ Do not distribute Traditional computing • Server per component • Configuration & Management • Plan ahead • Op Ex • Scalability limits. • Utilization libs app kernel libs app kernel libs app kernel libs app kernel libs app kernel libs app kernel

Slide 4

Slide 4 text

Google confidential │ Do not distribute Cluster Computing Solution: Use Computers! • Automation • Scalability • Think about resources • Diverse workloads • Ease of management Omega, Mesos, Kubernetes, etc. libs app kernel libs app kernel libs app kernel libs app kernel libs app kernel libs app kernel

Slide 5

Slide 5 text

Google confidential │ Do not distribute Typical parts of a Cluster Management System • Scheduler • Node manager • Binary deployment service • Application discovery • Application config management • Node and application monitoring

Slide 6

Slide 6 text

Google confidential │ Do not distribute One application per machine? Can we do better? 1. Place multiple application on one machine 2. Partition the physical machine - VMs 3. Partition the resources on a physical machine - cgroups, namespaces (isolation) Smarter Node Management Capacity vs Usage

Slide 7

Slide 7 text

Google confidential │ Do not distribute Old Way: Shared Machines No isolation No namespacing Common libs Highly coupled apps and OS app kernel libs app app app

Slide 8

Slide 8 text

Google confidential │ Do not distribute Old Way: Virtual Machines Some isolation Expensive and inefficient Still highly coupled to the OS Hard to manage libs app kernel libs app app kernel app

Slide 9

Slide 9 text

Google confidential │ Do not distribute New Way: Containers libs app kernel libs app libs app libs app Think of Lightweight VMs Isolate CPU, RAM, Disk, Users, Network, etc. Powered by Linux APIs ● cgroups ● namespaces ● capabilities ● chroots Better resource utilization.

Slide 10

Slide 10 text

Google confidential │ Do not distribute cAdvisor Understand resource usage and performance of applications Google OSS project Written in Go; tiny resource footprint. Supports Docker containers natively. Lxc and raw cgroup supported. Understands Cpu, memory, filesystem and network utilization Easy to use REST Api Runs in a docker container

Slide 11

Slide 11 text

Google confidential │ Do not distribute Heapster Cluster container monitoring using cAdvisor Default monitoring solution in kubernetes Filesystem based API to support other Cluster mangement systems. CoreOS support using Filesystem API Discovers and collects stats from cAdvisors running on all the nodes Pushes data to InfluxDB or BigQuery Typical setup: Heapster + InfluxDB + Grafana

Slide 12

Slide 12 text

Google confidential │ Do not distribute Kubernetes Greek for “Helmsman”; also the root of the word “Governor” • Container orchestrator • Runs Docker containers • Supports multiple cloud and bare- metal environments • Inspired and informed by Google’s experiences • Open source, written in Go Manage applications, not machines

Slide 13

Slide 13 text

Google confidential │ Do not distribute

Slide 14

Slide 14 text

Google confidential │ Do not distribute High Level Design CLI API UI apiserver users master kubelet kubelet kubelet nodes scheduler

Slide 15

Slide 15 text

Google confidential │ Do not distribute Primary Concepts Container: A sealed application package (Docker) Pod: A small group of tightly coupled Containers example: content syncer & web server Controller: A loop that drives current state towards desired state example: replication controller Service: A set of running pods that work together example: load-balanced backends Labels: Identifying metadata attached to other objects example: phase=canary vs. phase=prod Selector: A query against labels, producing a set result example: all pods where label phase == prod

Slide 16

Slide 16 text

Google confidential │ Do not distribute Design Principles Declarative > imperative: State your desired results, let the system actuate Control loops: Observe, rectify, repeat Simple > Complex: Try to do as little as possible Modularity: Components, interfaces, & plugins Legacy compatible: Requiring apps to change is a non-starter Network-centric: IP addresses are cheap No grouping: Labels are the only groups Cattle > Pets: Manage your workload in bulk Open > Closed: Open Source, standards, REST, JSON, etc.

Slide 17

Slide 17 text

Google confidential │ Do not distribute Pets vs. Cattle

Slide 18

Slide 18 text

Google confidential │ Do not distribute Control Loops Drive current state -> desired state Act independently APIs - no shortcuts or back doors Observed state is truth Recurring pattern in the system Example: ReplicationController observe diff act

Slide 19

Slide 19 text

Google confidential │ Do not distribute Atomic Storage Backing store for all master state Hidden behind an abstract interface Stateless means scalable Watchable • this is a fundamental primitive • don’t poll, watch Using CoreOS etcd

Slide 20

Slide 20 text

Google confidential │ Do not distribute Pods Small group of containers & volumes Tightly coupled Scheduling atom Shared namespace • share IP address & localhost Ephemeral • can die and be replaced Example: data puller & web server Pod File Puller Web Server Volume Consumers Content Manager

Slide 21

Slide 21 text

Google confidential │ Do not distribute Pod Networking Pod IPs are routable • Docker default is private IP Pods can reach each other without NAT • even across nodes Pods can egress traffic • if allowed by cloud environment No brokering of port numbers Fundamental requirement • several SDN solutions

Slide 22

Slide 22 text

Google confidential │ Do not distribute Volumes Pod scoped Share pod’s lifetime & fate Support various types of volumes • Empty directory (default) • Host file/directory • Git repository • GCE Persistent Disk • ...more to come, suggestions welcome Pod Container Container Git GitHub Host Host’s FS GCE GCE PD Empty

Slide 23

Slide 23 text

Google confidential │ Do not distribute Pod Lifecycle Once scheduled to a node, pods do not move • restart policy means restart in-place Pods can be observed pending, running, succeeded, or failed • failed is really the end - no more restarts • no complex state machine logic Pods are not rescheduled by the scheduler or apiserver • even if a node dies • controllers are responsible for this • keeps the scheduler simple

Slide 24

Slide 24 text

Google confidential │ Do not distribute Labels Arbitrary metadata Attached to any API object Generally represent identity Queryable by selectors • think SQL ‘select ... where ...’ The only grouping mechanism • pods under a ReplicationController • pods in a Service • capabilities of a node (constraints) Example: “phase: canary” App: Nifty Phase: Dev Role: FE App: Nifty Phase: Dev Role: BE App: Nifty Phase: Test Role: FE App: Nifty Phase: Test Role: BE

Slide 25

Slide 25 text

Google confidential │ Do not distribute Selectors App: Nifty Phase: Dev Role: FE App: Nifty Phase: Test Role: FE App: Nifty Phase: Dev Role: BE App: Nifty Phase: Test Role: BE

Slide 26

Slide 26 text

Google confidential │ Do not distribute App == Nifty App: Nifty Phase: Dev Role: FE App: Nifty Phase: Test Role: FE App: Nifty Phase: Dev Role: BE App: Nifty Phase: Test Role: BE Selectors

Slide 27

Slide 27 text

Google confidential │ Do not distribute App == Nifty Role == FE App: Nifty Phase: Dev Role: FE App: Nifty Phase: Test Role: FE App: Nifty Phase: Dev Role: BE App: Nifty Phase: Test Role: BE Selectors

Slide 28

Slide 28 text

Google confidential │ Do not distribute App == Nifty Role == BE App: Nifty Phase: Dev Role: FE App: Nifty Phase: Test Role: FE App: Nifty Phase: Dev Role: BE App: Nifty Phase: Test Role: BE Selectors

Slide 29

Slide 29 text

Google confidential │ Do not distribute App == Nifty Phase == Dev App: Nifty Phase: Dev Role: FE App: Nifty Phase: Test Role: FE App: Nifty Phase: Dev Role: BE App: Nifty Phase: Test Role: BE Selectors

Slide 30

Slide 30 text

Google confidential │ Do not distribute App == Nifty Phase == Test App: Nifty Phase: Dev Role: FE App: Nifty Phase: Test Role: FE App: Nifty Phase: Dev Role: BE App: Nifty Phase: Test Role: BE Selectors

Slide 31

Slide 31 text

Google confidential │ Do not distribute Replication Controllers Canonical example of control loops Runs out-of-process wrt API server Have 1 job: ensure N copies of a pod • if too few, start new ones • if too many, kill some • group == selector Cleanly layered on top of the core • all access is by public APIs Replication Controller - Name = “nifty-rc” - Selector = {“App”: “Nifty”} - PodTemplate = { ... } - NumReplicas = 4 API Server How many? 3 Start 1 more OK How many? 4

Slide 32

Slide 32 text

Google confidential │ Do not distribute Services A group of pods that act as one • group == selector Defines access policy • only “load balanced” for now Gets a stable virtual IP and port • called the service portal • soon to have DNS VIP is captured by kube-proxy • watches the service constituency • updates when backends change Hide complexity - ideal for non-native apps Portal (VIP) Client

Slide 33

Slide 33 text

Google confidential │ Do not distribute Cluster Services Logging, Monitoring, DNS, etc. All run as pods in the cluster - no special treatment, no back doors Open-source solutions for everything • cadvisor + influxdb + heapster == cluster monitoring • fluentd + elasticsearch + kibana == cluster logging • skydns + kube2sky == cluster DNS Can be easily replaced by custom solutions • Modular clusters to fit your needs

Slide 34

Slide 34 text

Google confidential │ Do not distribute Status & Plans Open sourced in June, 2014 Google just launched Google Container Engine (GKE) • hosted Kubernetes • https://cloud.google.com/container-engine/ Roadmap: • https://github.com/GoogleCloudPlatform/kubernetes/blob/master/docs/roadmap.md Driving towards a 1.0 release in O(months)

Slide 35

Slide 35 text

Google confidential │ Do not distribute The Goal: Shake Things Up Containers is a new way of working Requires new concepts and new tools Google has a lot of experience... ...but we are listening to the users Workload portability is important!

Slide 36

Slide 36 text

Google confidential │ Do not distribute cAdvisor & Kubernetes is Open Source We want your help! http://kubernetes.io https://github.com/google/cadvisor https://github.com/GoogleCloudPlatform/heapster irc.freenode.net #google-containers

Slide 37

Slide 37 text

Google confidential │ Do not distribute Backup slides

Slide 38

Slide 38 text

Google confidential │ Do not distribute Why containers? • Performance • Repeatability • Isolation • Quality of service • Accounting • Visibility • Portability A fundamentally different way of managing applications Images by Connie Zhou

Slide 39

Slide 39 text

Google confidential │ Do not distribute cAdvisor Internals Docker Kernel cAdvisor ● Collect ● Measure ● Analyze ● Export Users LXC lmctfy

Slide 40

Slide 40 text

Google confidential │ Do not distribute Docker Dramatically simplifies node management. Easy to use Build, test and deploy - anywhere Provides resource isolation and security Big ecosystem exists around Docker WIP - better resource isolation, hardening, performance, etc.

Slide 41

Slide 41 text

Google confidential │ Do not distribute cAdvisor roadmap • Better signals and more resources • Memory • Disk I/O • Network • More suggestions • Insufficient resources • Performance effects • Start applying suggestions

Slide 42

Slide 42 text

Google confidential │ Do not distribute Heapster roadmap • Auto scaling • Nodes • Containers • Recognize Antagonists • Bad interactions between containers • Current work: CPI2 • React to signals • Migrate containers (CRUI)

Slide 43

Slide 43 text

Google confidential │ Do not distribute 10.1.1.0/24 10.1.1.93 10.1.1.113 Docker Networking 10.1.2.0/24 10.1.2.118 10.1.3.0/24 10.1.3.129

Slide 44

Slide 44 text

Google confidential │ Do not distribute 10.1.1.0/24 10.1.1.93 10.1.1.113 Docker Networking 10.1.2.0/24 10.1.2.118 10.1.3.0/24 10.1.3.129 NAT NAT NAT NAT NAT

Slide 45

Slide 45 text

Google confidential │ Do not distribute 10.1.1.0/24 10.1.1.93 10.1.1.113 Pod Networking 10.1.2.0/24 10.1.2.118 10.1.3.0/24 10.1.3.129

Slide 46

Slide 46 text

Google confidential │ Do not distribute Replication Controllers node 1 f0118 node 3 node 4 node 2 d9376 b0111 a1209 Replication Controller - Desired = 4 - Current = 4

Slide 47

Slide 47 text

Google confidential │ Do not distribute Replication Controllers node 1 f0118 node 3 node 4 node 2 Replication Controller - Desired = 4 - Current = 3 d9376 b0111 a1209

Slide 48

Slide 48 text

Google confidential │ Do not distribute Replication Controllers node 1 f0118 node 3 node 4 node 2 Replication Controller - Desired = 4 - Current = 4 d9376 b0111 a1209 c9bad

Slide 49

Slide 49 text

Google confidential │ Do not distribute Replication Controllers node 1 f0118 node 3 node 4 node 2 Replication Controller - Desired = 4 - Current = 5 d9376 b0111 a1209 c9bad

Slide 50

Slide 50 text

Google confidential │ Do not distribute Replication Controllers node 1 f0118 node 3 node 4 node 2 Replication Controller - Desired = 4 - Current = 4 d9376 b0111 a1209 c9bad

Slide 51

Slide 51 text

Google confidential │ Do not distribute Services 10.0.0.1 : 9376 Client kube-proxy Service - Name = “nifty-svc” - Selector = {“App”: “Nifty”} - Port = 9376 - ContainerPort = 8080 Portal IP is assigned iptables DNAT TCP / UDP apiserver watch 10.240.2.2 : 8080 10.240.1.1 : 8080 10.240.3.3 : 8080 TCP / UDP