Kubernetes Meetup Bangalore 17-01-2015

© 2014 VMware Inc. All rights reserved. Commentary on Google
Omega paper KG

About Google Omega paper 2 • Published in Eurosys 2013
• Omega: ﬂexible, scalable schedulers for large compute clusters • Google maintains ones of the biggest clusters in the world – Authors got lots of real time trace data to validate

1. Introduction (types of cluster schedulers) • Monolithic: uses single,
centralized scheduling for all jobs • Two level: single active resource manager that offers compute resources to multiple parallel, independent “scheduler frameworks” • Shared state: using lock-free optimistic concurrency control 3 General philosophy: run mix of workload on same machine for efficiency

2. Requirements (of a cluster scheduler) • Goals: – High
resource utilization – User supplied placement constraints – Rapid decision making – “fairness” and business priority – Robust and always available • Types of jobs to cater to: – Service: long running, so spend time on scheduling it with perfect fit – Batch: perform a computation and then finish, so point in time demand 4

3. Taxonomy • Design issues to be addressed – Partitioning
the scheduling work: Load balancing, specialized schedulers – Choice of resources: subset or all cluster resources – Interference: internal competition (only in shared-state scheduler) – Allocation granularity: if job contains many tasks, choice of all-or-none – Cluster-wise behavior: fairness, common priority definition • Monolithic schedulers – Single instance of scheduling code, applies same algo to all incoming jobs • To support different scheduling policies, can provide multiple code paths • 2 level schedulers – Static partition: may lead to fragmentation, and sub-optimal utilization – In Mesos, a centralized resource allocator dynamically partitions a cluster • Shared-state schedulers – Grant each scheduler full access to entire cluster, compete in free-for-all – Once a scheduler makes placement decision, it updates shared copy of cell state in atomic commit – Schedulers can have different policies. Fairness is not guaranteed 5

4. Design comparisons • Light-weight simulator – Runs in minutes,
also available as open source • High fidelity simulator – Runs in days, very close to Google clusters • Compared (a) Google monolithic, (b) Mesos, (c) Omega – Monolithic: scheduler busyness & wait time increases linearly with jobs – Mesos: achieves fairness by alternatively offering all available cluster resources to different schedulers. Assumption: all resources are available frequently and scheduler decisions are quick – Omega: designed with 2 schedulers: one for batch workload, other for service Both schedulers start with local copy of cell state (resync it when scheduling) Avg job wait times are comparable to Mesos Omega manages to schedule all jobs (unlike Mesos) Also tried with 32 batch schedulers to check locking-type issues 6

Concluding .. And the remaining sections … • 5. Trace
driven simulation: used high fidelity simulator, same results • 6. Demonstrated Omega’s superiority over Google monolithic on historical trace of real time Map-Reduce jobs • 7. Additional related work: – optimistic concurrency control, think of databases? – Multi-core OS scheduler? How about checking Linux on HP-Moonlight? • 8. Future work: provide global guarantees (fairness, starvation avoidance, etc), and interference amongst schedulers Kubernetes has inherited shared-state scheduler: multiple master nodes All of the software on the master is stateless, and uses etcd as its backing store. etcd can be configured in a multi-master clustered set up HA set up for the master would be to have three different master notes, with etcd set up multi-master between them, and a load balancer set up in front of the API server to balance traffic between clients and the master. 7

Areas to Contribute in Kubernetes Start with community work: (a)
bug fixes, (b) documentation, (c) blogs on porting K8s on various platforms Code contribution: (a) Compute: supports random scheduler. Resource based scheduling in dev multiple master is in ideation phase (b) Network: kbr0 and OVS. Bring in SDN capabilities (c) Storage: support for policies (d) (App) Management: important for SIs and SPs (e) K8s UI: not yet there 8

© 2014 VMware Inc. All rights reserved. Introduction to Kubernetes
Date: 17-01-2015 Vaibhav

Agenda • Talk 1: Commentary on google ‘Omega’ paper •
Talk 2: Introduction to Kubernetes with example • Talk 3: Kubernetes Architecture • Talk 4: Kubernetes golang code structure • Talk 5: Areas to contribute in Kubernetes • Talk 6: VMware`s contribution to Kubernetes 10

What is Kubernetes? • System for Container Cluster Management •
Open Sourced by Google, launched in June 2014, Google I/O • Supports Rackspace,GCE, CoreOS, Azure, vSphere • Manages Docker containers as a default implementation but it will be supporting other linux containers soon. • Kubernetes is: • lean: lightweight, simple, accessible • portable: public, private, hybrid, multi cloud • extensible: modular, pluggable, hookable, composable • self-healing: auto-placement, auto-restart, auto-replication 11

Kubernetes basic structure 12

Key Concepts  Master  Minion  Pod : Grouping
for Containers  Service and Labels  Container  Kubernetes Node  Kubelet  Kubernetes Proxy  API Server  etcd  cAdvisor 13

Master • Master maintains the State of the Kubernetes Server
runtime • It is the point of entry for all the client calls to configure and manage Kubernetes components like Minions, Pods, Containers • kubelet provides commands to display pod, service and replication controller status. • All the persistent master state is stored in instance of etcd which is high availability key-store. • Master is also made up of following components  API Server  Scheduler  Registries (Minon Registry, Pod Registry,Service Registry, Binding Registry) 14

Minion • Represents the Host / VM where containers are
created as per the supporterd cloud providers. • Minion is identified with a name and a HOST IP • Key Components of a Minion;  PODs  Kubelet  cAdvisor  Proxy • cAdvisor (Container Advisor) provides container users an understanding of the resource usage and performance characteristics of their running containers. It is a separate open-source project currently integrated with Kubernetes. 15

High Level Components 16

Important Terminologies • Pod: It is the smallest scheduling unit
in Kubernetes. It is a collection of Docker containers that share volumes, and don’t have port conflicts.It can be created easily by defining simple json file. • kubectl: It is a command which provides master access to Kubernetes APIs. Through it, the user can deploy, delete, list pods, among other things. • kubelet: It is a service which runs on minions. It processes the container manifests to ensure they are deployed as described by the user. 17

Pod Labels • Labels are key/value pairs that are attached
to objects, such as pods. Labels can be used to organize and to select subsets of objects. • Via a label selector, the client/user can identify a set of objects. The label selector is the core grouping primitive in Kubernetes. • Example of pod label;  environment=dev, environment=qa, environment=production  tier=frontend, tier=backend  user=vkohli,user=shrutis 18

Simple pod example with two containers

Replication Controller • Responsible for replicating a POD in case
of failure. • Replication Controllers should be defined during the creation of pods. Each replicated pod will run on different minions in order to provide HA. • Replication Manager is responsible for polling PODs and maintaining the pod lifecycle. • Replication controller can be defined in the similar way as pod using the json file.

Replication Controller (contd.) • Replication controller create new pods from
templates, which is currently inline to current replicationController object. • Pods created by replication controller are subsequently updated. • Lables play very important role in replication controller a loosely coupled relation is created between pods and the controller. • The replication controllers will generally carry same label as there pod to keep the mapping. 22

Basic Functionality of Kubernetes 23

Services • Abstraction of a Software Service (e.g a relational
data base), consists of a proxy’s port and selector to determine the pod the service request should go to. • Elements of a Service  Name  Port of the proxy  Labels of a Service  Selector  Uses LoadBalancer  Container Port 24

• Kubernetes pods can come up and go down anytime,
while each pod gets its own IP address those IP addresses can not be relied upon to be stable over time. • Suppose if some set of pods (let's call them backends) provides functionality to other pods (let's call it front end) it cannot be achieved if we heavily tight them each other as pods can come up and go. • So to encounter this problem we define services which provides abstraction and it offers clients an IP and port pair which, when accessed, redirects to the appropriate backends. The set of pods targeted for is determined by a label selector. 25 Services (contd.)

Basic service functionality 26

Guest book example • This is a simple mulit-tier web
application using kubernetes and Docker. • One redis-master for storage. • 2 redis-slave replicated pods. • And also 3 front-end replicated pods running a php web application . • We will be also running some services in order to make the pods independent of each other. 27

Current Kubernetes Implementation • Fedora manual setup • Fedora ansible
setup • GCE (Google Compute Engine) • vSphere • Windows Azure • AWS(Amazon Web Services) • Vagrant+Virtual box • Coreos • Many more… https://github.com/GoogleCloudPlatform/kubernetes/tree/master/docs/getting-started-guides 29

Master • Contains 2 major components – etcd and api-server
• etcd – to store configuration data • api-server – serves the Kubernetes APIs, validates and configures 3 type of objects (pods, replication-controllers, services) • Registry is an interface implemented by things that know how to store objects

Master – Key Components

Registries Implemented by Master

Pod Registry • Wrapper on top of etcd persistent store
• Keeps track of Pods and their mapping to minions • Actions Performed on a Pod Registry – List Pods – based on a Selector – Watch Pods – Create a Pod – Update a Pod – Delete a Pod

Service Registry • Wrapper on top of etcd persistent store
which keeps track of Services • Service is an abstraction containing details of list of pods and policy to access them, provide a socket to clients • List of Actions that can be performed on this registry – Create Service – Get Service – Delete a Service – Update Service – Update Endpoints for the service

Scheduler • Responsible for scheduling POD on a minion •
Multiple implementations possible

Scheduler Implementations • Random Scheduler • Round robin Scheduler •
VM scheduler(Contribution to Kubernetes by VMWare to have elastic approach)

Kubernetes Cluster

Kubelet • Component which runs on each minion and manages
the Pod and Container Lifecycle • There is 1:1 mapping between a Host and a Kubelet • Key Elements of a Kubelet – Docker Client – Pod Workers – Etcd client – Cadvisor client

Kubelet • Key Elements of a Kubelet – Hostname :
Name of the host, – Docker Client: based on github.com/fsouza/go-dockerclient, used for Docker container create, start, stop and delete – Pod Workers : Workers which act on each POD – Etcd client : Interface for the persistent store – Cadvisor client – Health Checker

Functions performed by a Kubelet • Run an Action on
a Pod using a Worker • Make binding between Volumes and a container. • Make binding between Ports and a container. • Run a single container in a given POD • Kill a Container • Create a Network Container for a POD • Delete all containers in a POD • Sync POD state with the data structure in a Kubelet • Run a Command in a Container • Health Information of the Container • Root and POD info from Cadvisor

Run Container : Sequence Diagram

Golang Intro 48 • Golang initially developed at Google •
Its syntax is loosely derived from C. • No OOPs concept embedded in it. • Kubernetes  Golang + Shell Scripts.

Points to cover 49 • Using GoDep • Building Kubernetes
Code and deploying a cluster from it • Code walkthrough for some key components

50 • Used for solving package dependencies in Go •
Install : go get github.com/tools/godep • Using GoDep 1) Godep save 2) Godep restore 3) Adding a dependency 4) Updating a dependency GoDep

Building K8s code • Building using Docker containers – Building
happens based on DockerFile in kubernetes/build/build-image/ • Some Scripts need to be run to build binaries. • Run.sh hack/build-go.sh • Run.sh hack/build-cross.sh • Run.sh hack/test-go.sh • Run.sh hack/test-integration.sh 51

Cont.. • Release.sh – Build everything, test it. => tar
ball • Kubernetes.tar.gz will include – script for picking up and running right client binary based on platform – Examples – Cluster deployment scripts for various clouds – Salt scripts shared across multiple deployment scripts. 52

Walkthrough the code • How to get the cluster (e.g.
vsphere )up and running. • Important components in Kubernetes cluster – 1)API Server 1)Kubelet – 2)etcd 2)cAdvisor – 3)Scheduler 3)Docker – 4)Replication Controller – 5)Registries • Minion Registry • Pod Registry • Service Registry • Binding Registry – 6)Storage – 7) kubecfg command used to – Query API server. 53 Master Minion

Package kubernetes/cmd/apiserver 54

Package kubernetes/cmd/kubecfg 55

Package kubernetes/cmd/kubelet 56

Package kubernetes/pkg/ 1) Kubelet – • kubelet.go – load initialization
variables for kubelet instance on a particular host. • dockertools – get information about the containers running on the host using docker client 2) Master is also made up of following components – API Server – Scheduler – Registries • Minion Registry • Pod Registry • Service Registry • Binding Registry – Storage 57

Thank you!

Kubernetes Meetup Bangalore 17-01-2015

Kubernetes Meetup Bangalore 17-01-2015

Other Decks in Research

Featured

Transcript