Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Kubernetes Meetup Bangalore 17-01-2015

Vaibhav Kohli
January 17, 2015

Kubernetes Meetup Bangalore 17-01-2015

The above slide deck includes the following talks;

1. Commentary on Google "Omega" paper & Areas to Contribute - Kumar Gaurav

2. Introduction to Kubernetes - Vaibhav Kohli

3. Kubernetes Architecture - Shruti Sharma

4. Kubernetes golang code structure - Nupur Agrawal

Vaibhav Kohli

January 17, 2015
Tweet

Other Decks in Research

Transcript

  1. About Google Omega paper 2 • Published in Eurosys 2013

    • Omega: flexible, scalable schedulers for large compute clusters • Google maintains ones of the biggest clusters in the world – Authors got lots of real time trace data to validate
  2. 1. Introduction (types of cluster schedulers) • Monolithic: uses single,

    centralized scheduling for all jobs • Two level: single active resource manager that offers compute resources to multiple parallel, independent “scheduler frameworks” • Shared state: using lock-free optimistic concurrency control 3 General philosophy: run mix of workload on same machine for efficiency
  3. 2. Requirements (of a cluster scheduler) • Goals: – High

    resource utilization – User supplied placement constraints – Rapid decision making – “fairness” and business priority – Robust and always available • Types of jobs to cater to: – Service: long running, so spend time on scheduling it with perfect fit – Batch: perform a computation and then finish, so point in time demand 4
  4. 3. Taxonomy • Design issues to be addressed – Partitioning

    the scheduling work: Load balancing, specialized schedulers – Choice of resources: subset or all cluster resources – Interference: internal competition (only in shared-state scheduler) – Allocation granularity: if job contains many tasks, choice of all-or-none – Cluster-wise behavior: fairness, common priority definition • Monolithic schedulers – Single instance of scheduling code, applies same algo to all incoming jobs • To support different scheduling policies, can provide multiple code paths • 2 level schedulers – Static partition: may lead to fragmentation, and sub-optimal utilization – In Mesos, a centralized resource allocator dynamically partitions a cluster • Shared-state schedulers – Grant each scheduler full access to entire cluster, compete in free-for-all – Once a scheduler makes placement decision, it updates shared copy of cell state in atomic commit – Schedulers can have different policies. Fairness is not guaranteed 5
  5. 4. Design comparisons • Light-weight simulator – Runs in minutes,

    also available as open source • High fidelity simulator – Runs in days, very close to Google clusters • Compared (a) Google monolithic, (b) Mesos, (c) Omega – Monolithic: scheduler busyness & wait time increases linearly with jobs – Mesos: achieves fairness by alternatively offering all available cluster resources to different schedulers. Assumption: all resources are available frequently and scheduler decisions are quick – Omega: designed with 2 schedulers: one for batch workload, other for service Both schedulers start with local copy of cell state (resync it when scheduling) Avg job wait times are comparable to Mesos Omega manages to schedule all jobs (unlike Mesos) Also tried with 32 batch schedulers to check locking-type issues 6
  6. Concluding .. And the remaining sections … • 5. Trace

    driven simulation: used high fidelity simulator, same results • 6. Demonstrated Omega’s superiority over Google monolithic on historical trace of real time Map-Reduce jobs • 7. Additional related work: – optimistic concurrency control, think of databases? – Multi-core OS scheduler? How about checking Linux on HP-Moonlight? • 8. Future work: provide global guarantees (fairness, starvation avoidance, etc), and interference amongst schedulers Kubernetes has inherited shared-state scheduler: multiple master nodes All of the software on the master is stateless, and uses etcd as its backing store. etcd can be configured in a multi-master clustered set up HA set up for the master would be to have three different master notes, with etcd set up multi-master between them, and a load balancer set up in front of the API server to balance traffic between clients and the master. 7
  7. Areas to Contribute in Kubernetes Start with community work: (a)

    bug fixes, (b) documentation, (c) blogs on porting K8s on various platforms Code contribution: (a) Compute: supports random scheduler. Resource based scheduling in dev multiple master is in ideation phase (b) Network: kbr0 and OVS. Bring in SDN capabilities (c) Storage: support for policies (d) (App) Management: important for SIs and SPs (e) K8s UI: not yet there 8
  8. Agenda • Talk 1: Commentary on google ‘Omega’ paper •

    Talk 2: Introduction to Kubernetes with example • Talk 3: Kubernetes Architecture • Talk 4: Kubernetes golang code structure • Talk 5: Areas to contribute in Kubernetes • Talk 6: VMware`s contribution to Kubernetes 10
  9. What is Kubernetes? • System for Container Cluster Management •

    Open Sourced by Google, launched in June 2014, Google I/O • Supports Rackspace,GCE, CoreOS, Azure, vSphere • Manages Docker containers as a default implementation but it will be supporting other linux containers soon. • Kubernetes is: • lean: lightweight, simple, accessible • portable: public, private, hybrid, multi cloud • extensible: modular, pluggable, hookable, composable • self-healing: auto-placement, auto-restart, auto-replication 11
  10. Key Concepts  Master  Minion  Pod : Grouping

    for Containers  Service and Labels  Container  Kubernetes Node  Kubelet  Kubernetes Proxy  API Server  etcd  cAdvisor 13
  11. Master • Master maintains the State of the Kubernetes Server

    runtime • It is the point of entry for all the client calls to configure and manage Kubernetes components like Minions, Pods, Containers • kubelet provides commands to display pod, service and replication controller status. • All the persistent master state is stored in instance of etcd which is high availability key-store. • Master is also made up of following components  API Server  Scheduler  Registries (Minon Registry, Pod Registry,Service Registry, Binding Registry) 14
  12. Minion • Represents the Host / VM where containers are

    created as per the supporterd cloud providers. • Minion is identified with a name and a HOST IP • Key Components of a Minion;  PODs  Kubelet  cAdvisor  Proxy • cAdvisor (Container Advisor) provides container users an understanding of the resource usage and performance characteristics of their running containers. It is a separate open-source project currently integrated with Kubernetes. 15
  13. Important Terminologies • Pod: It is the smallest scheduling unit

    in Kubernetes. It is a collection of Docker containers that share volumes, and don’t have port conflicts.It can be created easily by defining simple json file. • kubectl: It is a command which provides master access to Kubernetes APIs. Through it, the user can deploy, delete, list pods, among other things. • kubelet: It is a service which runs on minions. It processes the container manifests to ensure they are deployed as described by the user. 17
  14. Pod Labels • Labels are key/value pairs that are attached

    to objects, such as pods. Labels can be used to organize and to select subsets of objects. • Via a label selector, the client/user can identify a set of objects. The label selector is the core grouping primitive in Kubernetes. • Example of pod label;  environment=dev, environment=qa, environment=production  tier=frontend, tier=backend  user=vkohli,user=shrutis 18
  15. Replication Controller • Responsible for replicating a POD in case

    of failure. • Replication Controllers should be defined during the creation of pods. Each replicated pod will run on different minions in order to provide HA. • Replication Manager is responsible for polling PODs and maintaining the pod lifecycle. • Replication controller can be defined in the similar way as pod using the json file.
  16. Replication Controller (contd.) • Replication controller create new pods from

    templates, which is currently inline to current replicationController object. • Pods created by replication controller are subsequently updated. • Lables play very important role in replication controller a loosely coupled relation is created between pods and the controller. • The replication controllers will generally carry same label as there pod to keep the mapping. 22
  17. Services • Abstraction of a Software Service (e.g a relational

    data base), consists of a proxy’s port and selector to determine the pod the service request should go to. • Elements of a Service  Name  Port of the proxy  Labels of a Service  Selector  Uses LoadBalancer  Container Port 24
  18. • Kubernetes pods can come up and go down anytime,

    while each pod gets its own IP address those IP addresses can not be relied upon to be stable over time. • Suppose if some set of pods (let's call them backends) provides functionality to other pods (let's call it front end) it cannot be achieved if we heavily tight them each other as pods can come up and go. • So to encounter this problem we define services which provides abstraction and it offers clients an IP and port pair which, when accessed, redirects to the appropriate backends. The set of pods targeted for is determined by a label selector. 25 Services (contd.)
  19. Guest book example • This is a simple mulit-tier web

    application using kubernetes and Docker. • One redis-master for storage. • 2 redis-slave replicated pods. • And also 3 front-end replicated pods running a php web application . • We will be also running some services in order to make the pods independent of each other. 27
  20. Current Kubernetes Implementation • Fedora manual setup • Fedora ansible

    setup • GCE (Google Compute Engine) • vSphere • Windows Azure • AWS(Amazon Web Services) • Vagrant+Virtual box • Coreos • Many more… https://github.com/GoogleCloudPlatform/kubernetes/tree/master/docs/getting-started-guides 29
  21. Master • Contains 2 major components – etcd and api-server

    • etcd – to store configuration data • api-server – serves the Kubernetes APIs, validates and configures 3 type of objects (pods, replication-controllers, services) • Registry is an interface implemented by things that know how to store objects
  22. Pod Registry • Wrapper on top of etcd persistent store

    • Keeps track of Pods and their mapping to minions • Actions Performed on a Pod Registry – List Pods – based on a Selector – Watch Pods – Create a Pod – Update a Pod – Delete a Pod
  23. Service Registry • Wrapper on top of etcd persistent store

    which keeps track of Services • Service is an abstraction containing details of list of pods and policy to access them, provide a socket to clients • List of Actions that can be performed on this registry – Create Service – Get Service – Delete a Service – Update Service – Update Endpoints for the service
  24. Scheduler Implementations • Random Scheduler • Round robin Scheduler •

    VM scheduler(Contribution to Kubernetes by VMWare to have elastic approach)
  25. Kubelet • Component which runs on each minion and manages

    the Pod and Container Lifecycle • There is 1:1 mapping between a Host and a Kubelet • Key Elements of a Kubelet – Docker Client – Pod Workers – Etcd client – Cadvisor client
  26. Kubelet • Key Elements of a Kubelet – Hostname :

    Name of the host, – Docker Client: based on github.com/fsouza/go-dockerclient, used for Docker container create, start, stop and delete – Pod Workers : Workers which act on each POD – Etcd client : Interface for the persistent store – Cadvisor client – Health Checker
  27. Functions performed by a Kubelet • Run an Action on

    a Pod using a Worker • Make binding between Volumes and a container. • Make binding between Ports and a container. • Run a single container in a given POD • Kill a Container • Create a Network Container for a POD • Delete all containers in a POD • Sync POD state with the data structure in a Kubelet • Run a Command in a Container • Health Information of the Container • Root and POD info from Cadvisor
  28. Golang Intro 48 • Golang initially developed at Google •

    Its syntax is loosely derived from C. • No OOPs concept embedded in it. • Kubernetes  Golang + Shell Scripts.
  29. Points to cover 49 • Using GoDep • Building Kubernetes

    Code and deploying a cluster from it • Code walkthrough for some key components
  30. 50 • Used for solving package dependencies in Go •

    Install : go get github.com/tools/godep • Using GoDep 1) Godep save 2) Godep restore 3) Adding a dependency 4) Updating a dependency GoDep
  31. Building K8s code • Building using Docker containers – Building

    happens based on DockerFile in kubernetes/build/build-image/ • Some Scripts need to be run to build binaries. • Run.sh hack/build-go.sh • Run.sh hack/build-cross.sh • Run.sh hack/test-go.sh • Run.sh hack/test-integration.sh 51
  32. Cont.. • Release.sh – Build everything, test it. => tar

    ball • Kubernetes.tar.gz will include – script for picking up and running right client binary based on platform – Examples – Cluster deployment scripts for various clouds – Salt scripts shared across multiple deployment scripts. 52
  33. Walkthrough the code • How to get the cluster (e.g.

    vsphere )up and running. • Important components in Kubernetes cluster – 1)API Server 1)Kubelet – 2)etcd 2)cAdvisor – 3)Scheduler 3)Docker – 4)Replication Controller – 5)Registries • Minion Registry • Pod Registry • Service Registry • Binding Registry – 6)Storage – 7) kubecfg command used to – Query API server. 53 Master Minion
  34. Package kubernetes/pkg/ 1) Kubelet – • kubelet.go – load initialization

    variables for kubelet instance on a particular host. • dockertools – get information about the containers running on the host using docker client 2) Master is also made up of following components – API Server – Scheduler – Registries • Minion Registry • Pod Registry • Service Registry • Binding Registry – Storage 57