The Scope of This Talk • Kubernetes • by Cloud Native Computing Foundation • Docker 1.12+ • by Docker Inc. • Compose + Swarm is kind of legacy, so they will not be included in this talk • Mesos • by Apache Software Foundation • only with Marathon, DC/OS is not included (the scope of later is larger)
Kubernetes • Build right things with containers by following concepts and conventions • like a “Spring Framework” in container eco-system • Design • master • api-server, scheduler, controller-manager • node • kubelet, kube-proxy • independent binaries • Pros: modular, transparent, manageable • Cons: a little bit complex to setup (1.4 is much better now) • network & volume plugins • driven by control loops
kubelet SyncLoop controller-manager ControlLoop kubelet SyncLoop proxy proxy Objects: pod replica namespace service endpoint job deployment volume petset … etcd scheduler api-server Reconcile: desired world VS real world handler Kubernetes
Tips: Control Theory* *Andrei, Neculai (2005). "Modern Control Theory – A historical Perspective" • It’s the basic model for: • Kubernetes controller and all other event loops • SwarmKit orchestrator • … ControlLoop
Docker 1.12+ • Build-in cluster support for Docker containers • powered by swarmkit • SwarmtKit Design • build-in data store • manager • several components build into one binary • control loop driven • worker • use pull model to connect with manager WARNING: SwarmKit is currently a primitive project, expect change of this part
Allocator Dispatcher Scheduler Orchestrator • API: accept commands from client • Create object in raft based memory store • github.com/coreos/etcd/raft for consensus • github.com/hashicorp/go-memdb for in-memory object storage • state, cluster, node, service, task, network … $ docker service create API Store SwarmKit Manager
Allocator Dispatcher Scheduler • Create Tasks from Service object • Task: “start a container” etc • Reconcile loop for Service objects • Control Theory again Orchestrator API Store Orchestrator Service (replica=2) Task Task check if replica=2 or not SwarmKit Manager
• Allocates IP addresses to Services and Tasks • (and allocate volumes in the future) • VIP and ports for Service • IP for all endpoints (veth pairs) in the network the task is attached to Orchestrator Dispatcher Scheduler API Store Allocator SwarmKit Manager Network Create
• Assign Task to Node • unassignedTasks • nodeHeap • search in heap to find the best node which meets the constraints && has lightest workloads • ReadyFilter, ResourceFilter, ConstraintFilter Orchestrator Dispatcher API Store Scheduler Allocator SwarmKit Manager
Mesos 1.0 • A distributed systems kernel • originally designed to run big data job • core idea: fine-grained resource sharing • Mesos Design • Master + Slave + Zookeeper • two level scheduling • scheduler + executor = framework • need to use frameworks like Marathon for orchestration and management • containerizer • multiple container runtime & image support (>=1.0)
Checkpoint Kubernetes Docker SwarmKit Mesos+Marathon Design control loops driven control loops driven (but in single binary) two level scheduling Coordination etcd build-in raft Zookeeper Container Runtime multiple single, but has potential for more OCI runtimes multiple Container Image Docker Image, ACI, more in future Docker Image Docker Image, ACI, more in future Docker Daemon no need need no need
About Build-In Data Store Pros Cons easy to setup hard to understand & debug fewer round trips hard to do backup/restore, migration, monitoring/audit easy to do performance tuning lack of mgmt API like:etcd admin guide
Control Panel: Orchestration + Management • “Defines when and what to do next through out the automated workflow” • workload management • secret management • configuration management • scale and autoscaling • stateful workload • … and more
Workload Management e.g. “a web server with 2 replicas” Kubernetes Docker SwarmKit Mesos+Marathon Description Deployment Service Application Version Control yes (revision) not yet yes (deployments)
• Docker SwarmKit “Service” • $ docker service create SERVICE —replicas=5 … • $ docker service scale SERVICE=REPLICAS • $ docker service update [OPTIONS] SERVICE • rolling update • 30+ update options are supported • —container-label-add value • —container-label-rm value • --env-add value • --env-rm value • —image string • …
Secret Management • Kubernetes • Secret volume • encrypted and stored in etcd • consumed by ENV or volume • Docker SwarmKit • under discussion: https://github.com/docker/swarmkit/issues/1329 • Mesos + Marathon • only in DC/OS • stored in ZooKeeper, exposed as ENV in Marathon • Another similar feature is Configuration Management
Autoscaling • Kubernetes • HorizontalPodAutoScaler • default: CPU • Custom Metrics: • user defined endpoint, e.g. http://localhost:9100/metrics • share same metric data structure with CNCF projects like Prometheus • Docker SwarmKit • not yet: https://github.com/docker/swarmkit/issues/486#issuecomment-219133613 • Mesos + Marathon • a stand-by `marathon-autoscale.py` • autoscales application based on the utilization metrics from Mesos
Worker Worker container sandbox ingress sandbox Service Discovery & Load Balance • Docker SwarmKit • Load Balancer • ipvs NAT mode • External Access • Routing Mesh • Name Service • embedded DNS server • for service and task Container 2 Container 1 ipvs Gossip to update the iptables & ipvs rules port mapping iptables iptables outside traffic (when service created with -p) internal traffic ipvs • Two kinds of sandboxes • ingress: on every worker • container: on workers where task lives • Two networks are needed • ingress overlay • user-defined overlay DNS: svc->vip ingress sandbox
Kubernetes • Pod as schedule unit • this is unique, but why? • Multi-Scheduler • pod1: scheduler1, pod2 : scheduler2 • QoS tiers • anyone remember the core idea of Borg? • Guaranteed (requests == limit) • Burstable (requests < limit) • Best-Effort (no request & limit) • More Borg features are on the way • equivalence class, pod level resource boundary … Burstable Pod
Mesos + Marathon • Task as schedule unit (Pod support in plan) • Multi-Scheduler • Mesos is designed to run multiple frameworks (schedulers) • Strategy • Two level scheduling (the killing weapon of Mesos) • Twitter scale … • fine-grained resource sharing (like Borg) • QoS tiers • of course • And much more • task eviction, data locality, max-min fairness, priority, offer reject, Delay Scheduling • and Big Data of course
A Use Case: hyper.sh • hyper.sh is “Docker Done the Right Way” • $ hyper run mysql • $ hyper run --link mysql wordpress • $ hyper fip attach 22.33.44.55 wordpress • But Hyper.sh is powered by Kubernetes • and also maintain Kubernetes features
Just Personal Idea • So, if • I am a individual developer/org, trying to find something that is friendly and just works • I use Docker SwarmKit • I have a “Twitter scale” cluster to manage or I am a Big Data user • I need Mesos • But if what I need is a infrastructure layer to build my systems on top of it in right way • Kubernetes is the choice