$30 off During Our Annual Pro Sale. View Details »

Container Orchestration and Management Systems Comparison from Technical View

Container Orchestration and Management Systems Comparison from Technical View

Container Orchestration and Management Systems Comparison from Technical View. This is the speak I did at KubeCon 2016

Lei (Harry) Zhang

June 28, 2017
Tweet

More Decks by Lei (Harry) Zhang

Other Decks in Technology

Transcript

  1. Container Orchestration and
    Management Systems
    Comparison from Technical View
    Harry Zhang, Member of #CNCF

    View Slide

  2. The Scope of This Talk
    • Kubernetes
    • by Cloud Native Computing Foundation
    • Docker 1.12+
    • by Docker Inc.
    • Compose + Swarm is kind of legacy, so they will not be included in this talk
    • Mesos
    • by Apache Software Foundation
    • only with Marathon, DC/OS is not included (the scope of later is larger)

    View Slide

  3. Chapter 1:
    Core Idea and Architecture

    View Slide

  4. Kubernetes
    • Build right things with containers by following concepts and conventions
    • like a “Spring Framework” in container eco-system
    • Design
    • master
    • api-server, scheduler, controller-manager
    • node
    • kubelet, kube-proxy
    • independent binaries
    • Pros: modular, transparent, manageable
    • Cons: a little bit complex to setup (1.4 is much better now)
    • network & volume plugins
    • driven by control loops

    View Slide

  5. kubelet
    SyncLoop
    kubelet
    SyncLoop
    proxy
    proxy
    1 Pod created
    etcd
    scheduler
    api-server
    Kubernetes

    View Slide

  6. kubelet
    SyncLoop
    kubelet
    SyncLoop
    proxy
    proxy
    2 Pod object added
    etcd
    scheduler
    api-server
    Kubernetes

    View Slide

  7. kubelet
    SyncLoop
    kubelet
    SyncLoop
    proxy
    proxy
    3.1 New pod object detected
    3.2 Bind pod with node
    etcd
    scheduler
    api-server
    Kubernetes

    View Slide

  8. kubelet
    SyncLoop
    kubelet
    SyncLoop
    proxy
    proxy
    4.1 Detected pod bind with me
    4.2 Start containers in pod
    etcd
    scheduler
    api-server
    Kubernetes

    View Slide

  9. kubelet
    SyncLoop
    controller-manager
    ControlLoop
    kubelet
    SyncLoop
    proxy
    proxy
    Objects:
    pod
    replica
    namespace
    service
    endpoint
    job
    deployment
    volume
    petset

    etcd
    scheduler
    api-server
    Reconcile:
    desired world VS real world
    handler
    Kubernetes

    View Slide

  10. Tips: Control Theory*
    *Andrei, Neculai (2005). "Modern Control Theory – A historical Perspective"
    • It’s the basic model for:
    • Kubernetes controller and all other event loops
    • SwarmKit orchestrator
    • …
    ControlLoop

    View Slide

  11. Docker 1.12+
    • Build-in cluster support for Docker containers
    • powered by swarmkit
    • SwarmtKit Design
    • build-in data store
    • manager
    • several components build into one binary
    • control loop driven
    • worker
    • use pull model to connect with manager
    WARNING:
    SwarmKit is currently a primitive project, expect
    change of this part

    View Slide

  12. Allocator
    Dispatcher
    Scheduler
    Orchestrator
    • API: accept commands from client
    • Create object in raft based memory store
    • github.com/coreos/etcd/raft for consensus
    • github.com/hashicorp/go-memdb for in-memory
    object storage
    • state, cluster, node, service, task, network …
    $ docker service create
    API
    Store
    SwarmKit Manager

    View Slide

  13. Allocator
    Dispatcher
    Scheduler
    • Create Tasks from Service object
    • Task: “start a container” etc
    • Reconcile loop for Service objects
    • Control Theory again
    Orchestrator
    API
    Store
    Orchestrator
    Service (replica=2)
    Task
    Task
    check if replica=2 or not
    SwarmKit Manager

    View Slide

  14. • Allocates IP addresses to Services
    and Tasks
    • (and allocate volumes in the future)
    • VIP and ports for Service
    • IP for all endpoints (veth pairs) in the
    network the task is attached to
    Orchestrator
    Dispatcher
    Scheduler
    API
    Store
    Allocator
    SwarmKit Manager
    Network Create

    View Slide

  15. • Assign Task to Node
    • unassignedTasks
    • nodeHeap
    • search in heap to find the best node which meets
    the constraints && has lightest workloads
    • ReadyFilter, ResourceFilter, ConstraintFilter
    Orchestrator
    Dispatcher
    API
    Store
    Scheduler
    Allocator
    SwarmKit Manager

    View Slide

  16. Manager
    • Nodes (agents) management
    • Dispatch assigned Task to
    corresponding Node
    Orchestrator
    API
    Store
    Allocator
    SwarmKit Manager
    Scheduler Dispatcher
    Dispatcher
    Agent
    Agent
    Agent
    grpc stream
    grpc stream
    grpc stream
    Task

    View Slide

  17. • Worker:
    • connect to Dispatcher to check assigned tasks
    • executor: execute tasks (containers) on this Node
    Worker
    Executor
    Agent Agent
    Adapter
    Docker Daemon
    docker.sock
    Worker
    Executor
    Worker
    Executor
    Agent

    View Slide

  18. Mesos 1.0
    • A distributed systems kernel
    • originally designed to run big data job
    • core idea: fine-grained resource sharing
    • Mesos Design
    • Master + Slave + Zookeeper
    • two level scheduling
    • scheduler + executor = framework
    • need to use frameworks like Marathon for orchestration and management
    • containerizer
    • multiple container runtime & image support (>=1.0)

    View Slide

  19. MPI job
    MPI
    scheduler
    Hadoop job
    Hadoop
    scheduler
    Allocation
    module
    Mesos
    master
    Mesos slave
    MPI
    executor
    Mesos slave
    MPI
    executor
    task
    task
    Resource
    offer
    Pick framework to
    offer resources to
    *Animate:
    Operating Systems and Systems Programming Lecture 24
    Anthony D. Joseph
    https://cs162.eecs.berkeley.edu/

    View Slide

  20. MPI job
    MPI
    scheduler
    Hadoop job
    Hadoop
    scheduler
    Allocation
    module
    Mesos
    master
    Mesos slave
    MPI
    executor
    Mesos slave
    MPI
    executor
    task
    task
    Pick framework to
    offer resources to
    Resource
    offer
    Resource offer = list of (node, availableResources)
    E.g. { (node1, <2 CPUs, 4 GB>), (node2, <3 CPUs, 2 GB>) }
    *Animate:
    Operating Systems and Systems Programming Lecture 24
    Anthony D. Joseph
    https://cs162.eecs.berkeley.edu/

    View Slide

  21. MPI job
    MPI
    scheduler
    Hadoop job
    Hadoop
    scheduler
    Allocation
    module
    Mesos
    master
    Mesos slave
    MPI
    executor
    Hadoop
    executor
    Mesos slave
    MPI
    executor
    task
    task
    Pick framework to
    offer resources to
    task
    Framework-specific
    scheduling
    Resource
    offer
    Launches and
    isolates executors
    *Animate:
    Operating Systems and Systems Programming Lecture 24
    Anthony D. Joseph
    https://cs162.eecs.berkeley.edu/

    View Slide

  22. How Docker plug into Mesos?
    • Before 1.0
    • Docker Containerizer
    • Docker image -> task -> mesos-docker-executor -> Docker Daemon
    • Mesos 1.0
    • Supporting multiple runtime & images
    • MesosContainerizer
    • “Mesos native container stack”
    • Isolators
    • Launcher
    Mesos slave
    Hadoop
    executor
    task
    mesos-
    docker-
    executor

    View Slide

  23. Checkpoint
    Kubernetes Docker SwarmKit Mesos+Marathon
    Design control loops driven
    control loops driven
    (but in single binary)
    two level scheduling
    Coordination etcd build-in raft Zookeeper
    Container Runtime multiple
    single, but has potential for
    more OCI runtimes
    multiple
    Container Image
    Docker Image, ACI, more in
    future
    Docker Image
    Docker Image, ACI, more
    in future
    Docker Daemon no need need no need

    View Slide

  24. About Build-In Data Store
    Pros Cons
    easy to setup hard to understand & debug
    fewer round trips
    hard to do backup/restore, migration,
    monitoring/audit
    easy to do performance tuning lack of mgmt API like:etcd admin guide

    View Slide

  25. Chapter 2: Control Panel

    View Slide

  26. Control Panel: Orchestration + Management
    • “Defines when and what to do next through out the automated workflow”
    • workload management
    • secret management
    • configuration management
    • scale and autoscaling
    • stateful workload
    • … and more

    View Slide

  27. Workload Management
    e.g. “a web server with 2 replicas”
    Kubernetes Docker SwarmKit Mesos+Marathon
    Description Deployment Service Application
    Version Control yes (revision) not yet yes (deployments)

    View Slide

  28. • Kubernetes “Deployment”
    • $ kubectl create -f
    • $ kubectl edit
    • this will open and edit object stored in etcd
    • update will trigger rolling update
    • $ kubectl set image
    • $ kubectl scale —replicas=5

    • $ kubectl rollout history
    • $ kubectl rollout undo —to-revision=
    $ kubectl edit …

    View Slide

  29. • Docker SwarmKit “Service”
    • $ docker service create SERVICE —replicas=5

    • $ docker service scale SERVICE=REPLICAS
    • $ docker service update [OPTIONS] SERVICE
    • rolling update
    • 30+ update options are supported
    • —container-label-add value
    • —container-label-rm value
    • --env-add value
    • --env-rm value
    • —image string
    • …

    View Slide

  30. • Mesos + Marathon “Application”
    • $ dcos marathon app start [--force] []
    • $ dcos marathon app update [--force] […]
    • rolling update
    • app dependencies are respected
    • $ dcos marathon app version list [--max-count=]

    • $ dcos marathon deployment list [--json ]
    • $ dcos marathon deployment rollback

    View Slide

  31. Secret Management
    • Kubernetes
    • Secret volume
    • encrypted and stored in etcd
    • consumed by ENV or volume
    • Docker SwarmKit
    • under discussion: https://github.com/docker/swarmkit/issues/1329
    • Mesos + Marathon
    • only in DC/OS
    • stored in ZooKeeper, exposed as ENV in Marathon
    • Another similar feature is Configuration Management

    View Slide

  32. Configuration Management
    • Kubernetes
    • ConfigMap
    • stored in etcd, consumed by ENV or volume
    • $ kubectl create configmap example-redis-config —from-file=docs/redis-config
    • Docker SwarmKit
    • under discussion: https://github.com/docker/swarmkit/issues/1329
    • Mesos + Marathon
    • not yet

    View Slide

  33. Autoscaling
    • Kubernetes
    • HorizontalPodAutoScaler
    • default: CPU
    • Custom Metrics:
    • user defined endpoint, e.g. http://localhost:9100/metrics
    • share same metric data structure with CNCF projects like Prometheus
    • Docker SwarmKit
    • not yet: https://github.com/docker/swarmkit/issues/486#issuecomment-219133613
    • Mesos + Marathon
    • a stand-by `marathon-autoscale.py`
    • autoscales application based on the utilization metrics from Mesos

    View Slide

  34. Stateful Workload
    • Kubernetes
    • PetSet: Replicas with stable membership and volumes
    • stable hostname
    • ordinal index
    • stable storage
    • Docker SwarmKit
    • not yet, and don’t suggest stateful service
    • Mesos + Marathon
    • Stateful Applications
    • dynamic reservations, reservation labels, and persistent volumes.
    cassandra-0
    volume 0
    cassandra-0.cassandra.default.svc.cluster.local
    cassandra-1
    volume 1
    cassandra-1.cassandra.default.svc.cluster.local

    View Slide

  35. Chapter 3:
    Service Discovery & Load Balance

    View Slide

  36. Node
    Node
    Service Discovery & LB
    • Kubernetes
    • Load Balancer
    • iptables
    • External Access
    • :
    • NodePort: :
    • External LoadBalancer
    • Ingress (L7)
    • Ingress Pod: Nginx, HAproxy
    • SSL
    • Name Service
    • build-in skyDNS pod
    portal iptables rule
    10.10.0.116:8001
    random mode iptables rules
    Pod 2
    Pod 1
    ingress traffic
    http://foo.bar.com
    Node
    Ingress Pod
    internal traffic
    outside traffic
    pod rule 2
    pod rule 1

    View Slide

  37. Worker
    Worker
    container
    sandbox
    ingress sandbox
    Service Discovery & Load Balance
    • Docker SwarmKit
    • Load Balancer
    • ipvs NAT mode
    • External Access
    • Routing Mesh
    • Name Service
    • embedded DNS server
    • for service and task
    Container 2
    Container 1
    ipvs
    Gossip to update the iptables & ipvs rules
    port mapping
    iptables iptables
    outside traffic
    (when service created with -p)
    internal traffic
    ipvs
    • Two kinds of sandboxes
    • ingress: on every worker
    • container: on workers where task lives
    • Two networks are needed
    • ingress overlay
    • user-defined overlay
    DNS: svc->vip
    ingress
    sandbox

    View Slide

  38. Service Discovery & Load Balance
    • Mesos + Marathon
    • Load Balancer
    • Marathon-lb: HAproxy based
    • virtual addresses (VIPs) in DC/OS
    • External Access
    • http://:
    • external load balancer
    • Name Service
    • Mesos-DNS
    Slave
    Slave
    Marathon-lb
    Container 2
    Container 1
    Mesos-DNS

    View Slide

  39. Checkpoint
    Kubernetes Docker SwarmKit Mesos+Marathon
    Filter iptables VIP iptables VIP no need
    LB iptables random mode ipvs NAT mode HAproxy
    External Access
    nodeIP:port, Ingress, external
    IP/LB
    Routing Mesh (ingress
    overlay)
    same as expose HAproxy
    to public
    Update watch etcd gossip
    marathon_lb.py &
    template

    View Slide

  40. Chapter 4: Scheduling

    View Slide

  41. Kubernetes
    • Pod as schedule unit
    • this is unique, but why?
    • Multi-Scheduler
    • pod1: scheduler1, pod2 : scheduler2
    • QoS tiers
    • anyone remember the core idea of Borg?
    • Guaranteed (requests == limit)
    • Burstable (requests < limit)
    • Best-Effort (no request & limit)
    • More Borg features are on the way
    • equivalence class, pod level resource boundary …
    Burstable Pod

    View Slide

  42. Docker SwarmKit
    • Task (container) as schedule unit
    • Multi-Scheduler
    • not yet
    • Strategy
    • pipeline of filters
    • ReadyFilter ResourceFilter ConstraintFilter
    • to sort nodeHeap
    • QoS tiers
    • not yet

    View Slide

  43. Mesos + Marathon
    • Task as schedule unit (Pod support in plan)
    • Multi-Scheduler
    • Mesos is designed to run multiple frameworks (schedulers)
    • Strategy
    • Two level scheduling (the killing weapon of Mesos)
    • Twitter scale …
    • fine-grained resource sharing (like Borg)
    • QoS tiers
    • of course
    • And much more
    • task eviction, data locality, max-min fairness, priority, offer reject, Delay Scheduling
    • and Big Data of course

    View Slide

  44. Chapter 5: Summary

    View Slide

  45. A Use Case: hyper.sh
    • hyper.sh is “Docker Done the Right Way”
    • $ hyper run mysql
    • $ hyper run --link mysql wordpress
    • $ hyper fip attach 22.33.44.55 wordpress
    • But Hyper.sh is powered by Kubernetes
    • and also maintain Kubernetes features

    View Slide

  46. Extensibility Really Matters
    • Hypernetes (h8s = k8s + HyperContainer) is what’s backing Hyper.sh:
    • HyperContainer runtime
    • Multi-tenant network based on Neutron
    • Custom Cinder plugin with Ceph backend
    • Custom HAproxy based Service
    • Kubernetes is truly extensible and configurable

    View Slide

  47. Just Personal Idea
    • So, if
    • I am a individual developer/org, trying to find something that is friendly and just works
    • I use Docker SwarmKit
    • I have a “Twitter scale” cluster to manage or I am a Big Data user
    • I need Mesos
    • But if what I need is a infrastructure layer to build my systems on top of it in right way
    • Kubernetes is the choice

    View Slide

  48. THE END
    @resouer

    View Slide