Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Heart of the SwarmKit: Object Model

Stephen Day
October 07, 2016

Heart of the SwarmKit: Object Model

The design of SwarmKit's object model minimizes problems that commonly occur in distributed orchestration systems. Slides from Docker's Distributed Systems Summit in Berlin.

Docker Swarm Mode (https://docs.docker.com/engine/swarm/)
Docker SwarmKit (https://github.com/docker/swarmkit)
Docker SwarmKit Protobufs/GRPC (https://github.com/docker/swarmkit/tree/master/api)
Borg Paper (http://static.googleusercontent.com/media/research.google.com/en//pubs/archive/43438.pdf)
Raft Consensus Algorithm (https://raft.github.io/)
Control Theory (https://en.wikipedia.org/wiki/Control_theory)

Stephen Day

October 07, 2016
Tweet

More Decks by Stephen Day

Other Decks in Technology

Transcript

  1. Heart of the SwarmKit:
    Object Model
    Stephen Day
    Docker, Inc.
    Docker Distributed Systems Summit, Berlin
    October 2016
    v1

    View full-size slide

  2. Stephen Day
    Docker, Inc.
    [email protected]
    github.com/stevvooe
    @stevvooe

    View full-size slide

  3. SwarmKit
    A new framework by Docker for building orchestration systems.

    View full-size slide

  4. 4
    Orchestration
    A control system for your cluster
    Cluster
    O
    -
    Δ S
    t
    D
    D = Desired State
    O = Orchestrator
    C = Cluster
    S
    t
    = State at time t
    Δ = Operations to converge S to D
    https://en.wikipedia.org/wiki/Control_theory

    View full-size slide

  5. 5
    Convergence
    A functional view
    D = Desired State
    O = Orchestrator
    C = Cluster
    S
    t
    = State at time t
    f(D, S
    n-1
    , C) → S
    n
    | min(S-D)

    View full-size slide

  6. 6
    Observability and Controllability
    The Problem
    Low Observability High Observability
    Failure
    Process State
    User Input

    View full-size slide

  7. 7
    Data Model Requirements
    - Represent difference in cluster state
    - Maximize Observability
    - Support Convergence
    - Do this while being Extensible and Reliable

    View full-size slide

  8. Show me your data structures
    and I’ll show you your
    orchestration system

    View full-size slide

  9. 9
    Services
    - Express desired state of the cluster
    - Abstraction to control a set of containers
    - Enumerates resources, network availability, placement
    - Leave the details of runtime to container process
    - Implement these services by distributing processes across a cluster
    Node 1 Node 2 Node 3

    View full-size slide

  10. 10
    Declarative
    $ docker network create -d overlay backend
    31ue4lvbj4m301i7ef3x8022t
    $ docker service create -p 6379:6379 --network backend
    redis
    bhk0gw6f0bgrbhmedwt5lful6
    $ docker service scale serene_euler=3
    serene_euler scaled to 3
    $ docker service ls
    ID NAME REPLICAS IMAGE COMMAND
    dj0jh3bnojtm serene_euler 3/3 redis

    View full-size slide

  11. 11
    Reconciliation
    Spec → Object
    Object
    Current State
    Spec
    Desired State

    View full-size slide

  12. Orchestrator
    12
    Task Model
    Atomic Scheduling Unit of SwarmKit
    Object
    Current State
    Spec
    Desired
    State
    Task
    0
    Task
    1
    … Task
    n
    Scheduler

    View full-size slide

  13. Task Model
    Prepare: setup resources
    Start: start the task
    Wait: wait until task exits
    Shutdown: stop task, cleanly
    Runtime

    View full-size slide

  14. Service Spec
    message ServiceSpec {
    // Task defines the task template this service will spawn.
    TaskSpec task = 2 [(gogoproto.nullable) = false];
    // UpdateConfig controls the rate and policy of updates.
    UpdateConfig update = 6;
    // Service endpoint specifies the user provided configuration
    // to properly discover and load balance a service.
    EndpointSpec endpoint = 8;
    }
    Protobuf Example

    View full-size slide

  15. Service Object
    message Service {
    ServiceSpec spec = 3;
    // UpdateStatus contains the status of an update, if one is in
    // progress.
    UpdateStatus update_status = 5;
    // Runtime state of service endpoint. This may be different
    // from the spec version because the user may not have entered
    // the optional fields like node_port or virtual_ip and it
    // could be auto allocated by the system.
    Endpoint endpoint = 4;
    }
    Protobuf Example

    View full-size slide

  16. Manager
    Task
    Task
    Data Flow
    ServiceSpec
    TaskSpec
    Service
    ServiceSpec
    TaskSpec
    Task
    TaskSpec
    Worker

    View full-size slide

  17. 18
    Field Ownership
    Only one component of the system can
    write to a field
    Consistency

    View full-size slide

  18. TaskSpec
    message TaskSpec {
    oneof runtime {
    NetworkAttachmentSpec attachment = 8;
    ContainerSpec container = 1;
    }
    // Resource requirements for the container.
    ResourceRequirements resources = 2;
    // RestartPolicy specifies what to do when a task fails or finishes.
    RestartPolicy restart = 4;
    // Placement specifies node selection constraints
    Placement placement = 5;
    // Networks specifies the list of network attachment
    // configurations (which specify the network and per-network
    // aliases) that this task spec is bound to.
    repeated NetworkAttachmentConfig networks = 7;
    }
    Protobuf Examples

    View full-size slide

  19. Task
    message Task {
    TaskSpec spec = 3;
    string service_id = 4;
    uint64 slot = 5;
    string node_id = 6;
    TaskStatus status = 9;
    TaskState desired_state = 10;
    repeated NetworkAttachment networks = 11;
    Endpoint endpoint = 12;
    Driver log_driver = 13;
    }
    Protobuf Example
    Owner
    User
    Orchestrator
    Allocator
    Scheduler
    Shared

    View full-size slide

  20. Worker
    Pre-Run
    Preparing
    Manager
    Terminal States
    Task State
    New Allocated Assigned
    Ready Starting
    Running
    Complete
    Shutdown
    Failed
    Rejected

    View full-size slide

  21. Field Handoff
    Task Status
    State Owner
    < Assigned Manager
    >= Assigned Worker

    View full-size slide

  22. 23
    Observability and Controllability
    The Problem
    Low Observability High Observability
    Failure
    Process State
    User Input

    View full-size slide

  23. 24
    Orchestration
    A control system for your cluster
    Cluster
    O
    -
    Δ S
    t
    D
    D = Desired State
    O = Orchestrator
    C = Cluster
    S
    t
    = State at time t
    Δ = Operations to converge S to D
    https://en.wikipedia.org/wiki/Control_theory

    View full-size slide

  24. Orchestrator
    25
    Task Model
    Atomic Scheduling Unit of SwarmKit
    Object
    Current State
    Spec
    Desired
    State
    Task
    0
    Task
    1
    … Task
    n
    Scheduler

    View full-size slide

  25. SwarmKit doesn’t Quit

    View full-size slide

  26. Documentation
    - Docker Swarm Mode
    Source Code
    - SwarmKit
    - SwarmKit Protobuf/GRPC
    Interesting Topics
    - Borg Paper
    - Raft Consensus Algorithm
    - Control Theory
    Links

    View full-size slide