Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Heart of the SwarmKit: Object Model

Stephen Day
October 07, 2016

Heart of the SwarmKit: Object Model

The design of SwarmKit's object model minimizes problems that commonly occur in distributed orchestration systems. Slides from Docker's Distributed Systems Summit in Berlin.

Docker Swarm Mode (https://docs.docker.com/engine/swarm/)
Docker SwarmKit (https://github.com/docker/swarmkit)
Docker SwarmKit Protobufs/GRPC (https://github.com/docker/swarmkit/tree/master/api)
Borg Paper (http://static.googleusercontent.com/media/research.google.com/en//pubs/archive/43438.pdf)
Raft Consensus Algorithm (https://raft.github.io/)
Control Theory (https://en.wikipedia.org/wiki/Control_theory)

Stephen Day

October 07, 2016
Tweet

More Decks by Stephen Day

Other Decks in Technology

Transcript

  1. Heart of the SwarmKit:
    Object Model
    Stephen Day
    Docker, Inc.
    Docker Distributed Systems Summit, Berlin
    October 2016
    v1

    View Slide

  2. Stephen Day
    Docker, Inc.
    [email protected]
    github.com/stevvooe
    @stevvooe

    View Slide

  3. SwarmKit
    A new framework by Docker for building orchestration systems.

    View Slide

  4. 4
    Orchestration
    A control system for your cluster
    Cluster
    O
    -
    Δ S
    t
    D
    D = Desired State
    O = Orchestrator
    C = Cluster
    S
    t
    = State at time t
    Δ = Operations to converge S to D
    https://en.wikipedia.org/wiki/Control_theory

    View Slide

  5. 5
    Convergence
    A functional view
    D = Desired State
    O = Orchestrator
    C = Cluster
    S
    t
    = State at time t
    f(D, S
    n-1
    , C) → S
    n
    | min(S-D)

    View Slide

  6. 6
    Observability and Controllability
    The Problem
    Low Observability High Observability
    Failure
    Process State
    User Input

    View Slide

  7. 7
    Data Model Requirements
    - Represent difference in cluster state
    - Maximize Observability
    - Support Convergence
    - Do this while being Extensible and Reliable

    View Slide

  8. Show me your data structures
    and I’ll show you your
    orchestration system

    View Slide

  9. 9
    Services
    - Express desired state of the cluster
    - Abstraction to control a set of containers
    - Enumerates resources, network availability, placement
    - Leave the details of runtime to container process
    - Implement these services by distributing processes across a cluster
    Node 1 Node 2 Node 3

    View Slide

  10. 10
    Declarative
    $ docker network create -d overlay backend
    31ue4lvbj4m301i7ef3x8022t
    $ docker service create -p 6379:6379 --network backend
    redis
    bhk0gw6f0bgrbhmedwt5lful6
    $ docker service scale serene_euler=3
    serene_euler scaled to 3
    $ docker service ls
    ID NAME REPLICAS IMAGE COMMAND
    dj0jh3bnojtm serene_euler 3/3 redis

    View Slide

  11. 11
    Reconciliation
    Spec → Object
    Object
    Current State
    Spec
    Desired State

    View Slide

  12. Orchestrator
    12
    Task Model
    Atomic Scheduling Unit of SwarmKit
    Object
    Current State
    Spec
    Desired
    State
    Task
    0
    Task
    1
    … Task
    n
    Scheduler

    View Slide

  13. Task Model
    Prepare: setup resources
    Start: start the task
    Wait: wait until task exits
    Shutdown: stop task, cleanly
    Runtime

    View Slide

  14. Service Spec
    message ServiceSpec {
    // Task defines the task template this service will spawn.
    TaskSpec task = 2 [(gogoproto.nullable) = false];
    // UpdateConfig controls the rate and policy of updates.
    UpdateConfig update = 6;
    // Service endpoint specifies the user provided configuration
    // to properly discover and load balance a service.
    EndpointSpec endpoint = 8;
    }
    Protobuf Example

    View Slide

  15. Service Object
    message Service {
    ServiceSpec spec = 3;
    // UpdateStatus contains the status of an update, if one is in
    // progress.
    UpdateStatus update_status = 5;
    // Runtime state of service endpoint. This may be different
    // from the spec version because the user may not have entered
    // the optional fields like node_port or virtual_ip and it
    // could be auto allocated by the system.
    Endpoint endpoint = 4;
    }
    Protobuf Example

    View Slide

  16. Manager
    Task
    Task
    Data Flow
    ServiceSpec
    TaskSpec
    Service
    ServiceSpec
    TaskSpec
    Task
    TaskSpec
    Worker

    View Slide

  17. Consistency

    View Slide

  18. 18
    Field Ownership
    Only one component of the system can
    write to a field
    Consistency

    View Slide

  19. TaskSpec
    message TaskSpec {
    oneof runtime {
    NetworkAttachmentSpec attachment = 8;
    ContainerSpec container = 1;
    }
    // Resource requirements for the container.
    ResourceRequirements resources = 2;
    // RestartPolicy specifies what to do when a task fails or finishes.
    RestartPolicy restart = 4;
    // Placement specifies node selection constraints
    Placement placement = 5;
    // Networks specifies the list of network attachment
    // configurations (which specify the network and per-network
    // aliases) that this task spec is bound to.
    repeated NetworkAttachmentConfig networks = 7;
    }
    Protobuf Examples

    View Slide

  20. Task
    message Task {
    TaskSpec spec = 3;
    string service_id = 4;
    uint64 slot = 5;
    string node_id = 6;
    TaskStatus status = 9;
    TaskState desired_state = 10;
    repeated NetworkAttachment networks = 11;
    Endpoint endpoint = 12;
    Driver log_driver = 13;
    }
    Protobuf Example
    Owner
    User
    Orchestrator
    Allocator
    Scheduler
    Shared

    View Slide

  21. Worker
    Pre-Run
    Preparing
    Manager
    Terminal States
    Task State
    New Allocated Assigned
    Ready Starting
    Running
    Complete
    Shutdown
    Failed
    Rejected

    View Slide

  22. Field Handoff
    Task Status
    State Owner
    < Assigned Manager
    >= Assigned Worker

    View Slide

  23. 23
    Observability and Controllability
    The Problem
    Low Observability High Observability
    Failure
    Process State
    User Input

    View Slide

  24. 24
    Orchestration
    A control system for your cluster
    Cluster
    O
    -
    Δ S
    t
    D
    D = Desired State
    O = Orchestrator
    C = Cluster
    S
    t
    = State at time t
    Δ = Operations to converge S to D
    https://en.wikipedia.org/wiki/Control_theory

    View Slide

  25. Orchestrator
    25
    Task Model
    Atomic Scheduling Unit of SwarmKit
    Object
    Current State
    Spec
    Desired
    State
    Task
    0
    Task
    1
    … Task
    n
    Scheduler

    View Slide

  26. SwarmKit doesn’t Quit

    View Slide

  27. Documentation
    - Docker Swarm Mode
    Source Code
    - SwarmKit
    - SwarmKit Protobuf/GRPC
    Interesting Topics
    - Borg Paper
    - Raft Consensus Algorithm
    - Control Theory
    Links

    View Slide

  28. THANK YOU

    View Slide