Slide 1

Slide 1 text

SwarmKit: Docker’s Simple Model for Complex Orchestration Stephen Day Docker, Inc. ContainerCon+LinuxCon EU October 2016 v1

Slide 2

Slide 2 text

Stephen Day Docker, Inc. github.com/stevvooe @stevvooe

Slide 3

Slide 3 text

SwarmKit A new framework by Docker for building orchestration systems.

Slide 4

Slide 4 text

What is orchestration?

Slide 5

Slide 5 text

5 Orchestration - Orchestration Systems - Mostly use a service based model - Mostly wraps docker - Challenging to use - Standalone Swarm, July 2015 - Scale containers - Docker API “native” - No higher-level abstraction A Docker-oriented History

Slide 6

Slide 6 text

Why orchestration?

Slide 7

Slide 7 text

7 Example $ docker network create -d overlay backend 31ue4lvbj4m301i7ef3x8022t $ docker service create -p 6379:6379 --network backend redis bhk0gw6f0bgrbhmedwt5lful6 $ docker service scale serene_euler=3 serene_euler scaled to 3 $ docker service ls ID NAME REPLICAS IMAGE COMMAND dj0jh3bnojtm serene_euler 3/3 redis

Slide 8

Slide 8 text

Why services? What is wrong with containers?

Slide 9

Slide 9 text

9 Nodes - Arbitrary cluster resources - Connected across a common network - Topology control - Cryptographic identity Node 1 Node 2 Node 3

Slide 10

Slide 10 text

10 Services - Express desired state of the cluster - Abstraction to control a set of containers - Enumerates resources, network availability, placement - Leave the details of runtime to container process - Implement these services by distributing processes across a cluster Node 1 Node 2 Node 3

Slide 11

Slide 11 text

11 Networks - Defines broadcast domains - Services can attach to networks - Routing mesh will route connections to active service process Node 1 Node 2 Node 3

Slide 12

Slide 12 text

Simple systems can exhibit complex behavior

Slide 13

Slide 13 text

13 Orchestration A control system for your cluster Cluster O - Δ S n D D = Desired State O = Orchestrator C = Cluster S t = State at time t Δ = Operations to converge S to D https://en.wikipedia.org/wiki/Control_theory

Slide 14

Slide 14 text

14 Convergence A functional view D = Desired State O = Orchestrator C = Cluster S t = State at time t f(D, S n-1 , C) → S n | min(S-D)

Slide 15

Slide 15 text

15 Observability and Controllability The Problem Low Observability High Observability Failure Process State User Input

Slide 16

Slide 16 text

16 Data Model Requirements - Represent difference in cluster state - Maximize Observability - Support Convergence - Do this while being Extensible and Reliable

Slide 17

Slide 17 text

Show me your data structures and I’ll show you your orchestration system

Slide 18

Slide 18 text

Declarative

Slide 19

Slide 19 text

19 Declarative $ docker network create -d overlay backend 31ue4lvbj4m301i7ef3x8022t $ docker service create --network backend redis bhk0gw6f0bgrbhmedwt5lful6

Slide 20

Slide 20 text

20 Reconciliation Spec → Object Object Current State Spec Desired State

Slide 21

Slide 21 text

Orchestrator 21 Task Model Atomic Scheduling Unit of SwarmKit Object Current State Spec Desired State Task 0 Task 1 … Task n Scheduler

Slide 22

Slide 22 text

Consistency

Slide 23

Slide 23 text

23 Versioned Updates Consistency service := getCurrentService() spec := service.Spec spec.Image = "my.serv/myimage:mytag" update(spec, service.Version)

Slide 24

Slide 24 text

24 Field Ownership Only one component of the system can write to a field Consistency

Slide 25

Slide 25 text

Worker Pre-Run Preparing Manager Terminal States Task State New Allocated Assigned Ready Starting Running Complete Shutdown Failed Rejected

Slide 26

Slide 26 text

Extensible

Slide 27

Slide 27 text

Task Model Prepare: setup resources Start: start the task Wait: wait until task exits Shutdown: stop task, cleanly Terminate: kill the task, forcefully Update: update task metadata, without interruption Remove: remove resources used by task Runtime

Slide 28

Slide 28 text

Reliable

Slide 29

Slide 29 text

SwarmKit doesn’t Quit

Slide 30

Slide 30 text

Architecture Data Structures

Slide 31

Slide 31 text

Service Spec message ServiceSpec { // Task defines the task template this service will spawn. TaskSpec task = 2 [(gogoproto.nullable) = false]; // UpdateConfig controls the rate and policy of updates. UpdateConfig update = 6; // Service endpoint specifies the user provided configuration // to properly discover and load balance a service. EndpointSpec endpoint = 8; } Protobuf Example

Slide 32

Slide 32 text

Service Object message Service { ServiceSpec spec = 3; // Runtime state of service endpoint. This may be different // from the spec version because the user may not have entered // the optional fields like node_port or virtual_ip and it // could be auto allocated by the system. Endpoint endpoint = 4; // UpdateStatus contains the status of an update, if one is in // progress. UpdateStatus update_status = 5; } Protobuf Example

Slide 33

Slide 33 text

Task // Task specifies the parameters for implementing a Spec. A task is effectively // immutable and idempotent. Once it is dispatched to a node, it will not be // dispatched to another node. message Task { TaskSpec spec = 3; string service_id = 4; uint64 slot = 5; string node_id = 6; TaskStatus status = 9; TaskState desired_state = 10; repeated NetworkAttachment networks = 11; Endpoint endpoint = 12; Driver log_driver = 13; } Protobuf Example

Slide 34

Slide 34 text

Blue Green Deployments Sillyproxy - Uses rolling updates to set proxy backends - Desired state is encoded in environment variables - Rolling updates can control traffic between backends Applied

Slide 35

Slide 35 text

Distributed Period Scheduler From a GitHub comment - Scheduling criteria set via environment variables - Can leverage something like redis to do this, as well - Leverage restarts to dispatch to available nodes Applied

Slide 36

Slide 36 text

Future

Slide 37

Slide 37 text

Documentation - Docker Swarm Mode Source Code - SwarmKit - SwarmKit Protobuf/GRPC Interesting Topics - Borg Paper - Raft Consensus Algorithm - Control Theory Links

Slide 38

Slide 38 text

Booth D38 @ LinuxCon + ContainerCon Tues Oct 4th ● Build Distributed Systems without Docker, using Docker Plumbing Projects - Patrick Chanezon, David Chung and Captain Phil Estes ● Getting Started with Docker Services - Mike Goelzer ● Swarmkit: Docker’s Simplified Model for Complex Orchestration - Stephen Day ● User Namespace and Seccomp Support in Docker Engine - Paul Novarese ● Build Efficient Parallel Testing Systems with Docker - Docker Captain Laura Frank Wed Oct 5th ● How Secure is your Container? A Docker Engine Security Update - Phil Estes ● Docker Orchestration: Beyond the Basics - Aaron Lehmann ● When the Going gets Tough, get TUF Going - Riyaz Faizullabhoy and Lily Guo Thurs Oct 6th ● Orchestrating Linux Containers while Tolerating Failures - Drew Erny ● Unikernels: When you Should and When you Shouldn’t - Amir Chaudhry ● Berlin Docker Meetup Friday Oct 7th ● Tutorial: Comparing Container Orchestration Tools - Neependra Khare ● Tutorial: Orchestrate Containers in Production at Scale with Docker Swarm - Jerome Petazzoni

Slide 39

Slide 39 text

THANK YOU