Operator
Datacenter
PYTHON
PYTHON
GOLANG
GOLANG
GOLANG
Skywalker Vader Leia Solo
Slide 14
Slide 14 text
Operator
Datacenter
RUBY PYTHON
PYTHON
PYTHON
GOLANG
GOLANG
GOLANG
GOLANG
NODE
Skywalker Vader Leia Solo
Slide 15
Slide 15 text
Operator
Datacenter
RUBY PYTHON
PYTHON
PYTHON
GOLANG
GOLANG
GOLANG
GOLANG
NODE
Skywalker Vader Leia Solo
RUBY
VADER
LEIA
SOLO
192.168.1.4
192.168.1.5
192.168.1.7
192.168.1.253
88:45:13:B6:87:C4
94:CE:4F:C8:54:C3
CA:9A:3D:7F:8B:CB
72:30:9C:0D:1E:74
Randomly kills applications
Slide 16
Slide 16 text
Operator
Datacenter
RUBY PYTHON
PYTHON
PYTHON
GOLANG
GOLANG
GOLANG
GOLANG
NODE
Skywalker Leia Solo
RUBY
VADER
LEIA
SOLO
192.168.1.4
192.168.1.5
192.168.1.7
192.168.1.253
88:45:13:B6:87:C4
94:CE:4F:C8:54:C3
CA:9A:3D:7F:8B:CB
72:30:9C:0D:1E:74
Randomly kills applications
F
F
Vader
Slide 17
Slide 17 text
Operator
Datacenter
RUBY PYTHON
PYTHON
PYTHON
GOLANG
GOLANG
GOLANG
GOLANG
NODE
Skywalker Leia Solo
RUBY
VADER
LEIA
SOLO
192.168.1.4
192.168.1.5
192.168.1.7
192.168.1.253
88:45:13:B6:87:C4
94:CE:4F:C8:54:C3
CA:9A:3D:7F:8B:CB
72:30:9C:0D:1E:74
Randomly kills applications
F
F
Vader
PYTHON
PYTHON
PYTHON
Slide 18
Slide 18 text
Operator
Datacenter
RUBY GOLANG
GOLANG
GOLANG
GOLANG
NODE
Skywalker Leia Solo
RUBY
VADER
LEIA
SOLO
192.168.1.4
192.168.1.5
192.168.1.7
192.168.1.253
88:45:13:B6:87:C4
94:CE:4F:C8:54:C3
CA:9A:3D:7F:8B:CB
72:30:9C:0D:1E:74
Randomly kills applications
Vader
PYTHON
PYTHON
PYTHON
Slide 19
Slide 19 text
Operator
Datacenter
RUBY GOLANG
GOLANG
GOLANG
GOLANG
NODE
Skywalker Leia Solo
RUBY
VADER
LEIA
SOLO
192.168.1.4
192.168.1.9
192.168.1.7
192.168.1.253
88:45:13:B6:87:C4
94:CE:4F:C8:54:C3
CA:9A:3D:7F:8B:CB
72:30:9C:0D:1E:74
Rebuilt on 04/20/2016
Vader
PYTHON
PYTHON
PYTHON
Slide 20
Slide 20 text
Operator
Datacenter
RUBY GOLANG
GOLANG
GOLANG
GOLANG
NODE
Skywalker Leia Solo
RUBY
VADER
LEIA
SOLO
192.168.1.4
192.168.1.9
192.168.1.7
192.168.1.253
88:45:13:B6:87:C4
94:CE:4F:C8:54:C3
CA:9A:3D:7F:8B:CB
72:30:9C:0D:1E:74
Rebuilt on 04/20/2016
Vader
PYTHON
PYTHON
PYTHON
Slide 21
Slide 21 text
This does not scale
Slide 22
Slide 22 text
CPU Scheduler
CORE
CORE
CORE
CORE
CPU
SCHEDULER
KERNEL
APACHE
REDIS
BASH
Slide 23
Slide 23 text
CPU Scheduler
CORE
CORE
CPU
SCHEDULER
KERNEL
APACHE
REDIS
BASH
Slide 24
Slide 24 text
Schedulers in the Wild
Type Work Resources
CPU Scheduler Threads Physical Cores
EC2 / Nova Virtual Machines Hypervisors
Hadoop YARN MapReduce Jobs Client Nodes
Cluster Scheduler Applications Machines
Slide 25
Slide 25 text
Scheduler Advantages
Higher Resource Utilization
Decouple Work from Resources
Better Quality of Service
Slide 26
Slide 26 text
Scheduler Advantages
Bin Packing
Over-Subscription
Job Queueing
Higher Resource Utilization
Decouple Work from Resources
Better Quality of Service
Slide 27
Slide 27 text
Scheduler Advantages
Abstraction
API Contracts
Standardization
Higher Resource Utilization
Decouple Work from Resources
Better Quality of Service
Slide 28
Slide 28 text
Scheduler Advantages
Priorities
Resource Isolation
Pre-emption
Higher Resource Utilization
Decouple Work from Resources
Better Quality of Service
Nomad
Higher Resource Utilization
Decouple Work from Resources
Better Quality of Service
Slide 38
Slide 38 text
Designing Nomad
Slide 39
Slide 39 text
Nomad
Multi-Datacenter
Multi-Region
Flexible Workloads
Job Priorities
Bin Packing
Large Scale
Operationally Simple
Slide 40
Slide 40 text
Scaling Requirements
Thousands of regions
Tens of thousands of clients per region
Thousands of jobs per region
Slide 41
Slide 41 text
Built on Experience
GOSSIP CONSENSUS
Slide 42
Slide 42 text
Serf
Cluster Management
Gossip Based (P2P)
Membership
Failure Detection
Event System
Slide 43
Slide 43 text
Serf
Gossip Protocol
Large Scale
Production Hardened
Operationally Simple
Slide 44
Slide 44 text
Consul
Service Discovery
Configuration
Coordination (Locking)
Central Servers +
Distributed Clients
Slide 45
Slide 45 text
Consul
Multi-Datacenter
Raft Consensus
Large Scale
Production Hardened
Slide 46
Slide 46 text
Built on Experience
GOSSIP CONSENSUS
Mature Libraries Proven Design Patterns
Slide 47
Slide 47 text
Built on Experience
GOSSIP CONSENSUS
Mature Libraries Proven Design Patterns
Lacking Scheduling Logic
Slide 48
Slide 48 text
Built on Research
GOSSIP CONSENSUS
Slide 49
Slide 49 text
No content
Slide 50
Slide 50 text
Optimistic vs Pessimistic
Internal vs External State
Single vs Multi Level
Fixed vs Pluggable
Service vs Batch Oriented
Slide 51
Slide 51 text
Nomad
Inspired by Google Omega
Optimistic Concurrency
State Coordination
Service & Batch workloads
Pluggable Architecture
Slide 52
Slide 52 text
Consul Architecture
CLIENT CLIENT CLIENT CLIENT CLIENT CLIENT
SERVER SERVER SERVER
REPLICATION REPLICATION
RPC
RPC
LAN GOSSIP
SERVER
SERVER SERVER
REPLICATION REPLICATION
WAN GOSSIP
Slide 53
Slide 53 text
Consul
Multi-Datacenter
Servers per DC
Failure Isolation Domain
is the Datacenter
Slide 54
Slide 54 text
Single-Region Architecture
SERVER SERVER SERVER
CLIENT CLIENT CLIENT
DC1 DC2 DC3
FOLLOWER LEADER FOLLOWER
REPLICATION
FORWARDING
REPLICATION
FORWARDING
RPC RPC RPC
Slide 55
Slide 55 text
Multi-Region Architecture
SERVER SERVER SERVER
FOLLOWER LEADER FOLLOWER
REPLICATION
FORWARDING
REPLICATION
REGION B
GOSSIP
REPLICATION REPLICATION
FORWARDING
REGION FORWARDING
REGION A
SERVER
FOLLOWER
SERVER SERVER
LEADER FOLLOWER
Slide 56
Slide 56 text
Nomad
Region is Isolation Domain
1-N Datacenters Per Region
Flexibility to do 1:1 (Consul)
Scheduling Boundary
Slide 57
Slide 57 text
Data Model
ALLOCATION
JOB
EVALUATION
NODE
Slide 58
Slide 58 text
Evaluation ~= State Change
Slide 59
Slide 59 text
Evaluations
Create / Update / Delete Job
Node Up / Node Down
Allocation Failed
Evaluations
SCHEDULER
func(Evaluation) => []AllocationUpdates
Service, Batch, System
Slide 62
Slide 62 text
Server Architecture
Omega Class Scheduler
Pluggable Logic
Internal Coordination and State
Multi-Region / Multi-Datacenter
Slide 63
Slide 63 text
Client Architecture
Broad OS Support
Host Fingerprinting
Pluggable Drivers
Slide 64
Slide 64 text
Fingerprinting
Type Examples
Operating System Kernel, OS, Version
Hardware CPU, Memory, Disk
Apps (Capabilities) Docker, Java, Consul
Environment AWS, GCE
Slide 65
Slide 65 text
Constrain Placement and
Bin Pack
Slide 66
Slide 66 text
“Task Requires Linux, Docker, and PCI-
Compliant Hardware”
expressed as constraints in job file
Slide 67
Slide 67 text
“Task needs 512MB RAM and 1 Core”
expressed as resource in job file