HASHICORP
Taming the modern public and private clouds with Nomad
Diptanu Gon Choudhury
@diptanu
PhillyETE 2016
Slide 2
Slide 2 text
HASHICORP
Evolution of compute infrastructure
1995 2000 2015
Slide 3
Slide 3 text
HASHICORP
Evolution of compute infrastructure
Slide 4
Slide 4 text
HASHICORP
Evolution of compute infrastructure
Global Public Cloud
AWS - US-West-2 AWS - US-East-1
GCP - US-Central-1
Private Clouds Private Clouds
Slide 5
Slide 5 text
HASHICORP
Challenges of the modern cloud
10s of 1000s of compute nodes to manage
Compute clusters are spread across the globe
Static and offline partitioning of clusters are no longer efficient
Slide 6
Slide 6 text
HASHICORP
Challenges of the modern cloud
Heterogenous API for accessing compute infrastructure
Heterogenous primitives for managing network, secrets, etc
Slide 7
Slide 7 text
HASHICORP
Evolution of application architecture
SOA and Micro Services are replacing monoliths
Distributed Systems are the new normal
Slide 8
Slide 8 text
HASHICORP
Challenges in running modern services
Orchestrated deployment and rollback strategies
More modes of failures
Slide 9
Slide 9 text
HASHICORP
Cluster Schedulers to the rescue
Decouple Work from Resources
Better Quality of Service
Higher Resource Utilization
Slide 10
Slide 10 text
Nomad
HASHICORP
Multi-Datacenter
Multi-Region
Flexible Workloads
Job Priorities
Bin Packing
Large Scale
Operationally Simple
Slide 11
Slide 11 text
HASHICORP
Nomad as Cluster Scheduler
Bin Packing
Job Queueing
Over-Subscription
Higher Resource Utilization
Decouple Work from Resources
Better Quality of Service
Slide 12
Slide 12 text
HASHICORP
Nomad as the Cluster Scheduler
Abstraction
API Contracts
Standardization
Higher Resource Utilization
Decouple Work from Resources
Better Quality of Service
Slide 13
Slide 13 text
HASHICORP
Nomad as the Cluster Scheduler
Priorities
Resource Isolation
Pre-emption
Higher Resource Utilization
Decouple Work from Resources
Better Quality of Service
Slide 14
Slide 14 text
HASHICORP
Job Specification
Declares what to run
Slide 15
Slide 15 text
HASHICORP
example.nomad
# Define our simple redis job
job "redis" {
# Run only in us-east-1
datacenters = ["us-east-1"]
# Define the single redis task using Docker
task "redis" {
driver = "docker"
config {
image = "redis:latest"
}
resources {
cpu = 500 # Mhz
memory = 256 # MB
network {
mbits = 10
dynamic_ports = ["redis"]
}
}
}
}
Slide 16
Slide 16 text
HASHICORP
Job Specification
Nomad determines where and
manages how to run
Slide 17
Slide 17 text
HASHICORP
Job Specification
Abstract work from resources
Slide 18
Slide 18 text
HASHICORP
Supports multiple Clouds, DCs and Regions
Resources across DCs are presented as single pool
Developers can target multiple datacenter in the same job file
Unified interface for developers across clouds
Slide 19
Slide 19 text
HASHICORP
Unified interface across hybrid clouds
AWS GCP Azure
On-Prem
DC
Nomad
Job Spec
Slide 20
Slide 20 text
HASHICORP
Single Region Architecture
SERVER SERVER SERVER
CLIENT CLIENT CLIENT
DC1 DC2 DC3
FOLLOWER LEADER FOLLOWER
REPLICATION
FORWARDING
REPLICATION
FORWARDING
RPC RPC RPC
Slide 21
Slide 21 text
HASHICORP
Multi Region Architecture
SERVER SERVER SERVER
FOLLOWER LEADER FOLLOWER
REPLICATION
FORWARDING
REPLICATION
REGION B GOSSIP
REPLICATION REPLICATION
FORWARDING
REGION FORWARDING
REGION A
SERVER
FOLLOWER
SERVER SERVER
LEADER FOLLOWER
Slide 22
Slide 22 text
Nomad
HASHICORP
Region is Isolation Domain
1-N Datacenters Per Region
Flexibility to do 1:1 (Consul)
Scheduling Boundary
Slide 23
Slide 23 text
HASHICORP
Data Model
Slide 24
Slide 24 text
HASHICORP
Evaluations ~= State Change Event
Slide 25
Slide 25 text
HASHICORP
Create / Update / Delete Job
Node Up / Node Down
Allocation Failed
HASHICORP
Scheduler Architecture
Concurrent and optimistic scheduling
Event Driven invocation of schedulers
No head of line blocking for different type of workloads
Slide 28
Slide 28 text
HASHICORP
Client Architecture
Broad OS Support
Host Fingerprinting
Pluggable Drivers
Slide 29
Slide 29 text
HASHICORP
Drivers
Execute Tasks
Provide Resource Isolation
HASHICORP
Containerized
Virtualized
Standalone
Docker
Rocket
Windows Server Containers
Qemu / KVM
Hyper-V
Xen
Java Jar
Static Binaries
C#
Slide 32
Slide 32 text
HASHICORP
Maintainance Primitives
First class support for doing maintenance on nodes
Drain allocations running on a node
nomad node-drain -enable 149cc920
Are you sure you want to enable drain mode for node "149cc920"? [y/N]
Slide 33
Slide 33 text
HASHICORP
Service Discovery Aware
Allows developers to define services exposed by a job
Keep services and checks synced
Slide 34
Slide 34 text
HASHICORP
example.nomad
job "redis" {
task "redis" {
………
service {
name = “binstore”
tags = [“env:staging”, “stack:beta”]
port = “http”
check {
name = “binstore-http”
type = “http”
path = “/status”
interval = “30s”
timeout = “2s”
}
}
…………
}
}
Slide 35
Slide 35 text
HASHICORP
System Job Scheduler
Runs a job on every node on the cluster
Great for running monitoring, logging, auditing software
Slide 36
Slide 36 text
HASHICORP
Log Management
Takes care of rotating logs of services
Log forwarding coming soon