Slide 1

Slide 1 text

Scheduling Applications at Scale with Nomad Photo by Sašo Tušar on Unsplash @anubhavm 

Slide 2

Slide 2 text

@anubhavm Anubhav Mishra Developer Advocate, HashiCorp

Slide 3

Slide 3 text

@anubhavm Anubhav Mishra Developer Advocate, HashiCorp has stickers Atlan&s Gopher Artwork by Ashley McNamara

Slide 4

Slide 4 text

PROVISION, SECURE AND RUN ANY INFRASTRUCTURE Nomad Consul Vault Vagrant Packer Terraform Consul Enterprise Terraform Enterprise Vault Enterprise PRODUCT SUITE OSS TOOL SUITE RUN Applications SECURE Application Infrastructure PROVISION Infrastructure FOR INDIVIDUALS FOR TEAMS Nomad Enterprise

Slide 5

Slide 5 text

No content

Slide 6

Slide 6 text

Copyright © 2017 HashiCorp @anubhavm  !6 Globally Distributed Optimistically Concurrent Scheduler

Slide 7

Slide 7 text

Copyright © 2017 HashiCorp @anubhavm  !7 Globally Distributed Optimistically Concurrent Scheduler

Slide 8

Slide 8 text

Scheduler Photo by Emma Matthews on Unsplash

Slide 9

Slide 9 text

Copyright © 2017 HashiCorp @anubhavm  !9 A person or machine that helps scheduling during each day of work. Scheduler

Slide 10

Slide 10 text

Copyright © 2017 HashiCorp @anubhavm  !10

Slide 11

Slide 11 text

Copyright © 2017 HashiCorp @anubhavm  !11 A computer program that controls or manages the execution of jobs / processes / operations. Scheduler (Computing)

Slide 12

Slide 12 text

Copyright © 2017 HashiCorp @anubhavm  !12

Slide 13

Slide 13 text

Copyright © 2017 HashiCorp @anubhavm  !13 Schedulers map a set of work to a set of resources

Slide 14

Slide 14 text

Copyright © 2017 HashiCorp @anubhavm  CPU Scheduler !14 CORE CORE CORE CORE CPU SCHEDULER KERNEL APACHE REDIS BASH

Slide 15

Slide 15 text

Copyright © 2017 HashiCorp @anubhavm  CPU Scheduler !15 CORE CORE CPU SCHEDULER KERNEL APACHE REDIS BASH

Slide 16

Slide 16 text

Copyright © 2017 HashiCorp @anubhavm  CPU Scheduler !16 CORE CORE CPU SCHEDULER KERNEL APACHE REDIS BASH

Slide 17

Slide 17 text

Copyright © 2017 HashiCorp @anubhavm  CPU Scheduler !17 CORE CORE CPU SCHEDULER KERNEL APACHE REDIS BASH

Slide 18

Slide 18 text

Copyright © 2017 HashiCorp @anubhavm  Scheduler Advantages !18 Higher Resource Utilization Decouple Work from Resources Better Quality of Service

Slide 19

Slide 19 text

Copyright © 2017 HashiCorp @anubhavm  Nop! Schedulers Aren’t New Concept !19

Slide 20

Slide 20 text

Copyright © 2017 HashiCorp @anubhavm  Landscape !20

Slide 21

Slide 21 text

No content

Slide 22

Slide 22 text

Designing

Slide 23

Slide 23 text

@anubhavm  !23 Multi-Datacenter Multi-Region Flexible Workloads Job Priorities Bin Packing Large Scale Operationally Simple

Slide 24

Slide 24 text

Copyright © 2017 HashiCorp @anubhavm  Thousands of regions Tens of thousands of clients per region Thousands of jobs per region Scaling Requirements !24

Slide 25

Slide 25 text

Copyright © 2017 HashiCorp @anubhavm  Our Past Experience !25 GOSSIP CONSENSUS

Slide 26

Slide 26 text

@anubhavm  !26 Cluster Management Gossip Based (P2P) Membership Failure Detection Event System Serf

Slide 27

Slide 27 text

@anubhavm  !27 Service Mesh Service Discovery Configuration Coordination (Locking) Central Servers + Distributed Clients

Slide 28

Slide 28 text

Copyright © 2017 HashiCorp @anubhavm  Our Past Experience !28 GOSSIP CONSENSUS Mature Libraries Proven Design Patterns

Slide 29

Slide 29 text

Copyright © 2017 HashiCorp @anubhavm  Our Past Experience !29 GOSSIP CONSENSUS Mature Libraries Proven Design Patterns ?

Slide 30

Slide 30 text

Copyright © 2017 HashiCorp @anubhavm  Our Past Experience !30 GOSSIP CONSENSUS

Slide 31

Slide 31 text

No content

Slide 32

Slide 32 text

@anubhavm  !32 Inspired by Google Omega Optimistic Concurrency State Coordination Service & Batch workloads Pluggable Architecture

Slide 33

Slide 33 text

Copyright © 2017 HashiCorp @anubhavm  Consul Cluster !33 CLIENT CLIENT CLIENT CLIENT CLIENT CLIENT SERVER SERVER SERVER REPLICATION REPLICATION RPC RPC LAN GOSSIP SERVER SERVER SERVER REPLICATION REPLICATION WAN GOSSIP

Slide 34

Slide 34 text

Copyright © 2017 HashiCorp @anubhavm  Single Region Architecture !34 SERVER SERVER SERVER CLIENT CLIENT CLIENT DC1 DC2 DC3 FOLLOWER LEADER FOLLOWER REPLICATION FORWARDING REPLICATION FORWARDING RPC RPC RPC

Slide 35

Slide 35 text

Copyright © 2017 HashiCorp @anubhavm  Multi Region Architecture !35 SERVER SERVER SERVER FOLLOWER LEADER FOLLOWER REPLICATION FORWARDING REPLICATION REGION B  GOSSIP REPLICATION REPLICATION FORWARDING REGION FORWARDING  REGION A SERVER FOLLOWER SERVER SERVER LEADER FOLLOWER

Slide 36

Slide 36 text

Copyright © 2017 HashiCorp @anubhavm  !36 Region is Isolation Domain 1-N Datacenters Per Region Flexibility to do 1:1 (Consul) Scheduling Boundary

Slide 37

Slide 37 text

Installing and Configuring

Slide 38

Slide 38 text

Copyright © 2017 HashiCorp @anubhavm  Installing Nomad !38 Client Server

Slide 39

Slide 39 text

Copyright © 2017 HashiCorp @anubhavm  Fingerprinting !39 Type Examples Operating System Kernel, OS, Version Hardware CPU, Memory, Disk Apps (Capabilities) Docker, Java, Consul Environment AWS, GCE

Slide 40

Slide 40 text

Copyright © 2017 HashiCorp @anubhavm  !40 Constrain Placement and Bin Pack

Slide 41

Slide 41 text

Copyright © 2017 HashiCorp @anubhavm  !41 “Task Requires Linux, Docker, and PCI-Compliant Hardware” expressed as constraints in job file

Slide 42

Slide 42 text

Copyright © 2017 HashiCorp @anubhavm  !42 “Task needs 512MB RAM and 1 Core” expressed as resource in job file

Slide 43

Slide 43 text

Terminal $ nomad agent -config=/path/to/config.hcl

Slide 44

Slide 44 text

Terminal $ echo "let’s try it?"

Slide 45

Slide 45 text

Job File

Slide 46

Slide 46 text

@anubhavm  !X job "redis" { datacenters = ["us-east-1"] task "redis" { driver = "docker" config { image = "redis:latest" } resources { cpu = 500 # Mhz memory = 256 # MB network { mbits = 10 port "redis" {} } } } }

Slide 47

Slide 47 text

@anubhavm  !X job "webserver" { datacenters = ["us-east-1"] task "webserver" { driver = "exec" config { command = "yet-another-golang-webserver" } artifact { source = "https://github.com/download/../yet-another-golang-webserver" } resources { cpu = 500 # Mhz memory = 128 # MB network { port "http" { static = 8080 } } } } }

Slide 48

Slide 48 text

Terminal $ echo "let’s try it?"

Slide 49

Slide 49 text

Copyright © 2017 HashiCorp @anubhavm  !49 Job specification declares what to run

Slide 50

Slide 50 text

Copyright © 2017 HashiCorp @anubhavm  !50 Nomad determines how and where to run

Slide 51

Slide 51 text

Copyright © 2017 HashiCorp @anubhavm  !51 Nomad abstracts work from resources

Slide 52

Slide 52 text

Job Types

Slide 53

Slide 53 text

Copyright © 2017 HashiCorp @anubhavm  Nomad has three scheduler types that can be used when creating your job: service, batch, and system. Job Types !53

Slide 54

Slide 54 text

Copyright © 2017 HashiCorp @anubhavm  Service Scheduler Job Type The service scheduler is designed for scheduling long-lived services that should never go down. The service scheduler ranks a large portion of the nodes that meet the jobs constraints and selects the optimal node to place a task group on. Examples: webapp, redis Job Types: Service !54

Slide 55

Slide 55 text

Copyright © 2017 HashiCorp @anubhavm  Batch Scheduler Job Type Batch jobs are less sensitive to short-term performance fluctuations and are short lived, finishing after some period. Examples: billing, data replication Job Types: Batch !55

Slide 56

Slide 56 text

Copyright © 2017 HashiCorp @anubhavm  System Scheduler Job Type The system scheduler is used to register jobs that should be run on all clients that meet the job's constraints. The system scheduler is also invoked when clients join the cluster or transition into the ready state. Examples: logging agent, security auditing tool Job Types: System !56

Slide 57

Slide 57 text

Drivers

Slide 58

Slide 58 text

@anubhavm  !58 Containerized Virtualized Standalone Docker Qemu / KVM Java Jar Static Binaries rkt LXC

Slide 59

Slide 59 text

@anubhavm  !59 Containerized Virtualized Standalone Docker Qemu / KVM Java Jar Static Binaries rkt LXC Windows Server Containers Hyper-V Xen C#

Slide 60

Slide 60 text

Service Discovery

Slide 61

Slide 61 text

Copyright © 2017 HashiCorp @anubhavm  Consul is a free and open-source tool by HashiCorp that implements service discovery. It uses the RAFT and gossip protocols to reach massive scale. It has integrations with health checks, so unhealthy services are not added to the service discovery layer. Similar client-server model to Nomad. Service Discovery using Consul !61

Slide 62

Slide 62 text

@anubhavm  !X job "redis" { datacenters = [“us-east-1”] service { name = "redis" port = "redis" check { type = "tcp" port = "redis" interval = "10s" timeout = "5s" } } }

Slide 63

Slide 63 text

Terminal $ echo "let’s try it?"

Slide 64

Slide 64 text

Advanced Job Strategies

Slide 65

Slide 65 text

Copyright © 2017 HashiCorp @anubhavm  Nomad currently supports two advanced job strategies: Rolling Upgrades, Blue/Green & Canary Deployments Advanced Job Strategies !65

Slide 66

Slide 66 text

@anubhavm  !X job "redis" { # Add an update stanza to enable rolling updates of the service update { max_parallel = 1 min_healthy_time = "30s" healthy_deadline = "10m" } ..... }

Slide 67

Slide 67 text

@anubhavm  !X job "redis" { # Add an update stanza to enable rolling updates of the service update { max_parallel = 1 canary = 5 min_healthy_time = "30s" auto_revert = true } ..... }

Slide 68

Slide 68 text

Terminal $ echo "let’s try it?"

Slide 69

Slide 69 text

@anubhavm  !69 Multi-Datacenter Multi-Region Flexible Workloads Job Priorities Bin Packing Large Scale Operationally Simple

Slide 70

Slide 70 text

@anubhavm  !70 Multi-Datacenter Multi-Region Flexible Workloads Job Priorities Bin Packing Large Scale Operationally Simple

Slide 71

Slide 71 text

s Copyright © 2017 HashiCorp @anubhavm  !71 ONE MORE DEMO

Slide 72

Slide 72 text

Nomad Million Container Challenge 1,000 Jobs 1,000 Tasks per Job 5,000 Hosts on GCE 1,000,000 Containers

Slide 73

Slide 73 text

Copyright © 2017 HashiCorp @anubhavm  !73

Slide 74

Slide 74 text

No content

Slide 75

Slide 75 text

Copyright © 2017 HashiCorp @anubhavm  !75 “640 KB ought to be enough for anybody.” - Bill Gates

Slide 76

Slide 76 text

Copyright © 2017 HashiCorp @anubhavm  !76 2nd Largest Hedge Fund 18K Cores 5 Hours 2,200 Containers/second

Slide 77

Slide 77 text

Copyright © 2017 HashiCorp @anubhavm  !77 CircleCI 7+ Million Builds a Month Sustain 400-1000 Jobs a Minute Great Talk By Danielle Tomlinson: https://youtu.be/b8NQO_vFAYo

Slide 78

Slide 78 text

Copyright © 2017 HashiCorp @anubhavm  !78 Globally Distributed Optimistically Concurrent Scheduler

Slide 79

Slide 79 text

Copyright © 2017 HashiCorp @anubhavm  !79 Higher Resource Utilization Decouple Work from Resources Better Quality of Service

Slide 80

Slide 80 text

www.hashicorp.com FOR EVERYONE, EVERYWHERE Thank You! I have stickers! Ask me anything. @anubhavm Anubhav Mishra