Slide 1

Slide 1 text

Scheduling at Scale Photo by Sašo Tušar on Unsplash @anubhavm 

Slide 2

Slide 2 text

@anubhavm Anubhav Mishra Developer Advocate, HashiCorp

Slide 3

Slide 3 text

@anubhavm Anubhav Mishra Developer Advocate, HashiCorp has stickers

Slide 4

Slide 4 text

PROVISION, SECURE AND RUN ANY INFRASTRUCTURE Nomad Consul Vault Vagrant Packer Terraform Consul Enterprise Terraform Enterprise Vault Enterprise PRODUCT SUITE OSS TOOL SUITE RUN Applications SECURE Application Infrastructure PROVISION Infrastructure FOR INDIVIDUALS FOR TEAMS Nomad Enterprise

Slide 5

Slide 5 text

No content

Slide 6

Slide 6 text

Copyright © 2017 HashiCorp @anubhavm  !6 Globally Distributed Optimistically Concurrent Scheduler

Slide 7

Slide 7 text

Copyright © 2017 HashiCorp @anubhavm  !7 Globally Distributed Optimistically Concurrent Scheduler

Slide 8

Slide 8 text

Scheduling Photo by Emma Matthews on Unsplash

Slide 9

Slide 9 text

Copyright © 2017 HashiCorp @anubhavm  !9 [1] Assigning an appropriate number of workers to the jobs during each day of work. Scheduling [1] Read more: http://www.businessdictionary.com/definition/scheduling.html

Slide 10

Slide 10 text

Copyright © 2017 HashiCorp @anubhavm  !10 A person or machine that helps scheduling during each day of work. Scheduler

Slide 11

Slide 11 text

Copyright © 2017 HashiCorp @anubhavm  !11

Slide 12

Slide 12 text

Copyright © 2017 HashiCorp @anubhavm  !12 A computer program that controls or manages the execution of jobs / processes / operations. Scheduler (Computing)

Slide 13

Slide 13 text

Copyright © 2017 HashiCorp @anubhavm  !13

Slide 14

Slide 14 text

Copyright © 2017 HashiCorp @anubhavm  !14 Schedulers map a set of work to a set of resources

Slide 15

Slide 15 text

Copyright © 2017 HashiCorp @anubhavm  Scheduling !15 Traditional DATACENTER

Slide 16

Slide 16 text

Copyright © 2017 HashiCorp @anubhavm  Scheduling !16 Traditional DATACENTER OPERATIONS ENGINEER

Slide 17

Slide 17 text

Copyright © 2017 HashiCorp @anubhavm  Scheduling !17 Traditional DATACENTER OPERATIONS ENGINEER Gandalf Gollum Frodo Sam

Slide 18

Slide 18 text

Copyright © 2017 HashiCorp @anubhavm  Scheduling !18 Traditional DATACENTER OPERATIONS ENGINEER Gandalf Gollum Frodo Sam

Slide 19

Slide 19 text

Copyright © 2017 HashiCorp @anubhavm  Scheduling !19 Traditional DATACENTER OPERATIONS ENGINEER Gandalf Gollum Frodo Sam

Slide 20

Slide 20 text

Copyright © 2017 HashiCorp @anubhavm  Scheduling !20 Traditional DATACENTER OPERATIONS ENGINEER Gandalf Gollum Frodo Sam

Slide 21

Slide 21 text

Copyright © 2017 HashiCorp @anubhavm  !21

Slide 22

Slide 22 text

Copyright © 2017 HashiCorp @anubhavm  Scheduling !22 Traditional DATACENTER OPERATIONS ENGINEER Gandalf Gollum Frodo Sam

Slide 23

Slide 23 text

Copyright © 2017 HashiCorp @anubhavm  Scheduling !23 Traditional DATACENTER OPERATIONS ENGINEER Gandalf Gollum Frodo Sam

Slide 24

Slide 24 text

Copyright © 2017 HashiCorp @anubhavm  Scheduling !24 Traditional DATACENTER OPERATIONS ENGINEER Gandalf Gollum Frodo Sam

Slide 25

Slide 25 text

Copyright © 2017 HashiCorp @anubhavm  Scheduling !25 Traditional DATACENTER OPERATIONS ENGINEER Gandalf Gollum Frodo Sam

Slide 26

Slide 26 text

Copyright © 2017 HashiCorp @anubhavm  !26

Slide 27

Slide 27 text

No content

Slide 28

Slide 28 text

Copyright © 2017 HashiCorp @anubhavm  CPU Scheduler !28 CORE CORE CORE CORE CPU SCHEDULER KERNEL APACHE REDIS BASH

Slide 29

Slide 29 text

Copyright © 2017 HashiCorp @anubhavm  CPU Scheduler !29 CORE CORE CPU SCHEDULER KERNEL APACHE REDIS BASH

Slide 30

Slide 30 text

Copyright © 2017 HashiCorp @anubhavm  CPU Scheduler !30 CORE CORE CPU SCHEDULER KERNEL APACHE REDIS BASH

Slide 31

Slide 31 text

Copyright © 2017 HashiCorp @anubhavm  CPU Scheduler !31 CORE CORE CPU SCHEDULER KERNEL APACHE REDIS BASH

Slide 32

Slide 32 text

Copyright © 2017 HashiCorp @anubhavm  Scheduler Advantages !32 Higher Resource Utilization Decouple Work from Resources Better Quality of Service

Slide 33

Slide 33 text

Copyright © 2017 HashiCorp @anubhavm  Scheduler Advantages !33 Bin Packing Over-Subscription Job Queueing Higher Resource Utilization Decouple Work from Resources Better Quality of Service

Slide 34

Slide 34 text

Copyright © 2017 HashiCorp @anubhavm  Scheduler Advantages !34 Abstraction API Contracts Standardization Higher Resource Utilization Decouple Work from Resources Better Quality of Service

Slide 35

Slide 35 text

Copyright © 2017 HashiCorp @anubhavm  Scheduler Advantages !35 Priorities Resource Isolation Pre-emption Higher Resource Utilization Decouple Work from Resources Better Quality of Service

Slide 36

Slide 36 text

Copyright © 2017 HashiCorp @anubhavm  Nop! Schedulers Aren’t New Concept !36

Slide 37

Slide 37 text

Copyright © 2017 HashiCorp @anubhavm  Landscape !37

Slide 38

Slide 38 text

No content

Slide 39

Slide 39 text

@anubhavm  !39 Cluster Scheduler Deployments Job Specification

Slide 40

Slide 40 text

@anubhavm  !X job "redis" { datacenters = ["us-east-1"] task "redis" { driver = "docker" config { image = "redis:latest" } resources { cpu = 500 # Mhz memory = 256 # MB network { mbits = 10 port "redis" {} } } } }

Slide 41

Slide 41 text

@anubhavm  !X job "webserver" { datacenters = ["us-east-1"] task "webserver" { driver = "exec" config { command = "yet-another-golang-webserver-linux_amd64" } artifact { source = "https://github.com/anubhavmishra/yet-another-golang-webserver/releases/ download/v1.0.0/yet-another-golang-webserver-linux_amd64" } resources { cpu = 500 # Mhz memory = 128 # MB network { port "http" { static = 8080 } } } } }

Slide 42

Slide 42 text

Copyright © 2017 HashiCorp @anubhavm  !42 Job specification declares what to run

Slide 43

Slide 43 text

Copyright © 2017 HashiCorp @anubhavm  !43 Nomad determines how and where to run

Slide 44

Slide 44 text

Copyright © 2017 HashiCorp @anubhavm  !44 Nomad abstracts work from resources

Slide 45

Slide 45 text

Designing

Slide 46

Slide 46 text

@anubhavm  !46 Multi-Datacenter Multi-Region Flexible Workloads Job Priorities Bin Packing Large Scale Operationally Simple

Slide 47

Slide 47 text

Copyright © 2017 HashiCorp @anubhavm  Thousands of regions Tens of thousands of clients per region Thousands of jobs per region Scaling Requirements !47

Slide 48

Slide 48 text

Copyright © 2017 HashiCorp @anubhavm  Our Past Experience !48 GOSSIP CONSENSUS

Slide 49

Slide 49 text

@anubhavm  !49 Cluster Management Gossip Based (P2P) Membership Failure Detection Event System Serf

Slide 50

Slide 50 text

@anubhavm  !50 Serf Gossip Protocol Large Scale Production Hardened Operationally Simple

Slide 51

Slide 51 text

@anubhavm  !51 Service Discovery Configuration Coordination (Locking) Central Servers + Distributed Clients

Slide 52

Slide 52 text

@anubhavm  !52 Multi-Datacenter Raft Consensus Large Scale Production Hardened

Slide 53

Slide 53 text

Copyright © 2017 HashiCorp @anubhavm  Our Past Experience !53 GOSSIP CONSENSUS Mature Libraries Proven Design Patterns

Slide 54

Slide 54 text

Copyright © 2017 HashiCorp @anubhavm  Our Past Experience !54 GOSSIP CONSENSUS Mature Libraries Proven Design Patterns ?

Slide 55

Slide 55 text

Copyright © 2017 HashiCorp @anubhavm  Our Past Experience !55 GOSSIP CONSENSUS

Slide 56

Slide 56 text

No content

Slide 57

Slide 57 text

@anubhavm  !57 Optimistic vs Pessimistic Internal vs External State Single vs Multi Level Fixed vs Pluggable Service vs Batch Oriented

Slide 58

Slide 58 text

@anubhavm  !58 Inspired by Google Omega Optimistic Concurrency State Coordination Service & Batch workloads Pluggable Architecture

Slide 59

Slide 59 text

Copyright © 2017 HashiCorp @anubhavm  Consul Cluster !59 CLIENT CLIENT CLIENT CLIENT CLIENT CLIENT SERVER SERVER SERVER REPLICATION REPLICATION RPC RPC LAN GOSSIP SERVER SERVER SERVER REPLICATION REPLICATION WAN GOSSIP

Slide 60

Slide 60 text

Copyright © 2017 HashiCorp @anubhavm  Single Region Architecture !60 SERVER SERVER SERVER CLIENT CLIENT CLIENT DC1 DC2 DC3 FOLLOWER LEADER FOLLOWER REPLICATION FORWARDING REPLICATION FORWARDING RPC RPC RPC

Slide 61

Slide 61 text

Copyright © 2017 HashiCorp @anubhavm  Single Region Architecture !61 SERVER SERVER SERVER FOLLOWER LEADER FOLLOWER REPLICATION FORWARDING REPLICATION REGION B  GOSSIP REPLICATION REPLICATION FORWARDING REGION FORWARDING  REGION A SERVER FOLLOWER SERVER SERVER LEADER FOLLOWER

Slide 62

Slide 62 text

Copyright © 2017 HashiCorp @anubhavm  !62 Region is Isolation Domain 1-N Datacenters Per Region Flexibility to do 1:1 (Consul) Scheduling Boundary

Slide 63

Slide 63 text

Copyright © 2017 HashiCorp @anubhavm  Omega Class Scheduler Pluggable Logic Internal Coordination and State Multi-Region / Multi-Datacenter Server Architecture !63

Slide 64

Slide 64 text

Copyright © 2017 HashiCorp @anubhavm  Broad OS Support Host Fingerprinting Pluggable Drivers Client Architecture !64

Slide 65

Slide 65 text

Copyright © 2017 HashiCorp @anubhavm  Fingerprinting !65 Type Examples Operating System Kernel, OS, Version Hardware CPU, Memory, Disk Apps (Capabilities) Docker, Java, Consul Environment AWS, GCE

Slide 66

Slide 66 text

Copyright © 2017 HashiCorp @anubhavm  !66 Constrain Placement and Bin Pack

Slide 67

Slide 67 text

Copyright © 2017 HashiCorp @anubhavm  !67 “Task Requires Linux, Docker, and PCI-Compliant Hardware” expressed as constraints in job file

Slide 68

Slide 68 text

Copyright © 2017 HashiCorp @anubhavm  !68 “Task needs 512MB RAM and 1 Core” expressed as resource in job file

Slide 69

Slide 69 text

Drivers

Slide 70

Slide 70 text

@anubhavm  !70 Containerized Virtualized Standalone Docker Qemu / KVM Java Jar Static Binaries rkt LXC

Slide 71

Slide 71 text

@anubhavm  !71 Containerized Virtualized Standalone Docker Qemu / KVM Java Jar Static Binaries rkt LXC Windows Server Containers Hyper-V Xen C#

Slide 72

Slide 72 text

Copyright © 2017 HashiCorp @anubhavm  !72 Schedulers Fingerprints Drivers Job Specification

Slide 73

Slide 73 text

@anubhavm  !73 Single Binary No Dependencies Highly Available

Slide 74

Slide 74 text

Nomad Million Container Challenge 1,000 Jobs 1,000 Tasks per Job 5,000 Hosts on GCE 1,000,000 Containers

Slide 75

Slide 75 text

Copyright © 2017 HashiCorp @anubhavm  !75

Slide 76

Slide 76 text

No content

Slide 77

Slide 77 text

Copyright © 2017 HashiCorp @anubhavm  !77 “640 KB ought to be enough for anybody.” - Bill Gates

Slide 78

Slide 78 text

Copyright © 2017 HashiCorp @anubhavm  !78 2nd Largest Hedge Fund 18K Cores 5 Hours 2,200 Containers/second

Slide 79

Slide 79 text

Copyright © 2017 HashiCorp @anubhavm  !79 7+ Million Builds a Month Sustain 400-1000 Jobs a Minute Great Talk By Danielle Tomlinson: https://youtu.be/b8NQO_vFAYo

Slide 80

Slide 80 text

s Copyright © 2017 HashiCorp @anubhavm  !80 DEMO

Slide 81

Slide 81 text

Copyright © 2017 HashiCorp @anubhavm  !81 Globally Distributed Optimistically Concurrent Scheduler

Slide 82

Slide 82 text

Copyright © 2017 HashiCorp @anubhavm  !82 Higher Resource Utilization Decouple Work from Resources Better Quality of Service

Slide 83

Slide 83 text

| June 25-27, 2018

Slide 84

Slide 84 text

October 22-24, 2018 | San Francisco

Slide 85

Slide 85 text

Thank You! I have stickers! Ask me anything. @anubhavm www.hashicorp.com Anubhav Mishra