Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Scheduling Applications at Scale with Nomad

Scheduling Applications at Scale with Nomad

Session details on Linux Foundation OSS Summit Website: https://ossna18.sched.com/event/FAP0/scheduling-applications-at-scale-with-nomad-anubhav-mishra-hashicorp

Scheduler frameworks enable reliable and repeatable application deploys. In this session, attendees will use Nomad, a single binary cluster scheduler, to build a multi-region, self-healing production environment that runs a diverse set of workloads. They will also get hands on experience in writing and submitting job specifications, interacting with the API, and deployment strategies. This session will cover the following:
- Nomad Overview
- Installing and Configuring Nomad
- Creating, Running, and Inspecting Jobs
- Service Registration
- Interacting via the HTTP API
- Advanced Job Strategies (rolling updates, blue-green)

Anubhav Mishra

August 29, 2018
Tweet

More Decks by Anubhav Mishra

Other Decks in Technology

Transcript

  1. PROVISION, SECURE AND RUN ANY INFRASTRUCTURE Nomad Consul Vault Vagrant

    Packer Terraform Consul Enterprise Terraform Enterprise Vault Enterprise PRODUCT SUITE OSS TOOL SUITE RUN Applications SECURE Application Infrastructure PROVISION Infrastructure FOR INDIVIDUALS FOR TEAMS Nomad Enterprise
  2. Copyright © 2017 HashiCorp @anubhavm  !9 A person or

    machine that helps scheduling during each day of work. Scheduler
  3. Copyright © 2017 HashiCorp @anubhavm  !11 A computer program

    that controls or manages the execution of jobs / processes / operations. Scheduler (Computing)
  4. Copyright © 2017 HashiCorp @anubhavm  CPU Scheduler !14 CORE

    CORE CORE CORE CPU SCHEDULER KERNEL APACHE REDIS BASH
  5. Copyright © 2017 HashiCorp @anubhavm  CPU Scheduler !15 CORE

    CORE CPU SCHEDULER KERNEL APACHE REDIS BASH
  6. Copyright © 2017 HashiCorp @anubhavm  CPU Scheduler !16 CORE

    CORE CPU SCHEDULER KERNEL APACHE REDIS BASH
  7. Copyright © 2017 HashiCorp @anubhavm  CPU Scheduler !17 CORE

    CORE CPU SCHEDULER KERNEL APACHE REDIS BASH
  8. Copyright © 2017 HashiCorp @anubhavm  Scheduler Advantages !18 Higher

    Resource Utilization Decouple Work from Resources Better Quality of Service
  9. Copyright © 2017 HashiCorp @anubhavm  Thousands of regions Tens

    of thousands of clients per region Thousands of jobs per region Scaling Requirements !24
  10. Copyright © 2017 HashiCorp @anubhavm  Our Past Experience !28

    GOSSIP CONSENSUS Mature Libraries Proven Design Patterns
  11. Copyright © 2017 HashiCorp @anubhavm  Our Past Experience !29

    GOSSIP CONSENSUS Mature Libraries Proven Design Patterns ?
  12. @anubhavm  !32 Inspired by Google Omega Optimistic Concurrency State

    Coordination Service & Batch workloads Pluggable Architecture
  13. Copyright © 2017 HashiCorp @anubhavm  Consul Cluster !33 CLIENT

    CLIENT CLIENT CLIENT CLIENT CLIENT SERVER SERVER SERVER REPLICATION REPLICATION RPC RPC LAN GOSSIP SERVER SERVER SERVER REPLICATION REPLICATION WAN GOSSIP
  14. Copyright © 2017 HashiCorp @anubhavm  Single Region Architecture !34

    SERVER SERVER SERVER CLIENT CLIENT CLIENT DC1 DC2 DC3 FOLLOWER LEADER FOLLOWER REPLICATION FORWARDING REPLICATION FORWARDING RPC RPC RPC
  15. Copyright © 2017 HashiCorp @anubhavm  Multi Region Architecture !35

    SERVER SERVER SERVER FOLLOWER LEADER FOLLOWER REPLICATION FORWARDING REPLICATION REGION B  GOSSIP REPLICATION REPLICATION FORWARDING REGION FORWARDING  REGION A SERVER FOLLOWER SERVER SERVER LEADER FOLLOWER
  16. Copyright © 2017 HashiCorp @anubhavm  !36 Region is Isolation

    Domain 1-N Datacenters Per Region Flexibility to do 1:1 (Consul) Scheduling Boundary
  17. Copyright © 2017 HashiCorp @anubhavm  Fingerprinting !39 Type Examples

    Operating System Kernel, OS, Version Hardware CPU, Memory, Disk Apps (Capabilities) Docker, Java, Consul Environment AWS, GCE
  18. Copyright © 2017 HashiCorp @anubhavm  !41 “Task Requires Linux,

    Docker, and PCI-Compliant Hardware” expressed as constraints in job file
  19. Copyright © 2017 HashiCorp @anubhavm  !42 “Task needs 512MB

    RAM and 1 Core” expressed as resource in job file
  20. @anubhavm  !X job "redis" { datacenters = ["us-east-1"] task

    "redis" { driver = "docker" config { image = "redis:latest" } resources { cpu = 500 # Mhz memory = 256 # MB network { mbits = 10 port "redis" {} } } } }
  21. @anubhavm  !X job "webserver" { datacenters = ["us-east-1"] task

    "webserver" { driver = "exec" config { command = "yet-another-golang-webserver" } artifact { source = "https://github.com/download/../yet-another-golang-webserver" } resources { cpu = 500 # Mhz memory = 128 # MB network { port "http" { static = 8080 } } } } }
  22. Copyright © 2017 HashiCorp @anubhavm  Nomad has three scheduler

    types that can be used when creating your job: service, batch, and system. Job Types !53
  23. Copyright © 2017 HashiCorp @anubhavm  Service Scheduler Job Type

    The service scheduler is designed for scheduling long-lived services that should never go down. The service scheduler ranks a large portion of the nodes that meet the jobs constraints and selects the optimal node to place a task group on. Examples: webapp, redis Job Types: Service !54
  24. Copyright © 2017 HashiCorp @anubhavm  Batch Scheduler Job Type

    Batch jobs are less sensitive to short-term performance fluctuations and are short lived, finishing after some period. Examples: billing, data replication Job Types: Batch !55
  25. Copyright © 2017 HashiCorp @anubhavm  System Scheduler Job Type

    The system scheduler is used to register jobs that should be run on all clients that meet the job's constraints. The system scheduler is also invoked when clients join the cluster or transition into the ready state. Examples: logging agent, security auditing tool Job Types: System !56
  26. @anubhavm  !59 Containerized Virtualized Standalone Docker Qemu / KVM

    Java Jar Static Binaries rkt LXC Windows Server Containers Hyper-V Xen C#
  27. Copyright © 2017 HashiCorp @anubhavm  Consul is a free

    and open-source tool by HashiCorp that implements service discovery. It uses the RAFT and gossip protocols to reach massive scale. It has integrations with health checks, so unhealthy services are not added to the service discovery layer. Similar client-server model to Nomad. Service Discovery using Consul !61
  28. @anubhavm  !X job "redis" { datacenters = [“us-east-1”] service

    { name = "redis" port = "redis" check { type = "tcp" port = "redis" interval = "10s" timeout = "5s" } } }
  29. Copyright © 2017 HashiCorp @anubhavm  Nomad currently supports two

    advanced job strategies: Rolling Upgrades, Blue/Green & Canary Deployments Advanced Job Strategies !65
  30. @anubhavm  !X job "redis" { # Add an update

    stanza to enable rolling updates of the service update { max_parallel = 1 min_healthy_time = "30s" healthy_deadline = "10m" } ..... }
  31. @anubhavm  !X job "redis" { # Add an update

    stanza to enable rolling updates of the service update { max_parallel = 1 canary = 5 min_healthy_time = "30s" auto_revert = true } ..... }
  32. Copyright © 2017 HashiCorp @anubhavm  !75 “640 KB ought

    to be enough for anybody.” - Bill Gates
  33. Copyright © 2017 HashiCorp @anubhavm  !76 2nd Largest Hedge

    Fund 18K Cores 5 Hours 2,200 Containers/second
  34. Copyright © 2017 HashiCorp @anubhavm  !77 CircleCI 7+ Million

    Builds a Month Sustain 400-1000 Jobs a Minute Great Talk By Danielle Tomlinson: https://youtu.be/b8NQO_vFAYo
  35. Copyright © 2017 HashiCorp @anubhavm  !79 Higher Resource Utilization

    Decouple Work from Resources Better Quality of Service