Scheduling Applications at Scale

11ba9630c9136eef9a70d26473d355d5?s=47 Armon Dadgar
November 20, 2015

Scheduling Applications at Scale

Tools like Docker and rkt make it easier than ever to package and distribute applications. Unfortunately, not all organizations have the luxury of being able to package their applications in a container runtime.

Many organizations have virtualized workloads that cannot be easily containerized, such as applications that require full hardware isolation or virtual appliances. On the opposite end of the spectrum, some organizations deploy workloads that are already static binaries such as Go applications or Java applications that only rely on the JVM. These types of applications do not benefit from containerization as they are already self-contained. To address the growing heterogeneity of workloads, HashiCorp created Nomad - a globally aware, distributed scheduler and cluster manager.

Nomad is designed to handle many types of workloads, on a variety of operating systems, at massive scale. Nomad empowers developers to specify jobs and tasks using a high-level specification in a plain-text file. Nomad accepts the job specification, parses the information, determines which compatible hosts have available resources, and then automatically manages the placement, healing, and scaling of the application. By placing multiple applications per host, Nomad maximizes resource utilization and dramatically reduces infrastructure costs.

The flexibility of Nomad's design brings the benefits of a scheduled application workflow to organizations with heterogeneous workloads and operating systems. This talk will discuss the pros and cons of running in a scheduled environment and an overview of the design and architecture of Nomad.

11ba9630c9136eef9a70d26473d355d5?s=128

Armon Dadgar

November 20, 2015
Tweet

Transcript

  1. Nomad HASHICORP

  2. HASHICORP Armon Dadgar @armon

  3. None
  4. HASHICORP

  5. Nomad HASHICORP Distributed Optimistically Concurrent Scheduler

  6. Nomad HASHICORP Distributed Optimistically Concurrent Scheduler

  7. HASHICORP Schedulers map a set of work to a set

    of resources
  8. HASHICORP CPU Scheduler Web Server -Thread 1 CPU - Core

    1 CPU - Core 2 Web Server -Thread 2 Redis -Thread 1 Kernel -Thread 1 Work (Input) Resources CPU Scheduler
  9. HASHICORP CPU Scheduler Web Server -Thread 1 CPU - Core

    1 CPU - Core 2 Web Server -Thread 2 Redis -Thread 1 Kernel -Thread 1 Work (Input) Resources CPU Scheduler
  10. HASHICORP Schedulers In the Wild Type Work Resources CPU Scheduler

    Threads Physical Cores AWS EC2 / OpenStack Nova Virtual Machines Hypervisors Hadoop YARN MapReduce Jobs Client Nodes Cluster Scheduler Applications Servers
  11. HASHICORP Advantages Higher Resource Utilization Decouple Work from Resources Better

    Quality of Service
  12. HASHICORP Advantages Bin Packing Over-Subscription Job Queueing Higher Resource Utilization

    Decouple Work from Resources Better Quality of Service
  13. HASHICORP Advantages Abstraction API Contracts Standardization Higher Resource Utilization Decouple

    Work from Resources Better Quality of Service
  14. HASHICORP Advantages Priorities Resource Isolation Pre-emption Higher Resource Utilization Decouple

    Work from Resources Better Quality of Service
  15. HASHICORP

  16. Nomad HASHICORP

  17. Nomad HASHICORP Cluster Scheduler Easily Deploy Applications Job Specification

  18. HASHICORP example.nomad # Define our simple redis job job "redis"

    { # Run only in us-east-1 datacenters = ["us-east-1"] # Define the single redis task using Docker task "redis" { driver = "docker" config { image = "redis:latest" } resources { cpu = 500 # Mhz memory = 256 # MB network { mbits = 10 dynamic_ports = ["redis"] } } } }
  19. HASHICORP Job Specification Declares what to run

  20. HASHICORP Job Specification Nomad determines where and manages how to

    run
  21. HASHICORP Job Specification Powerful yet simple

  22. HASHICORP # Define our simple redis job job "redis" {

    # Run only in us-east-1 datacenters = ["us-east-1"] # Define the single redis task using Docker task "redis" { driver = "docker" config { image = "redis:latest" } resources { cpu = 500 # Mhz memory = 256 # MB network { mbits = 10 dynamic_ports = ["redis"] } } } }
  23. HASHICORP Containerized Virtualized Standalone Docker Qemu / KVM Java Jar

    Static Binaries
  24. HASHICORP Containerized Virtualized Standalone Docker Jetpack Windows Server Containers Qemu

    / KVM Hyper-V Xen Java Jar Static Binaries C#
  25. Nomad HASHICORP Application Deployment Docker Multi-Datacenter and Multi-Region Flexible Workloads

    Bin Packing HCL Job Specifications
  26. Nomad HASHICORP Easy for developers Operationally simple Built for scale

  27. HASHICORP Easy for Developers

  28. HASHICORP Nomad for Developers Simple Data Model Declarative Job Specification

    Sane Defaults
  29. HASHICORP job “foobar” { # Restrict the parallelism in updates

    update { stagger = “60s” max_parallel = 3 } … }
  30. HASHICORP job “foobar” { group “api” { # Scale our

    service up count = 5 … } }
  31. HASHICORP job “foobar” { group “api” { # Scale our

    service down count = 3 … } }
  32. HASHICORP job “foobar” { group “hdfs-data-node” { # Ensure the

    scheduler does not put # multiple instances on one host constraint { distinct_hosts = true } … } }
  33. HASHICORP job “foobar” { group “hdfs-data-node” { # Attempt restart

    of tasks if they # fail unexpectedly restart { attempts = 5 interval = “10m” delay = “30s” } … } }
  34. HASHICORP job “foobar” { task “my-app” { # Ensure modern

    kernel available constraint { attribute = “kernel.version” version = “>= 3.14” } … } }
  35. HASHICORP job “foobar” { task “my-app” { # Inject environment

    variables env { MY_FEATURE_FLAG = “ON” } … } }
  36. HASHICORP job “foobar” { task “my-app” { # Register with

    Consul for service # discovery and health checking service { port = “http” check { type = “tcp” interval = “10s” } } … } }
  37. HASHICORP job “foobar” { # Make sure this task runs

    everywhere type = “system” # Nothing should evict our collector priority = 100 task “stats-collector” { … } }
  38. Terminal HASHICORP $ nomad agent -dev ==> Starting Nomad agent...

    ==> Nomad agent configuration: Atlas: <disabled> Client: true Log Level: DEBUG Region: global (DC: dc1) Server: true ==> Nomad agent started! Log data will stream in below: [INFO] serf: EventMemberJoin: nomad.global 127.0.0.1 [INFO] nomad: starting 4 scheduling worker(s) for [service batch _core] [INFO] raft: Node at 127.0.0.1:4647 [Follower] entering Follower state [INFO] nomad: adding server nomad.global (Addr: 127.0.0.1:4647) (DC: dc1) [DEBUG] client: applied fingerprints [storage arch cpu host memory] [DEBUG] client: available drivers [docker exec]
  39. Nomad HASHICORP Infrastructure As Code Declarative Jobs Desired State Emergent

    State
  40. HASHICORP Operationally Simple

  41. HASHICORP Client Server

  42. HASHICORP Built for Scale

  43. HASHICORP Built on Experience gossip consensus

  44. HASHICORP Built on Research gossip consensus

  45. HASHICORP Single Region Architecture SERVER SERVER SERVER CLIENT CLIENT CLIENT

    DC1 DC2 DC3 FOLLOWER LEADER FOLLOWER REPLICATION FORWARDING REPLICATION FORWARDING RPC RPC RPC
  46. HASHICORP Multi Region Architecture SERVER SERVER SERVER FOLLOWER LEADER FOLLOWER

    REPLICATION FORWARDING REPLICATION REGION B  GOSSIP REPLICATION REPLICATION FORWARDING REGION FORWARDING  REGION A SERVER FOLLOWER SERVER SERVER LEADER FOLLOWER
  47. Nomad HASHICORP Region is Isolation Domain 1-N Datacenters Per Region

    Flexibility to do 1:1 (Consul) Scheduling Boundary
  48. HASHICORP Thousands of regions Tens of thousands of clients per

    region Thousands of jobs per region
  49. HASHICORP Optimistically Concurrent

  50. HASHICORP Data Model

  51. HASHICORP Evaluations ~= State Change Event

  52. HASHICORP Create / Update / Delete Job Node Up /

    Node Down Allocation Failed
  53. HASHICORP “Scheduler” = func(Eval) => []AllocUpdates

  54. HASHICORP Scheduler func’s can specialize (Service, Batch, System, etc)

  55. HASHICORP Evaluation Enqueue

  56. HASHICORP Evaluation Dequeue

  57. HASHICORP Plan Generation

  58. HASHICORP Plan Execution

  59. HASHICORP External Event Evalua?on Crea?on Evalua?on Queuing Evalua?on Processing Op?mis?c

    Coordina?on State Updates
  60. HASHICORP Server Architecture Omega Class Scheduler Pluggable Logic Internal Coordination

    and State Multi-Region / Multi-Datacenter
  61. HASHICORP Client Architecture Broad OS Support Host Fingerprinting Pluggable Drivers

  62. HASHICORP Fingerprinting Operating System Hardware Applications Environment Type Examples Kernel,

    OS, Versions CPU, Memory, Disk Java, Docker, Consul AWS, GCE
  63. HASHICORP Fingerprinting Constrain Placement and Bin Pack

  64. HASHICORP Drivers Execute Tasks Provide Resource Isolation

  65. Nomad HASHICORP Workload Flexibility: Schedulers Fingerprints Drivers Job Specification

  66. Nomad HASHICORP Operational Simplicity: Single Binary No Dependencies Highly Available

  67. HASHICORP

  68. HASHICORP Nomad 0.1 Released in October Service and Batch Scheduler

    Docker, Qemu, Exec, Java Drivers
  69. HASHICORP Case Study

  70. HASHICORP Case Study 3 servers in NYC3 100 clients in

    NYC3, SFO1, AMS2/3 1000 Containers
  71. HASHICORP Case Study <1s to schedule 1s to first start

    6s to 95% 8s to 99%
  72. HASHICORP Nomad 0.2 - Service Workloads Service Discovery System Scheduler

    Restart Policies Enhanced Constraints
  73. HASHICORP Nomad 0.3 - Batch Workloads Cron Job Queuing Latency-Aware

    Scheduling
  74. Nomad HASHICORP Cluster Scheduler Easily Deploy Applications Job Specification

  75. Nomad HASHICORP Higher Resource Utilization Decouple Work from Resources Better

    Quality of Service
  76. HASHICORP Thanks! Q/A