Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Scheduling Applications at Scale

Armon Dadgar
November 20, 2015

Scheduling Applications at Scale

Tools like Docker and rkt make it easier than ever to package and distribute applications. Unfortunately, not all organizations have the luxury of being able to package their applications in a container runtime.

Many organizations have virtualized workloads that cannot be easily containerized, such as applications that require full hardware isolation or virtual appliances. On the opposite end of the spectrum, some organizations deploy workloads that are already static binaries such as Go applications or Java applications that only rely on the JVM. These types of applications do not benefit from containerization as they are already self-contained. To address the growing heterogeneity of workloads, HashiCorp created Nomad - a globally aware, distributed scheduler and cluster manager.

Nomad is designed to handle many types of workloads, on a variety of operating systems, at massive scale. Nomad empowers developers to specify jobs and tasks using a high-level specification in a plain-text file. Nomad accepts the job specification, parses the information, determines which compatible hosts have available resources, and then automatically manages the placement, healing, and scaling of the application. By placing multiple applications per host, Nomad maximizes resource utilization and dramatically reduces infrastructure costs.

The flexibility of Nomad's design brings the benefits of a scheduled application workflow to organizations with heterogeneous workloads and operating systems. This talk will discuss the pros and cons of running in a scheduled environment and an overview of the design and architecture of Nomad.

Armon Dadgar

November 20, 2015
Tweet

More Decks by Armon Dadgar

Other Decks in Technology

Transcript

  1. Nomad
    HASHICORP

    View full-size slide

  2. HASHICORP
    Armon Dadgar
    @armon

    View full-size slide

  3. Nomad
    HASHICORP
    Distributed
    Optimistically Concurrent
    Scheduler

    View full-size slide

  4. Nomad
    HASHICORP
    Distributed
    Optimistically Concurrent
    Scheduler

    View full-size slide

  5. HASHICORP
    Schedulers map a set of work to a
    set of resources

    View full-size slide

  6. HASHICORP
    CPU Scheduler
    Web Server -Thread 1
    CPU - Core 1
    CPU - Core 2
    Web Server -Thread 2
    Redis -Thread 1
    Kernel -Thread 1
    Work (Input) Resources
    CPU
    Scheduler

    View full-size slide

  7. HASHICORP
    CPU Scheduler
    Web Server -Thread 1
    CPU - Core 1
    CPU - Core 2
    Web Server -Thread 2
    Redis -Thread 1
    Kernel -Thread 1
    Work (Input) Resources
    CPU
    Scheduler

    View full-size slide

  8. HASHICORP
    Schedulers In the Wild
    Type Work Resources
    CPU Scheduler Threads Physical Cores
    AWS EC2 / OpenStack Nova Virtual Machines Hypervisors
    Hadoop YARN MapReduce Jobs Client Nodes
    Cluster Scheduler Applications Servers

    View full-size slide

  9. HASHICORP
    Advantages
    Higher Resource Utilization
    Decouple Work from Resources
    Better Quality of Service

    View full-size slide

  10. HASHICORP
    Advantages
    Bin Packing
    Over-Subscription
    Job Queueing
    Higher Resource Utilization
    Decouple Work from Resources
    Better Quality of Service

    View full-size slide

  11. HASHICORP
    Advantages
    Abstraction
    API Contracts
    Standardization
    Higher Resource Utilization
    Decouple Work from Resources
    Better Quality of Service

    View full-size slide

  12. HASHICORP
    Advantages
    Priorities
    Resource Isolation
    Pre-emption
    Higher Resource Utilization
    Decouple Work from Resources
    Better Quality of Service

    View full-size slide

  13. Nomad
    HASHICORP

    View full-size slide

  14. Nomad
    HASHICORP
    Cluster Scheduler
    Easily Deploy Applications
    Job Specification

    View full-size slide

  15. HASHICORP
    example.nomad
    # Define our simple redis job
    job "redis" {
    # Run only in us-east-1
    datacenters = ["us-east-1"]
    # Define the single redis task using Docker
    task "redis" {
    driver = "docker"
    config {
    image = "redis:latest"
    }
    resources {
    cpu = 500 # Mhz
    memory = 256 # MB
    network {
    mbits = 10
    dynamic_ports = ["redis"]
    }
    }
    }
    }

    View full-size slide

  16. HASHICORP
    Job Specification
    Declares what to run

    View full-size slide

  17. HASHICORP
    Job Specification
    Nomad determines where and
    manages how to run

    View full-size slide

  18. HASHICORP
    Job Specification
    Powerful yet simple

    View full-size slide

  19. HASHICORP
    # Define our simple redis job
    job "redis" {
    # Run only in us-east-1
    datacenters = ["us-east-1"]
    # Define the single redis task using Docker
    task "redis" {
    driver = "docker"
    config {
    image = "redis:latest"
    }
    resources {
    cpu = 500 # Mhz
    memory = 256 # MB
    network {
    mbits = 10
    dynamic_ports = ["redis"]
    }
    }
    }
    }

    View full-size slide

  20. HASHICORP
    Containerized
    Virtualized
    Standalone
    Docker
    Qemu / KVM
    Java Jar
    Static Binaries

    View full-size slide

  21. HASHICORP
    Containerized
    Virtualized
    Standalone
    Docker
    Jetpack
    Windows Server Containers
    Qemu / KVM
    Hyper-V
    Xen
    Java Jar
    Static Binaries
    C#

    View full-size slide

  22. Nomad
    HASHICORP
    Application Deployment
    Docker
    Multi-Datacenter and Multi-Region
    Flexible Workloads
    Bin Packing
    HCL Job Specifications

    View full-size slide

  23. Nomad
    HASHICORP
    Easy for developers
    Operationally simple
    Built for scale

    View full-size slide

  24. HASHICORP
    Easy for Developers

    View full-size slide

  25. HASHICORP
    Nomad for Developers
    Simple Data Model
    Declarative Job Specification
    Sane Defaults

    View full-size slide

  26. HASHICORP
    job “foobar” {
    # Restrict the parallelism in updates
    update {
    stagger = “60s”
    max_parallel = 3
    }

    }

    View full-size slide

  27. HASHICORP
    job “foobar” {
    group “api” {
    # Scale our service up
    count = 5

    }
    }

    View full-size slide

  28. HASHICORP
    job “foobar” {
    group “api” {
    # Scale our service down
    count = 3

    }
    }

    View full-size slide

  29. HASHICORP
    job “foobar” {
    group “hdfs-data-node” {
    # Ensure the scheduler does not put
    # multiple instances on one host
    constraint {
    distinct_hosts = true
    }

    }
    }

    View full-size slide

  30. HASHICORP
    job “foobar” {
    group “hdfs-data-node” {
    # Attempt restart of tasks if they
    # fail unexpectedly
    restart {
    attempts = 5
    interval = “10m”
    delay = “30s”
    }

    }
    }

    View full-size slide

  31. HASHICORP
    job “foobar” {
    task “my-app” {
    # Ensure modern kernel available
    constraint {
    attribute = “kernel.version”
    version = “>= 3.14”
    }

    }
    }

    View full-size slide

  32. HASHICORP
    job “foobar” {
    task “my-app” {
    # Inject environment variables
    env {
    MY_FEATURE_FLAG = “ON”
    }

    }
    }

    View full-size slide

  33. HASHICORP
    job “foobar” {
    task “my-app” {
    # Register with Consul for service
    # discovery and health checking
    service {
    port = “http”
    check {
    type = “tcp”
    interval = “10s”
    }
    }

    }
    }

    View full-size slide

  34. HASHICORP
    job “foobar” {
    # Make sure this task runs everywhere
    type = “system”
    # Nothing should evict our collector
    priority = 100
    task “stats-collector” {

    }
    }

    View full-size slide

  35. Terminal
    HASHICORP
    $ nomad agent -dev
    ==> Starting Nomad agent...
    ==> Nomad agent configuration:
    Atlas:
    Client: true
    Log Level: DEBUG
    Region: global (DC: dc1)
    Server: true
    ==> Nomad agent started! Log data will stream in below:
    [INFO] serf: EventMemberJoin: nomad.global 127.0.0.1
    [INFO] nomad: starting 4 scheduling worker(s) for [service batch _core]
    [INFO] raft: Node at 127.0.0.1:4647 [Follower] entering Follower state
    [INFO] nomad: adding server nomad.global (Addr: 127.0.0.1:4647) (DC: dc1)
    [DEBUG] client: applied fingerprints [storage arch cpu host memory]
    [DEBUG] client: available drivers [docker exec]

    View full-size slide

  36. Nomad
    HASHICORP
    Infrastructure As Code
    Declarative Jobs
    Desired State
    Emergent State

    View full-size slide

  37. HASHICORP
    Operationally Simple

    View full-size slide

  38. HASHICORP
    Client Server

    View full-size slide

  39. HASHICORP
    Built for Scale

    View full-size slide

  40. HASHICORP
    Built on Experience
    gossip consensus

    View full-size slide

  41. HASHICORP
    Built on Research
    gossip consensus

    View full-size slide

  42. HASHICORP
    Single Region Architecture
    SERVER SERVER SERVER
    CLIENT CLIENT CLIENT
    DC1 DC2 DC3
    FOLLOWER LEADER FOLLOWER
    REPLICATION
    FORWARDING
    REPLICATION
    FORWARDING
    RPC RPC RPC

    View full-size slide

  43. HASHICORP
    Multi Region Architecture
    SERVER SERVER SERVER
    FOLLOWER LEADER FOLLOWER
    REPLICATION
    FORWARDING
    REPLICATION
    REGION B
     GOSSIP
    REPLICATION REPLICATION
    FORWARDING
    REGION FORWARDING
     REGION A
    SERVER
    FOLLOWER
    SERVER SERVER
    LEADER FOLLOWER

    View full-size slide

  44. Nomad
    HASHICORP
    Region is Isolation Domain
    1-N Datacenters Per Region
    Flexibility to do 1:1 (Consul)
    Scheduling Boundary

    View full-size slide

  45. HASHICORP
    Thousands of regions
    Tens of thousands of clients per region
    Thousands of jobs per region

    View full-size slide

  46. HASHICORP
    Optimistically Concurrent

    View full-size slide

  47. HASHICORP
    Data Model

    View full-size slide

  48. HASHICORP
    Evaluations ~= State Change Event

    View full-size slide

  49. HASHICORP
    Create / Update / Delete Job
    Node Up / Node Down
    Allocation Failed

    View full-size slide

  50. HASHICORP
    “Scheduler” =
    func(Eval) => []AllocUpdates

    View full-size slide

  51. HASHICORP
    Scheduler func’s can specialize
    (Service, Batch, System, etc)

    View full-size slide

  52. HASHICORP
    Evaluation Enqueue

    View full-size slide

  53. HASHICORP
    Evaluation Dequeue

    View full-size slide

  54. HASHICORP
    Plan Generation

    View full-size slide

  55. HASHICORP
    Plan Execution

    View full-size slide

  56. HASHICORP
    External Event
    Evalua?on Crea?on
    Evalua?on Queuing
    Evalua?on Processing
    Op?mis?c Coordina?on
    State Updates

    View full-size slide

  57. HASHICORP
    Server Architecture
    Omega Class Scheduler
    Pluggable Logic
    Internal Coordination and State
    Multi-Region / Multi-Datacenter

    View full-size slide

  58. HASHICORP
    Client Architecture
    Broad OS Support
    Host Fingerprinting
    Pluggable Drivers

    View full-size slide

  59. HASHICORP
    Fingerprinting
    Operating System
    Hardware
    Applications
    Environment
    Type Examples
    Kernel, OS, Versions
    CPU, Memory, Disk
    Java, Docker, Consul
    AWS, GCE

    View full-size slide

  60. HASHICORP
    Fingerprinting
    Constrain Placement and Bin Pack

    View full-size slide

  61. HASHICORP
    Drivers
    Execute Tasks
    Provide Resource Isolation

    View full-size slide

  62. Nomad
    HASHICORP
    Workload Flexibility:
    Schedulers
    Fingerprints
    Drivers
    Job Specification

    View full-size slide

  63. Nomad
    HASHICORP
    Operational Simplicity:
    Single Binary
    No Dependencies
    Highly Available

    View full-size slide

  64. HASHICORP
    Nomad 0.1
    Released in October
    Service and Batch Scheduler
    Docker, Qemu, Exec, Java Drivers

    View full-size slide

  65. HASHICORP
    Case Study

    View full-size slide

  66. HASHICORP
    Case Study
    3 servers in NYC3
    100 clients in NYC3, SFO1, AMS2/3
    1000 Containers

    View full-size slide

  67. HASHICORP
    Case Study
    <1s to schedule
    1s to first start
    6s to 95%
    8s to 99%

    View full-size slide

  68. HASHICORP
    Nomad 0.2 - Service Workloads
    Service Discovery
    System Scheduler
    Restart Policies
    Enhanced Constraints

    View full-size slide

  69. HASHICORP
    Nomad 0.3 - Batch Workloads
    Cron
    Job Queuing
    Latency-Aware Scheduling

    View full-size slide

  70. Nomad
    HASHICORP
    Cluster Scheduler
    Easily Deploy Applications
    Job Specification

    View full-size slide

  71. Nomad
    HASHICORP
    Higher Resource Utilization
    Decouple Work from Resources
    Better Quality of Service

    View full-size slide

  72. HASHICORP
    Thanks!
    Q/A

    View full-size slide