Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Nomad Overview (Cloud Native PDX Meetup)

Nomad Overview (Cloud Native PDX Meetup)

An overview of Nomad, a general purpose cluster scheduler that can be used to build a multi-regional runtime environment for a diverse set of workloads including non-containerized applications.

Presented at the Cloud Native PDX Meetup on 2 August 2018.

christiekoehler

August 02, 2018
Tweet

More Decks by christiekoehler

Other Decks in Technology

Transcript

  1. Copyright © 2018 HashiCorp @christi3k ▪ HashiCorp overview ▪ What

    is Nomad? ▪ Features: – Easily Deploy Applications – Operationally Simple – Flexible Workloads – Built for Scale ▪ Integration with HashiCorp Suite ▪ Resilient Infrastructure ▪ Questions Agenda 3
  2. Copyright © 2018 HashiCorp @christi3k ▪ Founded in 2012 by

    Mitchell Hashimoto (@mitchellh) and Armon Dadgar (@armon). ▪ Cloud Infrastructure Automation ▪ Consistent workflows to provision, secure, connect, and run any infrastructure for any application. ▪ Remote-first with lots of Portland employees and we're hiring! HashiCorp Overview 5
  3. Copyright © 2018 HashiCorp @christi3k The foundation that guides our

    vision, roadmap, and product design. ▪ Workflows, not technologies ▪ Simple, Modular, Composable https://www.hashicorp.com/tao-of-hashicorp Tao of HashiCorp 6
  4. Copyright © 2018 HashiCorp @christi3k HashiCorp Suite 7 CONNECT RUN

    SECURE PROVISION Infrastructure & applications Applications Infrastructure & applications Infrastructure Consul Nomad Terraform Vault Packer Vagrant Consul Enterprise Nomad Enterprise Vault Enterprise Terraform Enterprise FOR TEAMS OSS TOOL SUITE PRODUCT SUITE
  5. Copyright © 2018 HashiCorp @christi3k ▪ Simple and flexible scheduler

    ▪ Designed for long running services and batch jobs. ▪ Can schedule virtualized, containerized, and standalone applications. ▪ Multi-cluster, Multi-datacenter, and multi-region deployments out-of-the- box. ▪ Cloud agnostic. Nomad is... 9
  6. Copyright © 2018 HashiCorp @christi3k Schedulers and orchestration tools leverage

    containerization to: ▪ Improve deployment workflows ▪ reduce tight coupling between operators and developers ▪ increase resilience of running applications ▪ enable more efficient use of computing resources Containerization brought some improvements... 11
  7. Copyright © 2018 HashiCorp @christi3k ▪ overly complex ▪ difficult

    to deploy and operate ▪ inflexible regarding packaging and workload ▪ difficult to scale and deploy across regions ▪ difficult to grasp mental model / steep learning curve But also some drawbacks... 12
  8. Copyright © 2018 HashiCorp @christi3k ▪ Easily deploy applications ▪

    Operationally Simple ▪ Flexible Workloads ▪ Built for Scale Nomad's Features 13
  9. Copyright © 2018 HashiCorp @christi3k Deployment Workflow 15 User Nomad

    Servers Submits Job Nomad Clients Deploy App Skip (Busy) Deploy App
  10. Copyright © 2018 HashiCorp @christi3k 16 Declarative Job Specification job

    "redis" { datacenters = ["us-east-1"] task "redis" { driver = "docker" config { image = "redis:latest" } resources { cpu = 500 # Mhz memory = 256 # MB network { mbits = 10 port “redis" {} } } } } example.nomad
  11. Copyright © 2018 HashiCorp @christi3k Single Binary - Client/Server Deployment

    Topology 19 Client Server nomad -client nomad -server
  12. Copyright © 2018 HashiCorp @christi3k Multi-Datacenter and Multi-Region Aware 22

    Single Region Architecture SERVER SERVER SERVER CLIENT CLIENT CLIENT DC1 DC2 DC3 FOLLOWER LEADER FOLLOWER REPLICATION FORWARDING REPLICATION FORWARDING RPC RPC RPC
  13. Copyright © 2018 HashiCorp @christi3k Multi-Datacenter and Multi-Region Aware 23

    Multi Region Architecture SERVER SERVER SERVER FOLLOWER LEADER FOLLOWER REPLICATION FORWARDING REPLICATION REGION B GOSSIP REPLICATION REPLICATION FORWARDING REGION FORWARDING REGION A SERVER FOLLOWER SERVER SERVER LEADER FOLLOWER
  14. Copyright © 2018 HashiCorp @christi3k Extensible support for task drivers

    25 OS Workloads Drivers Windows Long Running Service Docker / Rkt / LXC Linux Short Lived Batch Qemu / KVM BSD Periodic Cron “exec” cgroups+chroot Solaris System Agents Static Binaries / Fat JARs
  15. Copyright © 2018 HashiCorp @christi3k Host Fingerprinting 26 Type Examples

    Operating System Kernel, OS, Version Hardware CPU, Memory, Disk Apps (Capabilities) Docker, Java, Consul Environment AWS, GCE
  16. Copyright © 2018 HashiCorp @christi3k Host Fingerprinting 27 “Task Requires

    Linux, Docker, and PCI-Compliant Hardware” expressed as constraints in job file “Task needs 512MB RAM and 1 Core” expressed as resource in job file
  17. Copyright © 2018 HashiCorp @christi3k https://www.hashicorp.com/c1m ▪ 1,000 Tasks per

    Job ▪ 1,000 Jobs ▪ 5,000 Hosts on GCE ▪ 1,000,000 Containers Million Container Challenge 29
  18. Copyright © 2018 HashiCorp @christi3k Million Container Challenge 30 A

    cluster of five Nomad servers scheduled one million containers in less than five minutes, a rate of 3,750 containers per second.
  19. Copyright © 2018 HashiCorp @christi3k Citadel 33 2nd Largest Hedge

    Fund 18K Cores 5 Hours 2,200 Containers/second
  20. Copyright © 2018 HashiCorp @christi3k CircleCI 34 7+ Million Builds

    a Month Sustain 400-1000 Jobs a Minute Great Talk By Danielle Tomlinson: https://youtu.be/b8NQO_vFAYo
  21. Copyright © 2018 HashiCorp @christi3k ▪ Native Vault Integration ▪

    No Secrets in Jobs ▪ No Secrets on Client Disk ▪ Minimize Trust https://www.nomadproject.io/docs/job-specification/vault.html HashiCorp Vault integration 36
  22. Copyright © 2018 HashiCorp @christi3k 37 Vault Integration job “my-app"

    { … task “my-app" { vault { policies = [“my-app-role”] } } }
  23. Copyright © 2018 HashiCorp @christi3k ▪ Register services and health

    checks ▪ Dynamic configuration (consul-template) ▪ Automatic bootstrapping HashiCorp Consul integration 38
  24. Copyright © 2018 HashiCorp @christi3k 39 Consul Integration job “my_job"

    { group "example" { task "server" { service { tags = ["leader", "mysql"] port = “db" check { type = "script" name = "check_table" command = "/usr/local/bin/check_mysql_table_status" args = ["--verbose"] interval = "60s" timeout = “5s" } template { source = "local/redis.conf.tpl" destination = "local/redis.conf" change_mode = "signal" change_signal = "SIGINT" }
  25. Copyright © 2018 HashiCorp @christi3k 41 restart stanza Restarting failed

    tasks job "docs" { group "example" { restart { attempts = 3 delay = "30s" } } }
  26. Copyright © 2018 HashiCorp @christi3k 42 check_restart stanza Restarting unhealthy

    tasks job "mysql" { group "mysqld" { ... task "server" { service { ... check { type = "script" name = "check_table" command = "/usr/local/bin/check_mysql_table_status" args = ["--verbose"] interval = "60s" timeout = "5s" check_restart { limit = 3 grace = "90s" ignore_warnings = false
  27. Copyright © 2018 HashiCorp @christi3k 43 reschedule stanza Rescheduling tasks

    job "docs" { group "example" { reschedule { attempts = 15 interval = "1hr" delay = "30s" delay_function = "exponential" max_delay = "120s" unlimited = false } } }
  28. Copyright © 2018 HashiCorp @christi3k 44 update stanza Updates &

    Deployments job "docs" { update { max_parallel = 3 health_check = "checks" min_healthy_time = "10s" healthy_deadline = "5m" progress_deadline = "10m" auto_revert = true canary = 1 stagger = "30s" } }
  29. Copyright © 2018 HashiCorp @christi3k 45 update stanza Canary Deployments

    job "docs" { update { canary = 1 max_parallel = 3 } }
  30. Copyright © 2018 HashiCorp @christi3k 46 update stanza Blue-Green Deployments

    group "api-server" { count = 3 update { canary = 3 max_parallel = 3 } ... }
  31. Copyright © 2018 HashiCorp @christi3k 47 migrate stanza Node draining

    job "docs" { migrate { max_parallel = 1 health_check = "checks" min_healthy_time = "10s" healthy_deadline = "5m" } }