Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Chef and Sensu - Delightful Monitoring

Chef and Sensu - Delightful Monitoring

Chef makes it easier for organizations to go fast–enabling them to continuously deliver new services and features to their customers. With Chef, you can quickly provision infrastructure and deploy applications, but can your monitoring keep up? Or will it keep you up at night?

Sean will demonstrate how Sensu 2.0 is designed to monitor Chef driven infrastructure. Sensu 2.0 is the next release of the open source monitoring framework, rewritten in Go, with new capabilities and reduced operational overhead. Sean will show how the Sensu Chef cookbook can be used to deploy Sensu and manage service checks, metric collection, and alert notifications. He will go over monitoring best practices that lead to delightful monitoring with Chef and Sensu.

portertech

May 24, 2018
Tweet

More Decks by portertech

Other Decks in Technology

Transcript

  1. Chef & Sensu –
    Delightful
    Monitoring
    Sean Porter
    CTO for Sensu

    View Slide

  2. ● Sean Porter
    ● Author of Sensu
    ● CTO for Sensu Inc.
    ● @portertech

    View Slide

  3. We solve our problems
    with technology.

    View Slide

  4. We create new problems
    with technology.

    View Slide

  5. Illustration by Fredrik Skarstedt

    View Slide

  6. HOST
    APP APP APP APP

    View Slide

  7. HOST
    VM VM
    APP APP APP APP

    View Slide

  8. HOST
    VM VM
    APP APP APP APP

    View Slide

  9. COMPLEXITY
    TIME

    View Slide

  10. Apps can span any
    number of technologies.

    View Slide

  11. View Slide

  12. View Slide

  13. View Slide

  14. Design

    View Slide

  15. View Slide

  16. View Slide

  17. View Slide

  18. View Slide

  19. View Slide

  20. View Slide

  21. View Slide

  22. View Slide

  23. View Slide

  24. View Slide

  25. {
    timestamp: 1516663186,
    entity: { … },
    check: { … },
    metrics: { ... }
    }

    View Slide

  26. View Slide

  27. View Slide

  28. View Slide

  29. View Slide

  30. View Slide

  31. ● Backend REST API
    ● RBAC (multi-tenancy)
    ○ Organization
    ○ Environment
    Configuration

    View Slide

  32. View Slide

  33. ● sensuctl (CLI tool)
    ● Dashboard
    ● Chef
    Configuration

    View Slide

  34. The Cookbook
    The new “sensu-go” Chef Cookbook.

    View Slide

  35. ● sensu/sensu-go-chef
    ● Custom resources
    ○ Sensu services
    ○ Sensu object config
    Chef Cookbook

    View Slide

  36. sensu_backend "chefconf" do
    version "2.0.0.beta.1-1"
    repo "sensu/beta"
    config_home "/etc/sensu"
    config {
    "state-dir": "/var/lib/sensu"
    }
    end

    View Slide

  37. sensu_agent "chefconf" do
    version "2.0.0.beta.1-1"
    repo "sensu/beta"
    config_home "/etc/sensu"
    config {
    "organization": "acme",
    "environment": "production",
    "backend-url": ["ws://backend:8081"],
    "subscriptions": ["mysql"]
    }
    end

    View Slide

  38. View Slide

  39. sensu_ctl "default" do
    version "2.0.0.beta.1-1"
    repo "sensu/beta"
    username secrets["username"]
    password secrets["password"]
    backend_url "ws://backend:8081"
    end

    View Slide

  40. 3 Methods
    The three methods of data collection with Sensu
    and example Chef resources.

    View Slide

  41. 1. Service Checks

    View Slide

  42. ● Script
    ● STDOUT (message and data)
    ● Exit code (severity)
    Service Checks

    View Slide

  43. check_mysql -H localhost -P 3360
    Uptime: 798 Threads: 1 Questions: 5 Slow queries: 0
    Opens: 107 Flush tables: 1 Open tables: 26 Queries per
    second avg: 0.006|Connections=9c;;; Open_files=6;;;
    Open_tables=27;;; Qcache_free_memory=16760152;;;
    Qcache_hits=0c;;; Qcache_inserts=0c;;;
    Qcache_lowmem_prunes=0c;;; Qcache_not_cached=1c;;;
    Qcache_queries_in_cache=0;;; Queries=6c;;; Questions=4c;;;
    Table_locks_waited=0c;;; Threads_connected=1;;;
    Threads_running=1;;; Uptime=798c;;;
    Exit 0 (OK)
    Service Checks

    View Slide

  44. check_mysql -H localhost -P 3360
    Can't connect to MySQL server on 'localhost'
    Exit 2 (CRITICAL)
    Service Checks

    View Slide

  45. {
    timestamp: 1516663186,
    entity: { … },
    check: {
    command: "check_mysql -H ...",
    output: "Can’t connect ... ",
    status: 2,

    },
    metrics: { ... }
    }

    View Slide

  46. Symptoms

    View Slide

  47. View Slide

  48. sensu_check "mysql" do
    command "check_mysql -H ..."
    subscriptions ["mysql"]
    interval 20
    timeout 10
    handlers [ … ]
    end

    View Slide

  49. check_mysql -H localhost -P 3360
    Uptime: 798 Threads: 1 Questions: 5 Slow queries: 0
    Opens: 107 Flush tables: 1 Open tables: 26 Queries per
    second avg: 0.006|Connections=9c;;; Open_files=6;;;
    Open_tables=27;;; Qcache_free_memory=16760152;;;
    Qcache_hits=0c;;; Qcache_inserts=0c;;;
    Qcache_lowmem_prunes=0c;;; Qcache_not_cached=1c;;;
    Qcache_queries_in_cache=0;;; Queries=6c;;; Questions=4c;;;
    Table_locks_waited=0c;;; Threads_connected=1;;;
    Threads_running=1;;; Uptime=798c;;;
    Exit 0 (OK)
    Service Checks

    View Slide

  50. sensu_check "mysql" do
    command "check_mysql -H ..."
    subscriptions ["mysql"]
    interval 20
    timeout 10
    handlers [ … ]
    output_metric_format "nagios_perfdata"
    output_metric_handlers [ … ]
    end

    View Slide

  51. {
    timestamp: 1516663186,
    entity: { … },
    check: { … },
    metrics: {
    handlers: [ … ],
    points: [{
    name: "mysql.connections",
    value: 9,
    tags: [ … ]
    }, … ]
    }
    }

    View Slide

  52. View Slide

  53. View Slide

  54. sensu_asset "mysql-plugins" do
    url "https://…/mysql-plugins.tar.gz"
    sha512 "4e6f621ebe652d3b0ba5d4dea ..."
    organization "acme"
    end

    View Slide

  55. sensu_check "mysql" do
    command "check_mysql -H ..."
    runtime_assets ["mysql-plugins"]
    subscriptions ["mysql"]
    interval 20
    timeout 10
    handlers [ … ]
    output_metric_format "nagios_perfdata"
    output_metric_handlers [ … ]
    end

    View Slide

  56. ● Simple
    ● Accessible
    ● Shareable
    ● Legacy
    Service Checks

    View Slide

  57. 2. Events API

    View Slide

  58. ● REST API (Agent & Backend)
    ● Entity management
    ● External checks
    ● Metrics
    Events API

    View Slide

  59. POST /events
    {
    timestamp: 1516663186,
    entity: { … },
    check: { … },
    metrics: { ... }
    }

    View Slide

  60. {
    timestamp: 1516663186,
    entity: {
    name: "leviathan",
    class: "application",
    tags: [ … ],
    ...
    },
    check: { … },
    metrics: { … }
    }

    View Slide

  61. {
    timestamp: 1516663186,
    entity: { … },
    check: {
    output: "Backup failed ... ",
    status: 2,
    ttl: 21600,

    },
    metrics: { … }
    }

    View Slide

  62. {
    timestamp: 1516663186,
    entity: { … },
    check: { … },
    metrics: {
    handlers: [ … ],
    points: [{
    name: "logins",
    value: 2,
    tags: [ … ]
    }, … ]
    }
    }

    View Slide

  63. 3. StatsD

    View Slide

  64. ● Agent listeners (TCP & UDP)
    ● Stats aggregation
    ● Gauges, counters, etc.
    ● Protocol enhancements (tags)
    StatsD

    View Slide

  65. :|c[|@]
    UDP localhost 8125

    View Slide

  66. {
    timestamp: 1516663186,
    entity: { … },
    check: { … },
    metrics: {
    handlers: [ … ],
    points: [{
    name: "request_count",
    value: 42,
    tags: [ … ]
    }, … ]
    }
    }

    View Slide

  67. ● Service checks
    ● Events API
    ● StatsD
    3 Methods Recap

    View Slide

  68. Event Processing
    Do things with the collected data.

    View Slide

  69. sensu_handler "slack" do
    type "pipe"
    command "handler-slack --webhook-url http..."
    timeout 10
    filters ["is_incident", "not_silenced"]
    end

    View Slide

  70. sensu_handler "influxdb" do
    type "pipe"
    command "handler-influx -a http..."
    timeout 10
    filters ["has_metrics"]
    end

    View Slide

  71. sensu_check "mysql" do
    command "check_mysql -H ..."
    runtime_assets ["mysql-plugins"]
    subscriptions ["mysql"]
    interval 20
    timeout 10
    handlers ["slack"]
    output_metric_format "nagios_perfdata"
    output_metric_handlers ["influxdb"]
    end

    View Slide

  72. COMPLEXITY
    TIME

    View Slide

  73. View Slide

  74. +

    View Slide

  75. Thank You
    Co-Founder & CTO, Sensu Inc.
    Sean Porter (@portertech)
    ChefConf 2018

    View Slide