Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Consul as a Monitoring Service

Consul as a Monitoring Service

There are two sides to monitoring – exposing problems with alerts and acting upon those alerts to find solutions to the exposed problem. For exposing problems, users can define any script for Consul to intelligently check and report the health status of all nodes in a cluster. These scripts could be as simple as returning a 200, or as complex as querying the load and query response time on a database server. Other monitoring solutions already provide such functionality, but where Consul shines is in the second half of monitoring – automatic intervention to find solutions to problems without human operators.

Since Consul has built-in health checking, it not only notifies operators of a node or service failure, but automatically routes traffic away from unhealthy nodes. Consul is also able to re-route traffic back to a troubled node, once the node reports it is healthy again. In this way Consul pushes the existing paradigms of monitoring, making it much more than a simple notification system. Rather it surfaces problems and solves them without human intervention. Don’t worry about that pager going off in the middle of the night – rest easy with Consul.

Seth Vargo

May 29, 2015
Tweet

More Decks by Seth Vargo

Other Decks in Technology

Transcript

  1. CONSUL AS A MONITORING SERVICE

    View full-size slide

  2. SETH VARGO
    @sethvargo

    View full-size slide

  3. SERVICE
    ORIENTED
    ARCHITECTURE

    View full-size slide

  4. SOA  PRIMER
    Autonomous
    Limited Scope
    Loose Coupling

    View full-size slide

  5. ORDER
    PROCESSING
    WEB APP
    ORDER
    HISTORY
    FORECASTING

    View full-size slide

  6. ORDER
    PROCESSING
    WEB APP
    DISCOVERY
    Which nodes are part of "order processing"?

    View full-size slide

  7. ORDER
    PROCESSING
    WEB APP
    LOAD  BALANCING
    How to ensure request leveling across providers?
    NODE 1
    NODE 2
    NODE N

    View full-size slide

  8. ORDER
    PROCESSING
    WEB APP
    ANTI-­‐PATTERN
    Load Balancer is a Single Point of Failure (SPOF)
    NODE 1
    NODE 2
    NODE N
    LOAD
    BALANCER

    View full-size slide

  9. ORDER
    PROCESSING
    WEB APP
    HEALTH  CHECKING
    How to avoid routing to unhealthy hosts?
    NODE 1
    NODE 2
    NODE 3
    LOAD
    BALANCER

    View full-size slide

  10. WEB APP
    CONFIGURATION
    How to efficiently push dynamic configuration?
    WEB 1
    WEB 2
    WEB N
    maintenance: false
    feature_a: true
    role: "web"

    View full-size slide

  11. SERVICE  
    DISCOVERY
    LOAD  
    BALANCING
    HEALTH  
    CHECKING
    KEY-­‐VALUE  
    CONFIGURATION
    4  BASIC  PROBLEMS

    View full-size slide

  12. ZOOKEEPER ETCD SENSU SMART  STACK
    EXISTING  "SOLUTIONS"
    http://consul.io/intro/vs

    View full-size slide

  13. Service Discovery
    HTTP + DNS

    View full-size slide

  14. demo  master dig web-frontend.service.consul

    View full-size slide

  15. ; <<>> DiG 9.8.3-P1 <<>> web-frontend.service.consul. ANY
    ;; global options: +cmd
    ;; Got answer:
    ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 29981
    ;; flags: qr aa rd ra; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 0
    ;; QUESTION SECTION:
    ;web-frontend.service.consul. IN ANY
    ;; ANSWER SECTION:
    web-frontend.service.consul. 0 IN A 10.0.3.83
    web-frontend.service.consul. 0 IN A 10.0.1.109
    demo  master dig web-frontend.service.consul

    View full-size slide

  16. Datacenter Aware

    View full-size slide

  17. CLIENT CLIENT CLIENT CLIENT CLIENT CLIENT
    SERVER SERVER SERVER
    REPLICATION REPLICATION
    RPC
    RPC
    LAN  GOSSIP

    View full-size slide

  18. CLIENT CLIENT CLIENT CLIENT CLIENT CLIENT
    SERVER SERVER SERVER
    REPLICATION REPLICATION
    RPC
    RPC
    LAN  GOSSIP
    SERVER
    SERVER SERVER
    REPLICATION REPLICATION

    View full-size slide

  19. CLIENT CLIENT CLIENT CLIENT CLIENT CLIENT
    SERVER SERVER SERVER
    REPLICATION REPLICATION
    RPC
    RPC
    LAN  GOSSIP
    SERVER
    SERVER SERVER
    REPLICATION REPLICATION
    WAN  GOSSIP

    View full-size slide

  20. Host & Service
    Level Health Checks

    View full-size slide

  21. >
    listen http-in
    bind *:8000
    server web-0 127.0.0.1:80
    server web-1 127.0.0.1:80
    server web-2 127.0.0.1:80
    demo  master consul-template -template="example.ctmpl" -dry
    demo  master

    View full-size slide

  22. >
    listen http-in
    bind *:8000
    server web-0 127.0.0.1:80
    server web-1 127.0.0.1:80
    server web-2 127.0.0.1:80
    demo  master consul-template -template="example.ctmpl" -dry
    demo  master sudo stop webserver

    View full-size slide

  23. >
    listen http-in
    bind *:8000
    server web-1 127.0.0.1:80
    server web-2 127.0.0.1:80
    demo  master consul-template -template="example.ctmpl" -dry
    demo  master sudo stop webserver

    View full-size slide

  24. >
    listen http-in
    bind *:8000
    server web-0 127.0.0.1:80
    server web-1 127.0.0.1:80
    server web-2 127.0.0.1:80
    demo  master consul-template -template="example.ctmpl" -dry
    demo  master sudo start webserver

    View full-size slide

  25. K/V Store
    HTTP API

    View full-size slide

  26. true
    demo  master curl -X PUT -d 'bar' http://localhost:8500/v1/kv/foo

    View full-size slide

  27. true
    demo  master curl -X PUT -d 'bar' http://localhost:8500/v1/kv/foo
    [
    {
    "CreateIndex": 100,
    "ModifyIndex": 200,
    "Key": "foo",
    "Flags": 0,
    "Value": "YmFy"
    }
    ]
    demo  master curl http://localhost:8500/v1/kv/foo

    View full-size slide

  28. TRUSTED  BY

    View full-size slide

  29. HEALTH
    CHECKS

    View full-size slide

  30. WHAT  IS  A  CHECK?
    Any command that returns an exit code

    View full-size slide

  31. WHAT  IS  A  CHECK?
    Any command that returns an exit code
    0
    PASSING
    1
    WARNING
    __
    FAILING

    View full-size slide

  32. WHAT  IS  A  CHECK?
    Output is captured as a "note" for inspection
    curl: (7) Failed to connect to 127.0.0.1 port 4455: Connection refused
    $ curl http://127.0.0.1:4455/_health

    View full-size slide

  33. CREATING  A  CHECK
    Use a custom script
    {
    "check": {
    "id": "mem-util",
    "name": "Memory utilization",
    "script": "/usr/local/bin/check_mem.py",
    "interval": "10s"
    }
    }

    View full-size slide

  34. CREATING  A  CHECK
    Use a built-in check type
    {
    "check": {
    "id": "api",
    "name": "HTTP API on port 4455",
    "http": "http://localhost:4455/_health",
    "interval": "10s",
    "timeout": "1s"
    }
    }

    View full-size slide

  35. MONITORING
    SERVICE
    TRADITIONAL  MONITORING
    Pushes information into a silo
    WEB 1
    WEB 2
    WEB N

    View full-size slide

  36. MONITORING
    SERVICE
    TRADITIONAL  MONITORING
    Pushes information into a silo
    WEB 1
    WEB 2
    WEB N

    View full-size slide

  37. MONITORING
    SERVICE
    TRADITIONAL  MONITORING
    Pushes information into a silo
    WEB 1
    WEB 2
    WEB N

    View full-size slide

  38. MONITORING
    SERVICE
    TRADITIONAL  MONITORING
    Pushes information into a silo
    WEB 1
    WEB 2
    WEB N

    View full-size slide

  39. MONITORING
    SERVICE
    TRADITIONAL  MONITORING
    Pushes information into a silo
    WEB 1
    WEB 2
    WEB N
    U

    View full-size slide

  40. MONITORING
    SERVICE
    TRADITIONAL  MONITORING
    Pushes information into a silo
    WEB 1
    WEB 2
    WEB N
    U
    F
    F

    View full-size slide

  41. MONITORING
    SERVICE
    TRADITIONAL  MONITORING
    Pushes information into a silo
    WEB 1
    WEB 2
    WEB N
    U
    F
    F

    View full-size slide

  42. CONSUL
    CONSUL  MONITORING
    Removes unhealthy nodes from service discovery layer
    WEB 1
    WEB 2
    WEB N

    View full-size slide

  43. CONSUL
    CONSUL  MONITORING
    Removes unhealthy nodes from service discovery layer
    WEB 1
    WEB 2
    WEB N

    View full-size slide

  44. CONSUL
    CONSUL  MONITORING
    Removes unhealthy nodes from service discovery layer
    WEB 1
    WEB 2
    WEB N
    dig web.service.consul
    10.0.1.4
    10.0.1.5
    10.0.1.6

    View full-size slide

  45. CONSUL
    CONSUL  MONITORING
    Removes unhealthy nodes from service discovery layer
    WEB 1
    WEB 2
    WEB N
    dig web.service.consul
    10.0.1.4
    10.0.1.5
    10.0.1.6

    View full-size slide

  46. CONSUL
    CONSUL  MONITORING
    Removes unhealthy nodes from service discovery layer
    WEB 1
    WEB 2
    WEB N
    dig web.service.consul
    10.0.1.5
    10.0.1.6

    View full-size slide

  47. CONSUL
    CONSUL  MONITORING
    Removes unhealthy nodes from service discovery layer
    WEB 1
    WEB 2
    WEB N
    dig web.service.consul
    10.0.1.5
    10.0.1.6
    host: web.service.consul

    View full-size slide

  48. CONSUL
    CONSUL  MONITORING
    Removes unhealthy nodes from service discovery layer
    WEB 1
    WEB 2
    WEB N
    dig web.service.consul
    10.0.1.5
    10.0.1.6
    host: web.service.consul

    View full-size slide

  49. CONSUL
    CONSUL  MONITORING
    Removes unhealthy nodes from service discovery layer
    WEB 1
    WEB 2
    WEB N
    dig web.service.consul
    10.0.1.5
    10.0.1.6
    host: web.service.consul

    View full-size slide

  50. CONSUL
    CONSUL  MONITORING
    Removes unhealthy nodes from service discovery layer
    WEB 1
    WEB 2
    WEB N
    dig web.service.consul
    10.0.1.5
    10.0.1.6
    host: web.service.consul

    View full-size slide

  51. CONSUL
    CONSUL  MONITORING
    Removes unhealthy nodes from service discovery layer
    WEB 1
    WEB 2
    WEB N
    dig web.service.consul
    10.0.1.4
    10.0.1.5
    10.0.1.6
    host: web.service.consul

    View full-size slide

  52. CONSUL  MONITORING
    Unhealthy nodes are not returned from DNS queries
    dig web.service.consul web-01, web-02, web-03

    View full-size slide

  53. CONSUL  MONITORING
    Unhealthy nodes are not returned from HTTP API
    curl /v1/services/web web-01, web-02, web-03

    View full-size slide

  54. CONSUL  LOCK
    Allows for a new kind of "HA"
    demo  master consul lock [options] prefix child...

    View full-size slide

  55. CONSUL  LOCK
    Making standby HA much simpler
    CONSUL
    VAULT 1
    VAULT 2
    VAULT 3

    View full-size slide

  56. CONSUL  LOCK
    Making standby HA much simpler
    CONSUL
    VAULT 1
    VAULT 2
    VAULT 3
    L
    L

    View full-size slide

  57. CONSUL  LOCK
    Making standby HA much simpler
    CONSUL
    VAULT 1
    VAULT 2
    VAULT 3
    L

    View full-size slide

  58. CONSUL  LOCK
    Making standby HA much simpler
    CONSUL
    VAULT 1
    VAULT 2
    VAULT 3
    L
    LEADER  ELECTION

    View full-size slide

  59. CONSUL  LOCK
    Making standby HA much simpler
    CONSUL
    VAULT 1
    VAULT 2
    VAULT 3
    L
    GET /secret/foo
    REQUEST

    View full-size slide

  60. CONSUL  LOCK
    Making standby HA much simpler
    CONSUL
    VAULT 1
    VAULT 2
    VAULT 3
    L
    GET /secret/foo
    REQUEST

    View full-size slide

  61. CONSUL  LOCK
    Making standby HA much simpler
    CONSUL
    VAULT 1
    VAULT 2
    VAULT 3
    L
    GET /secret/foo
    REQUEST

    View full-size slide

  62. CONSUL  LOCK
    Making standby HA much simpler
    CONSUL
    VAULT 1
    VAULT 2
    VAULT 3
    L

    View full-size slide

  63. CONSUL  LOCK
    Making standby HA much simpler
    CONSUL
    VAULT 1
    VAULT 2
    VAULT 3
    l

    View full-size slide

  64. CONSUL  LOCK
    Making standby HA much simpler
    CONSUL
    VAULT 1
    VAULT 2
    VAULT 3
    L

    View full-size slide

  65. CONSUL  LOCK
    Making standby HA much simpler
    CONSUL
    VAULT 1
    VAULT 2
    VAULT 3
    L

    View full-size slide

  66. CONSUL  LOCK
    Making standby HA much simpler
    VAULT 1
    VAULT 2
    VAULT 3
    GET /secret/foo
    REQUEST
    CONSUL
    L

    View full-size slide

  67. CONSUL  LOCK
    Making standby HA much simpler
    VAULT 1
    VAULT 2
    VAULT 3
    GET /secret/foo
    REQUEST
    CONSUL
    L

    View full-size slide

  68. CONSUL  LOCK
    Solves the "exactly one of these must always be running" problem

    View full-size slide

  69. CONSUL  LOCK
    Also great as a semaphore - rolling restarts

    View full-size slide

  70. MONITORING
    SERVICE
    TRADITIONAL  MONITORING
    Notifies/polls all statuses
    WEB 1
    WEB 2
    WEB N
    "I'm healthy"
    "Good, thanks for asking!"

    View full-size slide

  71. MONITORING
    SERVICE
    TRADITIONAL  MONITORING
    Notifies/polls all statuses
    WEB 1
    WEB 2
    WEB 1,000
    1,000'S OF
    REQUESTS

    View full-size slide

  72. MONITORING
    SERVICE
    TRADITIONAL  MONITORING
    Notifies/polls all statuses
    WEB 1
    WEB 2
    WEB 1,000
    1,000'S OF
    REQUESTS
    HA

    View full-size slide

  73. CONSUL
    WEB 1
    WEB 2
    WEB N
    My status has changed
    CONSUL  MONITORING
    Notifies on status changes

    View full-size slide

  74. CONSUL
    CONSUL  MONITORING
    Notifies on status changes
    WEB 1
    WEB 2
    WEB 1,000
    10'S OF
    REQUESTS

    View full-size slide

  75. SERVICE  
    DISCOVERY
    LOAD  
    BALANCING
    HEALTH  
    CHECKING
    KEY-­‐VALUE  
    CONFIGURATION
    SOLVES  4  BASIC  PROBLEMS

    View full-size slide

  76. 9
    H
    G L
    RESPONSIVE
    LEADER  
    ELECTION
    SEMAPHORE  
    LOCKING
    SCALABLE
    SOLVES  4  MORE  PROBLEMS

    View full-size slide

  77. SETH VARGO
    @sethvargo
    QUESTIONS?

    View full-size slide