Consul as a Monitoring Service

Consul as a Monitoring Service

There are two sides to monitoring – exposing problems with alerts and acting upon those alerts to find solutions to the exposed problem. For exposing problems, users can define any script for Consul to intelligently check and report the health status of all nodes in a cluster. These scripts could be as simple as returning a 200, or as complex as querying the load and query response time on a database server. Other monitoring solutions already provide such functionality, but where Consul shines is in the second half of monitoring – automatic intervention to find solutions to problems without human operators.

Since Consul has built-in health checking, it not only notifies operators of a node or service failure, but automatically routes traffic away from unhealthy nodes. Consul is also able to re-route traffic back to a troubled node, once the node reports it is healthy again. In this way Consul pushes the existing paradigms of monitoring, making it much more than a simple notification system. Rather it surfaces problems and solves them without human intervention. Don’t worry about that pager going off in the middle of the night – rest easy with Consul.

502828deee7e3b38ca1e527dded8a1a9?s=128

Seth Vargo

May 29, 2015
Tweet

Transcript

  1. CONSUL AS A MONITORING SERVICE

  2. SETH VARGO @sethvargo

  3. None
  4. SERVICE ORIENTED ARCHITECTURE

  5. SOA  PRIMER Autonomous Limited Scope Loose Coupling

  6. ORDER PROCESSING WEB APP ORDER HISTORY FORECASTING

  7. ORDER PROCESSING WEB APP DISCOVERY Which nodes are part of

    "order processing"?
  8. ORDER PROCESSING WEB APP LOAD  BALANCING How to ensure request

    leveling across providers? NODE 1 NODE 2 NODE N
  9. ORDER PROCESSING WEB APP ANTI-­‐PATTERN Load Balancer is a Single

    Point of Failure (SPOF) NODE 1 NODE 2 NODE N LOAD BALANCER
  10. ORDER PROCESSING WEB APP HEALTH  CHECKING How to avoid routing

    to unhealthy hosts? NODE 1 NODE 2 NODE 3 LOAD BALANCER
  11. WEB APP CONFIGURATION How to efficiently push dynamic configuration? WEB

    1 WEB 2 WEB N maintenance: false feature_a: true role: "web"
  12. SERVICE   DISCOVERY LOAD   BALANCING HEALTH   CHECKING KEY-­‐VALUE

      CONFIGURATION 4  BASIC  PROBLEMS
  13. ZOOKEEPER ETCD SENSU SMART  STACK EXISTING  "SOLUTIONS" http://consul.io/intro/vs

  14. CONSUL

  15. Service Discovery HTTP + DNS

  16. demo  master dig web-frontend.service.consul

  17. ; <<>> DiG 9.8.3-P1 <<>> web-frontend.service.consul. ANY ;; global options:

    +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 29981 ;; flags: qr aa rd ra; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 0 ;; QUESTION SECTION: ;web-frontend.service.consul. IN ANY ;; ANSWER SECTION: web-frontend.service.consul. 0 IN A 10.0.3.83 web-frontend.service.consul. 0 IN A 10.0.1.109 demo  master dig web-frontend.service.consul
  18. Datacenter Aware

  19. CLIENT CLIENT CLIENT CLIENT CLIENT CLIENT SERVER SERVER SERVER REPLICATION

    REPLICATION RPC RPC LAN  GOSSIP
  20. CLIENT CLIENT CLIENT CLIENT CLIENT CLIENT SERVER SERVER SERVER REPLICATION

    REPLICATION RPC RPC LAN  GOSSIP SERVER SERVER SERVER REPLICATION REPLICATION
  21. CLIENT CLIENT CLIENT CLIENT CLIENT CLIENT SERVER SERVER SERVER REPLICATION

    REPLICATION RPC RPC LAN  GOSSIP SERVER SERVER SERVER REPLICATION REPLICATION WAN  GOSSIP
  22. Host & Service Level Health Checks

  23. > listen http-in bind *:8000 server web-0 127.0.0.1:80 server web-1

    127.0.0.1:80 server web-2 127.0.0.1:80 demo  master consul-template -template="example.ctmpl" -dry demo  master
  24. > listen http-in bind *:8000 server web-0 127.0.0.1:80 server web-1

    127.0.0.1:80 server web-2 127.0.0.1:80 demo  master consul-template -template="example.ctmpl" -dry demo  master sudo stop webserver
  25. > listen http-in bind *:8000 server web-1 127.0.0.1:80 server web-2

    127.0.0.1:80 demo  master consul-template -template="example.ctmpl" -dry demo  master sudo stop webserver
  26. > listen http-in bind *:8000 server web-0 127.0.0.1:80 server web-1

    127.0.0.1:80 server web-2 127.0.0.1:80 demo  master consul-template -template="example.ctmpl" -dry demo  master sudo start webserver
  27. K/V Store HTTP API

  28. true demo  master curl -X PUT -d 'bar' http://localhost:8500/v1/kv/foo

  29. true demo  master curl -X PUT -d 'bar' http://localhost:8500/v1/kv/foo

    [ { "CreateIndex": 100, "ModifyIndex": 200, "Key": "foo", "Flags": 0, "Value": "YmFy" } ] demo  master curl http://localhost:8500/v1/kv/foo
  30. TRUSTED  BY

  31. HEALTH CHECKS

  32. WHAT  IS  A  CHECK? Any command that returns an exit

    code
  33. WHAT  IS  A  CHECK? Any command that returns an exit

    code 0 PASSING 1 WARNING __ FAILING
  34. WHAT  IS  A  CHECK? Output is captured as a "note"

    for inspection curl: (7) Failed to connect to 127.0.0.1 port 4455: Connection refused $ curl http://127.0.0.1:4455/_health
  35. CREATING  A  CHECK Use a custom script { "check": {

    "id": "mem-util", "name": "Memory utilization", "script": "/usr/local/bin/check_mem.py", "interval": "10s" } }
  36. CREATING  A  CHECK Use a built-in check type { "check":

    { "id": "api", "name": "HTTP API on port 4455", "http": "http://localhost:4455/_health", "interval": "10s", "timeout": "1s" } }
  37. RESPONSIVE

  38. None
  39. None
  40. MONITORING SERVICE TRADITIONAL  MONITORING Pushes information into a silo WEB

    1 WEB 2 WEB N
  41. MONITORING SERVICE TRADITIONAL  MONITORING Pushes information into a silo WEB

    1 WEB 2 WEB N
  42. MONITORING SERVICE TRADITIONAL  MONITORING Pushes information into a silo WEB

    1 WEB 2 WEB N
  43. MONITORING SERVICE TRADITIONAL  MONITORING Pushes information into a silo WEB

    1 WEB 2 WEB N
  44. MONITORING SERVICE TRADITIONAL  MONITORING Pushes information into a silo WEB

    1 WEB 2 WEB N U
  45. MONITORING SERVICE TRADITIONAL  MONITORING Pushes information into a silo WEB

    1 WEB 2 WEB N U F F
  46. MONITORING SERVICE TRADITIONAL  MONITORING Pushes information into a silo WEB

    1 WEB 2 WEB N U F F
  47. CONSUL CONSUL  MONITORING Removes unhealthy nodes from service discovery layer

    WEB 1 WEB 2 WEB N
  48. CONSUL CONSUL  MONITORING Removes unhealthy nodes from service discovery layer

    WEB 1 WEB 2 WEB N
  49. CONSUL CONSUL  MONITORING Removes unhealthy nodes from service discovery layer

    WEB 1 WEB 2 WEB N dig web.service.consul 10.0.1.4 10.0.1.5 10.0.1.6
  50. CONSUL CONSUL  MONITORING Removes unhealthy nodes from service discovery layer

    WEB 1 WEB 2 WEB N dig web.service.consul 10.0.1.4 10.0.1.5 10.0.1.6
  51. CONSUL CONSUL  MONITORING Removes unhealthy nodes from service discovery layer

    WEB 1 WEB 2 WEB N dig web.service.consul 10.0.1.5 10.0.1.6
  52. CONSUL CONSUL  MONITORING Removes unhealthy nodes from service discovery layer

    WEB 1 WEB 2 WEB N dig web.service.consul 10.0.1.5 10.0.1.6 host: web.service.consul
  53. CONSUL CONSUL  MONITORING Removes unhealthy nodes from service discovery layer

    WEB 1 WEB 2 WEB N dig web.service.consul 10.0.1.5 10.0.1.6 host: web.service.consul
  54. CONSUL CONSUL  MONITORING Removes unhealthy nodes from service discovery layer

    WEB 1 WEB 2 WEB N dig web.service.consul 10.0.1.5 10.0.1.6 host: web.service.consul
  55. CONSUL CONSUL  MONITORING Removes unhealthy nodes from service discovery layer

    WEB 1 WEB 2 WEB N dig web.service.consul 10.0.1.5 10.0.1.6 host: web.service.consul
  56. CONSUL CONSUL  MONITORING Removes unhealthy nodes from service discovery layer

    WEB 1 WEB 2 WEB N dig web.service.consul 10.0.1.4 10.0.1.5 10.0.1.6 host: web.service.consul
  57. CONSUL  MONITORING Unhealthy nodes are not returned from DNS queries

    dig web.service.consul web-01, web-02, web-03
  58. CONSUL  MONITORING Unhealthy nodes are not returned from HTTP API

    curl /v1/services/web web-01, web-02, web-03
  59. LOCKING

  60. CONSUL  LOCK Allows for a new kind of "HA" demo

     master consul lock [options] prefix child...
  61. CONSUL  LOCK Making standby HA much simpler CONSUL VAULT 1

    VAULT 2 VAULT 3
  62. CONSUL  LOCK Making standby HA much simpler CONSUL VAULT 1

    VAULT 2 VAULT 3 L L
  63. CONSUL  LOCK Making standby HA much simpler CONSUL VAULT 1

    VAULT 2 VAULT 3 L
  64. CONSUL  LOCK Making standby HA much simpler CONSUL VAULT 1

    VAULT 2 VAULT 3 L LEADER  ELECTION
  65. CONSUL  LOCK Making standby HA much simpler CONSUL VAULT 1

    VAULT 2 VAULT 3 L GET /secret/foo REQUEST
  66. CONSUL  LOCK Making standby HA much simpler CONSUL VAULT 1

    VAULT 2 VAULT 3 L GET /secret/foo REQUEST
  67. CONSUL  LOCK Making standby HA much simpler CONSUL VAULT 1

    VAULT 2 VAULT 3 L GET /secret/foo REQUEST
  68. CONSUL  LOCK Making standby HA much simpler CONSUL VAULT 1

    VAULT 2 VAULT 3 L
  69. CONSUL  LOCK Making standby HA much simpler CONSUL VAULT 1

    VAULT 2 VAULT 3 l
  70. CONSUL  LOCK Making standby HA much simpler CONSUL VAULT 1

    VAULT 2 VAULT 3 L
  71. CONSUL  LOCK Making standby HA much simpler CONSUL VAULT 1

    VAULT 2 VAULT 3 L
  72. CONSUL  LOCK Making standby HA much simpler VAULT 1 VAULT

    2 VAULT 3 GET /secret/foo REQUEST CONSUL L
  73. CONSUL  LOCK Making standby HA much simpler VAULT 1 VAULT

    2 VAULT 3 GET /secret/foo REQUEST CONSUL L
  74. CONSUL  LOCK Solves the "exactly one of these must always

    be running" problem
  75. CONSUL  LOCK Also great as a semaphore - rolling restarts

  76. SCALABILITY

  77. MONITORING SERVICE TRADITIONAL  MONITORING Notifies/polls all statuses WEB 1 WEB

    2 WEB N "I'm healthy" "Good, thanks for asking!"
  78. MONITORING SERVICE TRADITIONAL  MONITORING Notifies/polls all statuses WEB 1 WEB

    2 WEB 1,000 1,000'S OF REQUESTS
  79. MONITORING SERVICE TRADITIONAL  MONITORING Notifies/polls all statuses WEB 1 WEB

    2 WEB 1,000 1,000'S OF REQUESTS HA
  80. CONSUL WEB 1 WEB 2 WEB N My status has

    changed CONSUL  MONITORING Notifies on status changes
  81. CONSUL CONSUL  MONITORING Notifies on status changes WEB 1 WEB

    2 WEB 1,000 10'S OF REQUESTS
  82. CONCLUSION

  83. SERVICE   DISCOVERY LOAD   BALANCING HEALTH   CHECKING KEY-­‐VALUE

      CONFIGURATION SOLVES  4  BASIC  PROBLEMS
  84. 9 H G L RESPONSIVE LEADER   ELECTION SEMAPHORE  

    LOCKING SCALABLE SOLVES  4  MORE  PROBLEMS
  85. SETH VARGO @sethvargo QUESTIONS?