Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Consul as a Monitoring Service

Consul as a Monitoring Service

There are two sides to monitoring – exposing problems with alerts and acting upon those alerts to find solutions to the exposed problem. For exposing problems, users can define any script for Consul to intelligently check and report the health status of all nodes in a cluster. These scripts could be as simple as returning a 200, or as complex as querying the load and query response time on a database server. Other monitoring solutions already provide such functionality, but where Consul shines is in the second half of monitoring – automatic intervention to find solutions to problems without human operators.

Since Consul has built-in health checking, it not only notifies operators of a node or service failure, but automatically routes traffic away from unhealthy nodes. Consul is also able to re-route traffic back to a troubled node, once the node reports it is healthy again. In this way Consul pushes the existing paradigms of monitoring, making it much more than a simple notification system. Rather it surfaces problems and solves them without human intervention. Don’t worry about that pager going off in the middle of the night – rest easy with Consul.

Seth Vargo

May 29, 2015
Tweet

More Decks by Seth Vargo

Other Decks in Technology

Transcript

  1. ORDER PROCESSING WEB APP LOAD  BALANCING How to ensure request

    leveling across providers? NODE 1 NODE 2 NODE N
  2. ORDER PROCESSING WEB APP ANTI-­‐PATTERN Load Balancer is a Single

    Point of Failure (SPOF) NODE 1 NODE 2 NODE N LOAD BALANCER
  3. ORDER PROCESSING WEB APP HEALTH  CHECKING How to avoid routing

    to unhealthy hosts? NODE 1 NODE 2 NODE 3 LOAD BALANCER
  4. WEB APP CONFIGURATION How to efficiently push dynamic configuration? WEB

    1 WEB 2 WEB N maintenance: false feature_a: true role: "web"
  5. ; <<>> DiG 9.8.3-P1 <<>> web-frontend.service.consul. ANY ;; global options:

    +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 29981 ;; flags: qr aa rd ra; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 0 ;; QUESTION SECTION: ;web-frontend.service.consul. IN ANY ;; ANSWER SECTION: web-frontend.service.consul. 0 IN A 10.0.3.83 web-frontend.service.consul. 0 IN A 10.0.1.109 demo  master dig web-frontend.service.consul
  6. CLIENT CLIENT CLIENT CLIENT CLIENT CLIENT SERVER SERVER SERVER REPLICATION

    REPLICATION RPC RPC LAN  GOSSIP SERVER SERVER SERVER REPLICATION REPLICATION
  7. CLIENT CLIENT CLIENT CLIENT CLIENT CLIENT SERVER SERVER SERVER REPLICATION

    REPLICATION RPC RPC LAN  GOSSIP SERVER SERVER SERVER REPLICATION REPLICATION WAN  GOSSIP
  8. > listen http-in bind *:8000 server web-0 127.0.0.1:80 server web-1

    127.0.0.1:80 server web-2 127.0.0.1:80 demo  master consul-template -template="example.ctmpl" -dry demo  master
  9. > listen http-in bind *:8000 server web-0 127.0.0.1:80 server web-1

    127.0.0.1:80 server web-2 127.0.0.1:80 demo  master consul-template -template="example.ctmpl" -dry demo  master sudo stop webserver
  10. > listen http-in bind *:8000 server web-1 127.0.0.1:80 server web-2

    127.0.0.1:80 demo  master consul-template -template="example.ctmpl" -dry demo  master sudo stop webserver
  11. > listen http-in bind *:8000 server web-0 127.0.0.1:80 server web-1

    127.0.0.1:80 server web-2 127.0.0.1:80 demo  master consul-template -template="example.ctmpl" -dry demo  master sudo start webserver
  12. true demo  master curl -X PUT -d 'bar' http://localhost:8500/v1/kv/foo

    [ { "CreateIndex": 100, "ModifyIndex": 200, "Key": "foo", "Flags": 0, "Value": "YmFy" } ] demo  master curl http://localhost:8500/v1/kv/foo
  13. WHAT  IS  A  CHECK? Any command that returns an exit

    code 0 PASSING 1 WARNING __ FAILING
  14. WHAT  IS  A  CHECK? Output is captured as a "note"

    for inspection curl: (7) Failed to connect to 127.0.0.1 port 4455: Connection refused $ curl http://127.0.0.1:4455/_health
  15. CREATING  A  CHECK Use a custom script { "check": {

    "id": "mem-util", "name": "Memory utilization", "script": "/usr/local/bin/check_mem.py", "interval": "10s" } }
  16. CREATING  A  CHECK Use a built-in check type { "check":

    { "id": "api", "name": "HTTP API on port 4455", "http": "http://localhost:4455/_health", "interval": "10s", "timeout": "1s" } }
  17. CONSUL CONSUL  MONITORING Removes unhealthy nodes from service discovery layer

    WEB 1 WEB 2 WEB N dig web.service.consul 10.0.1.4 10.0.1.5 10.0.1.6
  18. CONSUL CONSUL  MONITORING Removes unhealthy nodes from service discovery layer

    WEB 1 WEB 2 WEB N dig web.service.consul 10.0.1.4 10.0.1.5 10.0.1.6
  19. CONSUL CONSUL  MONITORING Removes unhealthy nodes from service discovery layer

    WEB 1 WEB 2 WEB N dig web.service.consul 10.0.1.5 10.0.1.6
  20. CONSUL CONSUL  MONITORING Removes unhealthy nodes from service discovery layer

    WEB 1 WEB 2 WEB N dig web.service.consul 10.0.1.5 10.0.1.6 host: web.service.consul
  21. CONSUL CONSUL  MONITORING Removes unhealthy nodes from service discovery layer

    WEB 1 WEB 2 WEB N dig web.service.consul 10.0.1.5 10.0.1.6 host: web.service.consul
  22. CONSUL CONSUL  MONITORING Removes unhealthy nodes from service discovery layer

    WEB 1 WEB 2 WEB N dig web.service.consul 10.0.1.5 10.0.1.6 host: web.service.consul
  23. CONSUL CONSUL  MONITORING Removes unhealthy nodes from service discovery layer

    WEB 1 WEB 2 WEB N dig web.service.consul 10.0.1.5 10.0.1.6 host: web.service.consul
  24. CONSUL CONSUL  MONITORING Removes unhealthy nodes from service discovery layer

    WEB 1 WEB 2 WEB N dig web.service.consul 10.0.1.4 10.0.1.5 10.0.1.6 host: web.service.consul
  25. CONSUL  MONITORING Unhealthy nodes are not returned from DNS queries

    dig web.service.consul web-01, web-02, web-03
  26. CONSUL  MONITORING Unhealthy nodes are not returned from HTTP API

    curl /v1/services/web web-01, web-02, web-03
  27. CONSUL  LOCK Allows for a new kind of "HA" demo

     master consul lock [options] prefix child...
  28. CONSUL  LOCK Making standby HA much simpler CONSUL VAULT 1

    VAULT 2 VAULT 3 L GET /secret/foo REQUEST
  29. CONSUL  LOCK Making standby HA much simpler CONSUL VAULT 1

    VAULT 2 VAULT 3 L GET /secret/foo REQUEST
  30. CONSUL  LOCK Making standby HA much simpler CONSUL VAULT 1

    VAULT 2 VAULT 3 L GET /secret/foo REQUEST
  31. CONSUL  LOCK Making standby HA much simpler VAULT 1 VAULT

    2 VAULT 3 GET /secret/foo REQUEST CONSUL L
  32. CONSUL  LOCK Making standby HA much simpler VAULT 1 VAULT

    2 VAULT 3 GET /secret/foo REQUEST CONSUL L
  33. CONSUL WEB 1 WEB 2 WEB N My status has

    changed CONSUL  MONITORING Notifies on status changes
  34. SERVICE   DISCOVERY LOAD   BALANCING HEALTH   CHECKING KEY-­‐VALUE

      CONFIGURATION SOLVES  4  BASIC  PROBLEMS
  35. 9 H G L RESPONSIVE LEADER   ELECTION SEMAPHORE  

    LOCKING SCALABLE SOLVES  4  MORE  PROBLEMS