Practical Service Discovery 
with Consul

Practical Service Discovery 
with Consul

Presented at Gluecon 2016

733c60b9662dea37bc29e2d758e3e9f9?s=128

Chris Stevens

May 26, 2016
Tweet

Transcript

  1. 1.

    Practical Service Discovery
 with Consul Chris Stevens @stevenscg 
 Gluecon

    2016 This work is licensed under a
 Creative Commons Attribution-ShareAlike 3.0 United States License
  2. 4.

    @stevenscg – Jeff Lindsay @progrium “Service discovery is about knowing

    when any process in the cluster is listening on a TCP or UDP port, and being able to look up and connect to that port by name.” http://progrium.com/blog/2014/07/29/understanding-modern-service-discovery-with-docker/
  3. 15.

    @stevenscg Static Configuration // config.php $db = '192.168.1.50:3306'; $api =

    '192.168.1.51:443'; $cache = [ '192.168.1.60:11211', '192.168.1.61:11211', ]; Pets. Not Cattle. ______ < Meow > ------ \ ^__^ \ (oo)\_______ (__)\ )\/\ ||----w | || ||
  4. 16.

    @stevenscg Configuration Management // ansible-playbook: web-servers.php - hosts: localhost connection:

    local tasks: - name: AWS - Gather RabbitMQ server instances shell: > aws ec2 describe-instances --region {{ region }} --filters "Name=tag:class,Values=rabbitmq" --query "Reservations[].Instances[].PrivateIpAddress" --output json register: rabbitmq_instances_raw changed_when: false Ansible + EC2 Tags => Templating
  5. 17.

    @stevenscg Service Discovery // config.php (using environment variables) // via

    DNS (load-balanced) $db = dns_get_record(getenv('db.service.consul'), SRV); $api = dns_get_record(getenv('api.service.consul'), SRV); // via API $cacheServers = getServers(getenv('cache.service.consul')); function getServers($name) { $url = 'http://localhost:8500/v1/catalog/service/$name'; $resp = json_decode(file_get_contents($url), true); $servers = array_map(function($item) { // format to host, port, weight, etc. }, $resp); return $servers; } Real-time. Healthy Services. Load Balancing.
  6. 22.

    @stevenscg Consul Feature PowerIndex Service Registry + Health Checks Distributed

    Key / Value Store First-class DNS Interface Distributed Locks Vault / Nomad
 Backing Store
  7. 25.

    @stevenscg Implementing Consul • Planning • Deployment • High Availability

    • Cluster / Instance sizing • Service naming conventions • Configuring DNS • DNS Recursors • Monitoring • Upgrades
  8. 26.

    @stevenscg Planning Register external services (RDS, etc). Register batch services.

    Feature flags with KV. Locks / Semaphores. App Integration.
  9. 27.

    @stevenscg Deployment Consul Servers
 Odd number required for quorum. Consul

    Agent
 One per instance or Docker host. Security Groups - TCP/UDP ports. Plan for attended provisioning.
  10. 28.

    @stevenscg Detect consul servers
 via EC2 tags // ansible playbook:

    consul.yml // consul_servers_via_tag: consul - name: Gather consul server instances run_once: true shell: > aws ec2 describe-instances --region {{ region }} --filters "Name=tag:class,Values={{ consul_servers_via_tag }}" --query "Reservations[].Instances[].PrivateIpAddress" --output json register: consul_instances_raw changed_when: false when: consul_servers_via_tag is defined
  11. 29.

    @stevenscg Security Groups
 Consul Agents // ansible playbook: security-groups.yml -

    name: ConsulAgentSG description: Consul agent security group rules: # Server RPC - proto: tcp from_port: 8300 to_port: 8300 cidr_ip: 10.0.0.0/8 # Serf gossip LAN - proto: tcp from_port: 8301 to_port: 8301 cidr_ip: 10.0.0.0/8 - proto: udp from_port: 8301 to_port: 8301 cidr_ip: 10.0.0.0/8 # CLI RPC - proto: tcp from_port: 8400 to_port: 8400 cidr_ip: 10.0.0.0/8 # HTTP API - proto: tcp from_port: 8500 to_port: 8500 cidr_ip: 10.0.0.0/8
  12. 30.

    @stevenscg Security Groups
 Consul Servers // ansible playbook: security-groups.yml -

    name: ConsulServerSG description: Consul server security group rules: # Server RPC - proto: tcp from_port: 8300 to_port: 8300 cidr_ip: 10.0.0.0/8 # Serf gossip LAN - proto: tcp from_port: 8301 to_port: 8301 cidr_ip: 10.0.0.0/8 - proto: udp from_port: 8301 to_port: 8301 cidr_ip: 10.0.0.0/8 # CLI RPC - proto: tcp from_port: 8400 to_port: 8400 cidr_ip: 10.0.0.0/8 # HTTP API - proto: tcp from_port: 8500 to_port: 8500 cidr_ip: 10.0.0.0/8 # DNS Interface - proto: tcp from_port: 8600 to_port: 8600 cidr_ip: 10.0.0.0/8 - proto: udp from_port: 8600 to_port: 8600 cidr_ip: 10.0.0.0/8
  13. 31.

    @stevenscg Availability Consul is a distributed,
 highly available system. Consul

    nodes within a datacenter
 participate in a gossip protocol. Consul servers within a datacenter
 are part of a single Raft peer set. The Raft Paper [PDF] https://www.consul.io/intro https://www.consul.io/docs/internals/architecture.html
  14. 32.

    @stevenscg Cluster / Instance Sizing Odd number of servers. Use

    t2.medium or larger. Monitor for leader transitions.
  15. 33.
  16. 34.
  17. 35.

    @stevenscg Service Naming Conventions • DNS-compatible from the start •

    Service name: • api • DNS names: • api.service.consul • api.service.dc1.consul
  18. 39.

    @stevenscg dnsmasq // ansible playbook: dnsmasq.yml - name: Configure dnsmasq

    to listen only on loopback lineinfile: dest: /etc/dnsmasq.conf regexp: "^#?interface=" line: "interface=lo" state: present - name: Delegate consul DNS requests to the consul DNS port copy: > content='server=/{{ consul_domain }}/ {{ consul_client_address }}#{{ consul_port_dns }}' dest=/etc/dnsmasq.d/10-consul notify: - Restart dnsmasq
  19. 40.

    @stevenscg resolv.conf // ansible playbook: dnsmasq.yml - name: Add localhost

    nameserver to resolv.conf lineinfile: dest: /etc/resolv.conf line: 'nameserver 127.0.0.1' insertbefore: "^nameserver" state: present - name: Ensure dhclient maintains the localhost nameserver lineinfile: dest: /etc/dhcp/dhclient.conf line: 'prepend domain-name-servers 127.0.0.1;' state: present when: dh.stat.exists
  20. 41.

    @stevenscg Protip: DNS Recursors Required for AWS managed services (RDS).

    AWS provides DNS for VPCs (i.e. 10.0.0.2) Alert on...
 
 syslog.appName:"consul" "dns: all resolvers failed"
  21. 42.

    @stevenscg Consul DNS Configuration // ansible group_vars consul_dns_config: allow_stale: true

    max_stale: 2s service_ttl: "*": 0s rds: 5s elasticache: 5s consul_dns_recursors: # - 169.254.169.253 - 10.0.0.2
  22. 46.

    @stevenscg Monitoring Built-in telemetry support. StatsD or Statsite. Log to

    syslog / Loggly / etc. consul.consul.rpc.request consul.consul.leader.reconcile consul.serf.events.consul_new-leader consul.serf.member.failed
  23. 49.

    @stevenscg Upgrades Dedicated documentation section. Servers first. Then agents. 4

    upgrades so far
 (0.6.0 - 0.6.4). Zero downtime. https://www.consul.io/docs/upgrading.html
  24. 52.

    @stevenscg Bootstrapping a cluster 1. Launch Consul in a container

    ("consul1"). 2. Join 2 additional server containers.
 "consul2" and "consul3". 3. Kill off "consul2". 4. Check cluster health. Just Released! Official docker image https://hub.docker.com/_/consul/
  25. 53.
  26. 54.

    @stevenscg Companion Project • Vagrant VM • Single-node Consul Cluster

    • Interactive demo app + services • Consul Agent, UI, and API • Ansible playbooks • Curl examples https://github.com/stevenscg/service-discovery-with-consul
  27. 57.

    @stevenscg Register a service Consul is already running in our

    VM. Run a playbook to register services:
 statsd. demo-app (web). memcached. Lookup services via DNS and API.
  28. 58.

    @stevenscg Register a service $ ansible-playbook 01-register-service.yml -i .vagrant/provisioners/ansible/inventory PLAY

    *************************************************************************** TASK [setup] ******************************************************************* ok: [default] TASK [Fetch existing services from consul API] ********************************* ok: [default] TASK [Existing services] ******************************************************* ok: [default] => { "msg": { "consul": [] } } TASK [Register statsd with consul] ********************************************* changed: [default] TASK [Register demo-app with consul] ******************************************* changed: [default] TASK [Register memcached with consul] ****************************************** changed: [default] TASK [Wait for the checks to be evaluated (3 seconds)] ************************* Pausing for 3 seconds (ctrl+C then 'C' = continue early, ctrl+C then 'A' = abort) ok: [default] TASK [Fetch updated services from consul API] ********************************** ok: [default] TASK [Updated services] ******************************************************** ok: [default] => { "msg": { "cache": [], "consul": [], "statsd": [], "web": [] } } PLAY RECAP ********************************************************************* default : ok=9 changed=3 unreachable=0 failed=0
  29. 61.

    @stevenscg Health checks Consul is already running in our VM.

    Run a playbook to register new checks for existing services. Evaluate the checks via the Consul API.
  30. 62.

    @stevenscg Health checks $ ansible-playbook 02-register-health-checks.yml -i .vagrant/provisioners/ansible/inventory PLAY ***************************************************************************

    TASK [setup] ******************************************************************* ok: [default] TASK [Fetch existing health checks from consul API] **************************** ok: [default] TASK [Existing health checks] ************************************************** ok: [default] => { "msg": { "service:web": { "CheckID": "service:web", "CreateIndex": 0, "ModifyIndex": 0, "Name": "Service 'web' check", "Node": "demo", "Notes": "", "Output": "HTTP GET http://localhost:8001/health_check: 200 OK Output: ", "ServiceID": "web", "ServiceName": "web", "Status": "passing" } } }
  31. 63.

    @stevenscg Health checks TASK [Register a status code check for

    the demo-app] *************************** changed: [default] TASK [Wait for the checks to be evaluated (3 seconds)] ************************* Pausing for 3 seconds (ctrl+C then 'C' = continue early, ctrl+C then 'A' = abort) ok: [default] TASK [Fetch updated health checks from consul API] ***************************** ok: [default] TASK [Updated health checks] *************************************************** ok: [default] => { "msg": { "service:web": { "CheckID": "service:web", "CreateIndex": 0, "ModifyIndex": 0, "Name": "Service 'web' check", "Node": "demo", "Notes": "", "Output": "HTTP GET http://localhost:8001/health_check: 200 OK Output: ", "ServiceID": "web", "ServiceName": "web", "Status": "passing" }, "service:web:status_code": { "CheckID": "service:web:status_code", "CreateIndex": 0, "ModifyIndex": 0, "Name": "service:web:status_code", "Node": "demo", "Notes": "", "Output": "HTTP GET http://localhost:8001: 200 OK Output:", "ServiceID": "", "ServiceName": "", "Status": "passing" } } } PLAY RECAP ********************************************************************* default : ok=7 changed=1 unreachable=0 failed=0
  32. 65.

    @stevenscg DNS Interface • Standard lookup
 [tag.]<service>.service[.datacenter].<domain>
 dig @127.0.0.1 -p

    8600 consul.service.consul SRV
 dig consul.service.consul SRV
 dig consul.service.dc1.consul SRV • Prepared query lookup
 <query or name>.query[.datacenter].<domain>
 dig redis-primary.service.consul SRV • Randomized (healthy) results for load balancing https://www.consul.io/docs/agent/dns.html
  33. 66.

    @stevenscg Key / Value Store Consul is already running in

    our VM. Consul-PHP-SDK. Curl. Any HTTP client.
  34. 69.

    @stevenscg Key / Value Store • Manage KV items with

    a
 dedicated git repository. • git2consul detects
 changes and updates
 KV paths. • Full change history. • Familiar workflows. Recommendations
  35. 70.

    @stevenscg Key / Value Store • App config managed
 by

    consul-template. • KV changes detected
 by consul-template. • App config updated
 and restarted. • Works with Ansible,
 Chef, Puppet, etc. Recommendations
  36. 71.

    @stevenscg Distributed Locks Uses Consul leader election client-side. Ensures only

    a single client process can execute. Scheduled Jobs. One-off tasks. https://www.consul.io/docs/guides/leader-election.html
  37. 72.

    @stevenscg Distributed Locks Consul is already running in our VM.

    LockHandler from Consul-PHP-SDK. Consul KV + Session. https://github.com/stevenscg/consul-php-sdk
  38. 74.

    @stevenscg Feature Flags Consul KV with a simple JSON structure.

    Feature Helper from Consul-PHP-SDK. Top level "feature" key path.
  39. 79.

    @stevenscg Advanced Topics • ACLs
 https://www.consul.io/docs/internals/acl.html • Multi-Datacenter
 https://www.consul.io/docs/guides/datacenters.html •

    Prepared Queries
 https://www.consul.io/docs/agent/http/query.html • Network Tomography
 https://www.consul.io/docs/internals/coordinates.html
  40. 82.

    Practical Service Discovery
 with Consul This work is licensed under

    a
 Creative Commons Attribution-ShareAlike 3.0 United States License Chris Stevens @stevenscg 
 Gluecon 2016
  41. 83.

    @stevenscg Image Credits 1 Flickr / Shane Gorski https://flic.kr/p/4TsYxV 2

    Traxo 5 Flickr / Xaf https://flic.kr/p/8bWWrR 6 Flickr / Tony Hisgett https://flic.kr/p/5aGNRj 7 Flickr / Tony Hisgett https://flic.kr/p/ovbczG 9 Github / @adrianco https://github.com/adrianco/spigo 19 Flickr / Xaf https://flic.kr/p/74gWVS 20-21 HashiCorp 24 Flickr / Xaf https://flic.kr/p/74gWVS 37 Flickr / Sasha G https://flic.kr/p/purZ4 44-45 Dean Mouhtaropoulos 48 Flickr / Paul Williams https://flic.kr/p/5spYAJ 51 Flickr / Timelapsed https://flic.kr/p/oVyUnH 52 Docker 80-81 HashiCorp 82 Flickr / Shane Gorski https://flic.kr/p/4TsYxV