Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Mesos + Consul

Mesos + Consul

Given to the Bay Area Mesos Meetup July 22, 2015

In this talk we discuss integrating Consul (consul.io) with Apache Mesos. Mesos allows us to mix diverse workloads across multiple systems and Consul provides service discovery, health checks and support for dynamic system configuration.

By making these tools work together, we can build a flexible and powerful platform that support a wide range of use cases -- from running Docker containers to data-centric applications like Kafka and Spark.

Steven Borrelli

July 22, 2015

More Decks by Steven Borrelli

Other Decks in Technology


  1. M E S O S + C O N S

    U L BAY A R E A M E S O S M E E T U P J U LY 2 015 Steven  Borrelli   @stevendborrelli @Asteris_LLC
  2. A B O U T M E F O U

    N D E D A S T E R I S ( 2 014 ) S YS T E M S E N G I N E E R I N G , H P C , B I G DATA & C LO U D F O C U S O N C O N T I N U O U S D E L I V E RY, S T R E A M I N G DATA , A N D I N F R A S T RU C T U R E S O F T WA R E
  3. +

  4. D I V E R S E U S E

    R T Y P E S Development Analytics Engineering
  5. E M E RG I N G U S E

    C A S E S Continuous Delivery Mobile First Microservices Multiple Languages Streaming Data Unstructured Data Multiple Data Stores Machine Learning Cloud DevOps
  6. F R A M E W O R K S

    App-specific: Generic:
  7. M E S O S C H A L L

    E N G E S • Deployment • Framework Development • Security & Management • Monitoring • Service Discovery
  8. RU N N I N G C O N S

    U L • Single Binary (golang) • Run on every system • 1- 7 Servers per datacenter, rest of systems are clients • Config via .json files or cli parameters • Optional Web UI
  9. P RO B L E M S RU N N

    I N G C O N S U L I N D O C K E R • ARP cache issues with Docker networking, need to install conntrack to flush. • PITA to mount volumes and open network ports • Health checks become more complex • Network latency seems to cause instability
  10. Clients: Failure Detection
 Health Checks
 Respond to local requests Servers:

    Leader Election
 Forward Request
 Replicate Data Consensus is achieved via gossip protocol (nodes) or raft (server data)
  11. C O N S E N S U S M

    O D E L Consistency Availability Partition Tolerance Gossip Paxos/ Raft Consul Agent Cassandra Zookeeper Consul K/V etcd
  12. C O N S U L C O N S

    I S T E N C Y • Servers use Raft for consistency (CP) • Loss of server quorum will cause availability failure • Run a small (odd) number of servers per DC • Agents use LAN gossip for node failure detection • WAN gossip is used across DCs, higher latency
  13. C O N S U L A P I C

    O N S I S T E N C Y M O D E S • default: server can serve requests during election. Possible stale values. • consistent: leader must be elected • stale: any server can respond, even non-leaders.
  14. R E G I S T E R A S

    E RV I C E { "service": { "name": "marathon", "tags": [ "admin" ], "port": 8080, "check": { "script": "curl --silent --show-error --fail --dump-header /dev/stderr --r etry 2", "interval": "10s" } } } marathon.json Create a file called: Optional Health Check DNS Name HTTP API also supported
  15. D N S R E G I S T R

    AT I O N # consul reload # dig marathon.service.consul +short If a health check fails, entry will not show in DNS.
  16. S E RV I C E TAG S # dig

    admin.marathon.service.consul +short Tags are supported in DNS
  17. D N S S RV R E C O R

    D S # dig zookeeper.service.consul SRV +short 1 1 2181 mi-control-01.node.dc1.consul. 1 1 2181 mi-control-03.node.dc1.consul. 1 1 2181 mi-control-02.node.dc1.consul. Get the port for any service: Nodes are automatically registered in DNS. You can even query services and nodes in other DCs!
  18. S I M P L I F Y M E

    S O S C O N F I G U R AT I O N zk://zookeeper.service.consul:2181/mesos Zookeeper config string: http://marathon.service.consul:8080 Marathon config string: Mesos config string (we’ll discuss leader later): mesos://leader.mesos.service.consul:5050
  19. B O N U S ! H E A LT

    H C H E C KS YO U R M E S O S C LU S T E R
  20. H E A LT H C H E C KS

    A R E RU N BY T H E N O D E S , E X P O S E S TAT E V I A A P I [ { "Node": { "Node": "mi-control-01", "Address": "" }, "Service": { "ID": "chronos", "Service": "chronos", "Tags": [ "chronos" ], "Address": "", "Port": 14400 }, "Checks": [ { "Node": "mi-control-01", "CheckID": "service:chronos", "Name": "Service 'chronos' check", "Status": "critical", "Notes": "", "Output": "", "ServiceID": "chronos", "ServiceName": "chronos" }, curl -L http://localhost:8500/v1/health/service/chronos?pretty=true
  21. H E A LT H C H E C K

    E X I T C O D E S Exit  code  0  -­‐  Check  is  passing   Exit  code  1  -­‐  Check  is  warning       Any  other  code  -­‐  Check  is  critical Consul Checks are compatible with Nagios/Sensu:
  22. C O N S U L K / V E

    X P O S E D V I A A P I curl -X PUT -d 'test' http://localhost:8500/v1/kv/web/key1 curl http://localhost:8500/v1/kv/?recurse [{"CreateIndex":97,"ModifyIndex":97,"Key":"web/key1","Flags": 0,"Value":"dGVzdA=="}, Or use consul-cli kv-read --ssl nodes/config/test Hello World consulcli- kv-delete --ssl --consul=consul.service.consul:8500 -- recurse nodes/config/test
  23. • Only use in 0.5.2 or higher (upsert support) •

    Master tokens are used to create ACL entries • Every ACL entry has a token • read/write/deny policy on k/v and service endpoints • Can manage with API or C O N S U L AC L S
  24. • Released today! (7/22/2015) • Wraps the consul API with

    an easy-to-use CLI for scripting • Manages ACLs, Checks, Locks, K/V, HealthChecks, Services, Sessions, Raft Status • https://github.com/CiscoCloud/consul-cli C O N S U L - C L I
  25. • Example: distributed lock C O N S U L

    - C L I $ ./consul-cli kv-lock --ttl=0 test/locks ba7c8cda-d197-a062-4e3e-f9a737237aa1 $ ./consul-cli kv-read --format=prettyjson test/locks { "Key": "test/locks", "CreateIndex": 386, "ModifyIndex": 386, "LockIndex": 1, "Flags": 0, "Value": "", "Session": "ba7c8cda-d197-a062-4e3e-f9a737237aa1" } $ ./consul-cli kv-unlock \ —session=ba7c8cda-d197-a062-4e3e-f9a737237aa1 test/locks
  26. • Reads data from Consul k/v and service catalog •

    Writes out text files based on go text/template • Can be used to dynamically configure systems and applications C O N S U L T E M P L AT E {{range service "web@datacenter"}}
 server {{.Name}} {{.Address}}:{{.Port}}
 server nyc_web_01 123.456.789.10:8080
 server nyc_web_02 456.789.101.213:8080 Becomes
  27. • Update zoo.cfg as ZK nodes come up/down • Writes

    out text files based on go text/template • Restarts Zookeeper nodes • https://github.com/CiscoCloud/docker-zookeeper
 DY N A M I C Z O O K E E P E R E N S E M B L E {{{ with $s := env "CONSUL_QUERY" }}
 { range service $s "passing, warning" }}
 ZK_HOSTS[{{.ID | regexReplaceAll ".*:zkid-([0-9]*)" "$1"}}]={{.Address}}
 ZK_CLIENT_PORTS[{{.ID | regexReplaceAll ".*:zkid-([0-9]*)" “$1"}}]=2181 ZK_PEER_PORTS[{{.ID | regexReplaceAll ".*:zkid-([0-9]*)" "$1"}}]=2888
 ZK_ELECTION_PORTS[{{.ID | regexReplaceAll ".*:zkid-([0-9]*)" “$1"}}]=3888 {{end}}{{end}}
  28. • Dynamically adds Mesos tasks to Consul • Located at

    https://github.com/CiscoCloud/mesos- consul • Easy to run as Docker container via Marathon
 • Mesos task <taskname> shows up as:
 M E S O S - C O N S U L curl -X POST [email protected] -H "Content-Type: application/json" http://marathon.service.consul:8080/v2/apps' taskname.service.consul
  29. • Leader detection built-in. Use: • Mesos doesn’t have an

    event bus. Mesos-consul needs to poll every few seconds. • Mesos (0.22.1 and earlier) doesn’t export Docker port mapping information, so all ports are registered to the same DNS name. M E S O S - C O N S U L leader.mesos.service.consul
  30. • Dynamically adds Marathon tasks to Consul K/V. Can be

    used to build proxy configurations. • Located at https://github.com/CiscoCloud/ marathon-­‐consul • Easy to run as Docker container via Marathon • Listens to Marathon event bus: M A R AT H O N - C O N S U L curl -X POST 'http://marathon.service.consul:8080/v2/eventSubscriptions? callbackUrl=http://marathon-consul.service.consul:4000/events'
  31. • https://github.com/CiscoCloud/distributive • Single 4mb binary, no gem or pip

    installs • Checks defined in .json format • Integrates with Consul, Nagios & Sensu • Will verify every node’s configuration • Cluster tests itself, no external tools needed D I S T R I B U T I V E
  32. P U T T I N G I T A

    L L TO G E T H E R
  33. • Integrates Mesos + Consul • Easy deployment • Includes

    Logstash, collectd, Docker, Mesos, Marathon, Chronos (and more coming) • 1,300+ stars on github Apache 2.0 M I C RO S E RV I C E S - I N F R A S T RU C T U R E
  34. • Uses terraform to provision to the following cloud providers:

    • AWS • Google Cloud • OpenStack • Digital Ocean • vSphere M I C RO S E RV I C E S - I N F R A S T RU C T U R E
  35. • Docs: http://microservices- infrastructure.readthedocs.org/en/latest/ • Github Issues: https://github.com/CiscoCloud/ microservices-infrastructure/issues •

    Gitter.im chat room: https://gitter.im/CiscoCloud/ microservices-infrastructure • Bug reports and pull requests welcome! G E T T I N G S U P P O R T