When Mesos met Consul

Given to Mesos NYC June 17th, 2015

Combining Mesos and Consul to build out a powerful next-generation platform.

Steven Borrelli

June 17, 2015

  7. M E S O S C H A L L

    E N G E S • Deployment • Framework Development • Security & Management • Monitoring • Service Discovery
  8. RU N N I N G C O N S

    U L • Single Binary (golang) • Run on every system • 1- 7 Servers per datacenter, rest of systems are clients • Config via .json files or cli parameters • Optional Web UI
  9. D O N ’ T RU N C O N

    S U L I N D O C K E R • ARP cache issues with Docker networking, need to install conntrack to flush. • PITA to mount volumes and open network ports • Health checks become more complex • Network latency seems to cause instability
  10. Clients: Failure Detection
 Health Checks
 Respond to local requests Servers:

    Leader Election
 Forward Request
 Replicate Data Consensus is achieved via gossip protocol (nodes) or raft (server data)
  11. C A P Consistency Availability Partition Tolerance Gossip Paxos/ Raft

    Consul Agent Cassandra Zookeeper Consul K/V etcd
  12. C O N S U L C O N S

    I S T E N C Y • Servers use raft for consistency (CP) • Loss of server quorum will cause availability failure • Run a small (odd) number of servers per DC • Agents use LAN gossip for node failure detection • WAN gossip is used across DCs, higher latency
  13. C O N S U L A P I C

    O N S I S T E N C Y M O D E S • default: server can serve requests during election. Possible stale values. • consistent: leader must be elected • stale: any server can respond, even non-leaders.
  14. R E G I S T E R A S

    E RV I C E { "service": { "name": "marathon", "tags": [ "admin" ], "port": 8080, "check": { "script": "curl --silent --show-error --fail --dump-header /dev/stderr --r etry 2", "interval": "10s" } } } marathon.json Create a file called: Optional Health Check DNS Name HTTP API also supported
  15. D N S R E G I S T R

    AT I O N # consul reload # dig marathon.service.consul +short If a health check fails, entry will not show in DNS.
  16. S E RV I C E TAG S # dig

    admin.marathon.service.consul +short Tags are supported in DNS
  17. D N S S RV R E C O R

    D S # dig zookeeper.service.consul SRV +short 1 1 2181 mi-control-01.node.dc1.consul. 1 1 2181 mi-control-03.node.dc1.consul. 1 1 2181 mi-control-02.node.dc1.consul. Get the port for any service: Nodes are automatically registered in DNS. You can even query services and nodes in other DCs!
  18. S I M P L I F Y M E

    S O S C O N F I G U R AT I O N zk://zookeeper.service.consul:2181/mesos Zookeeper config string: http://marathon.service.consul:8080 Marathon config string: Mesos config string (we’ll discuss leader later): mesos://leader.mesos.service.consul:5050
  19. B O N U S ! H E A LT

    H C H E C KS YO U R M E S O S C LU S T E R
  20. H E A LT H C H E C KS

    A R E RU N BY T H E N O D E S , E X P O S E S TAT E V I A A P I [ { "Node": { "Node": "mi-control-01", "Address": "" }, "Service": { "ID": "chronos", "Service": "chronos", "Tags": [ "chronos" ], "Address": "", "Port": 14400 }, "Checks": [ { "Node": "mi-control-01", "CheckID": "service:chronos", "Name": "Service 'chronos' check", "Status": "critical", "Notes": "", "Output": "", "ServiceID": "chronos", "ServiceName": "chronos" }, curl -L http://localhost:8500/v1/health/service/chronos?pretty=true
  21. H E A LT H C H E C K

    E X I T C O D E S Exit  code  0  -­‐  Check  is  passing   Exit  code  1  -­‐  Check  is  warning       Any  other  code  -­‐  Check  is  critical Consul Checks are compatible with Nagios/Sensu:
  22. C O N S U L K / V E

    X P O S E D V I A A P I curl -X PUT -d 'test' http://localhost:8500/v1/kv/web/key1 curl http://localhost:8500/v1/kv/?recurse [{"CreateIndex":97,"ModifyIndex":97,"Key":"web/key1","Flags": 0,"Value":"dGVzdA=="}, Or use consulkv read --ssl nodes/config/test Hello World consulkv delete --ssl --consul=consul.service.consul:8500 --recurse nodes/config/test
  23. • Only use in 0.5.2 or higher (upsert support) •

    Master tokens are used to create ACL entries • Every ACL entry has a token • read/write/deny policy on k/v and service endpoints • Can manage with API or C O N S U L AC L S
  24. • Reads data from Consul k/v and service catalog •

    Writes out text files based on go text/template • Can be used to dynamically configure systems and applications C O N S U L T E M P L AT E {{range service "web@datacenter"}}
 server {{.Name}} {{.Address}}:{{.Port}}
 server nyc_web_01 123.456.789.10:8080
 server nyc_web_02 456.789.101.213:8080 Becomes
  25. • Update zoo.cfg as ZK nodes come up/down • Writes

    out text files based on go text/template • Restarts Zookeeper nodes • https://github.com/CiscoCloud/docker-zookeeper
 DY N A M I C Z O O K E E P E R E N S E M B L E {{{ with $s := env "CONSUL_QUERY" }}
 { range service $s "passing, warning" }}
 ZK_HOSTS[{{.ID | regexReplaceAll ".*:zkid-([0-9]*)" "$1"}}]={{.Address}}
 ZK_CLIENT_PORTS[{{.ID | regexReplaceAll ".*:zkid-([0-9]*)" “$1"}}]=2181 ZK_PEER_PORTS[{{.ID | regexReplaceAll ".*:zkid-([0-9]*)" "$1"}}]=2888
 ZK_ELECTION_PORTS[{{.ID | regexReplaceAll ".*:zkid-([0-9]*)" “$1"}}]=3888 {{end}}{{end}}
  26. • Dynamically adds Mesos tasks to Consul • Located at

    https://github.com/CiscoCloud/mesos- consul • Easy to run as Docker container via Marathon
 • Mesos task <taskname> shows up as:
 M E S O S - C O N S U L curl -X POST [email protected] -H "Content-Type: application/json" http://marathon.service.consul:8080/v2/apps' taskname.service.consul
  27. • Leader detection built-in. Use: • Mesos doesn’t have an

    event bus. Mesos-consul needs to poll every few seconds. • Mesos (0.22.1 and earlier) doesn’t export Docker port mapping information, so all ports are registered to the same DNS name. M E S O S - C O N S U L leader.mesos.service.consul
  28. • Dynamically adds Marathon tasks to Consul K/V. Can be

    used to build proxy configurations • Located at https://github.com/CiscoCloud/ marathon-­‐consul • Easy to run as Docker container via Marathon • Listens to Marathon event bus: M A R AT H O N - C O N S U L curl -X POST 'http://marathon.service.consul:8080/v2/eventSubscriptions? callbackUrl=http://marathon-consul.service.consul:4000/events'
  29. L E T ’ S B U I L D

    T H E F U T U R E
  30. • Runs diverse workloads, from containers to big data •

    Resistant to failure • Can be deployed anywhere rapidly • Easy to configure and manage • Batteries included: service discovery, logging, security, etc. T H E N E X T P L AT F O R M
  31. • Integrates Mesos + Consul • Easy deployment • Includes

    Logstash, collectd, Docker, Mesos, Marathon, Chronos (and more coming) • 1,200+ stars on github (#6 trending, 500 stars this week) • Apache 2.0 M I C RO S E RV I C E S - I N F R A S T RU C T U R E
  32. • Configures HA & Security • 0.3.1 released today: •

    Digital Ocean Support! • VMWare vSphere Support! • Chronos! • Distributive added! M I C RO S E RV I C E S - I N F R A S T RU C T U R E
  33. • Uses terraform to provision to the following cloud providers:

    • AWS • Google Cloud • OpenStack • Digital Ocean • vSphere M I C RO S E RV I C E S - I N F R A S T RU C T U R E
  34. • Our new framework for distributed health checks • Single

    4mb binary, no gem or pip installs • Checks defined in .json format • Integrates with Consul, Nagios & Sensu • Will verify every node’s configuration • Cluster tests itself, no external tools needed N E W F E AT U R E : D I S T R I B U T I V E
  35. • IP per Mesos task • DCOS Support • Easy

    deployment of Mesos frameworks • Kubernetes Support • Vault Support • Dynamic Configuration ROA D M A P
  36. • 0.3: June 9th • 0.3.1: June 17th • 0.3.2:

    June 25th • 0.4.0: July 17th • 0.5.0 (Mesoscon) R E L E A S E S C H E D U L E
  37. • Docs: http://microservices- infrastructure.readthedocs.org/en/latest/ • Github Issues: https://github.com/CiscoCloud/ microservices-infrastructure/issues •

    Gitter.im chat room: https://gitter.im/CiscoCloud/ microservices-infrastructure • Bug reports and pull requests welcome! G E T T I N G S U P P O R T