Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Monitoring with Sensu at AppsFlyer

AppsFlyer
February 19, 2015

Monitoring with Sensu at AppsFlyer

AppsFlyer

February 19, 2015
Tweet

More Decks by AppsFlyer

Other Decks in Technology

Transcript

  1. What we are going to cover • What do we

    need to monitor at AppsFlyer? • Why we choose Sensu, which alternatives were considered? • Sensu architecture • Sensu configurations: server, clients, checks, handlers, mutators • Defining an alert, a simple demonstration • Our metrics flow, from collecting the metrics to getting alerts • Sensu API • Uchiwa dashboard • Mobile application • What Next?
  2. What do we need to monitor? • Hundreds of instances

    for liveliness • System metrics • Services liveliness • Services applicative metrics • Performance • Dozens of 3rd party software, ex: Kafka, Rabbit, Couchbase, Tokumx and many more • That our service flow works correctly • That our exposed services are available from outside
  3. Why Sensu? • Scale properly • Flexible to expand •

    Simple configuration • Natively passive • Standard check plugins • API • Interactive UI • Enable metric collection (Although we use collectd) • Secure message bus • Very fast implementation – ideal for growing startups
  4. Sensu Configuration • Server: Define ssl keys, port, password, redis,

    api, dashboard, handlers • Client: Define client info (name,address,etc), subscriptions, keepalive, key-values data • Checks: Define command, subscribers, interval, occurrences, handler, dependencies, flap detection, aggregation • Handlers: Define handler name, type, command • Mutators: Define tag, command • Uchiwa (dashboard): Define API access and dashboard credentials • RabbitMQ: Define ssl certs, rabbitmq conifiration (host,port,vhost,creds,password,etc)
  5. Client config generation • Based on naming scheme: couchbase-20017-014-prod.eu1.appsflyer.com Service/infrastructure

    # in cluster Cluster # Env Region Domain "client": { "name": "<%= node[:fqdn] %>", "address": "<%= node[:fqdn] %>", "subscriptions": [ "base", "<%= node[:fqdn].to_s.split('-')[0] %>" ] } } ➢ Chef Template configuration use service name as subscriber { "checks": { "couchbase": { "handlers": ["pagerduty"], "command": "/opt/consul/scripts/chk_couchbase.py", "interval": 60, "occurrences": 2, "subscribers": [ "default","couchbase" ] } ➢ So, at the server when we configure alert we set “couchbase” as subscriber
  6. Sensu API Sensu API enables to retrieve, update, delete: •

    Checks (EX: http://localhost:4567/checks) • Clients (EX: http://localhost:4567/clients) • Events (EX: http://localhost:4567/events) • Stashs (EX: http://localhost:4567/stashes) • Sensu dependencies health (Redis, Rabbitmq)
  7. Metrics Flow System + 3rd party metrics Application metrics: Collectd

    Metric Libs Graphite Graphite Rabbitmq Statsd proxy Collectd consumer Stastd agents
  8. So, what next? • Metrics Database (InfluxDB / OpenTSDB)? •

    UI for adding checks • Full integration with all systems (Auto Healing, Auto Scale, Deployments, etc) • More Checks, Dashboards, etc...