Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Monitoring

98f9dfc2e5e1318ac78b8c716582cd30?s=47 portertech
September 05, 2012

 Monitoring

A Polyglot Vancouver presentation on application & infrastructure monitoring. Reducing MTTD/R, instrumentation, log streams, dependency chains/relationships, tools, and a demo.

I have uploaded the source for others to use, just be sure to give me a mention :) http://portertech.s3.amazonaws.com/monitoring_slides.tar.gz

98f9dfc2e5e1318ac78b8c716582cd30?s=128

portertech

September 05, 2012
Tweet

Transcript

  1. MONITORING Applications & Infrastructure

  2. Sean Porter @PorterTech

  3. FOCUS • MTTD - Mean Time To Detect • MTTR

    - Mean Time To Repair
  4. GOAL • MTTD - Mean Time To Detect • MTTR

    - Mean Time To Repair REDUCE!
  5. Let’s start with an application.

  6. API GET /ping POST /contacts GET /contacts/:id PUT /contacts/:id DELETE

    /contacts/:id
  7. None
  8. “You can't manage what you haven't measured”

  9. Gather the data that we as developers & operators care

    about.
  10. EMIT & EXPOSE Instrumentation

  11. EMIT & EXPOSE Instrumentation Log

  12. EMIT & EXPOSE Instrumentation Log Storage

  13. EMIT & EXPOSE Instrumentation Log Storage GET /stats

  14. EMIT & EXPOSE Instrumentation Log Storage GET /stats Process Title

    ...
  15. (fn [request] (let [start (System/currentTimeMillis) response (handler request) finish (System/currentTimeMillis)

    time (- finish start)] ...
  16. A few great libraries you should read. Metrics (JAVA) codahale/metrics

    Metriks (Ruby) eric/metriks Folsom (Erlang) boundary/folsom
  17. Let’s talk about Logs...

  18. LOGS • Already being produced. • A log is a

    stream of events. • Full of performance & usage indicators.
  19. LOGS METRICS! • Already being produced. • A log is

    a stream of events. • Full of performance & usage indicators.
  20. “request :get /ping 200 (2ms)” { “request_method”: “get”, “request_uri”: “/ping”,

    “response_status”: 200, “response_time”: 2 } OR
  21. Parsing logs requires effort, let’s send metrics elsewhere.

  22. STORAGE sock = TCPSocket.new(host, port) sock.puts “name value #{Time.now.to_i}” sock.close

  23. Let’s get back to the application.

  24. None
  25. There is more to it ...

  26. HAProxy There is a load balancer. One or more instances

    of the application.
  27. HAProxy There is a load balancer. One or more instances

    of the application. MEMORY CPU DISK NETWORK
  28. Know your application dependencies and understand their relationships.

  29. Monitor all the way down to the resources they consume.

  30. HAProxy MEMORY CPU DISK NETWORK /ping HAProxy

  31. How?

  32. None
  33. Think Unix toolchain.

  34. SENSU “simple, malleable, and scalable” Nagios replacement.

  35. SENSU • JSON configuration. • Uses the Nagios check spec.

    • Clients self-register. • Easy to scale out. sensu/sensu
  36. LOGSTASH “collect logs, parse them, and store them for later

    use”
  37. LOGSTASH INPUTS File Syslog AMQP 0MQ ... FILTERS Grep Grok

    Multiline Mutate ... OUTPUTS ES Graphite AMQP Nagios ... logstash/logstash
  38. GRAPHITE “scalable realtime graphing” name value timestamp

  39. • drawAsInfinite() • highestCurrent() • mostDeviant() • hitcount() • threshold()

    GRAPHITE Many powerful functions() to analyze data. • derivative() • summarize() • sumSeries() • movingAverage() • holtWintersForecast()
  40. DEMO!

  41. Final words.

  42. Sean Porter @PorterTech THANK YOU