Effective Monitoring with statsD

Effective Monitoring with statsD

Presented at DevOpsDays Tokyo 2013

6bcba0c09e7fdeed29218918248fec2f?s=128

Alexis Lê-Quôc

September 28, 2013
Tweet

Transcript

  1. Effective Monitoring with

  2. @alq CTO at Datadog

  3. An application through the naked eye

  4. An application through a monitoring tool

  5. OODA Loop (simplified) Observe Orient Decide Act

  6. OODA Loop (simplified) Observe Orient Decide Act

  7. OODA Loop (simplified) Observe Orient Decide Act M onitorin g

    Tool
  8. OODA Loop (simplified) Observe Orient Decide Act M onitorin g

    Tool You
  9. OODA Loop (simplified) Observe Orient Decide Act M onitorin g

    Tool You You
  10. OODA Loop (simplified) Observe Orient Decide Act M onitorin g

    Tool You You You
  11. Observations need to be... 1.Timely 2.Correct 3.Comprehensive

  12. Observations need to be... 1.Timely 2.Correct 3.Comprehensive

  13. Observations need to be... 1.Timely 2.Correct 3.Comprehensive Else

  14. Observations need to be... 1.Timely 2.Correct 3.Comprehensive Garbage In, Garbage

    Out Else
  15. Timely Initial set of metrics Initial assumptions Revised set of

    metrics Contact with reality Revised assumptions
  16. Timely Initial set of metrics Initial assumptions Revised set of

    metrics Contact with reality Revised assumptions M inutes N ot weeks
  17. Comprehensive Work Resources Value Resources Resources Resources Resources

  18. Comprehensive Work Resources Value Resources Resources Resources Resources Easy to

    collect generic but not actionable
  19. Comprehensive Work Resources Value Resources Resources Resources Resources Easy to

    collect generic but not actionable Harder to collect, custom but most actionable
  20. statsD Easy

  21. statsD Easy Timely

  22. statsD Easy Timely Comprehensive

  23. How statsD works Client libraries talk to a simple UDP

    server... pageviews:100| c@0.25 latency:320|ms backlog:333|g uniques:765|s ...using a simple text protocol
  24. statsD types Type Definition Example Gauges Absolute values Queue size

    Counters Per-second rates Page views Histograms Gauge summary Page Latency Timers Gauge distribution Page Latency Sets Counters of unique things Unique visitors
  25. statsD problems Type Definition Problem Gauges Absolute values Latest value

    wins. Gauge deltas??? Counters Per-second rates Rates, not counts (! = rrdtool) Histograms Gauge summary Assumes normal distribution Timers Gauge distribution Can measure much more than time Sets Counters of unique things :-)
  26. #1 pitfall: “Counters” http://dtdg.co/tokyo-counters

  27. How we use statsD http://dtdg.co/tokyo-dog

  28. Essential: Tagging http://dtdg.co/tokyo-tags

  29. How to get started • statsD https://github.com/etsy/statsd • client libraries

    https://github.com/etsy/statsd/wiki (my company) 1-stop shop http://www.datadoghq.com
  30. ͋Γ͕ͱ͏͍͟͝·ͨ͠ɻ ࣭໰ʁ@alq Thank you very much! Questionsʁ @alq