Lies, Damn Lies, and Metrics (Distill 2014)

Lies, Damn Lies, and Metrics (Distill 2014)

Metrics are great, and measuring things can provide tremendously useful insights. But there's a problem: metrics lie to you. Metrics just report the numbers that were measured. Analyzing those numbers is up to us, and that analysis can go wrong in so, so many ways. Learn how to arm yourself against human intuition, interpreter pauses, routing, instrumentation lag, and other issues. Don't get so caught up in instrumenting that you lose sight of why metrics exist! Make sure your metrics are telling you actionable information, instead of just accurate numbers.

4c3ed917e59156a36212d48155831482?s=128

André Arko

August 08, 2014
Tweet

Transcript

  1. Lies, Damn Lies, and Metrics

  2. André Arko @indirect

  3. None
  4. Bundler

  5. Metrics

  6. Metrics are important

  7. Metrics tell you what is happening

  8. you rn →

  9. Metrics convince you you understand

  10. you later →

  11. Averages convince you you understand

  12. Averages are lie-candy for your brain

  13. “Normal” 5 -5 -4 -3 -2 -1 0 1 2

    3 4 0 0.1 0.2 0.3 0.4
  14. “Normal” 5 -5 -4 -3 -2 -1 0 1 2

    3 4 0 0.1 0.2 0.3 0.4
  15. Real Life 5 -5 -4 -3 -2 -1 0 1

    2 3 4 0 0.1 0.2 0.3 0.4
  16. brendangregg.com

  17. brendangregg.com

  18. just heard “w e have a great average” →

  19. Averages mask problems

  20. 10 0 1 2 3 4 5 6 7 8

    9 250 0 50 100 150 200
  21. Graph the median

  22. 10 0 1 2 3 4 5 6 7 8

    9 250 0 50 100 150 200
  23. Graph 95th percentile

  24. 10 0 1 2 3 4 5 6 7 8

    9 250 0 50 100 150 200
  25. Graph 99th percentile

  26. 10 0 1 2 3 4 5 6 7 8

    9 1000 0 250 500 750
  27. Aggregate graphs another average

  28. None
  29. Breakout graphs show each source

  30. None
  31. Aggregate alerts more dead servers than alive servers

  32. site’s up if any servers are up!

  33. Breakout alerts first dead server not all the servers

  34. Servers

  35. Servers you have no idea what is going on

  36. really.

  37. Runtime lag

  38. Runtime lag how do you tell you lost consciousness?

  39. Runtime lag you have it.

  40. Runtime lag you have it. how bad is it?

  41. VM lag

  42. VM lag do you have it?

  43. VM lag do you check for it?

  44. VM lag do you know how to check for it?

  45. Routing

  46. Routing your app has this

  47. Routing how does it work?

  48. Development App You

  49. Production People Router Server App App Router Server App App

    Router
  50. Routing how slow is it?

  51. Routing does it back up?

  52. Request time

  53. Request time not the time you measure

  54. Request time wall-clock time from real clients

  55. Request time make requests from around the world

  56. metrics are good So, in the end

  57. know what you are measuring So, in the end

  58. @indirect andre@arko.net Questions?