Lies, Damn Lies, and Metrics (Strange Loop 2016)

4c3ed917e59156a36212d48155831482?s=47 André Arko
September 17, 2016

Lies, Damn Lies, and Metrics (Strange Loop 2016)

Metrics are great, and measuring things can provide tremendously useful insights. But there's a problem: metrics lie to you. Metrics just report the numbers that were measured. Analyzing those numbers is up to us, and that analysis can go wrong in so, so many ways. Learn how to arm yourself against human intuition, interpreter pauses, routing, instrumentation lag, and other issues. Don't get so caught up in instrumenting that you lose sight of why metrics exist! Make sure your metrics are telling you actionable information, instead of just accurate numbers.

4c3ed917e59156a36212d48155831482?s=128

André Arko

September 17, 2016
Tweet

Transcript

  1. Lies, Damn Lies, and Metrics

  2. André Arko @indirect

  3. None
  4. Bundler Managing application dependencies since 2009

  5. Metrics

  6. Metrics are important

  7. Metrics tell you what is happening

  8. you rn →

  9. Metrics convince you you understand

  10. you later →

  11. Averages convince you you understand

  12. Averages are lie-candy for your brain

  13. “Normal” 5 -5 -4 -3 -2 -1 0 1 2

    3 4 0 0.1 0.2 0.3 0.4
  14. “Normal” 5 -5 -4 -3 -2 -1 0 1 2

    3 4 0 0.1 0.2 0.3 0.4
  15. Real Life 5 -5 -4 -3 -2 -1 0 1

    2 3 4 0 0.1 0.2 0.3 0.4
  16. brendangregg.com

  17. brendangregg.com

  18. just heard “w e have a great average” →

  19. The problem with averages: If you put one hand in

    a bucket of ice and the other in a bucket of hot coals, on average, you’re comfortable. Erik Michaels-Ober @sferik
  20. Averages mask problems

  21. 10 0 1 2 3 4 5 6 7 8

    9 250 0 50 100 150 200
  22. Graph the median

  23. 10 0 1 2 3 4 5 6 7 8

    9 250 0 50 100 150 200
  24. Graph 95th percentile

  25. 10 0 1 2 3 4 5 6 7 8

    9 250 0 50 100 150 200
  26. Graph 99th percentile

  27. 10 0 1 2 3 4 5 6 7 8

    9 1000 0 250 500 750
  28. Aggregate graphs another average

  29. None
  30. Breakout graphs show each source

  31. None
  32. Seriously, do it Visualize your data

  33. graphic by Schutz and Avenue, CC-Attribution-ShareAlike, taken from from the

    Wikipedia article on Anscombe's quartet
  34. Average of X: 9 Average of X: 9 Average of

    X: 9 Average of X: 9
  35. Average of Y: 7.50 Average of Y: 7.50 Average of

    Y: 7.50 Average of Y: 7.50
  36. Average of X Average of Y Variance of X Variance

    of Y Correlation of X and Y Linear regression All four data sets have the same
  37. Aggregate alerts more dead servers than alive servers

  38. site’s up if any servers are up!

  39. Breakout alerts first dead server not all the servers

  40. Servers

  41. Servers you have no idea what is going on

  42. really.

  43. Runtime lag

  44. Runtime lag how do you tell you lost consciousness?

  45. Runtime lag you have it.

  46. Runtime lag you have it. how bad is it?

  47. VM lag

  48. VM lag do you have it?

  49. VM lag do you check for it?

  50. VM lag do you know how to check for it?

  51. Routing

  52. Routing your app has this

  53. Routing how does it work?

  54. Development App You

  55. Production People Router Server App App Router Server App App

    Router
  56. Routing how slow is it?

  57. Routing does it back up?

  58. Request time

  59. Request time not the time you measure

  60. Request time wall-clock time from real clients

  61. Request time make requests from around the world

  62. metrics are good So, in the end

  63. know what you are measuring but

  64. @indirect andre@arko.net Questions?