Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Lies, Damn Lies, and Metrics (Strange Loop 2016)

André Arko
September 17, 2016

Lies, Damn Lies, and Metrics (Strange Loop 2016)

Metrics are great, and measuring things can provide tremendously useful insights. But there's a problem: metrics lie to you. Metrics just report the numbers that were measured. Analyzing those numbers is up to us, and that analysis can go wrong in so, so many ways. Learn how to arm yourself against human intuition, interpreter pauses, routing, instrumentation lag, and other issues. Don't get so caught up in instrumenting that you lose sight of why metrics exist! Make sure your metrics are telling you actionable information, instead of just accurate numbers.

André Arko

September 17, 2016
Tweet

More Decks by André Arko

Other Decks in Technology

Transcript

  1. Lies, Damn Lies,
    and Metrics

    View Slide

  2. André Arko
    @indirect

    View Slide

  3. View Slide

  4. Bundler
    Managing application dependencies since 2009

    View Slide

  5. Metrics

    View Slide

  6. Metrics
    are important

    View Slide

  7. Metrics
    tell you what
    is happening

    View Slide


  8. you rn →

    View Slide

  9. Metrics
    convince you
    you understand

    View Slide


  10. you later →

    View Slide

  11. Averages
    convince you
    you understand

    View Slide

  12. Averages
    are lie-candy
    for your brain

    View Slide

  13. “Normal”
    5
    -5 -4 -3 -2 -1 0 1 2 3 4
    0
    0.1
    0.2
    0.3
    0.4

    View Slide

  14. “Normal”
    5
    -5 -4 -3 -2 -1 0 1 2 3 4
    0
    0.1
    0.2
    0.3
    0.4

    View Slide

  15. Real Life
    5
    -5 -4 -3 -2 -1 0 1 2 3 4
    0
    0.1
    0.2
    0.3
    0.4

    View Slide

  16. brendangregg.com

    View Slide

  17. brendangregg.com

    View Slide


  18. just heard
    “w
    e
    have
    a
    great average” →

    View Slide

  19. The problem with averages:
    If you put one hand in a bucket of ice
    and the other in a bucket of hot coals,
    on average, you’re comfortable.
    Erik Michaels-Ober
    @sferik

    View Slide

  20. Averages
    mask problems

    View Slide

  21. 10
    0 1 2 3 4 5 6 7 8 9
    250
    0
    50
    100
    150
    200

    View Slide

  22. Graph
    the median

    View Slide

  23. 10
    0 1 2 3 4 5 6 7 8 9
    250
    0
    50
    100
    150
    200

    View Slide

  24. Graph
    95th percentile

    View Slide

  25. 10
    0 1 2 3 4 5 6 7 8 9
    250
    0
    50
    100
    150
    200

    View Slide

  26. Graph
    99th percentile

    View Slide

  27. 10
    0 1 2 3 4 5 6 7 8 9
    1000
    0
    250
    500
    750

    View Slide

  28. Aggregate graphs
    another average

    View Slide

  29. View Slide

  30. Breakout graphs
    show each source

    View Slide

  31. View Slide

  32. Seriously, do it
    Visualize your data

    View Slide

  33. graphic by Schutz and Avenue, CC-Attribution-ShareAlike, taken from from the Wikipedia article on Anscombe's quartet

    View Slide

  34. Average of X: 9 Average of X: 9
    Average of X: 9 Average of X: 9

    View Slide

  35. Average of Y: 7.50 Average of Y: 7.50
    Average of Y: 7.50 Average of Y: 7.50

    View Slide

  36. Average of X
    Average of Y
    Variance of X
    Variance of Y
    Correlation of X and Y
    Linear regression
    All four data sets have the same

    View Slide

  37. Aggregate alerts
    more dead servers
    than alive servers

    View Slide


  38. site’s up if any
    servers are up!

    View Slide

  39. Breakout alerts
    first dead server
    not all the servers

    View Slide

  40. Servers

    View Slide

  41. Servers
    you have no idea
    what is going on

    View Slide

  42. really.

    View Slide

  43. Runtime lag

    View Slide

  44. Runtime lag
    how do you tell you
    lost consciousness?

    View Slide

  45. Runtime lag
    you have it.

    View Slide

  46. Runtime lag
    you have it.
    how bad is it?

    View Slide

  47. VM lag

    View Slide

  48. VM lag
    do you have it?

    View Slide

  49. VM lag
    do you check for it?

    View Slide

  50. VM lag
    do you know how
    to check for it?

    View Slide

  51. Routing

    View Slide

  52. Routing
    your app has this

    View Slide

  53. Routing
    how does it work?

    View Slide

  54. Development
    App
    You

    View Slide

  55. Production
    People Router
    Server
    App
    App
    Router
    Server
    App
    App
    Router

    View Slide

  56. Routing
    how slow is it?

    View Slide

  57. Routing
    does it back up?

    View Slide

  58. Request time

    View Slide

  59. Request time
    not the time
    you measure

    View Slide

  60. Request time
    wall-clock time
    from real clients

    View Slide

  61. Request time
    make requests from
    around the world

    View Slide

  62. metrics are good
    So, in the end

    View Slide

  63. know what you
    are measuring
    but

    View Slide

  64. @indirect
    [email protected]
    Questions?

    View Slide