Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Lies, Damn Lies, and Metrics (Strange Loop 2016)

André Arko
September 17, 2016

Lies, Damn Lies, and Metrics (Strange Loop 2016)

Metrics are great, and measuring things can provide tremendously useful insights. But there's a problem: metrics lie to you. Metrics just report the numbers that were measured. Analyzing those numbers is up to us, and that analysis can go wrong in so, so many ways. Learn how to arm yourself against human intuition, interpreter pauses, routing, instrumentation lag, and other issues. Don't get so caught up in instrumenting that you lose sight of why metrics exist! Make sure your metrics are telling you actionable information, instead of just accurate numbers.

André Arko

September 17, 2016
Tweet

More Decks by André Arko

Other Decks in Technology

Transcript

  1. Lies, Damn Lies,
    and Metrics

    View full-size slide

  2. André Arko
    @indirect

    View full-size slide

  3. Bundler
    Managing application dependencies since 2009

    View full-size slide

  4. Metrics
    are important

    View full-size slide

  5. Metrics
    tell you what
    is happening

    View full-size slide

  6. Metrics
    convince you
    you understand

    View full-size slide


  7. you later →

    View full-size slide

  8. Averages
    convince you
    you understand

    View full-size slide

  9. Averages
    are lie-candy
    for your brain

    View full-size slide

  10. “Normal”
    5
    -5 -4 -3 -2 -1 0 1 2 3 4
    0
    0.1
    0.2
    0.3
    0.4

    View full-size slide

  11. “Normal”
    5
    -5 -4 -3 -2 -1 0 1 2 3 4
    0
    0.1
    0.2
    0.3
    0.4

    View full-size slide

  12. Real Life
    5
    -5 -4 -3 -2 -1 0 1 2 3 4
    0
    0.1
    0.2
    0.3
    0.4

    View full-size slide

  13. brendangregg.com

    View full-size slide

  14. brendangregg.com

    View full-size slide


  15. just heard
    “w
    e
    have
    a
    great average” →

    View full-size slide

  16. The problem with averages:
    If you put one hand in a bucket of ice
    and the other in a bucket of hot coals,
    on average, you’re comfortable.
    Erik Michaels-Ober
    @sferik

    View full-size slide

  17. Averages
    mask problems

    View full-size slide

  18. 10
    0 1 2 3 4 5 6 7 8 9
    250
    0
    50
    100
    150
    200

    View full-size slide

  19. Graph
    the median

    View full-size slide

  20. 10
    0 1 2 3 4 5 6 7 8 9
    250
    0
    50
    100
    150
    200

    View full-size slide

  21. Graph
    95th percentile

    View full-size slide

  22. 10
    0 1 2 3 4 5 6 7 8 9
    250
    0
    50
    100
    150
    200

    View full-size slide

  23. Graph
    99th percentile

    View full-size slide

  24. 10
    0 1 2 3 4 5 6 7 8 9
    1000
    0
    250
    500
    750

    View full-size slide

  25. Aggregate graphs
    another average

    View full-size slide

  26. Breakout graphs
    show each source

    View full-size slide

  27. Seriously, do it
    Visualize your data

    View full-size slide

  28. graphic by Schutz and Avenue, CC-Attribution-ShareAlike, taken from from the Wikipedia article on Anscombe's quartet

    View full-size slide

  29. Average of X: 9 Average of X: 9
    Average of X: 9 Average of X: 9

    View full-size slide

  30. Average of Y: 7.50 Average of Y: 7.50
    Average of Y: 7.50 Average of Y: 7.50

    View full-size slide

  31. Average of X
    Average of Y
    Variance of X
    Variance of Y
    Correlation of X and Y
    Linear regression
    All four data sets have the same

    View full-size slide

  32. Aggregate alerts
    more dead servers
    than alive servers

    View full-size slide


  33. site’s up if any
    servers are up!

    View full-size slide

  34. Breakout alerts
    first dead server
    not all the servers

    View full-size slide

  35. Servers
    you have no idea
    what is going on

    View full-size slide

  36. Runtime lag
    how do you tell you
    lost consciousness?

    View full-size slide

  37. Runtime lag
    you have it.

    View full-size slide

  38. Runtime lag
    you have it.
    how bad is it?

    View full-size slide

  39. VM lag
    do you have it?

    View full-size slide

  40. VM lag
    do you check for it?

    View full-size slide

  41. VM lag
    do you know how
    to check for it?

    View full-size slide

  42. Routing
    your app has this

    View full-size slide

  43. Routing
    how does it work?

    View full-size slide

  44. Development
    App
    You

    View full-size slide

  45. Production
    People Router
    Server
    App
    App
    Router
    Server
    App
    App
    Router

    View full-size slide

  46. Routing
    how slow is it?

    View full-size slide

  47. Routing
    does it back up?

    View full-size slide

  48. Request time

    View full-size slide

  49. Request time
    not the time
    you measure

    View full-size slide

  50. Request time
    wall-clock time
    from real clients

    View full-size slide

  51. Request time
    make requests from
    around the world

    View full-size slide

  52. metrics are good
    So, in the end

    View full-size slide

  53. know what you
    are measuring
    but

    View full-size slide