Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Lies, Damn Lies, and Metrics (Distill 2014)

Lies, Damn Lies, and Metrics (Distill 2014)

Metrics are great, and measuring things can provide tremendously useful insights. But there's a problem: metrics lie to you. Metrics just report the numbers that were measured. Analyzing those numbers is up to us, and that analysis can go wrong in so, so many ways. Learn how to arm yourself against human intuition, interpreter pauses, routing, instrumentation lag, and other issues. Don't get so caught up in instrumenting that you lose sight of why metrics exist! Make sure your metrics are telling you actionable information, instead of just accurate numbers.

André Arko

August 08, 2014
Tweet

More Decks by André Arko

Other Decks in Technology

Transcript

  1. Lies, Damn Lies,
    and Metrics

    View full-size slide

  2. André Arko
    @indirect

    View full-size slide

  3. Metrics
    are important

    View full-size slide

  4. Metrics
    tell you what
    is happening

    View full-size slide

  5. Metrics
    convince you
    you understand

    View full-size slide


  6. you later →

    View full-size slide

  7. Averages
    convince you
    you understand

    View full-size slide

  8. Averages
    are lie-candy
    for your brain

    View full-size slide

  9. “Normal”
    5
    -5 -4 -3 -2 -1 0 1 2 3 4
    0
    0.1
    0.2
    0.3
    0.4

    View full-size slide

  10. “Normal”
    5
    -5 -4 -3 -2 -1 0 1 2 3 4
    0
    0.1
    0.2
    0.3
    0.4

    View full-size slide

  11. Real Life
    5
    -5 -4 -3 -2 -1 0 1 2 3 4
    0
    0.1
    0.2
    0.3
    0.4

    View full-size slide

  12. brendangregg.com

    View full-size slide

  13. brendangregg.com

    View full-size slide


  14. just heard
    “w
    e
    have
    a
    great average” →

    View full-size slide

  15. Averages
    mask problems

    View full-size slide

  16. 10
    0 1 2 3 4 5 6 7 8 9
    250
    0
    50
    100
    150
    200

    View full-size slide

  17. Graph
    the median

    View full-size slide

  18. 10
    0 1 2 3 4 5 6 7 8 9
    250
    0
    50
    100
    150
    200

    View full-size slide

  19. Graph
    95th percentile

    View full-size slide

  20. 10
    0 1 2 3 4 5 6 7 8 9
    250
    0
    50
    100
    150
    200

    View full-size slide

  21. Graph
    99th percentile

    View full-size slide

  22. 10
    0 1 2 3 4 5 6 7 8 9
    1000
    0
    250
    500
    750

    View full-size slide

  23. Aggregate graphs
    another average

    View full-size slide

  24. Breakout graphs
    show each source

    View full-size slide

  25. Aggregate alerts
    more dead servers
    than alive servers

    View full-size slide


  26. site’s up if any
    servers are up!

    View full-size slide

  27. Breakout alerts
    first dead server
    not all the servers

    View full-size slide

  28. Servers
    you have no idea
    what is going on

    View full-size slide

  29. Runtime lag
    how do you tell you
    lost consciousness?

    View full-size slide

  30. Runtime lag
    you have it.

    View full-size slide

  31. Runtime lag
    you have it.
    how bad is it?

    View full-size slide

  32. VM lag
    do you have it?

    View full-size slide

  33. VM lag
    do you check for it?

    View full-size slide

  34. VM lag
    do you know how
    to check for it?

    View full-size slide

  35. Routing
    your app has this

    View full-size slide

  36. Routing
    how does it work?

    View full-size slide

  37. Development
    App
    You

    View full-size slide

  38. Production
    People Router
    Server
    App
    App
    Router
    Server
    App
    App
    Router

    View full-size slide

  39. Routing
    how slow is it?

    View full-size slide

  40. Routing
    does it back up?

    View full-size slide

  41. Request time

    View full-size slide

  42. Request time
    not the time
    you measure

    View full-size slide

  43. Request time
    wall-clock time
    from real clients

    View full-size slide

  44. Request time
    make requests from
    around the world

    View full-size slide

  45. metrics are good
    So, in the end

    View full-size slide

  46. know what you
    are measuring
    So, in the end

    View full-size slide