André Arko
August 08, 2014
160

# Lies, Damn Lies, and Metrics (Distill 2014)

Metrics are great, and measuring things can provide tremendously useful insights. But there's a problem: metrics lie to you. Metrics just report the numbers that were measured. Analyzing those numbers is up to us, and that analysis can go wrong in so, so many ways. Learn how to arm yourself against human intuition, interpreter pauses, routing, instrumentation lag, and other issues. Don't get so caught up in instrumenting that you lose sight of why metrics exist! Make sure your metrics are telling you actionable information, instead of just accurate numbers.

## Transcript

1. Lies, Damn Lies,
and Metrics

2. André Arko
@indirect

3. Bundler

4. Metrics

5. Metrics
are important

6. Metrics
tell you what
is happening

7. you rn →

8. Metrics
convince you
you understand

9. you later →

10. Averages
convince you
you understand

11. Averages
are lie-candy

12. “Normal”
5
-5 -4 -3 -2 -1 0 1 2 3 4
0
0.1
0.2
0.3
0.4

13. “Normal”
5
-5 -4 -3 -2 -1 0 1 2 3 4
0
0.1
0.2
0.3
0.4

14. Real Life
5
-5 -4 -3 -2 -1 0 1 2 3 4
0
0.1
0.2
0.3
0.4

15. brendangregg.com

16. brendangregg.com

17. just heard
“w
e
have
a
great average” →

18. Averages

19. 10
0 1 2 3 4 5 6 7 8 9
250
0
50
100
150
200

20. Graph
the median

21. 10
0 1 2 3 4 5 6 7 8 9
250
0
50
100
150
200

22. Graph
95th percentile

23. 10
0 1 2 3 4 5 6 7 8 9
250
0
50
100
150
200

24. Graph
99th percentile

25. 10
0 1 2 3 4 5 6 7 8 9
1000
0
250
500
750

26. Aggregate graphs
another average

27. Breakout graphs
show each source

than alive servers

29. site’s up if any
servers are up!

not all the servers

31. Servers

32. Servers
you have no idea
what is going on

33. really.

34. Runtime lag

35. Runtime lag
how do you tell you
lost consciousness?

36. Runtime lag
you have it.

37. Runtime lag
you have it.

38. VM lag

39. VM lag
do you have it?

40. VM lag
do you check for it?

41. VM lag
do you know how
to check for it?

42. Routing

43. Routing

44. Routing
how does it work?

45. Development
App
You

46. Production
People Router
Server
App
App
Router
Server
App
App
Router

47. Routing
how slow is it?

48. Routing
does it back up?

49. Request time

50. Request time
not the time
you measure

51. Request time
wall-clock time
from real clients

52. Request time
make requests from
around the world

53. metrics are good
So, in the end

54. know what you
are measuring
So, in the end

55. @indirect
[email protected]
Questions?