Aaron Quint
March 28, 2013
400

# Correlation: The Next Frontier

My talk from #monitorama 2013 with ideas about how to apply different types of correlation to our data

March 28, 2013

## Transcript

1. CORRELATION:
THE NEXT FRONTIER
monitorama 2013 / boston / @aq

2. CTO of

3. Chief Taco Oﬃcer
CTO of

4. A litte bit of @aq

5. A litte bit of @aq
• Expert Eater

6. A litte bit of @aq
• Expert Eater
• Experienced Ruby and JS Developer

7. A litte bit of @aq
• Expert Eater
• Experienced Ruby and JS Developer
• Growing Student of Operations

8. A litte bit of @aq
• Expert Eater
• Experienced Ruby and JS Developer
• Growing Student of Operations
• Beginner Distributed Systems
Maintainer

9. Give each other dap.
WE did it

10. OPS DONE.
WE have DATA Y’ALL

11. Uhhh
SO NO MORE TEARS,
right?

12. THE LEVELS OF
MONITORING
NIRVANA

13. PURE DATA

14. PURE DATA
BASIC INFERENCES
AND CORRELATIONS

15. PURE DATA
BASIC INFERENCES
AND CORRELATIONS
THE
FUCKING
MATRIX

16. PURE DATA
BASIC INFERENCES
AND CORRELATIONS
THE
FUCKING
MATRIX

17. PURE DATA
BASIC INFERENCES
AND CORRELATIONS
THE
FUCKING
MATRIX
whoa

18. PURE DATA

19. PURE DATA
BASIC INFERENCES
AND CORRELATIONS

20. PURE DATA
BASIC INFERENCES
AND CORRELATIONS
PREDICTIVE
AND DIRECT
RELATIONSHIPS

21. Aligning the data.
CORRELATION

22. Except when it does.
CORRELATION DOES NOT
IMPLY CAUSATION

23. The marshmallow test

24. And let us get back to Shaving Yaks.
correlation can
narrow our work

25. Step back, HE’S DOING MATH!
MATHEMATICAL
CORRELATION

26. Say that 5 times fast.
PEARSON Product
moment correlation
coefﬁcient

27. 0
1.5
3
4.5
6
0 225 450 675 900
CPU vs Response Time

28. 0
1.5
3
4.5
6
0 225 450 675 900
CPU vs Response Time

29. 1 data = [
2 [100, 0.7],
3 [125, 0.5],
4 [150, 1],
5 [300, 2.1],
6 [500, 3.4],
7 [900, 6]
8 ]
9
10 x, y = data.transpose
11 n = data.size
12 x_mean = x.reduce(:+) / n
13 y_mean = y.reduce(:+) / n
14 x_stddev = Math.sqrt(x.inject {|sum, i| sum + (i - x_mean)**2 } / (n - 1).to_f)
15 y_stddev = Math.sqrt(y.inject {|sum, i| sum + (i - y_mean)**2 } / (n - 1).to_f)
16 z_x = x.collect {|i| (i - x_mean) / x_stddev }
17 z_y = y.collect {|i| (i - y_mean) / y_stddev }
18 pearsons = z_x.zip(z_y).collect {|x| x[0] * x[1] }.reduce(:+) / n
19 # => 0.9265763490538744

30. PEarson

31. PEarson
• Close to absolute 1 = probably
correlated samples

32. PEarson
• Close to absolute 1 = probably
correlated samples
• Could be applied to moving
averages?

33. PEarson
• Close to absolute 1 = probably
correlated samples
• Could be applied to moving
averages?
• Could we pull it into a graphite
function? (Hackathon anyone?)

34. LIMITS OF Mathematical correlation

35. LIMITS OF Mathematical correlation
• Requires known inputs and
assumptions

36. LIMITS OF Mathematical correlation
• Requires known inputs and
assumptions
• Suggestion of correlation, not proof

37. LIMITS OF Mathematical correlation
• Requires known inputs and
assumptions
• Suggestion of correlation, not proof
• Needs a large amount of knowledge
of the data set to make decisions

38. I can see it!
TIME BASED VISUAL
CORRELATION

39. A Graphite
Story

40. LIMITS OF VISUAL CORRELATION

41. LIMITS OF VISUAL CORRELATION
• Takes a good eye

42. LIMITS OF VISUAL CORRELATION
• Takes a good eye
• Hard to see the signal through the
noise

43. LIMITS OF VISUAL CORRELATION
• Takes a good eye
• Hard to see the signal through the
noise
• Doesn’t really account for domino
events

44. LIMITS OF VISUAL CORRELATION
• Takes a good eye
• Hard to see the signal through the
noise
• Doesn’t really account for domino
events
• Good for trends but not as much for
events

45. There’s a disturbance in the force.
EMOTIONAL
CORRELATION

46. RASHoMONING

47. Each person uses their unique knowledge
of the situation to point out unique data
points.
RASHoMONING

48. LIMITS OF EMOTIONAL correlation

49. LIMITS OF EMOTIONAL correlation
• Provides a trail not an answer

50. LIMITS OF EMOTIONAL correlation
• Provides a trail not an answer
• Depends on having a team of
people

51. LIMITS OF EMOTIONAL correlation
• Provides a trail not an answer
• Depends on having a team of
people
• Many ideas, needs a “judge”

52. LIMITS OF EMOTIONAL correlation
• Provides a trail not an answer
• Depends on having a team of
people
• Many ideas, needs a “judge”
• HUMANS (Hence Rashomoning)

53. I’m sold, show me how!

54. But I have some ideas.
I don’t know exactly

55. TRYING TO MAKE THE
DATA MORE VISIBLE

56. A conﬂagration of data
HOTPOT

57. Hotpot = Chef, Sensu, Graphite, (Logstash)
Simply align disparate
sources of data TO
VISUALLY CORRELATE

58. create relationships
notiﬁcations

59. Math to ﬁlter out noise.
USe PEARSONS to pull
out potentially
related data

60. Have the ability to easily divide datasets by
“cohorts”
cohort analysis for
processes/nodes

61. “Node notes”. Document everything.
Treat personal and
institutional
knowledge as data

62. By making more data available to everyone.
Make emotional
correlation less EMO

63. So if that’s just level 2,
what’s level 3?

64. TAKE the correlations
and let the machine
turn them into
decisions

65. Lets ﬁgure out level 2 ﬁrst.
NOT YET.

66. github.com/quirkey
github.com/paperlesspost