Correlation: The Next Frontier

Correlation: The Next Frontier

My talk from #monitorama 2013 with ideas about how to apply different types of correlation to our data

F04bfa14141dca6713f0d9caa763e26b?s=128

Aaron Quint

March 28, 2013
Tweet

Transcript

  1. CORRELATION: THE NEXT FRONTIER monitorama 2013 / boston / @aq

  2. CTO of

  3. Chief Taco Officer CTO of

  4. None
  5. None
  6. A litte bit of @aq

  7. A litte bit of @aq • Expert Eater

  8. A litte bit of @aq • Expert Eater • Experienced

    Ruby and JS Developer
  9. A litte bit of @aq • Expert Eater • Experienced

    Ruby and JS Developer • Growing Student of Operations
  10. A litte bit of @aq • Expert Eater • Experienced

    Ruby and JS Developer • Growing Student of Operations • Beginner Distributed Systems Maintainer
  11. Give each other dap. WE did it

  12. OPS DONE. WE have DATA Y’ALL

  13. None
  14. Uhhh SO NO MORE TEARS, right?

  15. None
  16. THE LEVELS OF MONITORING NIRVANA

  17. None
  18. PURE DATA

  19. PURE DATA BASIC INFERENCES AND CORRELATIONS

  20. PURE DATA BASIC INFERENCES AND CORRELATIONS THE FUCKING MATRIX

  21. PURE DATA BASIC INFERENCES AND CORRELATIONS THE FUCKING MATRIX

  22. PURE DATA BASIC INFERENCES AND CORRELATIONS THE FUCKING MATRIX whoa

  23. None
  24. PURE DATA

  25. PURE DATA BASIC INFERENCES AND CORRELATIONS

  26. PURE DATA BASIC INFERENCES AND CORRELATIONS PREDICTIVE AND DIRECT RELATIONSHIPS

  27. Aligning the data. CORRELATION

  28. Except when it does. CORRELATION DOES NOT IMPLY CAUSATION

  29. The marshmallow test

  30. And let us get back to Shaving Yaks. correlation can

    narrow our work
  31. Step back, HE’S DOING MATH! MATHEMATICAL CORRELATION

  32. Say that 5 times fast. PEARSON Product moment correlation coefficient

  33. 0 1.5 3 4.5 6 0 225 450 675 900

    CPU vs Response Time
  34. 0 1.5 3 4.5 6 0 225 450 675 900

    CPU vs Response Time
  35. None
  36. 1 data = [ 2 [100, 0.7], 3 [125, 0.5],

    4 [150, 1], 5 [300, 2.1], 6 [500, 3.4], 7 [900, 6] 8 ] 9 10 x, y = data.transpose 11 n = data.size 12 x_mean = x.reduce(:+) / n 13 y_mean = y.reduce(:+) / n 14 x_stddev = Math.sqrt(x.inject {|sum, i| sum + (i - x_mean)**2 } / (n - 1).to_f) 15 y_stddev = Math.sqrt(y.inject {|sum, i| sum + (i - y_mean)**2 } / (n - 1).to_f) 16 z_x = x.collect {|i| (i - x_mean) / x_stddev } 17 z_y = y.collect {|i| (i - y_mean) / y_stddev } 18 pearsons = z_x.zip(z_y).collect {|x| x[0] * x[1] }.reduce(:+) / n 19 # => 0.9265763490538744
  37. PEarson

  38. PEarson • Close to absolute 1 = probably correlated samples

  39. PEarson • Close to absolute 1 = probably correlated samples

    • Could be applied to moving averages?
  40. PEarson • Close to absolute 1 = probably correlated samples

    • Could be applied to moving averages? • Could we pull it into a graphite function? (Hackathon anyone?)
  41. LIMITS OF Mathematical correlation

  42. LIMITS OF Mathematical correlation • Requires known inputs and assumptions

  43. LIMITS OF Mathematical correlation • Requires known inputs and assumptions

    • Suggestion of correlation, not proof
  44. LIMITS OF Mathematical correlation • Requires known inputs and assumptions

    • Suggestion of correlation, not proof • Needs a large amount of knowledge of the data set to make decisions
  45. I can see it! TIME BASED VISUAL CORRELATION

  46. A Graphite Story

  47. LIMITS OF VISUAL CORRELATION

  48. LIMITS OF VISUAL CORRELATION • Takes a good eye

  49. LIMITS OF VISUAL CORRELATION • Takes a good eye •

    Hard to see the signal through the noise
  50. LIMITS OF VISUAL CORRELATION • Takes a good eye •

    Hard to see the signal through the noise • Doesn’t really account for domino events
  51. LIMITS OF VISUAL CORRELATION • Takes a good eye •

    Hard to see the signal through the noise • Doesn’t really account for domino events • Good for trends but not as much for events
  52. There’s a disturbance in the force. EMOTIONAL CORRELATION

  53. RASHoMONING

  54. Each person uses their unique knowledge of the situation to

    point out unique data points. RASHoMONING
  55. LIMITS OF EMOTIONAL correlation

  56. LIMITS OF EMOTIONAL correlation • Provides a trail not an

    answer
  57. LIMITS OF EMOTIONAL correlation • Provides a trail not an

    answer • Depends on having a team of people
  58. LIMITS OF EMOTIONAL correlation • Provides a trail not an

    answer • Depends on having a team of people • Many ideas, needs a “judge”
  59. LIMITS OF EMOTIONAL correlation • Provides a trail not an

    answer • Depends on having a team of people • Many ideas, needs a “judge” • HUMANS (Hence Rashomoning)
  60. I’m sold, show me how!

  61. But I have some ideas. I don’t know exactly

  62. TRYING TO MAKE THE DATA MORE VISIBLE

  63. A conflagration of data HOTPOT

  64. None
  65. Hotpot = Chef, Sensu, Graphite, (Logstash) Simply align disparate sources

    of data TO VISUALLY CORRELATE
  66. create relationships for alerts and notifications

  67. None
  68. Math to filter out noise. USe PEARSONS to pull out

    potentially related data
  69. Have the ability to easily divide datasets by “cohorts” cohort

    analysis for processes/nodes
  70. “Node notes”. Document everything. Treat personal and institutional knowledge as

    data
  71. By making more data available to everyone. Make emotional correlation

    less EMO
  72. So if that’s just level 2, what’s level 3?

  73. None
  74. TAKE the correlations and let the machine turn them into

    decisions
  75. Lets figure out level 2 first. NOT YET.

  76. github.com/quirkey github.com/paperlesspost twitter.com/aq twitter.com/paperlessdev quirkey.com paperlesspost.com THANKS!