$30 off During Our Annual Pro Sale. View Details »

Correlation: The Next Frontier

Correlation: The Next Frontier

My talk from #monitorama 2013 with ideas about how to apply different types of correlation to our data

Aaron Quint

March 28, 2013
Tweet

More Decks by Aaron Quint

Other Decks in Technology

Transcript

  1. CORRELATION:
    THE NEXT FRONTIER
    monitorama 2013 / boston / @aq

    View Slide

  2. CTO of

    View Slide

  3. Chief Taco Officer
    CTO of

    View Slide

  4. View Slide

  5. View Slide

  6. A litte bit of @aq

    View Slide

  7. A litte bit of @aq
    • Expert Eater

    View Slide

  8. A litte bit of @aq
    • Expert Eater
    • Experienced Ruby and JS Developer

    View Slide

  9. A litte bit of @aq
    • Expert Eater
    • Experienced Ruby and JS Developer
    • Growing Student of Operations

    View Slide

  10. A litte bit of @aq
    • Expert Eater
    • Experienced Ruby and JS Developer
    • Growing Student of Operations
    • Beginner Distributed Systems
    Maintainer

    View Slide

  11. Give each other dap.
    WE did it

    View Slide

  12. OPS DONE.
    WE have DATA Y’ALL

    View Slide

  13. View Slide

  14. Uhhh
    SO NO MORE TEARS,
    right?

    View Slide

  15. View Slide

  16. THE LEVELS OF
    MONITORING
    NIRVANA

    View Slide

  17. View Slide

  18. PURE DATA

    View Slide

  19. PURE DATA
    BASIC INFERENCES
    AND CORRELATIONS

    View Slide

  20. PURE DATA
    BASIC INFERENCES
    AND CORRELATIONS
    THE
    FUCKING
    MATRIX

    View Slide

  21. PURE DATA
    BASIC INFERENCES
    AND CORRELATIONS
    THE
    FUCKING
    MATRIX

    View Slide

  22. PURE DATA
    BASIC INFERENCES
    AND CORRELATIONS
    THE
    FUCKING
    MATRIX
    whoa

    View Slide

  23. View Slide

  24. PURE DATA

    View Slide

  25. PURE DATA
    BASIC INFERENCES
    AND CORRELATIONS

    View Slide

  26. PURE DATA
    BASIC INFERENCES
    AND CORRELATIONS
    PREDICTIVE
    AND DIRECT
    RELATIONSHIPS

    View Slide

  27. Aligning the data.
    CORRELATION

    View Slide

  28. Except when it does.
    CORRELATION DOES NOT
    IMPLY CAUSATION

    View Slide

  29. The marshmallow test

    View Slide

  30. And let us get back to Shaving Yaks.
    correlation can
    narrow our work

    View Slide

  31. Step back, HE’S DOING MATH!
    MATHEMATICAL
    CORRELATION

    View Slide

  32. Say that 5 times fast.
    PEARSON Product
    moment correlation
    coefficient

    View Slide

  33. 0
    1.5
    3
    4.5
    6
    0 225 450 675 900
    CPU vs Response Time

    View Slide

  34. 0
    1.5
    3
    4.5
    6
    0 225 450 675 900
    CPU vs Response Time

    View Slide

  35. View Slide

  36. 1 data = [
    2 [100, 0.7],
    3 [125, 0.5],
    4 [150, 1],
    5 [300, 2.1],
    6 [500, 3.4],
    7 [900, 6]
    8 ]
    9
    10 x, y = data.transpose
    11 n = data.size
    12 x_mean = x.reduce(:+) / n
    13 y_mean = y.reduce(:+) / n
    14 x_stddev = Math.sqrt(x.inject {|sum, i| sum + (i - x_mean)**2 } / (n - 1).to_f)
    15 y_stddev = Math.sqrt(y.inject {|sum, i| sum + (i - y_mean)**2 } / (n - 1).to_f)
    16 z_x = x.collect {|i| (i - x_mean) / x_stddev }
    17 z_y = y.collect {|i| (i - y_mean) / y_stddev }
    18 pearsons = z_x.zip(z_y).collect {|x| x[0] * x[1] }.reduce(:+) / n
    19 # => 0.9265763490538744

    View Slide

  37. PEarson

    View Slide

  38. PEarson
    • Close to absolute 1 = probably
    correlated samples

    View Slide

  39. PEarson
    • Close to absolute 1 = probably
    correlated samples
    • Could be applied to moving
    averages?

    View Slide

  40. PEarson
    • Close to absolute 1 = probably
    correlated samples
    • Could be applied to moving
    averages?
    • Could we pull it into a graphite
    function? (Hackathon anyone?)

    View Slide

  41. LIMITS OF Mathematical correlation

    View Slide

  42. LIMITS OF Mathematical correlation
    • Requires known inputs and
    assumptions

    View Slide

  43. LIMITS OF Mathematical correlation
    • Requires known inputs and
    assumptions
    • Suggestion of correlation, not proof

    View Slide

  44. LIMITS OF Mathematical correlation
    • Requires known inputs and
    assumptions
    • Suggestion of correlation, not proof
    • Needs a large amount of knowledge
    of the data set to make decisions

    View Slide

  45. I can see it!
    TIME BASED VISUAL
    CORRELATION

    View Slide

  46. A Graphite
    Story

    View Slide

  47. LIMITS OF VISUAL CORRELATION

    View Slide

  48. LIMITS OF VISUAL CORRELATION
    • Takes a good eye

    View Slide

  49. LIMITS OF VISUAL CORRELATION
    • Takes a good eye
    • Hard to see the signal through the
    noise

    View Slide

  50. LIMITS OF VISUAL CORRELATION
    • Takes a good eye
    • Hard to see the signal through the
    noise
    • Doesn’t really account for domino
    events

    View Slide

  51. LIMITS OF VISUAL CORRELATION
    • Takes a good eye
    • Hard to see the signal through the
    noise
    • Doesn’t really account for domino
    events
    • Good for trends but not as much for
    events

    View Slide

  52. There’s a disturbance in the force.
    EMOTIONAL
    CORRELATION

    View Slide

  53. RASHoMONING

    View Slide

  54. Each person uses their unique knowledge
    of the situation to point out unique data
    points.
    RASHoMONING

    View Slide

  55. LIMITS OF EMOTIONAL correlation

    View Slide

  56. LIMITS OF EMOTIONAL correlation
    • Provides a trail not an answer

    View Slide

  57. LIMITS OF EMOTIONAL correlation
    • Provides a trail not an answer
    • Depends on having a team of
    people

    View Slide

  58. LIMITS OF EMOTIONAL correlation
    • Provides a trail not an answer
    • Depends on having a team of
    people
    • Many ideas, needs a “judge”

    View Slide

  59. LIMITS OF EMOTIONAL correlation
    • Provides a trail not an answer
    • Depends on having a team of
    people
    • Many ideas, needs a “judge”
    • HUMANS (Hence Rashomoning)

    View Slide

  60. I’m sold, show me how!

    View Slide

  61. But I have some ideas.
    I don’t know exactly

    View Slide

  62. TRYING TO MAKE THE
    DATA MORE VISIBLE

    View Slide

  63. A conflagration of data
    HOTPOT

    View Slide

  64. View Slide

  65. Hotpot = Chef, Sensu, Graphite, (Logstash)
    Simply align disparate
    sources of data TO
    VISUALLY CORRELATE

    View Slide

  66. create relationships
    for alerts and
    notifications

    View Slide

  67. View Slide

  68. Math to filter out noise.
    USe PEARSONS to pull
    out potentially
    related data

    View Slide

  69. Have the ability to easily divide datasets by
    “cohorts”
    cohort analysis for
    processes/nodes

    View Slide

  70. “Node notes”. Document everything.
    Treat personal and
    institutional
    knowledge as data

    View Slide

  71. By making more data available to everyone.
    Make emotional
    correlation less EMO

    View Slide

  72. So if that’s just level 2,
    what’s level 3?

    View Slide

  73. View Slide

  74. TAKE the correlations
    and let the machine
    turn them into
    decisions

    View Slide

  75. Lets figure out level 2 first.
    NOT YET.

    View Slide

  76. github.com/quirkey
    github.com/paperlesspost
    twitter.com/aq
    twitter.com/paperlessdev
    quirkey.com
    paperlesspost.com
    THANKS!

    View Slide