Save 37% off PRO during our Black Friday Sale! »

Adopting Stream Processing for Instrumentation

Adopting Stream Processing for Instrumentation

In the midst of building a multi-datacenter, multi-tenant instrumentation and visibility system, we arrived at stream processing as an alternative to storing, forwarding, and post-processing metrics as traditional systems do. However, the streaming paradigm is alien to many engineers and sysadmins who are used to working with "wall-of-graphs" dashboards, predefined aggregates, and point-and-click alert configuration.

Taking inspiration from REPLs, literate programming, and DevOps practices, we've designed an interface to our instrumentation system that focuses on interactive feedback, note-taking, and team communication. An engineer can both experiment with new flows at low risk, and codify longer-term practices into runbooks that embed live visualizations of instrumentation data. As a result, we can start to free our users from understanding the mechanics of the stream processor and instead focus on the domain of instrumentation.

In this talk, we will discuss how the interface described above works, how the stream processor manages flows on behalf of the user, and some tradeoffs we have encountered while preparing the system to roll out into our organization.

Ce461b57b6a1f65ce5b6cc1c124673e3?s=128

Sean Cribbs

June 27, 2017
Tweet

Transcript

  1. A D O P T I N G S T

    R E A M P R O C E S S I N G F O R I N S T R U M E N T A T I O N M A K I N G A L A Z Y R I V E R , N O T A W H I T E W A T E R R A P I D S SEAN CRIBBS SENIOR PRINCIPAL ENGINEER All photos are my own unless attributed. Permission is granted to Comcast to use these photos only in this presentation.
  2. O N I N S T R U M E

    N T A T I O N
  3. H O W A R E W E I N

    S T R U M E N T I N G ? • Wide variety of products
  4. super important metric

  5. super important metric super important metric super important metric super

    important metric
  6. super important super important super important super important super important

    super important super important super important super important super important super important super important “operational visibility”
  7. A R E A S F O R I M

    P R O V E M E N T super important super important super important super important super important super important super important super important
  8. A R E A S F O R I M

    P R O V E M E N T ✘ Human anomaly detector super important super important super important super important super important super important super important super important
  9. A R E A S F O R I M

    P R O V E M E N T ✘ Human anomaly detector ✘ Correlation is awkward super important super important super important super important super important super important super important super important
  10. A R E A S F O R I M

    P R O V E M E N T ✘ Human anomaly detector ✘ Correlation is awkward ✘ Copious data, low fidelity super important super important super important super important super important super important super important super important
  11. H O W A R E W E I N

    S T R U M E N T I N G ? • Wide variety of products
  12. H O W A R E W E I N

    S T R U M E N T I N G ? • Wide variety of products • Log files
  13. H O W A R E W E I N

    S T R U M E N T I N G ? • Wide variety of products • Log files • …more log files
  14. None
  15. i n d e x . j s : 6

    5 5 0 0 S e r v e r E r r o r
  16. i n d e x . j s : 6

    5 i t ’ s b r o k e f a m
  17. A R E A S F O R I M

    P R O V E M E N T
  18. A R E A S F O R I M

    P R O V E M E N T ✘ Developers write bad logs
  19. A R E A S F O R I M

    P R O V E M E N T ✘ Developers write bad logs ✘ Logs lack context
  20. A R E A S F O R I M

    P R O V E M E N T ✘ Developers write bad logs ✘ Logs lack context ✘ Text logs lack fidelity
  21. W H Y S T R E A M P

    R O C E S S I N G ?
  22. F U L L F L E X I B

    I L I T Y O V E R M O N I T O R I N G B E H A V I O R P R O G R A M M A B I L I T Y G I V E S
  23. R I C H S E T O F D

    I M E N S I O N S H I G H E R F I D E L I T Y D A T A G I V E S A
  24. P E R F O R M A N C

    E A N D S C A L E I N C R E M E N T A L P R O C E S S I N G G I V E S
  25. P E R F O R M A N C

    E A N D S C A L E I N C R E M E N T A L P R O C E S S I N G G I V E S http://riemann.io
  26. N E T W O R K M O N

    I T O R I N G A S A S T R E A M I N G A N A L Y T I C S P R O B L E M G U P T A , E T A L , H O T N E T S ’ 1 6 “Sonata can capture 95% of all traffic pertaining to the query, while reducing the overall data rate by a factor of about 400 and the number of required counters by four orders of magnitude.”
  27. C H A L L E N G E S

  28. C H A L L E N G E S

    ✘ “You’re asking me to program my monitoring system?”
  29. C H A L L E N G E S

    ✘ “You’re asking me to program my monitoring system?” ✘ New paradigm, new concepts: windows, triggers, partitioning, etc
  30. C H A L L E N G E S

    ✘ “You’re asking me to program my monitoring system?” ✘ New paradigm, new concepts: windows, triggers, partitioning, etc ✘ Our goal is not to make Hadoop easier/better/faster
  31. D R A W I N G I N S

    P I R A T I O N
  32. – D O N A L D K N U

    T H , 1 9 8 3 “Instead of imagining that our main task is to instruct a computer what to do, let us concentrate rather on explaining to human beings what we want a computer to do.” http://roxygen.org/knuth-literate-programming.pdf
  33. J U P Y T E R , M A

    T H E M A T I C A I N T E R A C T I V E R E P L S W I T H H I S T O R Y A N D P R O S E https://www.wolfram.com/mathematica/ http://jupyter.org/
  34. E V E D A T A - F O

    C U S E D D E V E L O P M E N T , A U T O N O M O U S C O D E B L O C K S http://play.witheve.com/#/examples/bar-graph.eve
  35. L I T E R A T E P R

    O G R A M M I N G B E N E F I T S https://github.com/witheve/rfcs/blob/master/proposed/syntax.md#program-structure
  36. L I T E R A T E P R

    O G R A M M I N G B E N E F I T S • "Literate programming forces you to consider a human audience." https://github.com/witheve/rfcs/blob/master/proposed/syntax.md#program-structure
  37. L I T E R A T E P R

    O G R A M M I N G B E N E F I T S • "Literate programming forces you to consider a human audience." • "The human brain is wired to engage with and remember stories." https://github.com/witheve/rfcs/blob/master/proposed/syntax.md#program-structure
  38. L I T E R A T E P R

    O G R A M M I N G B E N E F I T S • "Literate programming forces you to consider a human audience." • "The human brain is wired to engage with and remember stories." • "...literate programming encourages the programmer to arrange [programs] in a way that makes narrative sense." https://github.com/witheve/rfcs/blob/master/proposed/syntax.md#program-structure
  39. L I T E R A T E P R

    O G R A M M I N G B E N E F I T S • "Literate programming forces you to consider a human audience." • "The human brain is wired to engage with and remember stories." • "...literate programming encourages the programmer to arrange [programs] in a way that makes narrative sense." • “…you don't really understand something until you explain it to someone else." https://github.com/witheve/rfcs/blob/master/proposed/syntax.md#program-structure
  40. L I T E R A T E D A

    S H B O A R D S , E X E C U T A B L E R U N B O O K S T H E A H A ! M O M E N T
  41. C R E A T I N G L I

    T E R A T E D A S H B O A R D S O U R P R O T O T Y P E U I
  42. U N D E R T H E H O

    O D Photo by Comcast
  43. O P E RAT I O N A L V

    I S I B I L I T Y P R OJ EC T SYST E M H E A LT H S Y S T E M H E A L T H A P P L I CAT I O N M E T R I C S A P P L I C A T I O N M E T R I C S Emoji provided free by Emoji One
  44. O P E RAT I O N A L V

    I S I B I L I T Y P R OJ EC T SYST E M H E A LT H S Y S T E M H E A L T H A P P L I CAT I O N M E T R I C S A P P L I C A T I O N M E T R I C S AG E N T A G E N T Emoji provided free by Emoji One
  45. O P E RAT I O N A L V

    I S I B I L I T Y P R OJ EC T SYST E M H E A LT H S Y S T E M H E A L T H A P P L I CAT I O N M E T R I C S A P P L I C A T I O N M E T R I C S AG E N T A G E N T Emoji provided free by Emoji One
  46. O P E RAT I O N A L V

    I S I B I L I T Y P R OJ EC T SYST E M H E A LT H S Y S T E M H E A L T H A P P L I CAT I O N M E T R I C S A P P L I C A T I O N M E T R I C S AG E N T A G E N T E XT E R N A L S E R V I C E S E X T E R N A L S E R V I C E S Emoji provided free by Emoji One
  47. O P E RAT I O N A L V

    I S I B I L I T Y P R OJ EC T SYST E M H E A LT H S Y S T E M H E A L T H A P P L I CAT I O N M E T R I C S A P P L I C A T I O N M E T R I C S AG E N T A G E N T E XT E R N A L S E R V I C E S E X T E R N A L S E R V I C E S Emoji provided free by Emoji One
  48. O P E RAT I O N A L V

    I S I B I L I T Y P R OJ EC T SYST E M H E A LT H S Y S T E M H E A L T H A P P L I CAT I O N M E T R I C S A P P L I C A T I O N M E T R I C S AG E N T A G E N T “ M U L T I - T E N A N T ” “ M U L T I - R E G I O N ” “ H I G H L Y - A V A I L A B L E ” “ R E A L - T I M E ” “ S T R E A M I N G ” “ P L A T F O R M ” E XT E R N A L S E R V I C E S E X T E R N A L S E R V I C E S Emoji provided free by Emoji One
  49. O P E RAT I O N A L V

    I S I B I L I T Y P R OJ EC T SYST E M H E A LT H S Y S T E M H E A L T H A P P L I CAT I O N M E T R I C S A P P L I C A T I O N M E T R I C S AG E N T A G E N T “ M U L T I - T E N A N T ” “ M U L T I - R E G I O N ” “ H I G H L Y - A V A I L A B L E ” “ R E A L - T I M E ” “ S T R E A M I N G ” “ P L A T F O R M ” E XT E R N A L S E R V I C E S E X T E R N A L S E R V I C E S Emoji provided free by Emoji One
  50. O P E RAT I O N A L V

    I S I B I L I T Y P R OJ EC T SYST E M H E A LT H S Y S T E M H E A L T H A P P L I CAT I O N M E T R I C S A P P L I C A T I O N M E T R I C S AG E N T A G E N T E XT E R N A L S E R V I C E S E X T E R N A L S E R V I C E S Emoji provided free by Emoji One
  51. O P E RAT I O N A L V

    I S I B I L I T Y P R OJ EC T SYST E M H E A LT H S Y S T E M H E A L T H A P P L I CAT I O N M E T R I C S A P P L I C A T I O N M E T R I C S AG E N T A G E N T E XT E R N A L S E R V I C E S E X T E R N A L S E R V I C E S Emoji provided free by Emoji One
  52. O P E RAT I O N A L V

    I S I B I L I T Y P R OJ EC T SYST E M H E A LT H S Y S T E M H E A L T H A P P L I CAT I O N M E T R I C S A P P L I C A T I O N M E T R I C S AG E N T A G E N T E XT E R N A L S E R V I C E S E X T E R N A L S E R V I C E S Emoji provided free by Emoji One B R O W S E R
  53. O P E RAT I O N A L V

    I S I B I L I T Y P R OJ EC T SYST E M H E A LT H S Y S T E M H E A L T H A P P L I CAT I O N M E T R I C S A P P L I C A T I O N M E T R I C S AG E N T A G E N T E XT E R N A L S E R V I C E S E X T E R N A L S E R V I C E S Emoji provided free by Emoji One B R O W S E R A P I & C O N T R O L
  54. O P E RAT I O N A L V

    I S I B I L I T Y P R OJ EC T SYST E M H E A LT H S Y S T E M H E A L T H A P P L I CAT I O N M E T R I C S A P P L I C A T I O N M E T R I C S AG E N T A G E N T E XT E R N A L S E R V I C E S E X T E R N A L S E R V I C E S Emoji provided free by Emoji One B R O W S E R A P I & C O N T R O L I N F R A S T R U C T U R E S E R V I C E
  55. O P E RAT I O N A L V

    I S I B I L I T Y P R OJ EC T SYST E M H E A LT H S Y S T E M H E A L T H A P P L I CAT I O N M E T R I C S A P P L I C A T I O N M E T R I C S AG E N T A G E N T E XT E R N A L S E R V I C E S E X T E R N A L S E R V I C E S Emoji provided free by Emoji One B R O W S E R A P I & C O N T R O L I N F R A S T R U C T U R E S E R V I C E P R O C E S S O R
  56. O P E RAT I O N A L V

    I S I B I L I T Y P R OJ EC T SYST E M H E A LT H S Y S T E M H E A L T H A P P L I CAT I O N M E T R I C S A P P L I C A T I O N M E T R I C S AG E N T A G E N T E XT E R N A L S E R V I C E S E X T E R N A L S E R V I C E S Emoji provided free by Emoji One B R O W S E R A P I & C O N T R O L I N F R A S T R U C T U R E S E R V I C E P R O C E S S O R
  57. O P E RAT I O N A L V

    I S I B I L I T Y P R OJ EC T SYST E M H E A LT H S Y S T E M H E A L T H A P P L I CAT I O N M E T R I C S A P P L I C A T I O N M E T R I C S AG E N T A G E N T E XT E R N A L S E R V I C E S E X T E R N A L S E R V I C E S Emoji provided free by Emoji One B R O W S E R A P I & C O N T R O L I N F R A S T R U C T U R E S E R V I C E P R O C E S S O R C H A N N E L S
  58. O P E RAT I O N A L V

    I S I B I L I T Y P R OJ EC T SYST E M H E A LT H S Y S T E M H E A L T H A P P L I CAT I O N M E T R I C S A P P L I C A T I O N M E T R I C S AG E N T A G E N T E XT E R N A L S E R V I C E S E X T E R N A L S E R V I C E S Emoji provided free by Emoji One B R O W S E R A P I & C O N T R O L I N F R A S T R U C T U R E S E R V I C E P R O C E S S O R C H A N N E L S ➡ Pipeline execution
  59. O P E RAT I O N A L V

    I S I B I L I T Y P R OJ EC T SYST E M H E A LT H S Y S T E M H E A L T H A P P L I CAT I O N M E T R I C S A P P L I C A T I O N M E T R I C S AG E N T A G E N T E XT E R N A L S E R V I C E S E X T E R N A L S E R V I C E S Emoji provided free by Emoji One B R O W S E R A P I & C O N T R O L I N F R A S T R U C T U R E S E R V I C E P R O C E S S O R C H A N N E L S ➡ Pipeline execution ➡ Agent-to-platform discovery
  60. E L I X I R G E N STAG

    E & F LO W P R O D U C E R P R O D U C E R / C O N S U M E R P R O D U C E R / C O N S U M E R C O N S U M E R
  61. subscribe with 
 demand E L I X I R

    G E N STAG E & F LO W P R O D U C E R P R O D U C E R / C O N S U M E R P R O D U C E R / C O N S U M E R C O N S U M E R
  62. subscribe with 
 demand E L I X I R

    G E N STAG E & F LO W P R O D U C E R P R O D U C E R / C O N S U M E R P R O D U C E R / C O N S U M E R C O N S U M E R events
  63. subscribe with 
 demand E L I X I R

    G E N STAG E & F LO W P R O D U C E R P R O D U C E R / C O N S U M E R P R O D U C E R / C O N S U M E R C O N S U M E R events demand
  64. subscribe with 
 demand E L I X I R

    G E N STAG E & F LO W P R O D U C E R P R O D U C E R / C O N S U M E R P R O D U C E R / C O N S U M E R C O N S U M E R events demand File.stream!(“/tmp/words.txt”)
  65. subscribe with 
 demand E L I X I R

    G E N STAG E & F LO W P R O D U C E R P R O D U C E R / C O N S U M E R P R O D U C E R / C O N S U M E R C O N S U M E R events demand File.stream!(“/tmp/words.txt”) |> Flow.from_enumerable()
  66. subscribe with 
 demand E L I X I R

    G E N STAG E & F LO W P R O D U C E R P R O D U C E R / C O N S U M E R P R O D U C E R / C O N S U M E R C O N S U M E R events demand File.stream!(“/tmp/words.txt”) |> Flow.from_enumerable() |> Flow.flat_map(&String.split/1)
  67. subscribe with 
 demand E L I X I R

    G E N STAG E & F LO W P R O D U C E R P R O D U C E R / C O N S U M E R P R O D U C E R / C O N S U M E R C O N S U M E R events demand File.stream!(“/tmp/words.txt”) |> Flow.from_enumerable() |> Flow.flat_map(&String.split/1) |> Flow.partition()
  68. subscribe with 
 demand E L I X I R

    G E N STAG E & F LO W P R O D U C E R P R O D U C E R / C O N S U M E R P R O D U C E R / C O N S U M E R C O N S U M E R events demand File.stream!(“/tmp/words.txt”) |> Flow.from_enumerable() |> Flow.flat_map(&String.split/1) |> Flow.partition() |> Flow.reduce(&Map.new/0, &(Map.update(&2, &1, 0, fn c -> c+1 end)))
  69. subscribe with 
 demand E L I X I R

    G E N STAG E & F LO W P R O D U C E R P R O D U C E R / C O N S U M E R P R O D U C E R / C O N S U M E R C O N S U M E R events demand File.stream!(“/tmp/words.txt”) |> Flow.from_enumerable() |> Flow.flat_map(&String.split/1) |> Flow.partition() |> Flow.reduce(&Map.new/0, &(Map.update(&2, &1, 0, fn c -> c+1 end))) |> Enum.to_list()
  70. M E T R I C S - F O

    C U S E D S T R E A M C O M B I N A T O R S S E E K I N G D O M A I N - S P E C I F I C A B S T R A C T I O N S
  71. M E T R I C S - F O

    C U S E D S T R E A M C O M B I N A T O R S S E E K I N G D O M A I N - S P E C I F I C A B S T R A C T I O N S where(type: ["disk", "free", "percent"])
  72. M E T R I C S - F O

    C U S E D S T R E A M C O M B I N A T O R S S E E K I N G D O M A I N - S P E C I F I C A B S T R A C T I O N S where(type: ["disk", "free", "percent"]) |> by([:host, :mount])
  73. M E T R I C S - F O

    C U S E D S T R E A M C O M B I N A T O R S S E E K I N G D O M A I N - S P E C I F I C A B S T R A C T I O N S where(type: ["disk", "free", "percent"]) |> by([:host, :mount]) |> ewma
  74. M E T R I C S - F O

    C U S E D S T R E A M C O M B I N A T O R S S E E K I N G D O M A I N - S P E C I F I C A B S T R A C T I O N S where(type: ["disk", "free", "percent"]) |> by([:host, :mount]) |> ewma |> threshold(below: 10.0)
  75. M E T R I C S - F O

    C U S E D S T R E A M C O M B I N A T O R S S E E K I N G D O M A I N - S P E C I F I C A B S T R A C T I O N S where(type: ["disk", "free", "percent"]) |> by([:host, :mount]) |> ewma |> threshold(below: 10.0) |> forward(:on_call_alert)
  76. M E T R I C S - F O

    C U S E D S T R E A M C O M B I N A T O R S S E E K I N G D O M A I N - S P E C I F I C A B S T R A C T I O N S where(type: ["disk", "free", "percent"]) |> by([:host, :mount]) |> ewma |> threshold(below: 10.0) |> forward(:on_call_alert) |> draw(:table)
  77. S E G M E N T I N G

    P I P E L I N E S F O R R E - U S E A U T O M A T I C W R I T E - A T T E N U A T I O N B Y
  78. S E G M E N T I N G

    P I P E L I N E S F O R R E - U S E A U T O M A T I C W R I T E - A T T E N U A T I O N B Y where(type: ["disk", "free", "percent"]) |> by([:host, :mount]) |> ewma |> threshold(below: 10.0) |> forward(:on_call_alert) |> draw(:table)
  79. S E G M E N T I N G

    P I P E L I N E S F O R R E - U S E A U T O M A T I C W R I T E - A T T E N U A T I O N B Y where(type: ["disk", "free", "percent"]) |> by([:host, :mount]) |> ewma |> threshold(below: 10.0) |> forward(:on_call_alert) |> draw(:table)
  80. where(type: ["disk", "free", "percent"]) |> by([:host, :mount]) |> ewma |>

    history(minutes: 30) S E G M E N T I N G P I P E L I N E S F O R R E - U S E A U T O M A T I C W R I T E - A T T E N U A T I O N B Y where(type: ["disk", "free", "percent"]) |> by([:host, :mount]) |> ewma |> threshold(below: 10.0) |> forward(:on_call_alert) |> draw(:table)
  81. where(type: ["disk", "free", "percent"]) |> by([:host, :mount]) |> ewma |>

    history(minutes: 30) S E G M E N T I N G P I P E L I N E S F O R R E - U S E A U T O M A T I C W R I T E - A T T E N U A T I O N B Y where(type: ["disk", "free", "percent"]) |> by([:host, :mount]) |> ewma |> threshold(below: 10.0) |> forward(:on_call_alert) |> draw(:table)
  82. where(type: ["disk", "free", "percent"]) |> by([:host, :mount]) |> ewma |>

    history(minutes: 30) S E G M E N T I N G P I P E L I N E S F O R R E - U S E A U T O M A T I C W R I T E - A T T E N U A T I O N B Y where(type: ["disk", "free", "percent"]) |> by([:host, :mount]) |> ewma |> threshold(below: 10.0) |> forward(:on_call_alert) |> draw(:table)
  83. L E S S O N S L E A

    R N E D
  84. R E T R O S P E C T

    I V E
  85. R E T R O S P E C T

    I V E ✓ Literate programs make for good collaboration
  86. R E T R O S P E C T

    I V E ✓ Literate programs make for good collaboration ✘ Our vision is ahead of the organization
  87. R E T R O S P E C T

    I V E ✓ Literate programs make for good collaboration ✘ Our vision is ahead of the organization ✘ Stream processing is just the means, not the end
  88. R E T R O S P E C T

    I V E ✓ Literate programs make for good collaboration ✘ Our vision is ahead of the organization ✘ Stream processing is just the means, not the end ✘ Much more research is needed
  89. h t t p s : / / g i

    t . i o / v Q I g p R E F E R E N C E S
  90. extra slides

  91. W H Y E L I X I R ?

  92. W H Y E L I X I R ?

    ✓ Familiarity
  93. W H Y E L I X I R ?

    ✓ Familiarity ✓ Meta-programming
  94. W H Y E L I X I R ?

    ✓ Familiarity ✓ Meta-programming ✓ Transparent, low-latency runtime
  95. W H Y E L I X I R ?

    ✓ Familiarity ✓ Meta-programming ✓ Transparent, low-latency runtime ✓ Our ops time/budget is small
  96. W H Y E L I X I R ?

    ✓ Familiarity ✓ Meta-programming ✓ Transparent, low-latency runtime ✓ Our ops time/budget is small ✓ Generic Erlang/Elixir pitch, “let it crash”, etc
  97. M U S K E T E E R :

    A L L F O R O N E , O N E F O R A L L I N D A T A P R O C E S S I N G S Y S T E M S G O G , E T A L , E U R O S Y S ’ 1 5 “For small inputs (≤ 0.5GB), the Metis single-machine MapReduce system performs best. This matters, as small inputs are common in practice: 40–80% of Cloudera customers’ MapReduce jobs and 70% of jobs in a Facebook trace have ≤ 1GB of input.”