Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Adopting Stream Processing for Instrumentation

Adopting Stream Processing for Instrumentation

In the midst of building a multi-datacenter, multi-tenant instrumentation and visibility system, we arrived at stream processing as an alternative to storing, forwarding, and post-processing metrics as traditional systems do. However, the streaming paradigm is alien to many engineers and sysadmins who are used to working with "wall-of-graphs" dashboards, predefined aggregates, and point-and-click alert configuration.

Taking inspiration from REPLs, literate programming, and DevOps practices, we've designed an interface to our instrumentation system that focuses on interactive feedback, note-taking, and team communication. An engineer can both experiment with new flows at low risk, and codify longer-term practices into runbooks that embed live visualizations of instrumentation data. As a result, we can start to free our users from understanding the mechanics of the stream processor and instead focus on the domain of instrumentation.

In this talk, we will discuss how the interface described above works, how the stream processor manages flows on behalf of the user, and some tradeoffs we have encountered while preparing the system to roll out into our organization.

Sean Cribbs

June 27, 2017
Tweet

More Decks by Sean Cribbs

Other Decks in Technology

Transcript

  1. A D O P T I N G S T

    R E A M P R O C E S S I N G F O R I N S T R U M E N T A T I O N M A K I N G A L A Z Y R I V E R , N O T A W H I T E W A T E R R A P I D S SEAN CRIBBS SENIOR PRINCIPAL ENGINEER All photos are my own unless attributed. Permission is granted to Comcast to use these photos only in this presentation.
  2. O N I N S T R U M E

    N T A T I O N
  3. H O W A R E W E I N

    S T R U M E N T I N G ? • Wide variety of products
  4. super important super important super important super important super important

    super important super important super important super important super important super important super important “operational visibility”
  5. A R E A S F O R I M

    P R O V E M E N T super important super important super important super important super important super important super important super important
  6. A R E A S F O R I M

    P R O V E M E N T ✘ Human anomaly detector super important super important super important super important super important super important super important super important
  7. A R E A S F O R I M

    P R O V E M E N T ✘ Human anomaly detector ✘ Correlation is awkward super important super important super important super important super important super important super important super important
  8. A R E A S F O R I M

    P R O V E M E N T ✘ Human anomaly detector ✘ Correlation is awkward ✘ Copious data, low fidelity super important super important super important super important super important super important super important super important
  9. H O W A R E W E I N

    S T R U M E N T I N G ? • Wide variety of products
  10. H O W A R E W E I N

    S T R U M E N T I N G ? • Wide variety of products • Log files
  11. H O W A R E W E I N

    S T R U M E N T I N G ? • Wide variety of products • Log files • …more log files
  12. i n d e x . j s : 6

    5 5 0 0 S e r v e r E r r o r
  13. i n d e x . j s : 6

    5 i t ’ s b r o k e f a m
  14. A R E A S F O R I M

    P R O V E M E N T
  15. A R E A S F O R I M

    P R O V E M E N T ✘ Developers write bad logs
  16. A R E A S F O R I M

    P R O V E M E N T ✘ Developers write bad logs ✘ Logs lack context
  17. A R E A S F O R I M

    P R O V E M E N T ✘ Developers write bad logs ✘ Logs lack context ✘ Text logs lack fidelity
  18. W H Y S T R E A M P

    R O C E S S I N G ?
  19. F U L L F L E X I B

    I L I T Y O V E R M O N I T O R I N G B E H A V I O R P R O G R A M M A B I L I T Y G I V E S
  20. R I C H S E T O F D

    I M E N S I O N S H I G H E R F I D E L I T Y D A T A G I V E S A
  21. P E R F O R M A N C

    E A N D S C A L E I N C R E M E N T A L P R O C E S S I N G G I V E S
  22. P E R F O R M A N C

    E A N D S C A L E I N C R E M E N T A L P R O C E S S I N G G I V E S http://riemann.io
  23. N E T W O R K M O N

    I T O R I N G A S A S T R E A M I N G A N A L Y T I C S P R O B L E M G U P T A , E T A L , H O T N E T S ’ 1 6 “Sonata can capture 95% of all traffic pertaining to the query, while reducing the overall data rate by a factor of about 400 and the number of required counters by four orders of magnitude.”
  24. C H A L L E N G E S

    ✘ “You’re asking me to program my monitoring system?”
  25. C H A L L E N G E S

    ✘ “You’re asking me to program my monitoring system?” ✘ New paradigm, new concepts: windows, triggers, partitioning, etc
  26. C H A L L E N G E S

    ✘ “You’re asking me to program my monitoring system?” ✘ New paradigm, new concepts: windows, triggers, partitioning, etc ✘ Our goal is not to make Hadoop easier/better/faster
  27. D R A W I N G I N S

    P I R A T I O N
  28. – D O N A L D K N U

    T H , 1 9 8 3 “Instead of imagining that our main task is to instruct a computer what to do, let us concentrate rather on explaining to human beings what we want a computer to do.” http://roxygen.org/knuth-literate-programming.pdf
  29. J U P Y T E R , M A

    T H E M A T I C A I N T E R A C T I V E R E P L S W I T H H I S T O R Y A N D P R O S E https://www.wolfram.com/mathematica/ http://jupyter.org/
  30. E V E D A T A - F O

    C U S E D D E V E L O P M E N T , A U T O N O M O U S C O D E B L O C K S http://play.witheve.com/#/examples/bar-graph.eve
  31. L I T E R A T E P R

    O G R A M M I N G B E N E F I T S https://github.com/witheve/rfcs/blob/master/proposed/syntax.md#program-structure
  32. L I T E R A T E P R

    O G R A M M I N G B E N E F I T S • "Literate programming forces you to consider a human audience." https://github.com/witheve/rfcs/blob/master/proposed/syntax.md#program-structure
  33. L I T E R A T E P R

    O G R A M M I N G B E N E F I T S • "Literate programming forces you to consider a human audience." • "The human brain is wired to engage with and remember stories." https://github.com/witheve/rfcs/blob/master/proposed/syntax.md#program-structure
  34. L I T E R A T E P R

    O G R A M M I N G B E N E F I T S • "Literate programming forces you to consider a human audience." • "The human brain is wired to engage with and remember stories." • "...literate programming encourages the programmer to arrange [programs] in a way that makes narrative sense." https://github.com/witheve/rfcs/blob/master/proposed/syntax.md#program-structure
  35. L I T E R A T E P R

    O G R A M M I N G B E N E F I T S • "Literate programming forces you to consider a human audience." • "The human brain is wired to engage with and remember stories." • "...literate programming encourages the programmer to arrange [programs] in a way that makes narrative sense." • “…you don't really understand something until you explain it to someone else." https://github.com/witheve/rfcs/blob/master/proposed/syntax.md#program-structure
  36. L I T E R A T E D A

    S H B O A R D S , E X E C U T A B L E R U N B O O K S T H E A H A ! M O M E N T
  37. C R E A T I N G L I

    T E R A T E D A S H B O A R D S O U R P R O T O T Y P E U I
  38. U N D E R T H E H O

    O D Photo by Comcast
  39. O P E RAT I O N A L V

    I S I B I L I T Y P R OJ EC T SYST E M H E A LT H S Y S T E M H E A L T H A P P L I CAT I O N M E T R I C S A P P L I C A T I O N M E T R I C S Emoji provided free by Emoji One
  40. O P E RAT I O N A L V

    I S I B I L I T Y P R OJ EC T SYST E M H E A LT H S Y S T E M H E A L T H A P P L I CAT I O N M E T R I C S A P P L I C A T I O N M E T R I C S AG E N T A G E N T Emoji provided free by Emoji One
  41. O P E RAT I O N A L V

    I S I B I L I T Y P R OJ EC T SYST E M H E A LT H S Y S T E M H E A L T H A P P L I CAT I O N M E T R I C S A P P L I C A T I O N M E T R I C S AG E N T A G E N T Emoji provided free by Emoji One
  42. O P E RAT I O N A L V

    I S I B I L I T Y P R OJ EC T SYST E M H E A LT H S Y S T E M H E A L T H A P P L I CAT I O N M E T R I C S A P P L I C A T I O N M E T R I C S AG E N T A G E N T E XT E R N A L S E R V I C E S E X T E R N A L S E R V I C E S Emoji provided free by Emoji One
  43. O P E RAT I O N A L V

    I S I B I L I T Y P R OJ EC T SYST E M H E A LT H S Y S T E M H E A L T H A P P L I CAT I O N M E T R I C S A P P L I C A T I O N M E T R I C S AG E N T A G E N T E XT E R N A L S E R V I C E S E X T E R N A L S E R V I C E S Emoji provided free by Emoji One
  44. O P E RAT I O N A L V

    I S I B I L I T Y P R OJ EC T SYST E M H E A LT H S Y S T E M H E A L T H A P P L I CAT I O N M E T R I C S A P P L I C A T I O N M E T R I C S AG E N T A G E N T “ M U L T I - T E N A N T ” “ M U L T I - R E G I O N ” “ H I G H L Y - A V A I L A B L E ” “ R E A L - T I M E ” “ S T R E A M I N G ” “ P L A T F O R M ” E XT E R N A L S E R V I C E S E X T E R N A L S E R V I C E S Emoji provided free by Emoji One
  45. O P E RAT I O N A L V

    I S I B I L I T Y P R OJ EC T SYST E M H E A LT H S Y S T E M H E A L T H A P P L I CAT I O N M E T R I C S A P P L I C A T I O N M E T R I C S AG E N T A G E N T “ M U L T I - T E N A N T ” “ M U L T I - R E G I O N ” “ H I G H L Y - A V A I L A B L E ” “ R E A L - T I M E ” “ S T R E A M I N G ” “ P L A T F O R M ” E XT E R N A L S E R V I C E S E X T E R N A L S E R V I C E S Emoji provided free by Emoji One
  46. O P E RAT I O N A L V

    I S I B I L I T Y P R OJ EC T SYST E M H E A LT H S Y S T E M H E A L T H A P P L I CAT I O N M E T R I C S A P P L I C A T I O N M E T R I C S AG E N T A G E N T E XT E R N A L S E R V I C E S E X T E R N A L S E R V I C E S Emoji provided free by Emoji One
  47. O P E RAT I O N A L V

    I S I B I L I T Y P R OJ EC T SYST E M H E A LT H S Y S T E M H E A L T H A P P L I CAT I O N M E T R I C S A P P L I C A T I O N M E T R I C S AG E N T A G E N T E XT E R N A L S E R V I C E S E X T E R N A L S E R V I C E S Emoji provided free by Emoji One
  48. O P E RAT I O N A L V

    I S I B I L I T Y P R OJ EC T SYST E M H E A LT H S Y S T E M H E A L T H A P P L I CAT I O N M E T R I C S A P P L I C A T I O N M E T R I C S AG E N T A G E N T E XT E R N A L S E R V I C E S E X T E R N A L S E R V I C E S Emoji provided free by Emoji One B R O W S E R
  49. O P E RAT I O N A L V

    I S I B I L I T Y P R OJ EC T SYST E M H E A LT H S Y S T E M H E A L T H A P P L I CAT I O N M E T R I C S A P P L I C A T I O N M E T R I C S AG E N T A G E N T E XT E R N A L S E R V I C E S E X T E R N A L S E R V I C E S Emoji provided free by Emoji One B R O W S E R A P I & C O N T R O L
  50. O P E RAT I O N A L V

    I S I B I L I T Y P R OJ EC T SYST E M H E A LT H S Y S T E M H E A L T H A P P L I CAT I O N M E T R I C S A P P L I C A T I O N M E T R I C S AG E N T A G E N T E XT E R N A L S E R V I C E S E X T E R N A L S E R V I C E S Emoji provided free by Emoji One B R O W S E R A P I & C O N T R O L I N F R A S T R U C T U R E S E R V I C E
  51. O P E RAT I O N A L V

    I S I B I L I T Y P R OJ EC T SYST E M H E A LT H S Y S T E M H E A L T H A P P L I CAT I O N M E T R I C S A P P L I C A T I O N M E T R I C S AG E N T A G E N T E XT E R N A L S E R V I C E S E X T E R N A L S E R V I C E S Emoji provided free by Emoji One B R O W S E R A P I & C O N T R O L I N F R A S T R U C T U R E S E R V I C E P R O C E S S O R
  52. O P E RAT I O N A L V

    I S I B I L I T Y P R OJ EC T SYST E M H E A LT H S Y S T E M H E A L T H A P P L I CAT I O N M E T R I C S A P P L I C A T I O N M E T R I C S AG E N T A G E N T E XT E R N A L S E R V I C E S E X T E R N A L S E R V I C E S Emoji provided free by Emoji One B R O W S E R A P I & C O N T R O L I N F R A S T R U C T U R E S E R V I C E P R O C E S S O R
  53. O P E RAT I O N A L V

    I S I B I L I T Y P R OJ EC T SYST E M H E A LT H S Y S T E M H E A L T H A P P L I CAT I O N M E T R I C S A P P L I C A T I O N M E T R I C S AG E N T A G E N T E XT E R N A L S E R V I C E S E X T E R N A L S E R V I C E S Emoji provided free by Emoji One B R O W S E R A P I & C O N T R O L I N F R A S T R U C T U R E S E R V I C E P R O C E S S O R C H A N N E L S
  54. O P E RAT I O N A L V

    I S I B I L I T Y P R OJ EC T SYST E M H E A LT H S Y S T E M H E A L T H A P P L I CAT I O N M E T R I C S A P P L I C A T I O N M E T R I C S AG E N T A G E N T E XT E R N A L S E R V I C E S E X T E R N A L S E R V I C E S Emoji provided free by Emoji One B R O W S E R A P I & C O N T R O L I N F R A S T R U C T U R E S E R V I C E P R O C E S S O R C H A N N E L S ➡ Pipeline execution
  55. O P E RAT I O N A L V

    I S I B I L I T Y P R OJ EC T SYST E M H E A LT H S Y S T E M H E A L T H A P P L I CAT I O N M E T R I C S A P P L I C A T I O N M E T R I C S AG E N T A G E N T E XT E R N A L S E R V I C E S E X T E R N A L S E R V I C E S Emoji provided free by Emoji One B R O W S E R A P I & C O N T R O L I N F R A S T R U C T U R E S E R V I C E P R O C E S S O R C H A N N E L S ➡ Pipeline execution ➡ Agent-to-platform discovery
  56. E L I X I R G E N STAG

    E & F LO W P R O D U C E R P R O D U C E R / C O N S U M E R P R O D U C E R / C O N S U M E R C O N S U M E R
  57. subscribe with 
 demand E L I X I R

    G E N STAG E & F LO W P R O D U C E R P R O D U C E R / C O N S U M E R P R O D U C E R / C O N S U M E R C O N S U M E R
  58. subscribe with 
 demand E L I X I R

    G E N STAG E & F LO W P R O D U C E R P R O D U C E R / C O N S U M E R P R O D U C E R / C O N S U M E R C O N S U M E R events
  59. subscribe with 
 demand E L I X I R

    G E N STAG E & F LO W P R O D U C E R P R O D U C E R / C O N S U M E R P R O D U C E R / C O N S U M E R C O N S U M E R events demand
  60. subscribe with 
 demand E L I X I R

    G E N STAG E & F LO W P R O D U C E R P R O D U C E R / C O N S U M E R P R O D U C E R / C O N S U M E R C O N S U M E R events demand File.stream!(“/tmp/words.txt”)
  61. subscribe with 
 demand E L I X I R

    G E N STAG E & F LO W P R O D U C E R P R O D U C E R / C O N S U M E R P R O D U C E R / C O N S U M E R C O N S U M E R events demand File.stream!(“/tmp/words.txt”) |> Flow.from_enumerable()
  62. subscribe with 
 demand E L I X I R

    G E N STAG E & F LO W P R O D U C E R P R O D U C E R / C O N S U M E R P R O D U C E R / C O N S U M E R C O N S U M E R events demand File.stream!(“/tmp/words.txt”) |> Flow.from_enumerable() |> Flow.flat_map(&String.split/1)
  63. subscribe with 
 demand E L I X I R

    G E N STAG E & F LO W P R O D U C E R P R O D U C E R / C O N S U M E R P R O D U C E R / C O N S U M E R C O N S U M E R events demand File.stream!(“/tmp/words.txt”) |> Flow.from_enumerable() |> Flow.flat_map(&String.split/1) |> Flow.partition()
  64. subscribe with 
 demand E L I X I R

    G E N STAG E & F LO W P R O D U C E R P R O D U C E R / C O N S U M E R P R O D U C E R / C O N S U M E R C O N S U M E R events demand File.stream!(“/tmp/words.txt”) |> Flow.from_enumerable() |> Flow.flat_map(&String.split/1) |> Flow.partition() |> Flow.reduce(&Map.new/0, &(Map.update(&2, &1, 0, fn c -> c+1 end)))
  65. subscribe with 
 demand E L I X I R

    G E N STAG E & F LO W P R O D U C E R P R O D U C E R / C O N S U M E R P R O D U C E R / C O N S U M E R C O N S U M E R events demand File.stream!(“/tmp/words.txt”) |> Flow.from_enumerable() |> Flow.flat_map(&String.split/1) |> Flow.partition() |> Flow.reduce(&Map.new/0, &(Map.update(&2, &1, 0, fn c -> c+1 end))) |> Enum.to_list()
  66. M E T R I C S - F O

    C U S E D S T R E A M C O M B I N A T O R S S E E K I N G D O M A I N - S P E C I F I C A B S T R A C T I O N S
  67. M E T R I C S - F O

    C U S E D S T R E A M C O M B I N A T O R S S E E K I N G D O M A I N - S P E C I F I C A B S T R A C T I O N S where(type: ["disk", "free", "percent"])
  68. M E T R I C S - F O

    C U S E D S T R E A M C O M B I N A T O R S S E E K I N G D O M A I N - S P E C I F I C A B S T R A C T I O N S where(type: ["disk", "free", "percent"]) |> by([:host, :mount])
  69. M E T R I C S - F O

    C U S E D S T R E A M C O M B I N A T O R S S E E K I N G D O M A I N - S P E C I F I C A B S T R A C T I O N S where(type: ["disk", "free", "percent"]) |> by([:host, :mount]) |> ewma
  70. M E T R I C S - F O

    C U S E D S T R E A M C O M B I N A T O R S S E E K I N G D O M A I N - S P E C I F I C A B S T R A C T I O N S where(type: ["disk", "free", "percent"]) |> by([:host, :mount]) |> ewma |> threshold(below: 10.0)
  71. M E T R I C S - F O

    C U S E D S T R E A M C O M B I N A T O R S S E E K I N G D O M A I N - S P E C I F I C A B S T R A C T I O N S where(type: ["disk", "free", "percent"]) |> by([:host, :mount]) |> ewma |> threshold(below: 10.0) |> forward(:on_call_alert)
  72. M E T R I C S - F O

    C U S E D S T R E A M C O M B I N A T O R S S E E K I N G D O M A I N - S P E C I F I C A B S T R A C T I O N S where(type: ["disk", "free", "percent"]) |> by([:host, :mount]) |> ewma |> threshold(below: 10.0) |> forward(:on_call_alert) |> draw(:table)
  73. S E G M E N T I N G

    P I P E L I N E S F O R R E - U S E A U T O M A T I C W R I T E - A T T E N U A T I O N B Y
  74. S E G M E N T I N G

    P I P E L I N E S F O R R E - U S E A U T O M A T I C W R I T E - A T T E N U A T I O N B Y where(type: ["disk", "free", "percent"]) |> by([:host, :mount]) |> ewma |> threshold(below: 10.0) |> forward(:on_call_alert) |> draw(:table)
  75. S E G M E N T I N G

    P I P E L I N E S F O R R E - U S E A U T O M A T I C W R I T E - A T T E N U A T I O N B Y where(type: ["disk", "free", "percent"]) |> by([:host, :mount]) |> ewma |> threshold(below: 10.0) |> forward(:on_call_alert) |> draw(:table)
  76. where(type: ["disk", "free", "percent"]) |> by([:host, :mount]) |> ewma |>

    history(minutes: 30) S E G M E N T I N G P I P E L I N E S F O R R E - U S E A U T O M A T I C W R I T E - A T T E N U A T I O N B Y where(type: ["disk", "free", "percent"]) |> by([:host, :mount]) |> ewma |> threshold(below: 10.0) |> forward(:on_call_alert) |> draw(:table)
  77. where(type: ["disk", "free", "percent"]) |> by([:host, :mount]) |> ewma |>

    history(minutes: 30) S E G M E N T I N G P I P E L I N E S F O R R E - U S E A U T O M A T I C W R I T E - A T T E N U A T I O N B Y where(type: ["disk", "free", "percent"]) |> by([:host, :mount]) |> ewma |> threshold(below: 10.0) |> forward(:on_call_alert) |> draw(:table)
  78. where(type: ["disk", "free", "percent"]) |> by([:host, :mount]) |> ewma |>

    history(minutes: 30) S E G M E N T I N G P I P E L I N E S F O R R E - U S E A U T O M A T I C W R I T E - A T T E N U A T I O N B Y where(type: ["disk", "free", "percent"]) |> by([:host, :mount]) |> ewma |> threshold(below: 10.0) |> forward(:on_call_alert) |> draw(:table)
  79. R E T R O S P E C T

    I V E ✓ Literate programs make for good collaboration
  80. R E T R O S P E C T

    I V E ✓ Literate programs make for good collaboration ✘ Our vision is ahead of the organization
  81. R E T R O S P E C T

    I V E ✓ Literate programs make for good collaboration ✘ Our vision is ahead of the organization ✘ Stream processing is just the means, not the end
  82. R E T R O S P E C T

    I V E ✓ Literate programs make for good collaboration ✘ Our vision is ahead of the organization ✘ Stream processing is just the means, not the end ✘ Much more research is needed
  83. h t t p s : / / g i

    t . i o / v Q I g p R E F E R E N C E S
  84. W H Y E L I X I R ?

    ✓ Familiarity
  85. W H Y E L I X I R ?

    ✓ Familiarity ✓ Meta-programming
  86. W H Y E L I X I R ?

    ✓ Familiarity ✓ Meta-programming ✓ Transparent, low-latency runtime
  87. W H Y E L I X I R ?

    ✓ Familiarity ✓ Meta-programming ✓ Transparent, low-latency runtime ✓ Our ops time/budget is small
  88. W H Y E L I X I R ?

    ✓ Familiarity ✓ Meta-programming ✓ Transparent, low-latency runtime ✓ Our ops time/budget is small ✓ Generic Erlang/Elixir pitch, “let it crash”, etc
  89. M U S K E T E E R :

    A L L F O R O N E , O N E F O R A L L I N D A T A P R O C E S S I N G S Y S T E M S G O G , E T A L , E U R O S Y S ’ 1 5 “For small inputs (≤ 0.5GB), the Metis single-machine MapReduce system performs best. This matters, as small inputs are common in practice: 40–80% of Cloudera customers’ MapReduce jobs and 70% of jobs in a Facebook trace have ≤ 1GB of input.”