Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Streamlining Distributed Stream Processing with...

Streamlining Distributed Stream Processing with Superchief

Video: https://www.heavybit.com/library/blog/streamlining-distributed-stream-processing-with-superchief/

Distributed stream processing is all about gaining actionable insight from high volumes of data in near real-time. Stream processing is complex so many people look to frameworks for help. Last October Librato went in the other direction, moving from Apache Storm to an In-House distributed stream processing system named SuperChief.

Ray discusses the experience of moving from Apache Storm to SuperChief. He talks about some of the challenges Librato faced with Storm and the motivation to move away from frameworks. He covers the design and implementation of SuperChief, the benefits of this new approach and the direction SuperChief is going in the future.

Ray Jenkins

May 06, 2016
Tweet

More Decks by Ray Jenkins

Other Decks in Programming

Transcript

  1. Stream Processing at Librato • Librato API ~15TB/day • 1

    - 2Gb/sec Ingress Streaming Tier • Millions Messages/sec • 20 Kafka Topics / 15 Kafka nodes • 9 SuperChief clusters/workloads • 17 m3.2xls - 24GB Heap / G1 • 12 c3.2xls - 8GB Heap / G1 • Avg CPU Util 15% / Max 30% utilization
  2. r60 r_out_60 r900 r_out_900 rwriter60 r3600 C* r60 rwriter r_out_900

    r_out_3600 r_out_3600 C* r900 r_out_1 C* r3600 API API API C* raw
  3. Disparate Tasks In Executors Tweak # of tasks to alter

    hashing, reship, and hope for a better allocation