Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Elastic stream processing without tears

Elastic stream processing without tears

Avatar for Michael Hausenblas

Michael Hausenblas

October 01, 2015
Tweet

More Decks by Michael Hausenblas

Other Decks in Technology

Transcript

  1. © 2015 Mesosphere, Inc. All Rights Reserved. ELASTIC STREAM PROCESSING

    WITHOUT TEARS 1 Michael Hausenblas, Developer & Cloud Advocate | 2015-10-01 | Strata NYC
  2. © 2015 Mesosphere, Inc. All Rights Reserved. LET'S TALK ABOUT

    WORKLOADS* … 3 *) kudos to Timothy St. Clair, @timothysc batch streaming PaaS MapReduce
  3. © 2015 Mesosphere, Inc. All Rights Reserved. LOCAL OS VS.

    DISTRIBUTED OS 12 http://bitly.com/os-vs-dcos
  4. © 2015 Mesosphere, Inc. All Rights Reserved. DCOS IS A

    DISTRIBUTED OPERATING SYSTEM 13 • local OS per node (+container enabled) • scheduling (long-lived, batch) • networking • service discovery • stateful services • security • monitoring, logging, debugging
  5. © 2015 Mesosphere, Inc. All Rights Reserved. BENEFITS 16 DCOS

    • Run stateless services such as Web server, app server, etc. and Big Data services like HDFS, C*, Spark, etc. together on one cluster • Dynamic partitioning of your cluster, depending on your needs (business requirements) • Increased utilization (10% → 80% an more)
  6. © 2015 Mesosphere, Inc. All Rights Reserved. • Kafka •

    ØMQ, RabbitMQ, Disque (Redis-based), etc. • fluentd, Logstash, Flume, etc. • Akka streams • cloud-only: AWS SQS, Google Cloud Pub/Sub • see also queues.io MESSAGE QUEUES & ROUTERS 18
  7. © 2015 Mesosphere, Inc. All Rights Reserved. STREAM PROCESSING PLATFORMS

    19 • Storm • Spark • Samza • Flink • Concord • cloud-only: AWS Kinesis, Google Cloud Dataflow • see also my webinar on stream processing
  8. © 2015 Mesosphere, Inc. All Rights Reserved. TIME SERIES DATASTORES

    20 • InfluxDB • OpenTSDB • KairosDB • Prometheus • see also iot-a.info
  9. Concord • Distributed, event-based stream processing framework • Built on

    top of Apache Mesos, in C++ • Simple to use, all-in-one stream processing
  10. Benchmarking a Stream Processor • Distributed systems means 
 distributed

    results • You can’t profile processes as you would in a single machine • Latency measurements require instrumentation
  11. Prior approaches Benchmarking Apache Samza: 1.2 million messages per second

    on a single node: https://engineering.linkedin.com/performance/benchmarking-apache- samza-12-million-messages-second-single-node
  12. Prior approaches • Benchmarking Scenarios – Message passing – Key

    counting in memory • Isolating framework performance vs. 
 User code performance • Sampling throughput in 1 second windows
  13. Our Approach • Be realistic - Kafka as a data

    source – Convenient way to regulate data flow • Frequency counting as a simple task – Demonstrates the correctness of the framework, not accidently benchmarking C++ vs. Java, etc. – Dictionary limited to 9000 words to avoid 
 excess memory allocation / pressure • Sample msg throughput for data source & sink • End-to-end latency (Concord only)
  14. Setup Each cluster has 6 nodes: • n1-standard-4:
 4 vCPUs,

    15 GB RAM, 160 GB SSD • One “master”, 5 “workers” • Kafka prefilled with 1.13 billion messages (random words) • One worker dedicated to consume from Kafka • Remaining workers process msgs & log results
  15. Test problem Key counting (single node & 5 node cluster)

    • 3 operator topology, a => b => c • a reads from a queue • b counts words – with every tuple, updated count emitted down stream • c writes the result into a log file as CSV plaintext – word, frequency • Log files to be post processed to determine accuracy
  16. Test problem Word counting (single node & 5 node cluster)

    • 3 operator topology • Log files to be post processed to determine accuracy
  17. Results (in-progress) Storm • Single node throughput: 16,000 msgs /

    sec • Cluster-wide throughput: 65,000 msgs / sec Concord • Single node throughput: 100,000 msgs / sec • Cluster-wide throughput: pending
  18. Lessons Learned • It’s hard to setup each of these

    systems • Measuring latency is tricky – Requires instrumentation – The ability to follow a message all the way through the processing pipeline • Necessary to isolate Kafka consumer performance
  19. Future Plans • Finish benchmarking for Spark Streaming & Concord

    • Scale up the Kafka cluster • Isolate the performance of Kafka consumers • Optimization efforts • Other frameworks like Samza, Flink, etc. • Instrument non-Concord frameworks with tracing to measure end-to-end latency
  20. Questions & Feedback? We know this is far from perfect,

    but we had to start somewhere… :) Sign up for our office hours at: http://bit.ly/concordoh   [email protected]
  21. © 2015 Mesosphere, Inc. All Rights Reserved. MESOSPHERE IS HIRING,

    WORLDWIDE … San Francisco New York Hamburg https://mesosphere.com/careers/
  22. © 2015 Mesosphere, Inc. All Rights Reserved. Q & A

    38 • @mhausenblas • mhausenblas.info • @mesosphere • mesosphere.io/product • mesosphere.com/infinity