Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Reactive data-pipelines with Spring XD and Kafka

Reactive data-pipelines with Spring XD and Kafka

Slides from webinar on April 28, 2014 https://www.youtube.com/watch?v=v9_rlNzhl98

In the recent years, drastic increases in data volume as well as a greater demand for low latency have led to a radical shift in business requirements and application development methods. In response to these demands, frameworks such as RxJava and high throughput messaging systems such as Kafka have emerged as key building blocks. However, integrating technologies is never easy and Spring XD provides a solution. Through its development model and runtime, Spring XD makes it easy to develop highly scalable data pipelines, and lets you focus on writing and testing business logic vs. integrating and scaling a big data stack. Come and see how easy this can be in this webinar, where we will demonstrate how to build highly scalable data pipelines with RxJava and Kafka, using Spring XD as a platform.
In the recent years, drastic increases in data volume as well as a greater demand for low latency have led to a radical shift in business requirements and application development methods. In response to these demands, frameworks such as RxJava and high throughput messaging systems such as Kafka have emerged as key building blocks. However, integrating technologies is never easy and Spring XD provides a solution. Through its development model and runtime, Spring XD makes it easy to develop highly scalable data pipelines, and lets you focus on writing and testing business logic vs. integrating and scaling a big data stack. Come and see how easy this can be in this webinar, where we will demonstrate how to build highly scalable data pipelines with RxJava and Kafka, using Spring XD as a platform.

Marius Bogoevici

April 28, 2015
Tweet

More Decks by Marius Bogoevici

Other Decks in Technology

Transcript

  1. 1 Unless otherwise indicated, these slides are © 2013-2014 Pivotal

    Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Unless otherwise indicated, these slides are © 2013-2014 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Reactive data-pipelines with Spring XD and Kafka Marius Bogoevici and Mark Pollack, Pivotal
  2. 2 Unless otherwise indicated, these slides are © 2013-2014 Pivotal

    Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Introductions Mark Pollack • Co-lead for Spring XD Marius Bogoevici • Staff Engineer with Pivotal, Spring XD team
  3. 3 Unless otherwise indicated, these slides are © 2013-2014 Pivotal

    Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Big Data Landscape and Stream processing Now have the ability to cheaply store and analyze huge quantities of data Traditionally the realm of batch processing but now also demand for real- time processing • aka ‘Stream Processing’ Real-time analysis examples • Fraud Detection • Measuring Quality of Service • Predictive Maintenance Some characteristics of Stream processing applications • High volume of data • Low latency • Event ordering
  4. 4 Unless otherwise indicated, these slides are © 2013-2014 Pivotal

    Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Characteristics of Stream Processing Applications
  5. 5 Unless otherwise indicated, these slides are © 2013-2014 Pivotal

    Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Apache Kafka for Stream Processing Apache project • http://kafka.apache.org “Publish-subscribe messaging rethought as a distributed commit log” • Distributed • Partitioned • Replicated • Strong ordering guarantees • Replayable • High volume Performance • ~ millions msgs/sec • 2-3 ms latency • http://bit.ly/kafka-perf
  6. 6 Unless otherwise indicated, these slides are © 2013-2014 Pivotal

    Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Kafka: Partitions and Topics Topics are split into replicated partitions Writes always append to a partition Producers can specify a target partition • Directly: 0,1,2 • Logical partitioning key: customerId Order of consumption preserved within a partition Efficient local stateful computation Topic
  7. 7 Unless otherwise indicated, these slides are © 2013-2014 Pivotal

    Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Kafka – Spring support Spring integration Kafka Project Building off simple consumer API Overcoming the limitations of the High Level Consumer • Buggy • No offset control • No partition control • Writing offsets to Zookeeper High Level Consumer imminent deprecation
  8. 8 Unless otherwise indicated, these slides are © 2013-2014 Pivotal

    Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ RxJava - Introduction Reactive X – http://reactivex.io/ • “An API for asynchronous programming with observable streams” • “ReactiveX is a combination of the best ideas from the Observer pattern, the Iterator pattern, and functional programming” • Java Version - NetflixOSS project: https://github.com/ReactiveX/RxJava Functional style programming with asynchronous events • Map, filter, reduce • Functional transformations are a natural way of describing stream processing
  9. 9 Unless otherwise indicated, these slides are © 2013-2014 Pivotal

    Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ RxJava and Stream Processing (2) RxJava Observables vs. Java 8 Streams • Similar: both promote a functional programming model • Different: Asynchronous (Observables) vs. synchronous (Java 8 Streams) Asynchronous model maps well to stream processing • Infinite sequence of messages delivered by an event-driven source Ordered processing • An Observable’s items are processed sequentially
  10. 10 Unless otherwise indicated, these slides are © 2013-2014 Pivotal

    Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ RxJava and Stream processing (3) Rich operator set • Time/Count windowing (window, buffer) • Grouping (groupBy) • Reduction (reduce) • Libraries (rxjava-math) • Joining and merging (merge, zip, join)
  11. 11 Unless otherwise indicated, these slides are © 2013-2014 Pivotal

    Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Spring XD - Introduction Modules • Source polls external source or Event Driven • Processor takes input and produces output • Sink consumes input, outputs to external system Streams • Source | {Processor}0…n | Sink Taps • Dynamically add taps to listen for events Jobs • Directed Graph of Steps • Master/Slave runtime based on Spring Batch • Workflow orchestration on Hadoop or Spark
  12. 12 Unless otherwise indicated, these slides are © 2013-2014 Pivotal

    Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Spring XD – Stream Model HTTP JMS Kafka RabbitMQ JMS Gemfire File SFTP Mail JDBC Twitter Syslog TCP UDP MQTT Trigger Filter Transformer Splitter Aggregator HTTP Client JPMML Evaluator Shell Python Groovy Java RxJava Spark Streaming File HDFS HAWQ Kafka RabbitMQ Redis Splunk Mongo Redis JDBC TCP Log Mail Gemfire MQTT Dynamic Router Counters
  13. 13 Unless otherwise indicated, these slides are © 2013-2014 Pivotal

    Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Stream DSL
  14. 14 Unless otherwise indicated, these slides are © 2013-2014 Pivotal

    Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Spring XD – Message Bus Binds module inputs and outputs to a transport Performs Serialization (Kryo) Local, Rabbit, Redis, and Kafka
  15. 15 Unless otherwise indicated, these slides are © 2013-2014 Pivotal

    Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Spring XD - Runtime http hdfs
  16. 16 Unless otherwise indicated, these slides are © 2013-2014 Pivotal

    Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Spring XD and Kafka - Partitioning Partitioning logic configured via deployment manifest • partitionKeyExpression=payload.sensorId When using Kafka as a bus, partition key expression maps to the Kafka producer partition key
  17. 17 Unless otherwise indicated, these slides are © 2013-2014 Pivotal

    Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Spring XD – Programming with RxJava Source RxJava Processor Sink Bus Bus Observable API Input Stream Output Stream
  18. 18 Unless otherwise indicated, these slides are © 2013-2014 Pivotal

    Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ DEMO!
  19. 19 Unless otherwise indicated, these slides are © 2013-2014 Pivotal

    Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Future Directions: Fluent Stream Definition Stream definition with fluent API Processor definitions via lambdas https://github.com/aclement/spring-xd-fluent
  20. 20 Unless otherwise indicated, these slides are © 2013-2014 Pivotal

    Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Future Directions: Designer and Monitoring UI
  21. 21 Unless otherwise indicated, these slides are © 2013-2014 Pivotal

    Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Conclusions Spring XD integrates with a variety of technologies The combination of Spring XD, Kafka and RxJava is a powerful stack for building streaming applications • Spring XD as a runtime and deployment model; • Kafka as a high-throughput, low-latency transport; • RxJava as a functional programming model; Spring XD is flexible, other options exist: • Lower throughput, traditional middleware as message bus; • Imperative programming model as Java code; • Delegate to Spark Streaming;
  22. 22 Unless otherwise indicated, these slides are © 2013-2014 Pivotal

    Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Resources • XD Project Home page • Video Intro • InfoQ Article • Mobile Device Case Study • RxJava • http://reactivex.io/ • https://github.com/ReactiveX/RxJava • Kafka • http://kafka.apache.org