Slide 1

Slide 1 text

1 Unless otherwise indicated, these slides are © 2013-2014 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Unless otherwise indicated, these slides are © 2013-2014 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Reactive data-pipelines with Spring XD and Kafka Marius Bogoevici and Mark Pollack, Pivotal

Slide 2

Slide 2 text

2 Unless otherwise indicated, these slides are © 2013-2014 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Introductions Mark Pollack • Co-lead for Spring XD Marius Bogoevici • Staff Engineer with Pivotal, Spring XD team

Slide 3

Slide 3 text

3 Unless otherwise indicated, these slides are © 2013-2014 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Big Data Landscape and Stream processing Now have the ability to cheaply store and analyze huge quantities of data Traditionally the realm of batch processing but now also demand for real- time processing • aka ‘Stream Processing’ Real-time analysis examples • Fraud Detection • Measuring Quality of Service • Predictive Maintenance Some characteristics of Stream processing applications • High volume of data • Low latency • Event ordering

Slide 4

Slide 4 text

4 Unless otherwise indicated, these slides are © 2013-2014 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Characteristics of Stream Processing Applications

Slide 5

Slide 5 text

5 Unless otherwise indicated, these slides are © 2013-2014 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Apache Kafka for Stream Processing Apache project • http://kafka.apache.org “Publish-subscribe messaging rethought as a distributed commit log” • Distributed • Partitioned • Replicated • Strong ordering guarantees • Replayable • High volume Performance • ~ millions msgs/sec • 2-3 ms latency • http://bit.ly/kafka-perf

Slide 6

Slide 6 text

6 Unless otherwise indicated, these slides are © 2013-2014 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Kafka: Partitions and Topics Topics are split into replicated partitions Writes always append to a partition Producers can specify a target partition • Directly: 0,1,2 • Logical partitioning key: customerId Order of consumption preserved within a partition Efficient local stateful computation Topic

Slide 7

Slide 7 text

7 Unless otherwise indicated, these slides are © 2013-2014 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Kafka – Spring support Spring integration Kafka Project Building off simple consumer API Overcoming the limitations of the High Level Consumer • Buggy • No offset control • No partition control • Writing offsets to Zookeeper High Level Consumer imminent deprecation

Slide 8

Slide 8 text

8 Unless otherwise indicated, these slides are © 2013-2014 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ RxJava - Introduction Reactive X – http://reactivex.io/ • “An API for asynchronous programming with observable streams” • “ReactiveX is a combination of the best ideas from the Observer pattern, the Iterator pattern, and functional programming” • Java Version - NetflixOSS project: https://github.com/ReactiveX/RxJava Functional style programming with asynchronous events • Map, filter, reduce • Functional transformations are a natural way of describing stream processing

Slide 9

Slide 9 text

9 Unless otherwise indicated, these slides are © 2013-2014 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ RxJava and Stream Processing (2) RxJava Observables vs. Java 8 Streams • Similar: both promote a functional programming model • Different: Asynchronous (Observables) vs. synchronous (Java 8 Streams) Asynchronous model maps well to stream processing • Infinite sequence of messages delivered by an event-driven source Ordered processing • An Observable’s items are processed sequentially

Slide 10

Slide 10 text

10 Unless otherwise indicated, these slides are © 2013-2014 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ RxJava and Stream processing (3) Rich operator set • Time/Count windowing (window, buffer) • Grouping (groupBy) • Reduction (reduce) • Libraries (rxjava-math) • Joining and merging (merge, zip, join)

Slide 11

Slide 11 text

11 Unless otherwise indicated, these slides are © 2013-2014 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Spring XD - Introduction Modules • Source polls external source or Event Driven • Processor takes input and produces output • Sink consumes input, outputs to external system Streams • Source | {Processor}0…n | Sink Taps • Dynamically add taps to listen for events Jobs • Directed Graph of Steps • Master/Slave runtime based on Spring Batch • Workflow orchestration on Hadoop or Spark

Slide 12

Slide 12 text

12 Unless otherwise indicated, these slides are © 2013-2014 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Spring XD – Stream Model HTTP JMS Kafka RabbitMQ JMS Gemfire File SFTP Mail JDBC Twitter Syslog TCP UDP MQTT Trigger Filter Transformer Splitter Aggregator HTTP Client JPMML Evaluator Shell Python Groovy Java RxJava Spark Streaming File HDFS HAWQ Kafka RabbitMQ Redis Splunk Mongo Redis JDBC TCP Log Mail Gemfire MQTT Dynamic Router Counters

Slide 13

Slide 13 text

13 Unless otherwise indicated, these slides are © 2013-2014 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Stream DSL

Slide 14

Slide 14 text

14 Unless otherwise indicated, these slides are © 2013-2014 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Spring XD – Message Bus Binds module inputs and outputs to a transport Performs Serialization (Kryo) Local, Rabbit, Redis, and Kafka

Slide 15

Slide 15 text

15 Unless otherwise indicated, these slides are © 2013-2014 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Spring XD - Runtime http hdfs

Slide 16

Slide 16 text

16 Unless otherwise indicated, these slides are © 2013-2014 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Spring XD and Kafka - Partitioning Partitioning logic configured via deployment manifest • partitionKeyExpression=payload.sensorId When using Kafka as a bus, partition key expression maps to the Kafka producer partition key

Slide 17

Slide 17 text

17 Unless otherwise indicated, these slides are © 2013-2014 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Spring XD – Programming with RxJava Source RxJava Processor Sink Bus Bus Observable API Input Stream Output Stream

Slide 18

Slide 18 text

18 Unless otherwise indicated, these slides are © 2013-2014 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ DEMO!

Slide 19

Slide 19 text

19 Unless otherwise indicated, these slides are © 2013-2014 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Future Directions: Fluent Stream Definition Stream definition with fluent API Processor definitions via lambdas https://github.com/aclement/spring-xd-fluent

Slide 20

Slide 20 text

20 Unless otherwise indicated, these slides are © 2013-2014 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Future Directions: Designer and Monitoring UI

Slide 21

Slide 21 text

21 Unless otherwise indicated, these slides are © 2013-2014 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Conclusions Spring XD integrates with a variety of technologies The combination of Spring XD, Kafka and RxJava is a powerful stack for building streaming applications • Spring XD as a runtime and deployment model; • Kafka as a high-throughput, low-latency transport; • RxJava as a functional programming model; Spring XD is flexible, other options exist: • Lower throughput, traditional middleware as message bus; • Imperative programming model as Java code; • Delegate to Spark Streaming;

Slide 22

Slide 22 text

22 Unless otherwise indicated, these slides are © 2013-2014 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Resources • XD Project Home page • Video Intro • InfoQ Article • Mobile Device Case Study • RxJava • http://reactivex.io/ • https://github.com/ReactiveX/RxJava • Kafka • http://kafka.apache.org