$30 off During Our Annual Pro Sale. View Details »

Reactive data-pipelines with Spring XD and Kafka

Reactive data-pipelines with Spring XD and Kafka

Slides from webinar on April 28, 2014 https://www.youtube.com/watch?v=v9_rlNzhl98

In the recent years, drastic increases in data volume as well as a greater demand for low latency have led to a radical shift in business requirements and application development methods. In response to these demands, frameworks such as RxJava and high throughput messaging systems such as Kafka have emerged as key building blocks. However, integrating technologies is never easy and Spring XD provides a solution. Through its development model and runtime, Spring XD makes it easy to develop highly scalable data pipelines, and lets you focus on writing and testing business logic vs. integrating and scaling a big data stack. Come and see how easy this can be in this webinar, where we will demonstrate how to build highly scalable data pipelines with RxJava and Kafka, using Spring XD as a platform.
In the recent years, drastic increases in data volume as well as a greater demand for low latency have led to a radical shift in business requirements and application development methods. In response to these demands, frameworks such as RxJava and high throughput messaging systems such as Kafka have emerged as key building blocks. However, integrating technologies is never easy and Spring XD provides a solution. Through its development model and runtime, Spring XD makes it easy to develop highly scalable data pipelines, and lets you focus on writing and testing business logic vs. integrating and scaling a big data stack. Come and see how easy this can be in this webinar, where we will demonstrate how to build highly scalable data pipelines with RxJava and Kafka, using Spring XD as a platform.

Marius Bogoevici

April 28, 2015
Tweet

More Decks by Marius Bogoevici

Other Decks in Technology

Transcript

  1. 1 Unless otherwise indicated, these slides are © 2013-2014 Pivotal Software, Inc. and licensed under a
    Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
    Unless otherwise indicated, these slides are © 2013-2014 Pivotal Software, Inc. and licensed under a
    Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
    Reactive data-pipelines with Spring XD and Kafka
    Marius Bogoevici and Mark Pollack, Pivotal

    View Slide

  2. 2 Unless otherwise indicated, these slides are © 2013-2014 Pivotal Software, Inc. and licensed under a
    Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
    Introductions
    Mark Pollack
    • Co-lead for Spring XD
    Marius Bogoevici
    • Staff Engineer with Pivotal, Spring XD team

    View Slide

  3. 3 Unless otherwise indicated, these slides are © 2013-2014 Pivotal Software, Inc. and licensed under a
    Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
    Big Data Landscape and Stream processing
    Now have the ability to cheaply store and analyze huge quantities of data
    Traditionally the realm of batch processing but now also demand for real-
    time processing
    • aka ‘Stream Processing’
    Real-time analysis examples
    • Fraud Detection
    • Measuring Quality of Service
    • Predictive Maintenance
    Some characteristics of Stream processing applications
    • High volume of data
    • Low latency
    • Event ordering

    View Slide

  4. 4 Unless otherwise indicated, these slides are © 2013-2014 Pivotal Software, Inc. and licensed under a
    Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
    Characteristics of Stream Processing Applications

    View Slide

  5. 5 Unless otherwise indicated, these slides are © 2013-2014 Pivotal Software, Inc. and licensed under a
    Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
    Apache Kafka for Stream Processing
    Apache project
    • http://kafka.apache.org
    “Publish-subscribe messaging rethought as a distributed commit log”
    • Distributed
    • Partitioned
    • Replicated
    • Strong ordering guarantees
    • Replayable
    • High volume
    Performance
    • ~ millions msgs/sec
    • 2-3 ms latency
    • http://bit.ly/kafka-perf

    View Slide

  6. 6 Unless otherwise indicated, these slides are © 2013-2014 Pivotal Software, Inc. and licensed under a
    Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
    Kafka: Partitions and Topics
    Topics are split into replicated partitions
    Writes always append to a partition
    Producers can specify a target partition
    • Directly: 0,1,2
    • Logical partitioning key: customerId
    Order of consumption preserved within a partition
    Efficient local stateful computation
    Topic

    View Slide

  7. 7 Unless otherwise indicated, these slides are © 2013-2014 Pivotal Software, Inc. and licensed under a
    Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
    Kafka – Spring support
    Spring integration Kafka Project
    Building off simple consumer API
    Overcoming the limitations of the High Level Consumer
    • Buggy
    • No offset control
    • No partition control
    • Writing offsets to Zookeeper
    High Level Consumer imminent deprecation

    View Slide

  8. 8 Unless otherwise indicated, these slides are © 2013-2014 Pivotal Software, Inc. and licensed under a
    Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
    RxJava - Introduction
    Reactive X – http://reactivex.io/
    • “An API for asynchronous programming with observable streams”
    • “ReactiveX is a combination of the best ideas from the Observer pattern, the
    Iterator pattern, and functional programming”
    • Java Version - NetflixOSS project: https://github.com/ReactiveX/RxJava
    Functional style programming with asynchronous events
    • Map, filter, reduce
    • Functional transformations are a natural way of describing stream processing

    View Slide

  9. 9 Unless otherwise indicated, these slides are © 2013-2014 Pivotal Software, Inc. and licensed under a
    Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
    RxJava and Stream Processing (2)
    RxJava Observables vs. Java 8 Streams
    • Similar: both promote a functional programming model
    • Different: Asynchronous (Observables) vs. synchronous (Java 8 Streams)
    Asynchronous model maps well to stream processing
    • Infinite sequence of messages delivered by an event-driven source
    Ordered processing
    • An Observable’s items are processed sequentially

    View Slide

  10. 10 Unless otherwise indicated, these slides are © 2013-2014 Pivotal Software, Inc. and licensed under a
    Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
    RxJava and Stream processing (3)
    Rich operator set
    • Time/Count windowing (window, buffer)
    • Grouping (groupBy)
    • Reduction (reduce)
    • Libraries (rxjava-math)
    • Joining and merging (merge, zip, join)

    View Slide

  11. 11 Unless otherwise indicated, these slides are © 2013-2014 Pivotal Software, Inc. and licensed under a
    Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
    Spring XD - Introduction
    Modules
    • Source polls external source or Event Driven
    • Processor takes input and produces output
    • Sink consumes input, outputs to external system
    Streams
    • Source | {Processor}0…n | Sink
    Taps
    • Dynamically add taps to listen for events
    Jobs
    • Directed Graph of Steps
    • Master/Slave runtime based on Spring Batch
    • Workflow orchestration on Hadoop or Spark

    View Slide

  12. 12 Unless otherwise indicated, these slides are © 2013-2014 Pivotal Software, Inc. and licensed under a
    Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
    Spring XD – Stream Model
    HTTP
    JMS
    Kafka
    RabbitMQ
    JMS
    Gemfire
    File
    SFTP
    Mail
    JDBC
    Twitter
    Syslog
    TCP
    UDP
    MQTT
    Trigger
    Filter
    Transformer
    Splitter
    Aggregator
    HTTP Client
    JPMML Evaluator
    Shell
    Python
    Groovy
    Java
    RxJava
    Spark Streaming
    File
    HDFS
    HAWQ
    Kafka
    RabbitMQ
    Redis
    Splunk
    Mongo
    Redis
    JDBC
    TCP
    Log
    Mail
    Gemfire
    MQTT
    Dynamic Router
    Counters

    View Slide

  13. 13 Unless otherwise indicated, these slides are © 2013-2014 Pivotal Software, Inc. and licensed under a
    Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
    Stream DSL

    View Slide

  14. 14 Unless otherwise indicated, these slides are © 2013-2014 Pivotal Software, Inc. and licensed under a
    Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
    Spring XD – Message Bus
    Binds module inputs and outputs to a transport
    Performs Serialization (Kryo)
    Local, Rabbit, Redis, and Kafka

    View Slide

  15. 15 Unless otherwise indicated, these slides are © 2013-2014 Pivotal Software, Inc. and licensed under a
    Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
    Spring XD - Runtime
    http hdfs

    View Slide

  16. 16 Unless otherwise indicated, these slides are © 2013-2014 Pivotal Software, Inc. and licensed under a
    Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
    Spring XD and Kafka - Partitioning
    Partitioning logic configured via deployment manifest
    • partitionKeyExpression=payload.sensorId
    When using Kafka as a bus, partition key expression maps to the Kafka
    producer partition key

    View Slide

  17. 17 Unless otherwise indicated, these slides are © 2013-2014 Pivotal Software, Inc. and licensed under a
    Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
    Spring XD – Programming with RxJava
    Source RxJava Processor Sink
    Bus Bus
    Observable API
    Input
    Stream
    Output
    Stream

    View Slide

  18. 18 Unless otherwise indicated, these slides are © 2013-2014 Pivotal Software, Inc. and licensed under a
    Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
    DEMO!

    View Slide

  19. 19 Unless otherwise indicated, these slides are © 2013-2014 Pivotal Software, Inc. and licensed under a
    Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
    Future Directions: Fluent Stream Definition
    Stream definition with fluent API
    Processor definitions via lambdas
    https://github.com/aclement/spring-xd-fluent

    View Slide

  20. 20 Unless otherwise indicated, these slides are © 2013-2014 Pivotal Software, Inc. and licensed under a
    Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
    Future Directions: Designer and Monitoring UI

    View Slide

  21. 21 Unless otherwise indicated, these slides are © 2013-2014 Pivotal Software, Inc. and licensed under a
    Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
    Conclusions
    Spring XD integrates with a variety of technologies
    The combination of Spring XD, Kafka and RxJava is a powerful stack for
    building streaming applications
    • Spring XD as a runtime and deployment model;
    • Kafka as a high-throughput, low-latency transport;
    • RxJava as a functional programming model;
    Spring XD is flexible, other options exist:
    • Lower throughput, traditional middleware as message bus;
    • Imperative programming model as Java code;
    • Delegate to Spark Streaming;

    View Slide

  22. 22 Unless otherwise indicated, these slides are © 2013-2014 Pivotal Software, Inc. and licensed under a
    Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
    Resources
    • XD Project Home page
    • Video Intro
    • InfoQ Article
    • Mobile Device Case Study
    • RxJava
    • http://reactivex.io/
    • https://github.com/ReactiveX/RxJava
    • Kafka
    • http://kafka.apache.org

    View Slide