Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Reactive Functional Data Pipelines with Spring Cloud Microservices

Reactive Functional Data Pipelines with Spring Cloud Microservices

[Talk given together with Mark Pollack, on February 23, 2017 at DevNexus 2017, Atlanta]

Well written microservices obey the laws of domain driven design, one of which is finding a ubiquitous language to describe their abstractions accurately.

Functional and reactive APIs provide the best language for building event-driven microservices that operate over continuous streams of events or data, whether for data movement, analytics, ES/CQRS or more traditional enterprise integration.

We will explore with live coding examples the newly added reactive programming support in Spring Cloud Stream, based on the same foundation as Spring 5: Project Reactor. We will see how it complements and enriches the ability to write event-driven microservices that can transparently implement high-level primitives such as consumer groups and partitioning with messaging systems such as RabbitMQ, Kafka or Google PubSub, and how to integrate with other layers, such as Spring Reactive Web. We will also show how to use Reactive Streams clients, such as Reactive Kafka for building end to end reactive applications.
[Talk given together with Mark Pollack at DevNexus 2017, February 23, 2017 in Atlanta]

Finally, we will show you how to chain the microservices in complex pipelines that can be seamlessly deployed on Cloud Foundry, Kubernetes or Mesos, using Spring Cloud Data Flow.

Marius Bogoevici

February 23, 2017
Tweet

More Decks by Marius Bogoevici

Other Decks in Technology

Transcript

  1. Reactive Functional Data Pipelines with Spring Cloud Microservices Marius Bogoevici

    Mark Pollack @mariusbogoevici @markpollack Pivotal @devnexus February 23, 2017 Atlanta
  2. Collection Storage Machine Learning Batch Analytics ETL Streaming Analytics Internet

    Presentation Things Device Data Business Applications A general IoT architecture Business Data Data Pipelines Event processing Enterprise
  3. Collection Storage Machine Learning Batch Analytics ETL Streaming Analytics Internet

    Presentation Things Device Data Business Applications A general IoT architecture Business Data Data Pipelines Event processing Enterprise High concurrency Network latency Data volume
  4. Collection Storage Machine Learning Batch Analytics ETL Streaming Analytics Internet

    Presentation Things Device Data Business Applications A general IoT architecture Business Data Data Pipelines Event processing Enterprise High concurrency Network latency Data volume Complex topologies Intuitive programming models
  5. How to receive data at the edge? Sensor Data Generator

    HTTP Endpoint Data Cleaning Storage Average Calculation
  6. Server Threads Application 150 ms 150 ms 15 ms 15

    ms 20 ms 50 ms 150 ms 250 ms Requires a large number of threads - one per concurrent request
  7. Server Threads Application 150 ms 150 ms 15 ms 15

    ms 20 ms 50 ms 150 ms 250 ms Requires a large number of threads - one per concurrent request Processing Latency
  8. Server Threads Application 150 ms 150 ms 15 ms 15

    ms 20 ms 50 ms 150 ms 250 ms Network Latency Requires a large number of threads - one per concurrent request Processing Latency
  9. Application IO selector worker worker worker 20 ms 50 ms

    150 ms 250 ms Typically one thread per core
  10. Application IO selector worker worker worker 20 ms 50 ms

    150 ms 250 ms Typically one thread per core Must be nonblocking
  11. Application IO selector worker worker worker 20 ms 50 ms

    150 ms 250 ms Typically one thread per core Must be nonblocking Must handle backpressure
  12. Project Reactor • Reactive and non-blocking foundation for the JVM

    • Reactive Streams-based (with JDK 9 support too) ◦ An interop standard for nonblocking backpressure • API for reactive programming focusing on Java 8 APIs ◦ Functional programming model: map(), flatMap(), groupBy(), window() ◦ Composability • Extensions for TCP, Netty, Aeron, Kafka • Core of Reactive Spring efforts ◦ Spring 5, Spring Data, Spring Cloud Stream,...
  13. Reactor Kafka • Reactive API for Kafka based on Reactor

    • Thin layer on top of Kafka Publisher-Consumer API • Efficient, non-blocking interaction with backpressure with Kafka • End-to-end reactive pipeline • Reactive Streams support (via Reactor) • Currently 1.0.0.M1
  14. Spring Cloud Stream • Event-driven microservice framework • Middleware as

    a utility • Opinionated infrastructure • Currently version 1.2 • Built on Spring portfolio components ◦ Spring Boot - self-contained applications, configurations ◦ Spring Integration - binder implementations, programming model ◦ Reactor - Reactive API
  15. Spring Cloud Stream in a nutshell Application Core Messaging Middleware

    Binder Inputs Outputs Spring Boot Configuration
  16. Spring Cloud Stream in a nutshell Application Core Messaging Middleware

    Binder Inputs Outputs Spring Boot Configuration Pluggable Messaging Middleware: RabbitMQ, Kafka, Google PubSub, JMS
  17. Spring Cloud Stream in a nutshell Application Core Messaging Middleware

    Binder Inputs Outputs Spring Boot Configuration Flexible input/output model Spring Integration Channels, KStream, Flux
  18. Spring Cloud Stream in a nutshell Application Core Messaging Middleware

    Binder Inputs Outputs Spring Boot Configuration Flexible programming model: Spring Integration, KStream, Reactor, RxJava
  19. Spring Cloud Stream in a nutshell Application Core Messaging Middleware

    Binder Inputs Outputs Spring Boot Configuration Standardized configuration model
  20. Spring Cloud Stream primitives • Durable Publish-Subscribe messaging ◦ For

    easily creating complex topologies • Consumer groups ◦ Multiple instances can be competing consumers when scaling • Declarative data partitioning ◦ Colocating related data in consumer instances • Content negotiation ◦ Flexible, self-descriptive serialization/deserialization strategies • Schema evolution with Avro
  21. Building the HTTP endpoint Spring Web Flux Reactive HTTP Spring

    Cloud Stream Spring Cloud Stream Reactive Kafka Binder Reactor Kafka End to end Reactive
  22. How do we process data? Sensor Data Generator HTTP Endpoint

    Data Cleansing Storage Average Calculation
  23. Functional Programming for Stream processing? • Different goals than the

    web endpoint ◦ Fewer concerns about external clients, network latency, resource usage, backpressure • ‘Event at a time’ vs. ‘stream processing’ ◦ Event at a time event model: classical messages are considered independent of each other. ◦ Stream Processing: concerned about groups of messages, ordered processing is important. • Functional programming is a better domain language ◦ Obvious operation on a stream of data vs. using ‘aggregator’ and ‘reducer’ classes. • Easy to adopt due to flexibility of Spring Cloud Stream ◦ Reactive programming adapters for classical messaging ◦ ‘Native’ reactive adapters where a full reactive stack is required
  24. Building the processing pipeline Spring Cloud Stream Spring Cloud Stream

    Reactive Spring Cloud Stream Kafka Binder Spring Cloud Stream Spring Cloud Stream Reactive Spring Cloud Stream Kafka Binder Field Transformer Average Calculator
  25. How do we store data? Sensor Data Generator HTTP Endpoint

    Data Cleansing Storage Average Calculation
  26. Building the JDBC Sink JDBC Sink Spring Cloud Stream Spring

    Cloud Stream Kafka Binder Spring Integration JDBC
  27. Spring Cloud Stream: Imperative to Reactive Application Spring Integration Binder

    (RabbitMQ, Kafka, JMS, Google PubSub) Message Channels Application Reactive Programming Model Spring Integration Binder RabbitMQ, Kafka, JMS, Google PubSub) Message Channels Spring Cloud Stream Reactive Adapter Application Reactive Programming Model Reactive API (Reactor, RxJava) Reactive Streams Binder (>1.2) Reactive Streams Integration (Kafka) Imperative Reactive Functional Programming Non-reactive messaging Full Reactive Stack Spring Integration Programming Model
  28. Collection Storage Machine Learning Batch Analytics ETL Streaming Analytics Internet

    Presentation Things Device Data Business Applications Checkpoint Business Data Data Pipelines Event processing Enterprise Reactive Functional
  29. Data Pipelines using Microservices • Stand-alone, production grade applications focused

    on data processing • Communicating with ‘lightweight mechanisms’ – messaging middleware “Write programs that do one thing and do it well.” “Write programs to work together.” “Write programs to handle text streams, because that is a universal interface.” $ cat book.txt | tr ' ' '\ ' | tr '[:upper:]' '[:lower:]' | tr -d '[:punct:]' | grep -v '[^a-z]‘ | sort | uniq -c | sort -rn | head
  30. Spring Cloud Data Flow An orchestration service for data microservice

    applications on modern runtimes Designed for integration, streaming, and batch job use-cases Data Flow Server
  31. Stream DSL Stream Definition sensorStream = http | jdbc app

    register --type source --name http --uri maven://org.example:http-source-kafka-10:1.1.2.RELEASE app register --type sink --name jdbc --uri maven://org.example:jdbc-sink-kafka-10:1.1.1.RELEASE stream create --name sensorStream --definition "http | jdbc" --deploy SCDF Shell Map names in DSL onto Maven/Docker artifacts http jdbc
  32. Demo Streams stream create --name sensorstream --definition "rxhttp | rxtransformer

    | jdbc --tableName=sensors --columns=sensorId,temperature" stream create --name sensoravg --definition ":sensorstream.rxtransformer > rxavg | jdbc --tableName=sensors_avg --columns=sensorId,average" SCDF Shell rxhttp rxtransformer jdbc rxavg jdbc
  33. Demo Streams Deployment Runtime Platform Data Flow Server DB Message

    Broker rxhttp jdbc rxtransformer rxavg jdbc
  34. Deployment Manifest stream create s1 --definition "http | work |

    hdfs" stream deploy s1 --propertiesFile ingest.properties app.http.count=2 app.work.count=3 app.hdfs.count=4 app.http.producer.partitionKeyExpression=payload.custId app.work.spring.cloud.deployer.memory=2048 SCDF Shell ingest.properties