Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Reactive Functional Data Pipelines with Spring Cloud Microservices

Reactive Functional Data Pipelines with Spring Cloud Microservices

[Talk given together with Mark Pollack, on February 23, 2017 at DevNexus 2017, Atlanta]

Well written microservices obey the laws of domain driven design, one of which is finding a ubiquitous language to describe their abstractions accurately.

Functional and reactive APIs provide the best language for building event-driven microservices that operate over continuous streams of events or data, whether for data movement, analytics, ES/CQRS or more traditional enterprise integration.

We will explore with live coding examples the newly added reactive programming support in Spring Cloud Stream, based on the same foundation as Spring 5: Project Reactor. We will see how it complements and enriches the ability to write event-driven microservices that can transparently implement high-level primitives such as consumer groups and partitioning with messaging systems such as RabbitMQ, Kafka or Google PubSub, and how to integrate with other layers, such as Spring Reactive Web. We will also show how to use Reactive Streams clients, such as Reactive Kafka for building end to end reactive applications.
[Talk given together with Mark Pollack at DevNexus 2017, February 23, 2017 in Atlanta]

Finally, we will show you how to chain the microservices in complex pipelines that can be seamlessly deployed on Cloud Foundry, Kubernetes or Mesos, using Spring Cloud Data Flow.

Ee7ff5474c7ecfe0ec209df0eeb531fa?s=128

Marius Bogoevici

February 23, 2017
Tweet

Transcript

  1. Reactive Functional Data Pipelines with Spring Cloud Microservices Marius Bogoevici

    Mark Pollack @mariusbogoevici @markpollack Pivotal @devnexus February 23, 2017 Atlanta
  2. Collection Storage Machine Learning Batch Analytics ETL Streaming Analytics Internet

    Presentation Things Device Data Business Applications A general IoT architecture Business Data Data Pipelines Event processing Enterprise
  3. Collection Storage Machine Learning Batch Analytics ETL Streaming Analytics Internet

    Presentation Things Device Data Business Applications A general IoT architecture Business Data Data Pipelines Event processing Enterprise High concurrency Network latency Data volume
  4. Collection Storage Machine Learning Batch Analytics ETL Streaming Analytics Internet

    Presentation Things Device Data Business Applications A general IoT architecture Business Data Data Pipelines Event processing Enterprise High concurrency Network latency Data volume Complex topologies Intuitive programming models
  5. A smaller-scale version … Sensor Data Generator HTTP Endpoint Data

    Cleansing Storage Average Calculation
  6. How to receive data at the edge? Sensor Data Generator

    HTTP Endpoint Data Cleaning Storage Average Calculation
  7. Server Threads Application 150 ms 150 ms 15 ms 15

    ms 20 ms 50 ms 150 ms 250 ms Requires a large number of threads - one per concurrent request
  8. Server Threads Application 150 ms 150 ms 15 ms 15

    ms 20 ms 50 ms 150 ms 250 ms Requires a large number of threads - one per concurrent request Processing Latency
  9. Server Threads Application 150 ms 150 ms 15 ms 15

    ms 20 ms 50 ms 150 ms 250 ms Network Latency Requires a large number of threads - one per concurrent request Processing Latency
  10. None
  11. Application IO selector worker worker worker 20 ms 50 ms

    150 ms 250 ms
  12. Application IO selector worker worker worker 20 ms 50 ms

    150 ms 250 ms Typically one thread per core
  13. Application IO selector worker worker worker 20 ms 50 ms

    150 ms 250 ms Typically one thread per core Must be nonblocking
  14. Application IO selector worker worker worker 20 ms 50 ms

    150 ms 250 ms Typically one thread per core Must be nonblocking Must handle backpressure
  15. Project Reactor • Reactive and non-blocking foundation for the JVM

    • Reactive Streams-based (with JDK 9 support too) ◦ An interop standard for nonblocking backpressure • API for reactive programming focusing on Java 8 APIs ◦ Functional programming model: map(), flatMap(), groupBy(), window() ◦ Composability • Extensions for TCP, Netty, Aeron, Kafka • Core of Reactive Spring efforts ◦ Spring 5, Spring Data, Spring Cloud Stream,...
  16. Spring WebFlux in Spring 5 http://docs.spring.io/spring-framework/docs/5.0.0.BUILD-SNAPSHOT/spring-framework-reference/html/web-reactive.html

  17. Building the HTTP endpoint Spring Web Flux Reactive HTTP

  18. Reactor Kafka • Reactive API for Kafka based on Reactor

    • Thin layer on top of Kafka Publisher-Consumer API • Efficient, non-blocking interaction with backpressure with Kafka • End-to-end reactive pipeline • Reactive Streams support (via Reactor) • Currently 1.0.0.M1
  19. Spring Cloud Stream • Event-driven microservice framework • Middleware as

    a utility • Opinionated infrastructure • Currently version 1.2 • Built on Spring portfolio components ◦ Spring Boot - self-contained applications, configurations ◦ Spring Integration - binder implementations, programming model ◦ Reactor - Reactive API
  20. Spring Cloud Stream in a nutshell Application Core Messaging Middleware

    Binder Inputs Outputs Spring Boot Configuration
  21. Spring Cloud Stream in a nutshell Application Core Messaging Middleware

    Binder Inputs Outputs Spring Boot Configuration Pluggable Messaging Middleware: RabbitMQ, Kafka, Google PubSub, JMS
  22. Spring Cloud Stream in a nutshell Application Core Messaging Middleware

    Binder Inputs Outputs Spring Boot Configuration Flexible input/output model Spring Integration Channels, KStream, Flux
  23. Spring Cloud Stream in a nutshell Application Core Messaging Middleware

    Binder Inputs Outputs Spring Boot Configuration Flexible programming model: Spring Integration, KStream, Reactor, RxJava
  24. Spring Cloud Stream in a nutshell Application Core Messaging Middleware

    Binder Inputs Outputs Spring Boot Configuration Standardized configuration model
  25. Spring Cloud Stream in a 10000 ft nutshell

  26. Spring Cloud Stream primitives • Durable Publish-Subscribe messaging ◦ For

    easily creating complex topologies • Consumer groups ◦ Multiple instances can be competing consumers when scaling • Declarative data partitioning ◦ Colocating related data in consumer instances • Content negotiation ◦ Flexible, self-descriptive serialization/deserialization strategies • Schema evolution with Avro
  27. Building the HTTP endpoint Spring Web Flux Reactive HTTP Spring

    Cloud Stream Spring Cloud Stream Reactive Kafka Binder Reactor Kafka End to end Reactive
  28. Code deep dive

  29. How do we process data? Sensor Data Generator HTTP Endpoint

    Data Cleansing Storage Average Calculation
  30. Functional Programming for Stream processing? • Different goals than the

    web endpoint ◦ Fewer concerns about external clients, network latency, resource usage, backpressure • ‘Event at a time’ vs. ‘stream processing’ ◦ Event at a time event model: classical messages are considered independent of each other. ◦ Stream Processing: concerned about groups of messages, ordered processing is important. • Functional programming is a better domain language ◦ Obvious operation on a stream of data vs. using ‘aggregator’ and ‘reducer’ classes. • Easy to adopt due to flexibility of Spring Cloud Stream ◦ Reactive programming adapters for classical messaging ◦ ‘Native’ reactive adapters where a full reactive stack is required
  31. Code deep dive

  32. Building the processing pipeline Spring Cloud Stream Spring Cloud Stream

    Reactive Spring Cloud Stream Kafka Binder Spring Cloud Stream Spring Cloud Stream Reactive Spring Cloud Stream Kafka Binder Field Transformer Average Calculator
  33. How do we store data? Sensor Data Generator HTTP Endpoint

    Data Cleansing Storage Average Calculation
  34. Building the JDBC Sink JDBC Sink Spring Cloud Stream Spring

    Cloud Stream Kafka Binder Spring Integration JDBC
  35. Spring Cloud Stream: Imperative to Reactive Application Spring Integration Binder

    (RabbitMQ, Kafka, JMS, Google PubSub) Message Channels Application Reactive Programming Model Spring Integration Binder RabbitMQ, Kafka, JMS, Google PubSub) Message Channels Spring Cloud Stream Reactive Adapter Application Reactive Programming Model Reactive API (Reactor, RxJava) Reactive Streams Binder (>1.2) Reactive Streams Integration (Kafka) Imperative Reactive Functional Programming Non-reactive messaging Full Reactive Stack Spring Integration Programming Model
  36. Collection Storage Machine Learning Batch Analytics ETL Streaming Analytics Internet

    Presentation Things Device Data Business Applications Checkpoint Business Data Data Pipelines Event processing Enterprise Reactive Functional
  37. Data Pipelines using Microservices • Stand-alone, production grade applications focused

    on data processing • Communicating with ‘lightweight mechanisms’ – messaging middleware “Write programs that do one thing and do it well.” “Write programs to work together.” “Write programs to handle text streams, because that is a universal interface.” $ cat book.txt | tr ' ' '\ ' | tr '[:upper:]' '[:lower:]' | tr -d '[:punct:]' | grep -v '[^a-z]‘ | sort | uniq -c | sort -rn | head
  38. Spring Cloud Data Flow An orchestration service for data microservice

    applications on modern runtimes Designed for integration, streaming, and batch job use-cases Data Flow Server
  39. Stream DSL Stream Definition sensorStream = http | jdbc app

    register --type source --name http --uri maven://org.example:http-source-kafka-10:1.1.2.RELEASE app register --type sink --name jdbc --uri maven://org.example:jdbc-sink-kafka-10:1.1.1.RELEASE stream create --name sensorStream --definition "http | jdbc" --deploy SCDF Shell Map names in DSL onto Maven/Docker artifacts http jdbc
  40. Demo Streams stream create --name sensorstream --definition "rxhttp | rxtransformer

    | jdbc --tableName=sensors --columns=sensorId,temperature" stream create --name sensoravg --definition ":sensorstream.rxtransformer > rxavg | jdbc --tableName=sensors_avg --columns=sensorId,average" SCDF Shell rxhttp rxtransformer jdbc rxavg jdbc
  41. Demo Streams Deployment Runtime Platform Data Flow Server DB Message

    Broker rxhttp jdbc rxtransformer rxavg jdbc
  42. Deployment Manifest stream create s1 --definition "http | work |

    hdfs" stream deploy s1 --propertiesFile ingest.properties app.http.count=2 app.work.count=3 app.hdfs.count=4 app.http.producer.partitionKeyExpression=payload.custId app.work.spring.cloud.deployer.memory=2048 SCDF Shell ingest.properties
  43. Deployment Manifest

  44. Data Flow Stream Demo

  45. Getting Started • https://projectreactor.io/ • https://cloud.spring.io/spring-cloud-stream/ • https://cloud.spring.io/spring-cloud-dataflow/ • Sample

    App ◦ https://github.com/mbogoevici/devnexus2017
  46. Q & A