Slide 1

Slide 1 text

Reactive Functional Data Pipelines with Spring Cloud Microservices Marius Bogoevici Mark Pollack @mariusbogoevici @markpollack Pivotal @devnexus February 23, 2017 Atlanta

Slide 2

Slide 2 text

Collection Storage Machine Learning Batch Analytics ETL Streaming Analytics Internet Presentation Things Device Data Business Applications A general IoT architecture Business Data Data Pipelines Event processing Enterprise

Slide 3

Slide 3 text

Collection Storage Machine Learning Batch Analytics ETL Streaming Analytics Internet Presentation Things Device Data Business Applications A general IoT architecture Business Data Data Pipelines Event processing Enterprise High concurrency Network latency Data volume

Slide 4

Slide 4 text

Collection Storage Machine Learning Batch Analytics ETL Streaming Analytics Internet Presentation Things Device Data Business Applications A general IoT architecture Business Data Data Pipelines Event processing Enterprise High concurrency Network latency Data volume Complex topologies Intuitive programming models

Slide 5

Slide 5 text

A smaller-scale version … Sensor Data Generator HTTP Endpoint Data Cleansing Storage Average Calculation

Slide 6

Slide 6 text

How to receive data at the edge? Sensor Data Generator HTTP Endpoint Data Cleaning Storage Average Calculation

Slide 7

Slide 7 text

Server Threads Application 150 ms 150 ms 15 ms 15 ms 20 ms 50 ms 150 ms 250 ms Requires a large number of threads - one per concurrent request

Slide 8

Slide 8 text

Server Threads Application 150 ms 150 ms 15 ms 15 ms 20 ms 50 ms 150 ms 250 ms Requires a large number of threads - one per concurrent request Processing Latency

Slide 9

Slide 9 text

Server Threads Application 150 ms 150 ms 15 ms 15 ms 20 ms 50 ms 150 ms 250 ms Network Latency Requires a large number of threads - one per concurrent request Processing Latency

Slide 10

Slide 10 text

No content

Slide 11

Slide 11 text

Application IO selector worker worker worker 20 ms 50 ms 150 ms 250 ms

Slide 12

Slide 12 text

Application IO selector worker worker worker 20 ms 50 ms 150 ms 250 ms Typically one thread per core

Slide 13

Slide 13 text

Application IO selector worker worker worker 20 ms 50 ms 150 ms 250 ms Typically one thread per core Must be nonblocking

Slide 14

Slide 14 text

Application IO selector worker worker worker 20 ms 50 ms 150 ms 250 ms Typically one thread per core Must be nonblocking Must handle backpressure

Slide 15

Slide 15 text

Project Reactor ● Reactive and non-blocking foundation for the JVM ● Reactive Streams-based (with JDK 9 support too) ○ An interop standard for nonblocking backpressure ● API for reactive programming focusing on Java 8 APIs ○ Functional programming model: map(), flatMap(), groupBy(), window() ○ Composability ● Extensions for TCP, Netty, Aeron, Kafka ● Core of Reactive Spring efforts ○ Spring 5, Spring Data, Spring Cloud Stream,...

Slide 16

Slide 16 text

Spring WebFlux in Spring 5 http://docs.spring.io/spring-framework/docs/5.0.0.BUILD-SNAPSHOT/spring-framework-reference/html/web-reactive.html

Slide 17

Slide 17 text

Building the HTTP endpoint Spring Web Flux Reactive HTTP

Slide 18

Slide 18 text

Reactor Kafka ● Reactive API for Kafka based on Reactor ● Thin layer on top of Kafka Publisher-Consumer API ● Efficient, non-blocking interaction with backpressure with Kafka ● End-to-end reactive pipeline ● Reactive Streams support (via Reactor) ● Currently 1.0.0.M1

Slide 19

Slide 19 text

Spring Cloud Stream ● Event-driven microservice framework ● Middleware as a utility ● Opinionated infrastructure ● Currently version 1.2 ● Built on Spring portfolio components ○ Spring Boot - self-contained applications, configurations ○ Spring Integration - binder implementations, programming model ○ Reactor - Reactive API

Slide 20

Slide 20 text

Spring Cloud Stream in a nutshell Application Core Messaging Middleware Binder Inputs Outputs Spring Boot Configuration

Slide 21

Slide 21 text

Spring Cloud Stream in a nutshell Application Core Messaging Middleware Binder Inputs Outputs Spring Boot Configuration Pluggable Messaging Middleware: RabbitMQ, Kafka, Google PubSub, JMS

Slide 22

Slide 22 text

Spring Cloud Stream in a nutshell Application Core Messaging Middleware Binder Inputs Outputs Spring Boot Configuration Flexible input/output model Spring Integration Channels, KStream, Flux

Slide 23

Slide 23 text

Spring Cloud Stream in a nutshell Application Core Messaging Middleware Binder Inputs Outputs Spring Boot Configuration Flexible programming model: Spring Integration, KStream, Reactor, RxJava

Slide 24

Slide 24 text

Spring Cloud Stream in a nutshell Application Core Messaging Middleware Binder Inputs Outputs Spring Boot Configuration Standardized configuration model

Slide 25

Slide 25 text

Spring Cloud Stream in a 10000 ft nutshell

Slide 26

Slide 26 text

Spring Cloud Stream primitives ● Durable Publish-Subscribe messaging ○ For easily creating complex topologies ● Consumer groups ○ Multiple instances can be competing consumers when scaling ● Declarative data partitioning ○ Colocating related data in consumer instances ● Content negotiation ○ Flexible, self-descriptive serialization/deserialization strategies ● Schema evolution with Avro

Slide 27

Slide 27 text

Building the HTTP endpoint Spring Web Flux Reactive HTTP Spring Cloud Stream Spring Cloud Stream Reactive Kafka Binder Reactor Kafka End to end Reactive

Slide 28

Slide 28 text

Code deep dive

Slide 29

Slide 29 text

How do we process data? Sensor Data Generator HTTP Endpoint Data Cleansing Storage Average Calculation

Slide 30

Slide 30 text

Functional Programming for Stream processing? ● Different goals than the web endpoint ○ Fewer concerns about external clients, network latency, resource usage, backpressure ● ‘Event at a time’ vs. ‘stream processing’ ○ Event at a time event model: classical messages are considered independent of each other. ○ Stream Processing: concerned about groups of messages, ordered processing is important. ● Functional programming is a better domain language ○ Obvious operation on a stream of data vs. using ‘aggregator’ and ‘reducer’ classes. ● Easy to adopt due to flexibility of Spring Cloud Stream ○ Reactive programming adapters for classical messaging ○ ‘Native’ reactive adapters where a full reactive stack is required

Slide 31

Slide 31 text

Code deep dive

Slide 32

Slide 32 text

Building the processing pipeline Spring Cloud Stream Spring Cloud Stream Reactive Spring Cloud Stream Kafka Binder Spring Cloud Stream Spring Cloud Stream Reactive Spring Cloud Stream Kafka Binder Field Transformer Average Calculator

Slide 33

Slide 33 text

How do we store data? Sensor Data Generator HTTP Endpoint Data Cleansing Storage Average Calculation

Slide 34

Slide 34 text

Building the JDBC Sink JDBC Sink Spring Cloud Stream Spring Cloud Stream Kafka Binder Spring Integration JDBC

Slide 35

Slide 35 text

Spring Cloud Stream: Imperative to Reactive Application Spring Integration Binder (RabbitMQ, Kafka, JMS, Google PubSub) Message Channels Application Reactive Programming Model Spring Integration Binder RabbitMQ, Kafka, JMS, Google PubSub) Message Channels Spring Cloud Stream Reactive Adapter Application Reactive Programming Model Reactive API (Reactor, RxJava) Reactive Streams Binder (>1.2) Reactive Streams Integration (Kafka) Imperative Reactive Functional Programming Non-reactive messaging Full Reactive Stack Spring Integration Programming Model

Slide 36

Slide 36 text

Collection Storage Machine Learning Batch Analytics ETL Streaming Analytics Internet Presentation Things Device Data Business Applications Checkpoint Business Data Data Pipelines Event processing Enterprise Reactive Functional

Slide 37

Slide 37 text

Data Pipelines using Microservices ● Stand-alone, production grade applications focused on data processing ● Communicating with ‘lightweight mechanisms’ – messaging middleware “Write programs that do one thing and do it well.” “Write programs to work together.” “Write programs to handle text streams, because that is a universal interface.” $ cat book.txt | tr ' ' '\ ' | tr '[:upper:]' '[:lower:]' | tr -d '[:punct:]' | grep -v '[^a-z]‘ | sort | uniq -c | sort -rn | head

Slide 38

Slide 38 text

Spring Cloud Data Flow An orchestration service for data microservice applications on modern runtimes Designed for integration, streaming, and batch job use-cases Data Flow Server

Slide 39

Slide 39 text

Stream DSL Stream Definition sensorStream = http | jdbc app register --type source --name http --uri maven://org.example:http-source-kafka-10:1.1.2.RELEASE app register --type sink --name jdbc --uri maven://org.example:jdbc-sink-kafka-10:1.1.1.RELEASE stream create --name sensorStream --definition "http | jdbc" --deploy SCDF Shell Map names in DSL onto Maven/Docker artifacts http jdbc

Slide 40

Slide 40 text

Demo Streams stream create --name sensorstream --definition "rxhttp | rxtransformer | jdbc --tableName=sensors --columns=sensorId,temperature" stream create --name sensoravg --definition ":sensorstream.rxtransformer > rxavg | jdbc --tableName=sensors_avg --columns=sensorId,average" SCDF Shell rxhttp rxtransformer jdbc rxavg jdbc

Slide 41

Slide 41 text

Demo Streams Deployment Runtime Platform Data Flow Server DB Message Broker rxhttp jdbc rxtransformer rxavg jdbc

Slide 42

Slide 42 text

Deployment Manifest stream create s1 --definition "http | work | hdfs" stream deploy s1 --propertiesFile ingest.properties app.http.count=2 app.work.count=3 app.hdfs.count=4 app.http.producer.partitionKeyExpression=payload.custId app.work.spring.cloud.deployer.memory=2048 SCDF Shell ingest.properties

Slide 43

Slide 43 text

Deployment Manifest

Slide 44

Slide 44 text

Data Flow Stream Demo

Slide 45

Slide 45 text

Getting Started ● https://projectreactor.io/ ● https://cloud.spring.io/spring-cloud-stream/ ● https://cloud.spring.io/spring-cloud-dataflow/ ● Sample App ○ https://github.com/mbogoevici/devnexus2017

Slide 46

Slide 46 text

Q & A