Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Reactive Functional Data Pipelines with Spring Cloud Microservices

Reactive Functional Data Pipelines with Spring Cloud Microservices

[Talk given together with Mark Pollack, on February 23, 2017 at DevNexus 2017, Atlanta]

Well written microservices obey the laws of domain driven design, one of which is finding a ubiquitous language to describe their abstractions accurately.

Functional and reactive APIs provide the best language for building event-driven microservices that operate over continuous streams of events or data, whether for data movement, analytics, ES/CQRS or more traditional enterprise integration.

We will explore with live coding examples the newly added reactive programming support in Spring Cloud Stream, based on the same foundation as Spring 5: Project Reactor. We will see how it complements and enriches the ability to write event-driven microservices that can transparently implement high-level primitives such as consumer groups and partitioning with messaging systems such as RabbitMQ, Kafka or Google PubSub, and how to integrate with other layers, such as Spring Reactive Web. We will also show how to use Reactive Streams clients, such as Reactive Kafka for building end to end reactive applications.
[Talk given together with Mark Pollack at DevNexus 2017, February 23, 2017 in Atlanta]

Finally, we will show you how to chain the microservices in complex pipelines that can be seamlessly deployed on Cloud Foundry, Kubernetes or Mesos, using Spring Cloud Data Flow.

Marius Bogoevici

February 23, 2017
Tweet

More Decks by Marius Bogoevici

Other Decks in Technology

Transcript

  1. Reactive Functional Data Pipelines
    with Spring Cloud Microservices
    Marius Bogoevici Mark Pollack
    @mariusbogoevici @markpollack
    Pivotal
    @devnexus February 23, 2017 Atlanta

    View full-size slide

  2. Collection Storage
    Machine Learning
    Batch Analytics
    ETL
    Streaming
    Analytics
    Internet
    Presentation
    Things
    Device
    Data
    Business
    Applications
    A general IoT architecture
    Business
    Data
    Data Pipelines
    Event processing
    Enterprise

    View full-size slide

  3. Collection Storage
    Machine Learning
    Batch Analytics
    ETL
    Streaming
    Analytics
    Internet
    Presentation
    Things
    Device
    Data
    Business
    Applications
    A general IoT architecture
    Business
    Data
    Data Pipelines
    Event processing
    Enterprise
    High concurrency
    Network latency
    Data volume

    View full-size slide

  4. Collection Storage
    Machine Learning
    Batch Analytics
    ETL
    Streaming
    Analytics
    Internet
    Presentation
    Things
    Device
    Data
    Business
    Applications
    A general IoT architecture
    Business
    Data
    Data Pipelines
    Event processing
    Enterprise
    High concurrency
    Network latency
    Data volume
    Complex topologies
    Intuitive programming
    models

    View full-size slide

  5. A smaller-scale version …
    Sensor Data
    Generator
    HTTP
    Endpoint
    Data
    Cleansing
    Storage
    Average
    Calculation

    View full-size slide

  6. How to receive data at the edge?
    Sensor Data
    Generator
    HTTP
    Endpoint
    Data Cleaning
    Storage
    Average
    Calculation

    View full-size slide

  7. Server Threads
    Application
    150 ms 150 ms
    15 ms 15 ms
    20 ms
    50 ms
    150 ms
    250 ms
    Requires a large
    number of threads -
    one per concurrent
    request

    View full-size slide

  8. Server Threads
    Application
    150 ms 150 ms
    15 ms 15 ms
    20 ms
    50 ms
    150 ms
    250 ms
    Requires a large
    number of threads -
    one per concurrent
    request
    Processing
    Latency

    View full-size slide

  9. Server Threads
    Application
    150 ms 150 ms
    15 ms 15 ms
    20 ms
    50 ms
    150 ms
    250 ms
    Network
    Latency
    Requires a large
    number of threads -
    one per concurrent
    request
    Processing
    Latency

    View full-size slide

  10. Application
    IO
    selector
    worker worker worker
    20 ms
    50 ms
    150 ms
    250 ms

    View full-size slide

  11. Application
    IO
    selector
    worker worker worker
    20 ms
    50 ms
    150 ms
    250 ms
    Typically one
    thread per core

    View full-size slide

  12. Application
    IO
    selector
    worker worker worker
    20 ms
    50 ms
    150 ms
    250 ms
    Typically one
    thread per core
    Must be
    nonblocking

    View full-size slide

  13. Application
    IO
    selector
    worker worker worker
    20 ms
    50 ms
    150 ms
    250 ms
    Typically one
    thread per core
    Must be
    nonblocking
    Must handle
    backpressure

    View full-size slide

  14. Project Reactor
    ● Reactive and non-blocking foundation for the JVM
    ● Reactive Streams-based (with JDK 9 support too)
    ○ An interop standard for nonblocking backpressure
    ● API for reactive programming focusing on Java 8 APIs
    ○ Functional programming model: map(), flatMap(), groupBy(), window()
    ○ Composability
    ● Extensions for TCP, Netty, Aeron, Kafka
    ● Core of Reactive Spring efforts
    ○ Spring 5, Spring Data, Spring Cloud Stream,...

    View full-size slide

  15. Spring WebFlux in Spring 5
    http://docs.spring.io/spring-framework/docs/5.0.0.BUILD-SNAPSHOT/spring-framework-reference/html/web-reactive.html

    View full-size slide

  16. Building the HTTP endpoint
    Spring Web Flux
    Reactive HTTP

    View full-size slide

  17. Reactor Kafka
    ● Reactive API for Kafka based on Reactor
    ● Thin layer on top of Kafka Publisher-Consumer API
    ● Efficient, non-blocking interaction with backpressure with Kafka
    ● End-to-end reactive pipeline
    ● Reactive Streams support (via Reactor)
    ● Currently 1.0.0.M1

    View full-size slide

  18. Spring Cloud Stream
    ● Event-driven microservice framework
    ● Middleware as a utility
    ● Opinionated infrastructure
    ● Currently version 1.2
    ● Built on Spring portfolio components
    ○ Spring Boot - self-contained applications, configurations
    ○ Spring Integration - binder implementations, programming model
    ○ Reactor - Reactive API

    View full-size slide

  19. Spring Cloud Stream in a nutshell
    Application Core
    Messaging
    Middleware
    Binder
    Inputs
    Outputs
    Spring Boot Configuration

    View full-size slide

  20. Spring Cloud Stream in a nutshell
    Application Core
    Messaging
    Middleware
    Binder
    Inputs
    Outputs
    Spring Boot Configuration
    Pluggable Messaging
    Middleware: RabbitMQ,
    Kafka, Google PubSub, JMS

    View full-size slide

  21. Spring Cloud Stream in a nutshell
    Application Core
    Messaging
    Middleware
    Binder
    Inputs
    Outputs
    Spring Boot Configuration
    Flexible input/output model
    Spring Integration Channels,
    KStream, Flux

    View full-size slide

  22. Spring Cloud Stream in a nutshell
    Application Core
    Messaging
    Middleware
    Binder
    Inputs
    Outputs
    Spring Boot Configuration
    Flexible programming
    model:
    Spring Integration, KStream,
    Reactor, RxJava

    View full-size slide

  23. Spring Cloud Stream in a nutshell
    Application Core
    Messaging
    Middleware
    Binder
    Inputs
    Outputs
    Spring Boot Configuration
    Standardized
    configuration model

    View full-size slide

  24. Spring Cloud Stream in a 10000 ft nutshell

    View full-size slide

  25. Spring Cloud Stream primitives
    ● Durable Publish-Subscribe messaging
    ○ For easily creating complex topologies
    ● Consumer groups
    ○ Multiple instances can be competing consumers when scaling
    ● Declarative data partitioning
    ○ Colocating related data in consumer instances
    ● Content negotiation
    ○ Flexible, self-descriptive serialization/deserialization strategies
    ● Schema evolution with Avro

    View full-size slide

  26. Building the HTTP endpoint
    Spring Web Flux
    Reactive HTTP
    Spring Cloud Stream
    Spring Cloud Stream Reactive Kafka Binder
    Reactor Kafka
    End to end
    Reactive

    View full-size slide

  27. Code deep dive

    View full-size slide

  28. How do we process data?
    Sensor
    Data
    Generator
    HTTP
    Endpoint
    Data
    Cleansing
    Storage
    Average
    Calculation

    View full-size slide

  29. Functional Programming for Stream processing?
    ● Different goals than the web endpoint
    ○ Fewer concerns about external clients, network latency, resource usage, backpressure
    ● ‘Event at a time’ vs. ‘stream processing’
    ○ Event at a time event model: classical messages are considered independent of each other.
    ○ Stream Processing: concerned about groups of messages, ordered processing is important.
    ● Functional programming is a better domain language
    ○ Obvious operation on a stream of data vs. using ‘aggregator’ and ‘reducer’ classes.
    ● Easy to adopt due to flexibility of Spring Cloud Stream
    ○ Reactive programming adapters for classical messaging
    ○ ‘Native’ reactive adapters where a full reactive stack is required

    View full-size slide

  30. Code deep dive

    View full-size slide

  31. Building the processing pipeline
    Spring Cloud Stream
    Spring Cloud Stream Reactive
    Spring Cloud Stream Kafka Binder
    Spring Cloud Stream
    Spring Cloud Stream Reactive
    Spring Cloud Stream Kafka Binder
    Field Transformer Average Calculator

    View full-size slide

  32. How do we store data?
    Sensor
    Data
    Generator
    HTTP
    Endpoint
    Data
    Cleansing
    Storage
    Average
    Calculation

    View full-size slide

  33. Building the JDBC Sink
    JDBC Sink
    Spring Cloud Stream
    Spring Cloud Stream Kafka Binder
    Spring Integration
    JDBC

    View full-size slide

  34. Spring Cloud Stream: Imperative to Reactive
    Application
    Spring Integration Binder
    (RabbitMQ, Kafka, JMS,
    Google PubSub)
    Message Channels
    Application
    Reactive
    Programming Model
    Spring Integration Binder
    RabbitMQ, Kafka, JMS, Google
    PubSub)
    Message Channels
    Spring Cloud Stream Reactive Adapter
    Application
    Reactive Programming
    Model
    Reactive API (Reactor, RxJava)
    Reactive Streams Binder (>1.2)
    Reactive Streams
    Integration (Kafka)
    Imperative Reactive Functional Programming
    Non-reactive messaging
    Full Reactive Stack
    Spring Integration
    Programming Model

    View full-size slide

  35. Collection Storage
    Machine Learning
    Batch Analytics
    ETL
    Streaming
    Analytics
    Internet
    Presentation
    Things
    Device
    Data
    Business
    Applications
    Checkpoint
    Business
    Data
    Data Pipelines
    Event processing
    Enterprise
    Reactive
    Functional

    View full-size slide

  36. Data Pipelines using Microservices
    ● Stand-alone, production grade applications focused on data processing
    ● Communicating with ‘lightweight mechanisms’ – messaging middleware
    “Write programs that do one thing and do it well.”
    “Write programs to work together.”
    “Write programs to handle text streams, because that is a universal interface.”
    $ cat book.txt | tr ' ' '\ ' | tr '[:upper:]' '[:lower:]' |
    tr -d '[:punct:]' |
    grep -v '[^a-z]‘ |
    sort | uniq -c | sort -rn | head

    View full-size slide

  37. Spring Cloud Data Flow
    An orchestration service for data microservice applications on modern runtimes
    Designed for integration, streaming, and batch job use-cases
    Data Flow Server

    View full-size slide

  38. Stream DSL
    Stream Definition sensorStream = http | jdbc
    app register --type source --name http
    --uri maven://org.example:http-source-kafka-10:1.1.2.RELEASE
    app register --type sink --name jdbc
    --uri maven://org.example:jdbc-sink-kafka-10:1.1.1.RELEASE
    stream create --name sensorStream --definition "http | jdbc" --deploy
    SCDF Shell
    Map names in DSL onto Maven/Docker artifacts
    http jdbc

    View full-size slide

  39. Demo Streams
    stream create --name sensorstream --definition
    "rxhttp | rxtransformer | jdbc --tableName=sensors
    --columns=sensorId,temperature"
    stream create --name sensoravg --definition
    ":sensorstream.rxtransformer > rxavg | jdbc --tableName=sensors_avg
    --columns=sensorId,average"
    SCDF Shell
    rxhttp rxtransformer jdbc
    rxavg jdbc

    View full-size slide

  40. Demo Streams Deployment
    Runtime Platform
    Data Flow Server DB
    Message Broker
    rxhttp
    jdbc
    rxtransformer
    rxavg
    jdbc

    View full-size slide

  41. Deployment Manifest
    stream create s1 --definition "http | work | hdfs"
    stream deploy s1 --propertiesFile ingest.properties
    app.http.count=2
    app.work.count=3
    app.hdfs.count=4
    app.http.producer.partitionKeyExpression=payload.custId
    app.work.spring.cloud.deployer.memory=2048
    SCDF Shell
    ingest.properties

    View full-size slide

  42. Deployment Manifest

    View full-size slide

  43. Data Flow Stream Demo

    View full-size slide

  44. Getting Started
    ● https://projectreactor.io/
    ● https://cloud.spring.io/spring-cloud-stream/
    ● https://cloud.spring.io/spring-cloud-dataflow/
    ● Sample App
    ○ https://github.com/mbogoevici/devnexus2017

    View full-size slide