Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Sweet Streams (Are made of this)

Sweet Streams (Are made of this)

Spring Cloud Data Flow - A gentle introduction.

Corneil du Plessis

October 14, 2022
Tweet

More Decks by Corneil du Plessis

Other Decks in Technology

Transcript

  1. Version 1.0 October 2022 Sweet Streams (Are made of this)

    Spring Cloud Data Flow - A gentle overview Copyright © 2022 VMware, Inc. or its affiliates.
  2. Cover w/ Image Agenda • What is Spring Cloud Data

    Flow? • Spring Batch • Spring Task • Spring Cloud Functions • Spring Cloud Stream Applications • Interview with a User • Q+A
  3. What is Spring Cloud Data Flow? • Microservice based Enterprise

    Application Integration. ◦ Batch Jobs ◦ Tasks ◦ Streams
  4. What is Spring Cloud Data Flow? • Orchestration and Deployment

    ◦ Tanzu Application Service / Cloud Foundry ◦ Kubernetes ◦ Local
  5. What is Spring Cloud Data Flow? • Observability ◦ Micrometer

    ◦ Prometheus ◦ WaveFront ◦ InfluxDB ◦ More…
  6. Stream DSL file | splitter | csv-to-json: transform > :input-stream

    http > :input-stream :input-stream > filter | aggregator > :output-stream :output-stream > jdbc :output-stream > json-to-ws-notification: transform | websocket Legend Stream App Topic App Label
  7. Batch/Task Workflows • DSL can define complex Batch and or

    Task application topologies in addition to a single Batch or Task application • Each box is a Spring Batch or Task application. • Application Flow can split and join, based on optional conditional expressions
  8. How did we build Spring Cloud Data Flow? Spring Cloud

    Data Flow Spring Boot Spring Cloud Spring Batch Spring Integration Spring AMQP Spring Kafka Spring Security Spring Data
  9. Spring Cloud Data Flow Architecture • Data Flow UI, REST

    Client or Shell • Data Flow Server • Skipper Server • Messaging Middleware
  10. Pre-packaged Stream Applications aggregator analytics bridge cassandra cdc-debezium dataflow-tasklauncher elasticsearch

    file filter ftp groovy header-enricher http http-request image-recognition jdbc jms load-generator log mail mongodb mqtt object-detection pgcopy rabbit redis router rsocket s3 script semantic-segmentation sftp splitter syslog tcp throughput time transform twitter-message twitter-search twitter-stream twitter-trend twitter-update wavefront websocket zeromq
  11. Data ingest Data comes in via file or web service

    Once in the DB the batch process can consume Lorem ipsum Lorem ipsum Lorem ipsum Lorem ipsum Lorem ipsum Lorem ipsum Lorem ipsum Lorem ipsum Lorem ipsum
  12. Design • How do you eat an elephant? • Enable

    more real-time processing • Improve the feedback loop for failures • Use Java • Functional programming paradigm • How do you eat an elephant?
  13. Stream processing System is built to be event driven Files

    are broken into individual events File and web service events are processed by streaming applications All streaming apps are custom. (Didn’t use any off the shelf applications) Lorem ipsum Lorem ipsum Lorem ipsum Lorem ipsum Lorem ipsum Lorem ipsum Lorem ipsum Lorem ipsum Lorem ipsum Lorem ipsum Lorem ipsum Lorem ipsum Lorem ipsum Lorem ipsum Lorem ipsum Lorem ipsum Lorem ipsum Lorem ipsum Lorem ipsum Lorem ipsum Lorem ipsum Lorem ipsum
  14. Stream processing Failures can be handled on individual events Streaming

    apps are designed to be idempotent Load can be spread out much better Lorem ipsum Lorem ipsum Lorem ipsum Lorem ipsum Lorem ipsum Lorem ipsum Lorem ipsum Lorem ipsum Lorem ipsum Lorem ipsum Lorem ipsum Lorem ipsum Lorem ipsum Lorem ipsum Lorem ipsum Lorem ipsum Lorem ipsum Lorem ipsum Lorem ipsum Lorem ipsum Error handler
  15. Batch processing System is built to be event driven Files

    are broken into individual events Lorem ipsum Lorem ipsum Lorem ipsum Lorem ipsum Lorem ipsum Lorem ipsum Lorem ipsum Lorem ipsum Lorem ipsum Spring Batch
  16. Idempotent processing • At some point you must write to

    the DB (sink) • Processes are designed to be idempotent • The sink that persists is designed to cater for items being replayed
  17. Developer experience • Everything is a Spring Boot application (Our

    Devs love this) • Changes can be tested locally • Custom templates • Dependency management (Solution BOM) • Easy to understand applications that are small and do one thing • Unit and integration testing is easy • Containers Paving the road to production
  18. Deployment Data Flow deployment • Deployed to Kubernetes (using Bitnami

    Helm Charts) • Message broker is Apache Kafka • Customized to use non-OSS database • JDK17 Stream and Batch applications • GitOps ◦ Applications are defined in git ◦ Stream definition (DSL) and deployment info is in git ◦ Deployments via Dataflow REST API
  19. Management and Observability Monitoring • Application metrics (uptime, resource utilization)

    • Monitor topic lag • Logs exported via sidecar application Scaling • Make sure JVM properties are set (Heap memory, CPU count, GC, etc) • Single instance of most apps • No dynamic scaling yet
  20. © 2020 Spring. A VMware-backed project. Resources • dataflow.spring.io •

    github.com/spring-cloud/spring-cloud-dataflow • dataflow.spring.io/docs/applications/pre-packaged/ • via.vmw.com/sweet-streams