Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Data Integration and Real-time Data Processing with Spring Boot

Sabby Anandan
December 17, 2018

Data Integration and Real-time Data Processing with Spring Boot

SF JUG Meetup.

More details here: https://www.meetup.com/sfjava/events/256850303/

Sabby Anandan

December 17, 2018
Tweet

More Decks by Sabby Anandan

Other Decks in Technology

Transcript

  1. - Introduce Spring projects - Discuss how they address Data

    Integration / Data Processing Challenges Goal:
  2. A toolkit for building data integration, real- time streaming, and

    batch data processing pipelines. Spring Cloud Data Flow
  3. A toolkit for building data integration, real- time streaming, and

    batch data processing pipelines. Spring Cloud Data Flow
  4. - Decentralization (no ESB) - Lightweight applications that contain integration

    logic - Loose coupling through message channels Spring Integration
  5. A toolkit for building data integration, real- time streaming, and

    batch data processing pipelines. Spring Cloud Data Flow
  6. Spring Cloud Stream a event-driven microservices framework @EnableBinding(Processor.class) public class

    Application { @StreamListener("foo") @SendTo("bar") public String replaceStringMsgHandler(String payload) { return StringUtils.replace(payload, "foo", "bar"); } } B I N D I N G E V E N T S foo channel B I N D I N G channel bar C O N S U M E R S Programming Model: Message Channel Abstraction
  7. Spring Cloud Stream a event-driven microservices framework @EnableBinding(Processor.class) public class

    Application { @StreamListener("foo") @SendTo("bar") public KStream<Object, Foo> handler(KStream<Object, Event> input){ return . .; } } E V E N T S B I N D I N G foo channel bar B I N D I N G channel C O N S U M E R S Programming Model: Native Kafka Streams
  8. Spring Cloud Stream a event-driven microservices framework @EnableBinding(Processor.class) public class

    Application { @StreamListener("foo") @SendTo("bar") public Flux<Average> sensorAverage(Flux<Sensor> data) { return . .; } } E V E N T S B I N D I N G foo channel bar B I N D I N G channel C O N S U M E R S Programming Model: Native Reactor Flux’s
  9. Spring Cloud Stream a event-driven microservices framework @EnableBinding(Processor.class) public class

    Application { @Bean public Function<String, String> toUpperCase() { return s -> s.toUpperCase(); } } E V E N T S B I N D I N G foo channel bar B I N D I N G channel C O N S U M E R S Programming Model: Plain Old Java Functions
  10. Spring Cloud Stream a event-driven microservices framework Pluggable Binder Implementations

    Stream Partitions Consumer Groups Message Headers Testing Framework Content-type Negotiation Imperative + Functional Programming Model public class TransferServiceImpl implements TransferService { public TransferServiceImpl(AccountRepository ar) { this.accountRepository = ar; }
  11. Spring Cloud Stream a event-driven microservices framework Pluggable Binder Implementations

    Rabbit MQ Apache Kafka Google PubSub Amazon Kinesis Azure Event Hubs Solace Same code + Same test-harness Drop-in replacement for a variety of Messaging Systems Opportunities:
  12. Spring Cloud Stream a event-driven microservices framework Pluggable Binder Implementations

    Stream Partitions Consumer Groups Message Headers Testing Framework Content-type Negotiation Imperative + Functional Programming Model public class TransferServiceImpl implements TransferService { public TransferServiceImpl(AccountRepository ar) { this.accountRepository = ar; }
  13. User activity in the last 30s Users created in the

    last 2 mins User interaction by region in the last 1hr window
  14. A toolkit for building data integration, real- time streaming, and

    batch data processing pipelines. Spring Cloud Data Flow
  15. - Closely related processing steps that perform a discrete business

    process - Deployable unit comprised of one or more job steps - Lifecycle management for jobs/steps Spring Batch JSR 352
  16. Spring Cloud Task a short-lived microservices framework @EnableTask @EnableBatchProcessing public

    class BatchJobApplication { @Bean public Step extractStep() { // extract business logic } @Bean public Step transformStep() { // transformation logic } @Bean public Step loadStep() { // persistence logic } @Bean public Job etlJob() { return this.jobBuilderFactory.get("etlJob") .start(extractStep()) .next(transformStep()) .next(loadStep()) .build(); } } Database R E P O S I T O R Y Programming Model: Spring Batch Job as Short-lived Application
  17. Spring Cloud Task a short-lived microservices framework @EnableTask public class

    TimestampTask { @Bean public TimestampTask timeStampTask() { return new TimestampTask(); } public static class TimestampTask implements CommandLineRunner { @Override public void run(String... strings) throws Exception { DateFormat dateFormat = . . logger.info(dateFormat.format(new Date())); } } } Database R E P O S I T O R Y Programming Model: An arbitrary business-logic as Short-lived Application
  18. Spring Cloud Task a short-lived microservices framework Lifecycle Management Transactions

    Bookkeeping for Restarts/Replay Historical Representation Remote Partitions
  19. SFTP Source TaskLauncher ETL Job/Task Database Orchestrated by Spring Cloud

    Data Flow SFTP Server poll for new files publish each file launch task for each file persist parsed data
  20. A toolkit for building data integration, real- time streaming, and

    batch data processing pipelines. Spring Cloud Data Flow
  21. A toolkit for building data integration, real- time streaming, and

    batch data processing pipelines. Spring Cloud Data Flow But wait, there’s more!
  22. Mask each Payload 111-22-3333 444-55-6666 777-88-9999 . . . The

    Security Number = xxx-xx-3333 The Security Number = xxx-xx-6666 The Security Number = xxx-xx-9999 . . . Don’t Disturb Don’t Disturb Fix This!
  23. Spring Cloud Stream Build highly scalable event-driven microservices connected with

    shared messaging systems. Spring Cloud Task Build short-lived microservices to perform data processing locally or in the cloud. Spring Cloud Skipper Discover applications and manage their lifecycle on multiple Cloud Platforms. Spring Cloud Data Flow Orchestrate data pipelines made of Spring Cloud Stream or Spring Cloud Task microservices. Consolidate Development and Testing Practices Standardize CI/CD Tooling & Automation Opportunities:
  24. Next Spring Boot 2.1 Compatibility: Stream, Task, Skipper, and SCDF

    Function Composition / Function Chaining OAuth2 + OpenID Connect by Default Deeper Integration with Micrometer for Metrics/Monitoring New Data Integration Apps
  25. Resources Spring Cloud Stream Samples | Gitter | StackOverflow Spring

    Cloud Task Samples | Gitter | StackOverflow Spring Cloud Skipper Samples | Gitter | StackOverflow Spring Cloud Data Flow Samples | Gitter | StackOverflow Demo #1: Events + Kafka Streams Demo #2: File-ingest Demo #3: CI/CD for Data Pipelines
  26. Q+A