Slide 1

Slide 1 text

Real-Time Performance Analysis of Data-Processing Pipelines with Spring Cloud Data Flow Christian Tzolov (@christzolov) Sabby Anandan (@sabbyanandan)

Slide 2

Slide 2 text

Unless otherwise indicated, these slides are © 2013-2019 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Open: Data Intensive Applications // Orchestration + Operational Challenges Pitch: Spring Cloud Data Flow // Orchestration + Operationalization Use: Credit Cards as the theme Show: Architecture details, solution walkthrough, and demos Plot

Slide 3

Slide 3 text

Unless otherwise indicated, these slides are © 2013-2019 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Developers ❤ Applications

Slide 4

Slide 4 text

Unless otherwise indicated, these slides are © 2013-2019 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Monolith Application

Slide 5

Slide 5 text

Unless otherwise indicated, these slides are © 2013-2019 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Monolith Microservices Application

Slide 6

Slide 6 text

Unless otherwise indicated, these slides are © 2013-2019 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/

Slide 7

Slide 7 text

Unless otherwise indicated, these slides are © 2013-2019 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Apps, apps, and more apps

Slide 8

Slide 8 text

Unless otherwise indicated, these slides are © 2013-2019 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Apps Crash

Slide 9

Slide 9 text

Unless otherwise indicated, these slides are © 2013-2019 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Apps are Hungry/Slow

Slide 10

Slide 10 text

Unless otherwise indicated, these slides are © 2013-2019 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Apps in the Critical Path

Slide 11

Slide 11 text

Unless otherwise indicated, these slides are © 2013-2019 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Apps and Partitioned Data

Slide 12

Slide 12 text

Unless otherwise indicated, these slides are © 2013-2019 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ 1 MILLION events / sec 10 MILLION events / sec 2 MILLION events / sec Apps and Data Processing Volume 00:05 00:10 00:15 00:20 00:25 00:30 00:35 00:40 Timeline

Slide 13

Slide 13 text

Unless otherwise indicated, these slides are © 2013-2019 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Apps and Multiple Platforms Active-Passive Kubernetes Cloud Foundry Deployment Topologies Active-Active

Slide 14

Slide 14 text

Unless otherwise indicated, these slides are © 2013-2019 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Orchestrating data-intensive applications at scale is tough

Slide 15

Slide 15 text

Unless otherwise indicated, these slides are © 2013-2019 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ “Simple things should be simple, complex things should be possible.” - Alan Kay

Slide 16

Slide 16 text

Unless otherwise indicated, these slides are © 2013-2019 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Spring Cloud Data Flow A microservices based Streaming and Batch data processing in Cloud Foundry and Kubernetes

Slide 17

Slide 17 text

Unless otherwise indicated, these slides are © 2013-2019 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Microservices for Data Processing Short-lived Spring Boot microservices for batch data processing Event-driven Spring Boot microservices for real-time data processing Spring Cloud Stream Spring Cloud Task Use-cases: Scheduled data migration jobs Extract, Transform, and Load (ETL) Offline machine learning and model training Use-cases: Enterprise data integration (EAI/EIP) Event-driven architectures IoT and real-time predictive analytics Message Broker

Slide 18

Slide 18 text

Unless otherwise indicated, these slides are © 2013-2019 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Spring Cloud Stream Spring Cloud Task Spring Cloud Data Flow Ship Apps Deploy Streams Launch Batch Jobs Kubernetes / Cloud Foundry Monitor Performance Track Lifecycle Message Broker Build Apps

Slide 19

Slide 19 text

Unless otherwise indicated, these slides are © 2013-2019 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Spring Cloud Stream Spring Cloud Task Ship Apps Deploy Streams Launch Batch Jobs Monitor Performance Track Lifecycle Message Broker Build Apps Build Run Monitor

Slide 20

Slide 20 text

Unless otherwise indicated, these slides are © 2013-2019 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Demo 1: Credit Card Data + Change Data Capture Postgres CDC {1, -1.15823309349523, ..} {2, -1.35835406159823, ..} {3, -0.966271711572087, ..} …….. …….. {1, -1.15823309349523, ..} {2, -1.35835406159823, ..} {3, -0.966271711572087, ..} …….. ……..

Slide 21

Slide 21 text

Unless otherwise indicated, these slides are © 2013-2019 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Apps at Massive Scale Apps and Partitioned Data Apps in the Critical Path Apps are Hungry/Slow Apps and Multiple Platforms Apps Crash Let’s Recap Apps and Data Processing Volume

Slide 22

Slide 22 text

Unless otherwise indicated, these slides are © 2013-2019 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Apps at Massive Scale Apps and Partitioned Data Apps in the Critical Path Apps are Hungry/Slow Apps and Multiple Platforms Apps Crash Dimensions Time Time Dimensions Apps and Data Processing Volume

Slide 23

Slide 23 text

Unless otherwise indicated, these slides are © 2013-2019 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Why Time Dimensions? • Sequence of metrics data ordered by timestamp • Identifiable by labels and tag dimensions • Multi-dimensional time range aggregation • Focus on the recent view of the metrics

Slide 24

Slide 24 text

Unless otherwise indicated, these slides are © 2013-2019 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Micrometer A simple facade over the instrumentation clients for the most popular monitoring systems

Slide 25

Slide 25 text

Unless otherwise indicated, these slides are © 2013-2019 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Demo 2: Credit Card Fraud Detection Postgres CDC {1, -1.15823309349523, ..} {2, -1.35835406159823, ..} {3, -0.966271711572087, ..} …….. …….. {1, -1.15823309349523, ..} {2, -1.35835406159823, ..} {3, -0.966271711572087, ..} …….. …….. NEW!

Slide 26

Slide 26 text

Unless otherwise indicated, these slides are © 2013-2019 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Fraud Detection Genesis • Studied popular fraud-detection solutions from Kaggle • Trained the dataset to detect fraudulent transactions • Generated pre-trained model for real- time inferences • Developed a Tensorflow based fraud- detection Spring Cloud Stream processor

Slide 27

Slide 27 text

Unless otherwise indicated, these slides are © 2013-2019 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Demo 3: Autoscale `fraud-detection` Processor Postgres CDC {1, -1.15823309349523, ..} {2, -1.35835406159823, ..} {3, -0.966271711572087, ..} …….. …….. {1, -1.15823309349523, ..} {2, -1.35835406159823, ..} {3, -0.966271711572087, ..} …….. …….. NEW!

Slide 28

Slide 28 text

Unless otherwise indicated, these slides are © 2013-2019 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ tl;dr: Model Training Need More Data Enough Data Model Accuracy Data Size Time Window Model Accuracy

Slide 29

Slide 29 text

Unless otherwise indicated, these slides are © 2013-2019 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ tl;dr: Model Training More Data Requires a Dynamic Deployment Topology = More Accuracy More Accuracy = More Time

Slide 30

Slide 30 text

Unless otherwise indicated, these slides are © 2013-2019 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Cloud-native Batch for Model Training Each step is short-lived; in other words, each step runs as long as the business logic runs

Slide 31

Slide 31 text

Unless otherwise indicated, these slides are © 2013-2019 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Demo 4: Cloud-native Predictive Model Training

Slide 32

Slide 32 text

Unless otherwise indicated, these slides are © 2013-2019 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Message Broker Prometheus RSocket Proxy Prometheus TSDB Grafana Scrape PromQL RSocket Bidirectional Connection Streaming Data Pipeline Batch Data Pipeline 10,000ft Architecture

Slide 33

Slide 33 text

Unless otherwise indicated, these slides are © 2013-2019 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Bringing Orchestration + Monitoring to Developers Apps at Massive Scale Apps and Partitioned Data Apps in the Critical Path Apps are Hungry/Slow Apps and Multiple Platforms Apps Crash Apps and Data Processing Volume Event-driven Streaming Cloud-native Batch + =

Slide 34

Slide 34 text

Unless otherwise indicated, these slides are © 2013-2019 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Timelines / Next Steps • Monitoring stateful workloads (eg: Kafka Streams) • SCDF-native `scale()` operation for metrics-driven autoscaling Current Milestone GA Milestone Spring Cloud Task 2.2 M3 November 2019 Spring Cloud Stream Hoxton / 3.0 M4 Spring Cloud Data Flow 2.3 M2 Metrics & Monitoring Roadmap

Slide 35

Slide 35 text

Unless otherwise indicated, these slides are © 2013-2019 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Resources • Slides: https://github.com/sabbyanandan/s1p2019 • Fraud-detection in Action: https://github.com/tzolov/cdc-fraud-detection-demo • SCDF Microsite: https://dataflow.spring.io • SCDF Docs: https://spring.io/projects/spring-cloud-dataflow#learn • ….

Slide 36

Slide 36 text

Q+A @sabbyanandan @christzolov /