Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Real-Time Performance Analysis of Data-Processing Pipelines with Spring Cloud Data Flow

Real-Time Performance Analysis of Data-Processing Pipelines with Spring Cloud Data Flow

Sabby Anandan

October 09, 2019
Tweet

More Decks by Sabby Anandan

Other Decks in Technology

Transcript

  1. Real-Time Performance Analysis of Data-Processing Pipelines with Spring Cloud Data

    Flow Christian Tzolov (@christzolov) Sabby Anandan (@sabbyanandan)
  2. Unless otherwise indicated, these slides are © 2013-2019 Pivotal Software,

    Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Open: Data Intensive Applications // Orchestration + Operational Challenges Pitch: Spring Cloud Data Flow // Orchestration + Operationalization Use: Credit Cards as the theme Show: Architecture details, solution walkthrough, and demos Plot
  3. Unless otherwise indicated, these slides are © 2013-2019 Pivotal Software,

    Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Developers ❤ Applications
  4. Unless otherwise indicated, these slides are © 2013-2019 Pivotal Software,

    Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Monolith Application
  5. Unless otherwise indicated, these slides are © 2013-2019 Pivotal Software,

    Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Monolith Microservices Application
  6. Unless otherwise indicated, these slides are © 2013-2019 Pivotal Software,

    Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
  7. Unless otherwise indicated, these slides are © 2013-2019 Pivotal Software,

    Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Apps, apps, and more apps
  8. Unless otherwise indicated, these slides are © 2013-2019 Pivotal Software,

    Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Apps Crash
  9. Unless otherwise indicated, these slides are © 2013-2019 Pivotal Software,

    Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Apps are Hungry/Slow
  10. Unless otherwise indicated, these slides are © 2013-2019 Pivotal Software,

    Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Apps in the Critical Path
  11. Unless otherwise indicated, these slides are © 2013-2019 Pivotal Software,

    Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Apps and Partitioned Data
  12. Unless otherwise indicated, these slides are © 2013-2019 Pivotal Software,

    Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ 1 MILLION events / sec 10 MILLION events / sec 2 MILLION events / sec Apps and Data Processing Volume 00:05 00:10 00:15 00:20 00:25 00:30 00:35 00:40 Timeline
  13. Unless otherwise indicated, these slides are © 2013-2019 Pivotal Software,

    Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Apps and Multiple Platforms Active-Passive Kubernetes Cloud Foundry Deployment Topologies Active-Active
  14. Unless otherwise indicated, these slides are © 2013-2019 Pivotal Software,

    Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Orchestrating data-intensive applications at scale is tough
  15. Unless otherwise indicated, these slides are © 2013-2019 Pivotal Software,

    Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ “Simple things should be simple, complex things should be possible.” - Alan Kay
  16. Unless otherwise indicated, these slides are © 2013-2019 Pivotal Software,

    Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Spring Cloud Data Flow A microservices based Streaming and Batch data processing in Cloud Foundry and Kubernetes
  17. Unless otherwise indicated, these slides are © 2013-2019 Pivotal Software,

    Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Microservices for Data Processing Short-lived Spring Boot microservices for batch data processing Event-driven Spring Boot microservices for real-time data processing Spring Cloud Stream Spring Cloud Task Use-cases: Scheduled data migration jobs Extract, Transform, and Load (ETL) Offline machine learning and model training Use-cases: Enterprise data integration (EAI/EIP) Event-driven architectures IoT and real-time predictive analytics Message Broker
  18. Unless otherwise indicated, these slides are © 2013-2019 Pivotal Software,

    Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Spring Cloud Stream Spring Cloud Task Spring Cloud Data Flow Ship Apps Deploy Streams Launch Batch Jobs Kubernetes / Cloud Foundry Monitor Performance Track Lifecycle Message Broker Build Apps
  19. Unless otherwise indicated, these slides are © 2013-2019 Pivotal Software,

    Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Spring Cloud Stream Spring Cloud Task Ship Apps Deploy Streams Launch Batch Jobs Monitor Performance Track Lifecycle Message Broker Build Apps Build Run Monitor
  20. Unless otherwise indicated, these slides are © 2013-2019 Pivotal Software,

    Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Demo 1: Credit Card Data + Change Data Capture Postgres CDC {1, -1.15823309349523, ..} {2, -1.35835406159823, ..} {3, -0.966271711572087, ..} …….. …….. {1, -1.15823309349523, ..} {2, -1.35835406159823, ..} {3, -0.966271711572087, ..} …….. ……..
  21. Unless otherwise indicated, these slides are © 2013-2019 Pivotal Software,

    Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Apps at Massive Scale Apps and Partitioned Data Apps in the Critical Path Apps are Hungry/Slow Apps and Multiple Platforms Apps Crash Let’s Recap Apps and Data Processing Volume
  22. Unless otherwise indicated, these slides are © 2013-2019 Pivotal Software,

    Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Apps at Massive Scale Apps and Partitioned Data Apps in the Critical Path Apps are Hungry/Slow Apps and Multiple Platforms Apps Crash Dimensions Time Time Dimensions Apps and Data Processing Volume
  23. Unless otherwise indicated, these slides are © 2013-2019 Pivotal Software,

    Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Why Time Dimensions? • Sequence of metrics data ordered by timestamp • Identifiable by labels and tag dimensions • Multi-dimensional time range aggregation • Focus on the recent view of the metrics
  24. Unless otherwise indicated, these slides are © 2013-2019 Pivotal Software,

    Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Micrometer A simple facade over the instrumentation clients for the most popular monitoring systems
  25. Unless otherwise indicated, these slides are © 2013-2019 Pivotal Software,

    Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Demo 2: Credit Card Fraud Detection Postgres CDC {1, -1.15823309349523, ..} {2, -1.35835406159823, ..} {3, -0.966271711572087, ..} …….. …….. {1, -1.15823309349523, ..} {2, -1.35835406159823, ..} {3, -0.966271711572087, ..} …….. …….. NEW!
  26. Unless otherwise indicated, these slides are © 2013-2019 Pivotal Software,

    Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Fraud Detection Genesis • Studied popular fraud-detection solutions from Kaggle • Trained the dataset to detect fraudulent transactions • Generated pre-trained model for real- time inferences • Developed a Tensorflow based fraud- detection Spring Cloud Stream processor
  27. Unless otherwise indicated, these slides are © 2013-2019 Pivotal Software,

    Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Demo 3: Autoscale `fraud-detection` Processor Postgres CDC {1, -1.15823309349523, ..} {2, -1.35835406159823, ..} {3, -0.966271711572087, ..} …….. …….. {1, -1.15823309349523, ..} {2, -1.35835406159823, ..} {3, -0.966271711572087, ..} …….. …….. NEW!
  28. Unless otherwise indicated, these slides are © 2013-2019 Pivotal Software,

    Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ tl;dr: Model Training Need More Data Enough Data Model Accuracy Data Size Time Window Model Accuracy
  29. Unless otherwise indicated, these slides are © 2013-2019 Pivotal Software,

    Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ tl;dr: Model Training More Data Requires a Dynamic Deployment Topology = More Accuracy More Accuracy = More Time
  30. Unless otherwise indicated, these slides are © 2013-2019 Pivotal Software,

    Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Cloud-native Batch for Model Training Each step is short-lived; in other words, each step runs as long as the business logic runs
  31. Unless otherwise indicated, these slides are © 2013-2019 Pivotal Software,

    Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Demo 4: Cloud-native Predictive Model Training
  32. Unless otherwise indicated, these slides are © 2013-2019 Pivotal Software,

    Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Message Broker Prometheus RSocket Proxy Prometheus TSDB Grafana Scrape PromQL RSocket Bidirectional Connection Streaming Data Pipeline Batch Data Pipeline 10,000ft Architecture
  33. Unless otherwise indicated, these slides are © 2013-2019 Pivotal Software,

    Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Bringing Orchestration + Monitoring to Developers Apps at Massive Scale Apps and Partitioned Data Apps in the Critical Path Apps are Hungry/Slow Apps and Multiple Platforms Apps Crash Apps and Data Processing Volume Event-driven Streaming Cloud-native Batch + =
  34. Unless otherwise indicated, these slides are © 2013-2019 Pivotal Software,

    Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Timelines / Next Steps • Monitoring stateful workloads (eg: Kafka Streams) • SCDF-native `scale()` operation for metrics-driven autoscaling Current Milestone GA Milestone Spring Cloud Task 2.2 M3 November 2019 Spring Cloud Stream Hoxton / 3.0 M4 Spring Cloud Data Flow 2.3 M2 Metrics & Monitoring Roadmap
  35. Unless otherwise indicated, these slides are © 2013-2019 Pivotal Software,

    Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Resources • Slides: https://github.com/sabbyanandan/s1p2019 • Fraud-detection in Action: https://github.com/tzolov/cdc-fraud-detection-demo • SCDF Microsite: https://dataflow.spring.io • SCDF Docs: https://spring.io/projects/spring-cloud-dataflow#learn • ….