Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Open: Data Intensive Applications // Orchestration + Operational Challenges Pitch: Spring Cloud Data Flow // Orchestration + Operationalization Use: Credit Cards as the theme Show: Architecture details, solution walkthrough, and demos Plot
Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Monolith Microservices Application
Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ 1 MILLION events / sec 10 MILLION events / sec 2 MILLION events / sec Apps and Data Processing Volume 00:05 00:10 00:15 00:20 00:25 00:30 00:35 00:40 Timeline
Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Apps and Multiple Platforms Active-Passive Kubernetes Cloud Foundry Deployment Topologies Active-Active
Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Orchestrating data-intensive applications at scale is tough
Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ “Simple things should be simple, complex things should be possible.” - Alan Kay
Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Spring Cloud Data Flow A microservices based Streaming and Batch data processing in Cloud Foundry and Kubernetes
Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Microservices for Data Processing Short-lived Spring Boot microservices for batch data processing Event-driven Spring Boot microservices for real-time data processing Spring Cloud Stream Spring Cloud Task Use-cases: Scheduled data migration jobs Extract, Transform, and Load (ETL) Offline machine learning and model training Use-cases: Enterprise data integration (EAI/EIP) Event-driven architectures IoT and real-time predictive analytics Message Broker
Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Spring Cloud Stream Spring Cloud Task Spring Cloud Data Flow Ship Apps Deploy Streams Launch Batch Jobs Kubernetes / Cloud Foundry Monitor Performance Track Lifecycle Message Broker Build Apps
Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Spring Cloud Stream Spring Cloud Task Ship Apps Deploy Streams Launch Batch Jobs Monitor Performance Track Lifecycle Message Broker Build Apps Build Run Monitor
Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Apps at Massive Scale Apps and Partitioned Data Apps in the Critical Path Apps are Hungry/Slow Apps and Multiple Platforms Apps Crash Let’s Recap Apps and Data Processing Volume
Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Apps at Massive Scale Apps and Partitioned Data Apps in the Critical Path Apps are Hungry/Slow Apps and Multiple Platforms Apps Crash Dimensions Time Time Dimensions Apps and Data Processing Volume
Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Why Time Dimensions? • Sequence of metrics data ordered by timestamp • Identifiable by labels and tag dimensions • Multi-dimensional time range aggregation • Focus on the recent view of the metrics
Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Micrometer A simple facade over the instrumentation clients for the most popular monitoring systems
Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Fraud Detection Genesis • Studied popular fraud-detection solutions from Kaggle • Trained the dataset to detect fraudulent transactions • Generated pre-trained model for real- time inferences • Developed a Tensorflow based fraud- detection Spring Cloud Stream processor
Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ tl;dr: Model Training Need More Data Enough Data Model Accuracy Data Size Time Window Model Accuracy
Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ tl;dr: Model Training More Data Requires a Dynamic Deployment Topology = More Accuracy More Accuracy = More Time
Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Cloud-native Batch for Model Training Each step is short-lived; in other words, each step runs as long as the business logic runs
Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Demo 4: Cloud-native Predictive Model Training
Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Bringing Orchestration + Monitoring to Developers Apps at Massive Scale Apps and Partitioned Data Apps in the Critical Path Apps are Hungry/Slow Apps and Multiple Platforms Apps Crash Apps and Data Processing Volume Event-driven Streaming Cloud-native Batch + =
Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Timelines / Next Steps • Monitoring stateful workloads (eg: Kafka Streams) • SCDF-native `scale()` operation for metrics-driven autoscaling Current Milestone GA Milestone Spring Cloud Task 2.2 M3 November 2019 Spring Cloud Stream Hoxton / 3.0 M4 Spring Cloud Data Flow 2.3 M2 Metrics & Monitoring Roadmap