the topic for software architects Create “Big Picture” for architecture of ML-based systems Architecture language for ML-based systems Foundation for Structured thinking about and designing ML-based systems Talking to ML experts and data scientists Judging existing concepts and technologies and filling the own toolbox
Software System Traditional Software Engineering (Methods & Tools) Requirements Input Data Software System Output Data Traditional DevTime RunTime SE for ML-Based Systems (Methods & Tools) Machine Learning Data Science (Methods & Tools) Software System based on ML ML Component Requirements Output Data Software System based on ML ML Component Input Data ML-Based ML-Training ML-Inference Input Data Expected Output
ML Software System Traditional Software Engineering (Methods & Tools) used to develop Software System based on ML ML Component used to develop SE for ML-Based Systems (Methods & Tools) Data Science (Methods & Tools) used to develop used to develop System with substantial size, complexity, quality requirements
ML Algorithms in detail Topology design of NNs Detailed technologies in ML Data analytics with respective tools (dashboard visualizations) Detailed architecture of autonomous driving systems
System based on ML “Driving” Sensors, Cameras, … Driving Actuators Data Pre- Processing Data Post- Processing Alternative A Software System based on ML Driving Area Detection Sensors, Cameras, … Driving Actuators “Steering” Obstacle Detection Roadsign Detection Alternative B Software System based on ML Road Marking Detection Sensors, Cameras, … Driving Actuators … Alternative C
(Generalized, Neural Network) ML Component Model Config Parameters (e.g. NN Weights) Learning Method / Algorithm (e.g. NN Topology) Hyperparameters (Training Config Params) Config Data Basic Inference Logic Learning / Training Logic Code / Logic ML Model (fixed in inference) Training Data In-Model State during Inference Data Input Data Output Data Config Data The ML Component can be treated as a black box, architecturally The ML Component is the unit of training State: E.g. in Recurrent Neural Networks with feedback relationships
Convolutional Neural Network (CNN) Topology (Layers, Nodes, Relationships) Decisions about the topology of the Neural Network are mainly done by data scientists. Architects need a basic understanding to judge external implications. https://www.easy-tensorflow.com/tf-tutorials/convolutional-neural-nets-cnns
Data 2) Calculate loss function Selected Training Data 1) Feed training data into NN 3) Adjust Config Data - Weights, Biases - Adjust topology - Hyperparameters [by learning logic or data scientist] Feed Forward Back Propagation https://towardsdatascience.com/how-to-build-your-own-neural-network-from-scratch-in-python-68998a08e4f6 ML Component Model Config Parameters (e.g. NN Weights) Learning Method / Algorithm (e.g. NN Topology) Hyperparameters (Training Config Params) Config Data Basic Inference Logic Learning / Training Logic Code / Logic ML Model (fixed in inference) In-Model State during Inference Data
Involved ML-Training (DevTime) Data Collection Data Preparation Model Selection & Training Model Evaluation Model Persistence ML-Inference (RunTime) Data Ingestion Data Preparation Inference Model Deployment Large amounts of data Computing-intensive training Exploratory approach Concrete input data Inference is comparably cheap “Just computation”
Data Ingestion Data Preparation Inference Model Deployment New Training Data from Live Operation Deploy optimized model ML-Training (DevTime) Data Collection Data Preparation Model Selection & Training Model Evaluation Model Persistence
from Current Fleet – Driving Real World, not Autonomously Yet Deploy optimized driving functions model New Training Data from Live Operation Camera images Driving situations Data labeled from driver behavior / steering Data labeled from explicit user feedback Data labeled from additional sensors (e.g. radar) Central Data Collection and Learning Model Selection & Training & Evaluation Data Preparation Model Persistence Instruct cars, which data to collect Partially human pre-processed data Architects need overall system perspective Strong integration between runtime system (cars) and development time (learning and improvement) Continuous improvement and deployment Learning from the pre-phase of autonomous driving and continuously during operation
ML-Inference (RunTime) Data Ingestion Data Preparation Inference ML-Training / Retraining (RunTime) Model Training / Optimization Model Persistence Model Deployment New Training Data from Live Operation Learning can happen at defined points in time (rather not after every inference) (DevTime) el on & ng Model Evaluation Model Persistence Model Deployment The data science work is still done at DevTime Model is selected and training is done At Runtime, only optimization of the model
data for training needed Amount depends on application area, available data and on ML models / algorithms Very different types and formats of data Text Images Video Audio … require very different treatment result in very different computational load
Autonomous Driving Data needs Large data Varied data Real data Collect data from the fleet Create simulation data Cover edge and unusual cases Image: https://www.youtube.com/watch?v=-b041NXGPZ8
/ Optimization Model Persistence ML-Inference (RunTime) Data Ingestion Data Preparation Inference Model Deployment (DevTime) el on & ng Model Evaluation Model Persistence Model Deployment Training HW Powerful Server ML Component Client Server ML Component Client ML Component Server Design Alternatives Client Server ML Component Client ML Component Server ML-Training (RunTime)
Systems (Cars) Learning strategies Online-Learning in each car? Batch-Learning in a central system, only? Can cars communicate? Compare: Learning of typing recognition on mobile phone Training HW Powerful Server ML Component ML-Training (DevTime) ML-Inference (RunTime) New Training Data from Live Operation Deploy optimized model Software System based on ML ML Component Software System based on ML ML Component Software System based on ML ML Component Software System based on ML ML Component Software System based on ML ML Component Software System based on ML ML Component
for ML Different Level of Reuse ML Component In-Model State during Inference Model Config Parameters (e.g. NN Weights) Learning Method / Algorithm (e.g. NN Topology) Hyperparameters (Training Config Params) Basic Inference Logic Learning / Training Logic Data Config Data Code / Logic ML Model (fixed in inference) Fully trained model, immutable (as API or library) [e.g. Service for image tagging] Fully trained model, retrainable (as API or library) [e.g. Service for image tagging] Predefined topology (as API or library) [e.g. predefined CNNs] Basic ML model (as library) [e.g. general NN logic] Degree of freedom Knowledge needed Effort needed
Fully trained model, immutable (as API or library) [e.g. Service for image tagging] Fully trained model, retrainable (as API or library) [e.g. Service for image tagging] Predefined topology (as API or library) [e.g. predefined CNNs] Basic ML model (as library) [e.g. general NN logic] Predefined topology (as API or library) [e.g. predefined CNNs]
ML as a technology does inherently aim more at realizing functionality than at realizing quality attributes (in contrast to e.g. communication middleware, blockchain, …) However, ML can be used to support achieving some quality attributes (e.g. achieving certain aspects of security by for example detecting attack patterns with ML) The usage of ML has significant impact on quality attributes , and thus needs architectural treatment One key aspect: missing comprehensibility / explainability what is happening in the ML- component Safety, reliability: conflicting with safety standards, needs counter-measures UX: Explaining to the user what happens / integrating user into overall flow
Fulfill the respective quality attributes of the system, respecting the overall “scale” of the system Performance (latency, throughput), scalability, … Considering the runtime system, but also the devtime / learning system Provide an adequate execution environment Sufficient computing power Sufficient storage capacity Provide the right data with adequate frequency and latency Architect has to know the requirements / implications of the ML algorithm / model
me? What can I do? Keep an eye on the architectural big picture, even if there is ML in the system ;-) Understand the very nature of ML-based systems Learn from existing systems and their solution approaches Remember the essentials of software architecture Achieving quality attributes Dealing with uncertainty Organizing and distributing work Fill your toolbox with knowledge about patterns and technologies in the ML-area Start working with data scientists / data engineers and establish a common language