$30 off During Our Annual Pro Sale. View Details »

Future of Software Engineering: SWEBOK Evolution and Machine Learning SE including ML Design Patterns

Future of Software Engineering: SWEBOK Evolution and Machine Learning SE including ML Design Patterns

Hironori Washizaki, "Future of Software Engineering: SWEBOK Evolution and Machine Learning SE including ML Design Patterns," Scientific Talk 45th, BINUS University, August 30th, 2022.

washizaki

August 30, 2022
Tweet

More Decks by washizaki

Other Decks in Technology

Transcript

  1. https://www.computer.org/volunteering/nomination-election/election Future of Software Engineering: SWEBOK Evolution and Machine Learning

    SE including ML Design Patterns
  2. Prof. Dr. Hironori Washizaki • Professor and the Associate Dean

    of the Research Promotion Division at Waseda University in Tokyo • Visiting Professor at the National Institute of Informatics • Outside Directors of SYSTEM INFORMATION and eXmotion • Research and education projects • Leading a large-scale grant at MEXT enPiT-Pro Smart SE • Leading framework team of JST MIRAI eAI project • Professional contributions • IEEE Computer Society Vice President for Professional and Educational Activities • Editorial Board Member of MDPI Education Sciences • Steering Committee Member of the IEEE Conference on Software Engineering Education and Training (CSEE&T) • Associate Editor of IEEE Transactions on Emerging Topics in Computing • Advisory Committee Member of the IEEE-CS COMPSAC • Steering Committee Member of Asia-Pacific Software Engineering Conference (APSEC) • Convener of ISO/IEC/JTC1 SC7/WG20
  3. Agenda • SWEBOK Evolution • Machine Learning Software Engineering •

    Machine Learning Design Patterns 3
  4. Necessity and Importance of Application of Engineering to Software •

    We need to be able to produce reliable and trustworthy systems economically and quickly! • It is usually cheaper to use software engineering methods and techniques rather than just write the programs. • Majority of costs are the costs of changing the software after it has gone into use. 4
  5. Software Engineering • First appeared in1968 at NATO Science Committee

    • B.W. Boehm 1976 (1935-2022) – “The practical application of scientific knowledge to the design and construction of computer programs and the associated documentation required to develop, operate, and maintain them.” • R.E. Fairley 1985 – “The technological and managerial discipline concerned with systematic production and maintenance of Software products that are developed end modified on time and within cost estimates.” • IEEE 2004 and SWEBOK Guide – “The application of a systematic, disciplined, quantifiable approach to the development, operation, and maintenance of Software.” 5
  6. Guide to the Software Engineering Body of Knowledge (SWEBOK) http://swebokwiki.org

    • History: 2001 v1, 2004 v2, 2005 ISO/IEC Technical Report, 2014 v3, 2022 v4 • Objective – Guiding learners, researchers and practitioners to identify and have common understanding on “generally-accepted-knowledge” in software engineering – Defining boundary of software engineering and related disciplines – Providing foundations for certifications and educational curriculum • Adoption – IEEE-CS software professional certification programs based on SWEBOK (Associate Software Developer, Professional Software Developer, Professional Software Engineering Master) – ISO/IEC 24773-4: Certification of software and systems engineering professionals - Part 4: Software engineering – Software Engineering Competency Model (SWECOM) Activities (and practices) Body of Knowledge Islands of Knowledge Tasks (and activities) To Do Doing Done
  7. SWEBOK Evolution from V3 to V4 • Modern software engineering,

    practice change and update, BOK grows and recently developed areas • Public review is ongoing! https://www.computer.org/volunteering/boards-and-committees/professional-educational-activities/software- engineering-committee/swebok-evolution Requirements Design Construction Testing Maintenance Configuration Management Engineering Management Process Models and Methods Quality Professional Practice Economics Computing Foundations Mathematical Foundations Engineering Foundations Requirements Architecture Design Construction Testing Operations Maintenance Configuration Management Engineering Management Process Models and Methods Quality Security Professional Practice Economics Computing Foundations Mathematical and Engineering Foundations SWEBOK V3 SWEBOK V4 Agile, DevOps Agile testing ・・・ Agile security ・・・
  8. Vision of SWEBOK V4 (subject to change) (Evolution lead: Hironori

    Washizaki, since 2018-) https://www.computer.org/volunteering/boards-and-committees/professional-educational-activities/software-engineering-committee/swebok-evolution • Related areas – AI/Machine Learning – Restructuring foundation areas incl. Internet of Things (IoT) • Value in SE – Value proposition • Dependable SE – Architecture – Security • Modern SE – Agile – DevOps 8 Software Engineering Value proposition AI/ML Engineering Foundation incl. IoT Architecture Security Agile & DevOps Related areas Value Dependable Modern
  9. Agenda • SWEBOK Evolution • Machine Learning Software Engineering •

    Machine Learning Design Patterns 9
  10. Paradigm shifts in “new” software engineering 10 Current New Scope

    and perspective Software systems Software systems, business, society and related disciplines Process Planned, static, common, and closed Adaptive, dynamic, diverse, and open Focus Specification Value, data, and speed Thinking Cognitive (logical) or affective (design) Cognitive (logical), affective (design), and conative (conceptual) Inference Deduction and analogy Deduction, analogy, induction, and abduction Hironori Washizaki, Junzo Hagimoto, Kazuo Hamai, Mitsunori Seki, Takeshi Inoue, Shinya Taniguchi, Hiroshi Kobayashi, Kenji Hiranabe and Eiichi Hanyuda, “Framework and Value-Driven Process of Software Engineering for Business and Society (SE4BS),” 5th International Conference on Enterprise Architecture and Information Systems (EAIS 2020)
  11. ML software engineering: Induction (and abduction) 11 Goal Data Model

    Behavior Goal Model Behavior Data Conventional software engineering: Deduction ML software engineering: Induction (and abduction) H. Maruyama, “Machine Learning Engineering and Reuse of AI Work Products,” The First International Workshop on Sharing and Reuse of AI Work Products, 2017 Hironori Washizaki, “Towards Software Value Co-Creation with AI”, The 44th IEEE Computer Society Signature Conference on Computers, Software, and Applications (COMPSAC 2020), Fast Abstract
  12. ML meets SE • Machine learning (ML) systems are complex

    systems. – Machine learning algorithm’s behaviors are probabilistic because they depend on training data. – The probabilistic behaviors derive emerging quality or validation concerns. – Studies have investigated ML system’s bugs derived from algorithms, data dependency and its architectures. • Current status of ML software engineering – Quality aspects of ML systems are active research topics such as DNN testing and repair. – ML system development challenges are also discussed by conducting empirical case study (e.g., [Lwakatare19] ). 12 [Lwakatare19] L. E. Lwakatare et al, “A taxonomy of software engineering challenges for machine learning systems: An empirical investigation,” XP 2019
  13. Techniques specific or particularly useful for ML quality assurance Training

    data Trained model Prediction, inference Infrastructure software system New data ML model debugging Monitoring, goal-oriented modeling Testing oracle problem, balanced dataset and coverage Performance, robustness and explainability Architecture validity and quality assurance Suitability with objective, handling unexpected situations N. Uchihira, AI and Software Engineering, JUSE SQiP 2017 Eric Breck et al., The ML Test Score: A Rubric for ML Production Readiness and Technical Debt Reduction, IEEE Big Data 2017 Metamorphic testing Search-based testing Framework, practices and design patterns 13
  14. Practice investigation: List of papers investigated 1. C. Hill et

    al., “Trials and tribulations of developers of intelligent systems: A field study,” VL/HCC 2016. 2. M. Kim et al., “The Emerging Role of Data Scientists on Software Development Teams,” ICSE 2016. 3. G. Dove et al., “UX Design Innovation: Challenges for Working with Machine Learning As a Design Material,” CHI 2017. 4. R. Fiebrink et al., “Human Model Evaluation in Interactive Supervised Learning,” CHI 2011. 5. C. T. Wolf, “Professional Identity and Information Use: On Becoming a Machine Learning Developer,” iConference 2019. 6. T. Seymoens et al., “A Methodology to Involve Domain Experts and Machine Learning Techniques in the Design of Human-Centered Algorithms,” HWID 2018. 7. A. Billewicz, “Silly Lamp: Study of a Relationship with Engaging Machine Learning Artefacts,” CHI 2018. 8. S. Amershi et al., “Software Engineering for Machine Learning: A Case Study,” ICSE 2019. 9. M. Zinkevich, “Rules of Machine Learning: Best Practices for ML Engineering,” http://martin.zinkevich.org/rules_of_ml/rules_of_ml.pdf , 2017. 14 Y. Watanabe, H. Washizaki, e al., “Preliminary Literature Review of Machine Learning System Development Practices,” 45th IEEE Computer Society Signature Conference on Computers, Software and Applications (COMPSAC 2021)
  15. ML system development process and emphasized phases • Model Training

    and Model Evaluation phases are frequently described. – There are issues and practices in both phases. • Some phases are mentioned in specific communities only. – Two papers published in human-centered design communities describe issues in Model Requirement. – In contrast, issues in Data Cleaning & Data Labeling, practices in Data Collection, and issues and practices in Feature Engineering, Model Training & Model Deployment are mentioned by papers published in computer science communities only. 15 Model Requirements Data Collection Data Cleaning Feature Engineering Data Labeling Model Training Model Evaluation Model Deployment Model Monitoring S. Amershi, et al., “Software engineering for machine learning: a case study,” ICSE (SEIP) 2019 Y. Watanabe, H. Washizaki, e al., “Preliminary Literature Review of Machine Learning System Development Practices,” 45th IEEE Computer Society Signature Conference on Computers, Software and Applications (COMPSAC 2021)
  16. 16 Phase Issues Practices Model Requirement For designers, difficult to

    understand ML algorithms and their potentials… Design and implement metrics first.… Data Collection Difficult to understand data from third party…. Launch products without ML… Data Cleaning Unclear methods to preprocess data… Data Labeling Strains on resources for labelling… Feature Engineering Difficult to measure the effect of features… Domain knowledge and past experiences… Model Training Ad-hoc algorithm selection based on past experiences… Select simplest algorithms… Model Evaluation Difficult to understand results… Measure model performances… Model Deployment Copy pipeline and drop necessary data… Model infrastructures correctly… Model Monitoring Know data attribute requirements… Cross-Cutting Highly dependent on individuals… Provide opportunities to share knowledge… Y. Watanabe, H. Washizaki, e al., “Preliminary Literature Review of Machine Learning System Development Practices,” 45th IEEE Computer Society Signature Conference on Computers, Software and Applications (COMPSAC 2021)
  17. Identified common practices • Traditional practices – Separation of concerns:

    To identify sub issues and deal with them step-by- step – Goal-oriented: To focus on a project’s goal – Other traditional practices • ML specific practices – Data concerns: To handle data issues – Start small: To start with simplified issues – Measurement: To measure uncertainty – Heuristics: Relying on ML developers’ experiences 17 Y. Watanabe, H. Washizaki, e al., “Preliminary Literature Review of Machine Learning System Development Practices,” 45th IEEE Computer Society Signature Conference on Computers, Software and Applications (COMPSAC 2021)
  18. Techniques specific or particularly useful for ML quality assurance Training

    data Trained model Prediction, inference Infrastructure software system New data ML model debugging Monitoring, goal-oriented modeling Testing oracle problem, balanced dataset and coverage Performance, robustness and explainability Architecture validity and quality assurance Suitability with objective and handling unexpected situations N. Uchihira, AI and Software Engineering, JUSE SQiP 2017 Eric Breck et al., The ML Test Score: A Rubric for ML Production Readiness and Technical Debt Reduction, IEEE Big Data 2017 Metamorphic testing Search-based testing Framework, practices and design patterns 18
  19. Metamorphic testing • Testing based on metamorphic relations • Relationship

    whereby changes to the input can predict changes to the output. – E.g., Search results using a query “X” ⊇ Search results using “X AND Y” – E.g., sin(x) = sin(x + 360) 19 Change in input ( t ) Change in output ( g ) Sorting None Adding noise Semantically identical Statistically identical Similar Slight change Constant addition and multiplication Constant addition and multiplication Narrowing Subset Completely different Disjoint x t(x) f(x) g(f(x)) Transformation t g 参考: S. Segura et al., "Metamorphic Testing of RESTful Web APIs," IEEE Transactions on Software Engineering, 2017 参考: C. Murphy, “Applications of Metamorphic Testing”, http://www.cis.upenn.edu/~cdmurphy/pubs/MetamorphicTesting-Columbia-17Nov2011.ppt f(t(x)) = Transformation f f 19
  20. Metamorphic testing in ML-based detection and recognition • Adding noise

    Autonomous cars [Tian’18] Characters Y Tian, et al., DeepTest: Automated Testing of Deep-Neural-Network-driven Autonomous Cars, ICSE 2018 https://arxiv.org/pdf/1708.08559.pdf 20
  21. Neural network model debugging (repair) • Retraining – A straightforward

    method, but time-consuming and costly – Difficulty in considering additional training data, possible performance degradation • Online learning – Modification through sequential learning with specific data – Possible side effects of performance degradation • Data augmentation: generation [a], selection [b], expansion [c], etc. – Trial and error without directly modifying model parameters – Potential vulnerability to hostile samples • Direct modification of parameters according to specific samples – Correction for specific labels for adversarial examples [d]. – Finding and correcting impacted areas in failed data [e]. 21 [a] Generative adversarial nets, NIPS 2014 [b] MODE: automated neural network model debugging via state differential analysis and input selection, ESEC/FSE 2018 [c] Autoaugment: Learning augmentation policies from data, arXiv:1805.09501, 2019 [d] Unlearned Modification of Neural Network Models for Adversarial Examples and Its Evaluation, JSSST 2019 [e] Search Based Repair of Deep Neural Networks, arXiv:1912.12463, 2019
  22. Debugging for adversarial examples [e] • Localizing the area to

    be fixed and directly changes the weights for the purpose of shortening the time of model fixing process 22 [d] Unlearned Modification of Neural Network Models for Adversarial Examples and Its Evaluation, JSSST 2019 Average of neuron’s output values for successful data Average of neuron’s output values for misrecognition data Neurons with high priority for fixing
  23. Muti-view modeling of ML systems with patterns • ML project

    canvas to describe project requirements 23 Jati H. Husen, Hironori Washizaki, Hnin Thandar Tun, Nobukazu Yoshioka and Yoshiaki Fukazawa, “Modeling Tool for Managing Canvas-Based Models Traceability in ML System Development,” ACM/IEEE 25th International Conference on Model Driven Engineering Languages and Systems (MODELS 2022)
  24. Muti-view modeling of ML systems with patterns • Goal-oriented approach

    to align and rationalize multiple models and patterns with higher-level development goals 24 Jati H. Husen, Hironori Washizaki, Hnin Thandar Tun, Nobukazu Yoshioka and Yoshiaki Fukazawa, “Modeling Tool for Managing Canvas-Based Models Traceability in ML System Development,” ACM/IEEE 25th International Conference on Model Driven Engineering Languages and Systems (MODELS 2022) ML Design Patterns
  25. Agenda • SWEBOK Evolution • Machine Learning Software Engineering •

    Machine Learning Design Patterns 25 Hironori Washizaki, Foutse Khomh, Yann-Gael Gueheneuc, Hironori Takeuchi, Naotake Natori, Takuo Doi, Satoshi Okuda, “Software Engineering Design Patterns for Machine Learning Applications,” IEEE Computer, Vol. 55, No. 3, pp. 30-39, 2022.
  26. 26 Street Cafe Problem: Needs to have a place where

    people can sit lazily, legitimately, be on view, and watch the world go by… Solution: Encourage local cafes to spring up in each neighborhood. Make them intimate places, with several rooms, open to a busy path … Alexander, Christopher, et al. A Pattern Language. Oxford University Press, 1977. https://unsplash.com/photos/8IKf54pc3qk https://unsplash.com/photos/zACLEreWKXE
  27. Towards a pattern language … OK, so, to attract many

    people to our city, Small Public Squares should be located in the center. At the SMALL PUBLIC SQUARE, make Street Cafes be Opening to the Street ... 27 Small Public Square Street Cafe Opening to the Street https://unsplash.com/photos/EdpbTj3Br-Y https://unsplash.com/photos/GqurqYbj7aU https://unsplash.com/photos/zFoRwZirFvY
  28. ML software engineering needs patterns! • Bridge between abstract paradigms

    and concrete cases/tools – Documenting Know-Why, Know-What and Know-How – Reusing solutions and problems – Getting consistent architecture • Common language among stakeholders – Software engineers, data scientist, domain experts, network engineers, … 28 Paradigm Case Tool FW Instruction ? ?
  29. Practices and patterns in ML-SE • Researchers and practitioners studying

    best practices strive to design Machine Learning (ML) systems and software. • Some practices are formalized as patterns. (NOTE: NOT handle ML model patterns.) 29 Data Lake for ML K. M. Hazelwood, et al., Applied Machine Learning at Facebook: A Datacenter Infrastructure Perspective, HPCA 2018 Different Workloads in Different Computing Environments (e.g., Facebook) https://unsplash.com/photos/_HPlEmsKgP0 FBLearner Feature Store FBLearner Flow FBLearner Predictor Trained model Deployed model Features Data FBLearner Feature Store FBLearner Flow FBLearner Predictor Storage CPU CPU + GPU CPU Structured data Unstructured data Hironori Washizaki, Foutse Khomh, Yann-Gael Gueheneuc, Hironori Takeuchi, Naotake Natori, Takuo Doi, Satoshi Okuda, “Software Engineering Design Patterns for Machine Learning Applications,” IEEE Computer, Vol. 55, No. 3, pp. 30-39, 2022.
  30. Problem and goal • ML system architecture and design patterns

    at different abstraction levels are not well classified and studied. • Thus, we conducted a survey of software developers and an Systematic Literature Review. 30 Well-documented patterns Different Workloads in Different Computing Environments Different Workloads in Different Computing Environments Practices with less information Data Lake for ML Data Lake for ML Scholarly papers Gray documents Hironori Washizaki, Foutse Khomh, Yann-Gael Gueheneuc, Hironori Takeuchi, Naotake Natori, Takuo Doi, Satoshi Okuda, “Software Engineering Design Patterns for Machine Learning Applications,” IEEE Computer, Vol. 55, No. 3, pp. 30-39, 2022.
  31. Research questions • RQ1. Does academic and gray literature address

    the design of ML systems and software? – 19 scholarly and 19 gray documents identified – 15 SE patterns for ML applications extracted • RQ2. Can ML patterns be classified? – Categories of scopes: Topology, programming and model – Quality attributes: ISO/IEC 25010:2011 System and software product quality attributes, ML model and prediction quality attributes • RQ3. How do practitioners perceive ML patterns? – Questionnaire-based survey for 600+ developers – Developers were unfamiliar with most ML patterns, although there were several major patterns used by 20% 31 Hironori Washizaki, Foutse Khomh, Yann-Gael Gueheneuc, Hironori Takeuchi, Naotake Natori, Takuo Doi, Satoshi Okuda, “Software Engineering Design Patterns for Machine Learning Applications,” IEEE Computer, Vol. 55, No. 3, pp. 30-39, 2022.
  32. RQ2. Can ML patterns be classified? • Model operation patterns

    that focus on ML models • Programming patterns that define the design of a particular component • Topology patterns that define the entire system architecture. 32 Training data Trained model Prediction Training Infrastructure Input data Programming patterns Serving Infrastructure Model operation patterns Topology patterns Hironori Washizaki, Foutse Khomh, Yann-Gael Gueheneuc, Hironori Takeuchi, Naotake Natori, Takuo Doi, Satoshi Okuda, “Software Engineering Design Patterns for Machine Learning Applications,” IEEE Computer, Vol. 55, No. 3, pp. 30-39, 2022.
  33. Topology patterns 33 Pattern Problem Solution Different Workloads in Different

    Computing Environments It is necessary to separate and quickly change the ML data workload … Physically isolate different workloads to separate machines… Distinguish Business Logic from ML Models The overall business logic should be isolated from the ML models … Separate the business logic and the inference engine, loosely coupling the business logic and ML-specific dataflows. ML Gateway Routing Architecture Difficult to set up and manage individual endpoints for each service… Install a gateway before a set of applications … Microservice Architecture for ML ML applications may be confined to some “known” ML frameworks … Provide well-defined services to use for ML frameworks…. Lambda Architecture for ML Real-time data processing requires scalability, fault tolerance, predictability … The batch layer keeps producing views while the speed layer creates the relevant real-time views … Kappa Architecture for ML It is necessary to deal with huge amount of data with less code resource … Support both real-time data processing and continuous reprocessing with a single stream processing engine … Hironori Washizaki, Foutse Khomh, Yann-Gael Gueheneuc, Hironori Takeuchi, Naotake Natori, Takuo Doi, Satoshi Okuda, “Software Engineering Design Patterns for Machine Learning Applications,” IEEE Computer, Vol. 55, No. 3, pp. 30-39, 2022.
  34. Distinguish Business Logic from ML Models • Problem: Business logic

    should be isolated from ML models so that they can be changed without impacting rest of business logic. • Solution: Separate the business logic and the inference engine, loosely coupling the business logic and ML-specific dataflows. 34 H. Yokoyama, Machine Learning System Architectural Pattern for Improving Operational Stability, ICSA-C, 2019 Data Layer Logic Layer Presentation Layer User Interface Database Data Collection Data Lake Business Logic Data Processing Inference Engine Real World Business Logic Specific ML Specific Architectural Layers Deployed as ML System Business Logic Data Flow ML Runtime Data Flow ML Development Data Flow Legend Hironori Washizaki, Foutse Khomh, Yann-Gael Gueheneuc, Hironori Takeuchi, Naotake Natori, Takuo Doi, Satoshi Okuda, “Software Engineering Design Patterns for Machine Learning Applications,” IEEE Computer, Vol. 55, No. 3, pp. 30-39, 2022.
  35. Usage of Distinguish Business Logic from ML Models 35 Data

    Layer Logic Layer Presentation Layer User Interface (Chatbot UI) Web App Front-end Slack Business Logic (Chatbot Logic) Web App Back-end Slack Data Collection (Dataset) Datasets Nagoya Univ. Conversation Corpus Data Processing (Text to Vector Transformer) NN Model pre- and post- processing TensorFlow Inference Engine (Language Model) NN Model TensorFlow Database (Previous Q&A Store) DB Server (None) Data Lake (Vectorized Corpus) Word Vector TensorFlow (Text) Users Data Source Input Output Datasets ML Input ML Output Architectural Elements (Example Role as Chatbot) What How Business Logic Data Flow ML Development Data Flow ML Runtime Data Flow Input Data Output Data Input Data Datasets Word Vector Legend Hironori Washizaki, Foutse Khomh, Yann-Gael Gueheneuc, Hironori Takeuchi, Naotake Natori, Takuo Doi, Satoshi Okuda, “Software Engineering Design Patterns for Machine Learning Applications,” IEEE Computer, Vol. 55, No. 3, pp. 30-39, 2022.
  36. Programming patterns 36 Pattern Problem Solution Data Lake for ML

    We cannot foresee the kind of analyses that will be performed on the data … Store data, which range from structured to unstructured, as “raw” as possible into a data storage … Separation of Concerns and Modularization of ML Components ML applications must accommodate regular and frequent changes to their ML components … Decouple at different levels of complexity from the simplest to the most complex … Encapsulate ML Models within Rule- based Safeguards ML models are known to be unstable and vulnerable to adversarial attacks, drifts, … Encapsulate functionality in the containing system using deterministic and verifiable rules … Discard PoC Code The code created for Proof of Concept (PoC) often includes code that sacrifices maintainability … Discard the code created for the PoC and rebuild maintainable code … Hironori Washizaki, Foutse Khomh, Yann-Gael Gueheneuc, Hironori Takeuchi, Naotake Natori, Takuo Doi, Satoshi Okuda, “Software Engineering Design Patterns for Machine Learning Applications,” IEEE Computer, Vol. 55, No. 3, pp. 30-39, 2022.
  37. Encapsulate ML Models within Rule-based Safeguards • Problem: ML models

    are known to be unstable and vulnerable to adversarial attacks, noise, and data drift. • Solution: Encapsulate functionality provided by ML models and deal with the inherent uncertainty in the containing system using deterministic and verifiable rules. • Know usage: E.g. Apollos’s object detection [Peng20] 37 Business Logic API Rule-based Safeguard Inference (Prediction) Encapsulated ML model Input Output Rule Z. Peng, et al., A First Look at the Integration of Machine Learning Models in Complex Autonomous Driving Systems, ESEC/FSE’20 Hironori Washizaki, Foutse Khomh, Yann-Gael Gueheneuc, Hironori Takeuchi, Naotake Natori, Takuo Doi, Satoshi Okuda, “Software Engineering Design Patterns for Machine Learning Applications,” IEEE Computer, Vol. 55, No. 3, pp. 30-39, 2022.
  38. Model operation patterns 38 Pattern Problem Solution Parameter-Server Abstraction For

    distributed learning, widely accepted abstractions are lacking … Distribute both data and workloads over worker nodes, while the server nodes maintain globally shared parameters … Data Flows Up, Model Flows Down Standard ML approaches require centralizing the training data on one machine … Enable mobile devices to collaboratively learn while keeping all the training data on the device as federated learning … Secure Aggregation The system needs to communicate and aggregate model updates in a secure and scalable way … Encrypt data from each device and calculate totals and averages without individual examination … Deployable Canary Model A surrogate ML that approximates the behavior of best model must be built to provide explainability … Run the explainable inference pipeline in parallel to monitor prediction differences … ML Versioning ML models and their different versions may change the behavior of the overall ML applications … Record the ML model, dataset, and code to ensure a reproducible training and inference processes … Hironori Washizaki, Foutse Khomh, Yann-Gael Gueheneuc, Hironori Takeuchi, Naotake Natori, Takuo Doi, Satoshi Okuda, “Software Engineering Design Patterns for Machine Learning Applications,” IEEE Computer, Vol. 55, No. 3, pp. 30-39, 2022.
  39. Deployable Canary Model • Problem: A surrogate ML that approximates

    the behavior of the best ML model must be built to provide explainability. • Solution: Run the explainable inference pipeline in parallel with the primary inference pipeline to monitor prediction differences. • Known usage: Image-based anomaly detection at factory 39 S. Ghanta et al., Interpretability and reproducibility in production machine learning applications, ICMLA 2018 Input Decoy model Data lake Canary model (E.g., Decision tree) Production model (E.g., DNN) Monitoring and comparison Output Output Reproduce and retraining Hironori Washizaki, Foutse Khomh, Yann-Gael Gueheneuc, Hironori Takeuchi, Naotake Natori, Takuo Doi, Satoshi Okuda, “Software Engineering Design Patterns for Machine Learning Applications,” IEEE Computer, Vol. 55, No. 3, pp. 30-39, 2022.
  40. RQ3. Practitioners’ insights on ML design patterns • Surveyed 600+

    developers, 118 answered • Have you ever referred to ML patterns? – Major: ML Versioning, Microservice Architecture for ML – None: Secure Aggregation, Data Flows Up (aka. Federated Learning) 40 Knew it Didn’t know it 0 20 40 60 80 100 120 Data Flows Up, Model Flows Down Secure Aggregation Deployable Canary Model Kappa Architecture for ML Parameter-Server Abstraction Different Workloads in Different Computing… Encapsulate ML models within rule-base… ML Gateway Routing Architecture Lambda Architecture for ML Separation of Concerns and Modularization of… Distinguish Business Logic from ML Models Data Lake for ML Discard PoC code Microservice Architecture for ML ML Versioning Used it Never used it Consider using it Not consider
  41. Practitioners’ insights on ML design patterns • Have you ever

    referred to ML patterns? – Developers were unfamiliar with most ML patterns, although there were several major patterns (such as ML Versioning and Microservice Architecture for ML) used by 20+% of the respondents. – For all patterns, most respondents indicated that they would consider using them in future designs. – Promoting existing ML patterns will increase their utilization • How do you solve and share design challenges of ML application systems? – 37 (i.e., 31%) organized design patterns and past design results. – As respondents become more organized in their approach to design problems by reuse, the pattern usage ratio increased. – Development teams and organizations will reuse more ML patterns as they become more consistent in their reuse approach. 41 Design solution and reuse practice #Respondents #Patterns used Pattern usage ratio Lv3. Organizing, reusing patterns (and past results) 37 64 11.5% Lv2. Reusing externally documented patterns 31 50 10.8% Lv1. Resolving problems in an ad-hoc way 37 35 6.3% Others 13 3 1.5%
  42. Future research direction • Identify ML patterns addressing specific quality

    attributes that are not handled well now – Security, usability, and explainability • Investigate the impact of patterns on quality attributes of systems • Analyze relationships among patterns including related ones towards a pattern language – V. Lakshmanan et al., “Machine Learning Design Patterns,” O’Reilly, 2020. – Y. Shibui, “Machine Learning System Design Patterns”, https://github.com/mercari/ml-system-design-pattern, 2020. • Integration into framework to handle from requirements ton implementations and testing/debugging 42 Jati H. Husen, Hnin Thandar Tun, Nobukazu Yoshioka, Hironori Washizaki and Yoshiaki Fukazawa, “Goal-Oriented Machine Learning-Based Component Development Process,” ACM/IEEE 24th International Conference on Model Driven Engineering Languages and Systems (MODELS), Poster Hironori Washizaki, Foutse Khomh, Yann-Gael Gueheneuc, Hironori Takeuchi, Naotake Natori, Takuo Doi, Satoshi Okuda, “Software Engineering Design Patterns for Machine Learning Applications,” IEEE Computer, Vol. 55, No. 3, pp. 30-39, 2022.
  43. Conclusion • Software engineering and SWEBOK evolution from V3 to

    V4 – Modern software engineering, practice change and update, BOK grows and recently developed areas – Architecture, operations, security, agile, AI/IoT – Public review of V4 draft is ongoing! • Machine Learning Software Engineering – Paradigm shifts in “new” software engineering – ML learning SE as induction – Traditional and ML-specific practices – SE techniques particularly useful for ML quality assurance: Metamorphic testing, neural network model debugging, multi-view (goal) modeling • Machine Learning Design Patterns – ML software engineering needs patterns as common language – Developers were unfamiliar with most ML patterns, although there were several major patterns used by 20+% of the respondents. – Promoting existing ML patterns will increase their utilization 43
  44. https://www.computer.org/volunteering/nomination-election/election Future of Software Engineering: SWEBOK Evolution and Machine Learning

    SE including ML Design Patterns