Upgrade to Pro — share decks privately, control downloads, hide ads and more …

SWEBOK Evolution and Machine Learning Software Engineering including ML Design Patterns

SWEBOK Evolution and Machine Learning Software Engineering including ML Design Patterns

Hironori Washizaki, “SWEBOK Evolution and Machine Learning Software Engineering including ML Design Patterns,” SELMA Seminar, Polytechnique Montreal, Concordia University, September 20th-21st, 2022.

Hironori Washizaki

September 21, 2022
Tweet

More Decks by Hironori Washizaki

Other Decks in Technology

Transcript

  1. SWEBOK Evolution and Machine Learning SE
    including ML Design Patterns
    Hironori Washizaki
    Waseda University / National Institute of Informatics /
    SYSTEM INFORMATION / eXmotion
    Twitter: @Hiro_Washi [email protected]
    http://www.washi.cs.waseda.ac.jp/
    V20220920
    2022 September 20-21 Montreal

    View Slide

  2. Prof. Dr. Hironori Washizaki
    • Professor and the Associate Dean of the Research Promotion
    Division at Waseda University in Tokyo
    • Visiting Professor at the National Institute of Informatics
    • Outside Directors of SYSTEM INFORMATION and eXmotion
    • Research and education projects
    • Leading a large-scale grant at MEXT enPiT-Pro Smart SE
    • Leading framework team of JST MIRAI eAI project
    • Professional contributions
    • IEEE Computer Society, Vice President for Professional
    and Educational Activities 2022, 1st Vice President 2023
    • IEEE Conference on Software Engineering Education and
    Training (CSEE&T), Steering Committee
    • IEEE-CS COMPSAC, Advisory Committee
    • Asia-Pacific Software Engineering Conference (APSEC),
    Steering Committee
    • Convener of ISO/IEC/JTC1 SC7/WG20

    View Slide

  3. Agenda
    • SWEBOK Evolution
    • Machine Learning Software Engineering
    • Machine Learning Design Patterns
    • Multi-view ML system modeling
    3

    View Slide

  4. Guide to the Software Engineering
    Body of Knowledge (SWEBOK)
    http://swebokwiki.org
    • History: 2001 v1, 2004 v2, 2005 ISO/IEC Technical
    Report, 2014 v3, 2022 v4
    • Objective
    – Guiding learners, researchers and practitioners to
    identify and have common understanding on
    “generally-accepted-knowledge” in software
    engineering
    – Defining boundary of software engineering and
    related disciplines
    – Providing foundations for certifications and
    educational curriculum
    • Adoption
    – IEEE-CS software professional certification programs
    based on SWEBOK (Associate Software Developer,
    Professional Software Developer, Professional Software
    Engineering Master)
    – ISO/IEC 24773-4: Certification of software and
    systems engineering professionals - Part 4: Software
    engineering
    – Software Engineering Competency Model (SWECOM)
    Activities
    (and
    practices)
    Body of
    Knowledge
    Islands of
    Knowledge
    Tasks
    (and
    activities)
    To Do Doing Done

    View Slide

  5. SWEBOK Evolution from V3 to V4
    • Modern software engineering, practice change and update, BOK grows and recently developed areas
    • Public review is ongoing! https://www.computer.org/volunteering/boards-and-committees/professional-educational-activities/software-
    engineering-committee/swebok-evolution
    Requirements
    Design
    Construction
    Testing
    Maintenance
    Configuration Management
    Engineering Management
    Process
    Models and Methods
    Quality
    Professional Practice
    Economics
    Computing Foundations
    Mathematical Foundations
    Engineering Foundations
    Requirements
    Architecture
    Design
    Construction
    Testing
    Operations
    Maintenance
    Configuration Management
    Engineering Management
    Process
    Models and Methods
    Quality
    Security
    Professional Practice
    Economics
    Computing Foundations
    Mathematical and Engineering Foundations
    SWEBOK V3 SWEBOK V4
    Agile,
    DevOps
    Agile testing
    ・・・
    Agile security
    ・・・
    Restructuring
    foundation areas
    incl. IoT, AI/ML

    View Slide

  6. Agenda
    • SWEBOK Evolution
    • Machine Learning Software Engineering
    • Machine Learning Design Patterns
    • Multi-view ML system modeling
    6

    View Slide

  7. ML meets SE: Induction (and abduction)
    7
    Goal Data
    Model Behavior
    Goal Model Behavior Data
    Conventional software engineering: Deduction
    ML software engineering: Induction (and abduction)
    – The probabilistic behaviors derive emerging quality or validation concerns. Quality
    aspects including data quality are active research topics.
    – ML system development challenges are discussed by conducting empirical case studies.
    – It is needed to organize practices and patterns.
    H. Maruyama, “Machine Learning Engineering and Reuse of AI Work Products,” The First International Workshop on Sharing and Reuse of AI Work Products, 2017
    Hironori Washizaki, “Towards Software Value Co-Creation with AI”, The 44th IEEE Computer Society Signature Conference on Computers, Software, and Applications (COMPSAC 2020), Fast Abstract

    View Slide

  8. Techniques specific or particularly useful for ML quality assurance
    Training
    data
    Trained
    model
    Prediction,
    inference
    Infrastructure
    software system
    New data
    ML model debugging Monitoring, goal-oriented
    modeling
    Testing oracle problem,
    balanced dataset and coverage
    Performance,
    robustness and
    explainability
    Architecture validity
    and quality assurance
    Suitability with objective,
    handling unexpected
    situations
    N. Uchihira, AI and Software Engineering, JUSE SQiP 2017
    Eric Breck et al., The ML Test Score: A Rubric for ML Production Readiness and Technical Debt Reduction, IEEE Big Data 2017
    Metamorphic testing
    Search-based testing
    Practices and patterns
    Quality measurement
    8

    View Slide

  9. AI system and data quality
    Training
    data
    Trained
    model
    Prediction,
    inference
    Infrastructure
    software system
    New data
    N. Uchihira, AI and Software Engineering, JUSE SQiP 2017
    Eric Breck et al., The ML Test Score: A Rubric for ML Production Readiness and Technical Debt Reduction, IEEE Big Data 2017 9
    ISO/IEC DIS 25059
    AI System Product Quality
    ISO/IEC DIS 25059
    AI System Quality
    in Use
    ISO/IEC AWI 5259-2 Data quality
    measures

    View Slide

  10. Preliminary study on practitioners’ insights on quality
    • Surveyed 300+ developers, 46 answered in ML development
    • What product quality attributes considered?
    – Maintainability, reliability, security, and usability
    • What model and prediction quality attributes?
    – Robustness, accuracy, and explainability
    10
    3
    3
    4
    13
    19
    20
    20
    21
    28
    0 20 40
    (not considered)
    Compatibility
    Portability
    Performance efficiency
    Usability
    Reliability
    Security
    Maintainability
    Functional suitability
    10
    4
    20
    21
    26
    0 10 20 30
    (not considered)
    Fairness
    Accuracy
    Explainability
    Robustness
    H. Washizaki, et al., Practitioners’ insights on machine-learning software engineering design patterns: a preliminary study, ICSME 2020

    View Slide

  11. Metamorphic testing
    • Testing based on metamorphic relations: Relationship whereby changes to the input can predict
    changes to the output.
    • System quality: Functional correctness, reliability, …
    • Data quality: Completeness, context coverage, effectiveness of a data set, …
    11
    Change in input ( t ) Change in output ( g )
    Sorting
    None
    Adding noise
    Semantically identical
    Statistically identical
    Similar Slight change
    Constant addition and
    multiplication
    Constant addition and
    multiplication
    Narrowing Subset
    Completely different Disjoint
    x t(x)
    f(x) g(f(x))
    Transformation t
    g
    参考: S. Segura et al., "Metamorphic Testing of RESTful Web APIs," IEEE Transactions on Software Engineering, 2017
    参考: C. Murphy, “Applications of Metamorphic Testing”, http://www.cis.upenn.edu/~cdmurphy/pubs/MetamorphicTesting-Columbia-17Nov2011.ppt
    f(t(x))
    =
    Transformation f f
    Y Tian, et al., DeepTest: Automated Testing of Deep-Neural-Network-driven Autonomous Cars, ICSE 2018 https://arxiv.org/pdf/1708.08559.pdf

    View Slide

  12. Neural network model fixing (repair)
    • System quality: Functional correctness, reliability
    • Retraining and online learning
    – A straightforward method, but time-consuming and
    costly
    – Possible side effects of performance degradation
    • Data augmentation: generation [a], selection [b],
    expansion [c], etc.
    – Trial and error without directly modifying model
    parameters
    – Potential vulnerability to adversarial examples
    • Direct modification of parameters according to
    specific samples
    – Correction for specific labels for adversarial
    examples [d].
    – Finding and correcting impacted areas in failed data
    [e].
    [a] Generative adversarial nets, NIPS 2014
    [b] MODE: automated neural network model debugging via state differential analysis and input selection, ESEC/FSE 2018
    [c] Autoaugment: Learning augmentation policies from data, arXiv:1805.09501, 2019
    [d] Unlearned Modification of Neural Network Models for Adversarial Examples and Its Evaluation, JSSST 2019
    [e] Search Based Repair of Deep Neural Networks, arXiv:1912.12463, 2019
    Average of neuron’s
    output values for
    successful data
    Average of neuron’s
    output values for
    misrecognition data
    Neurons with
    high priority
    for fixing
    12

    View Slide

  13. Summary of this part
    • ML meets SE.
    • It is recommended to refer to quality model to clarify quality
    requirements and evaluation measures,
    – ISO/IEC DIS 25059 AI System Quality Product Quality, Quality in Use
    – ISO/IEC AWI 5259-2 Data quality measures
    • Followed by quality techniques
    – such as metamorphic testing and DNN fixing/repair
    • Practitioners concern various quality attributes
    – Need to have practices and patterns that handle multiple attributes!
    13

    View Slide

  14. Agenda
    • SWEBOK Evolution
    • Machine Learning Software Engineering
    • Machine Learning Design Patterns
    • Multi-view ML system modeling
    14

    View Slide

  15. 15
    Street Cafe
    Problem: Needs to have a place where
    people can sit lazily, legitimately, be on
    view, and watch the world go by…
    Solution: Encourage local cafes to spring
    up in each neighborhood. Make them
    intimate places, with several rooms, open
    to a busy path …
    Alexander, Christopher, et al. A Pattern Language. Oxford University Press, 1977.
    https://unsplash.com/photos/8IKf54pc3qk https://unsplash.com/photos/zACLEreWKXE

    View Slide

  16. Towards a pattern language
    … OK, so, to attract many
    people to our city, Small
    Public Squares should be
    located in the center. At the
    SMALL PUBLIC SQUARE,
    make Street Cafes be
    Opening to the Street ...
    16
    Small Public
    Square
    Street
    Cafe
    Opening to the
    Street
    https://unsplash.com/photos/EdpbTj3Br-Y
    https://unsplash.com/photos/GqurqYbj7aU
    https://unsplash.com/photos/zFoRwZirFvY

    View Slide

  17. Example of designing ML system
    • We wish to identify the type of
    instrument for the sound picked up
    by the phone and achieve recording
    and response according to the type.
    • However, the memory and
    performance of the phone is
    limited, and a large deep learning
    model is unlikely to be loaded.
    How can we do this?
    17
    Pretrained
    Model
    Pretrained
    Model
    • Let's use Two-stage predictions where a
    small model on the phone determines if a
    sound is a musical instrument, and a large
    model on the cloud classifies the type of
    sound only if it is a musical instrument.
    • For the large model, we will adopt Transfer
    Learning to achieve precise classification.
    Machine Learning Design Patterns
    (V. Lakshmanan, et al. 2020)

    View Slide

  18. ML software engineering needs patterns!
    • Bridge between abstract paradigms and concrete cases/tools
    – Documenting Know-Why, Know-What and Know-How
    – Reusing solutions and problems
    – Getting consistent architecture
    • Common language among stakeholders
    – Software engineers, data scientist, domain experts, network
    engineers, …
    • Researchers and practitioners studying best practices strive
    to design ML systems and software.
    – ML system architecture and design patterns at different abstraction
    levels are not well classified and studied.
    – Thus, we conducted a survey of software developers and a
    Systematic Literature Review.
    18
    Paradigm
    Case
    Tool
    FW
    Instruction
    ?

    Hironori Washizaki, Foutse Khomh, Yann-Gael Gueheneuc, Hironori Takeuchi, Naotake Natori, Takuo Doi, Satoshi Okuda, “Software Engineering
    Design Patterns for Machine Learning Applications,” IEEE Computer, Vol. 55, No. 3, pp. 30-39, 2022.

    View Slide

  19. Software Engineering Patterns for ML applications (SEP4MLA)
    • Academic and gray literature address the design of ML systems and software
    – 19 scholarly and 19 gray documents identified. 15 SE patterns were extracted.
    19
    Hironori Washizaki, Foutse Khomh, Yann-Gael Gueheneuc, Hironori Takeuchi, Naotake Natori, Takuo Doi, Satoshi Okuda,
    “Software Engineering Design Patterns for Machine Learning Applications,” IEEE Computer, Vol. 55, No. 3, pp. 30-39, 2022.
    Training data
    Trained model Prediction
    Training
    Infrastructure
    Input data
    Programming patterns
    Serving
    Infrastructure
    Model operation patterns
    Topology patterns
    • Data Lake for ML
    • Separation of Concerns and Modularization of
    ML Components
    • Encapsulate ML Models within Rule-base Safeguards
    • Discard PoC Code
    • Different Workloads in Different Computing Environments
    • Distinguish Business Logic from ML Models
    • ML Gateway Routing Architecture
    • Parameter-Server Abstraction
    • Data Flows Up, Model Flows Down
    • Secure Aggregation
    • Deployable Canary Model
    • ML Versioning
    • Microservice Architecture for ML
    • Lambda Architecture for ML
    • Kappa Architecture for ML

    View Slide

  20. 20
    Category Pattern Perfor
    mance
    Compa
    tibility
    Reliab
    ility
    Securi
    ty
    Maintai
    nability
    Portabi
    lity
    Robust
    ness
    Explaina
    bility
    Accur
    acy
    Topology
    Different Workloads in Different
    Computing Environments
    X X
    Distinguish Business Logic from ML
    Models
    X
    ML Gateway Routing Architecture X X
    Microservice Architecture for ML X X X
    Lambda Architecture for ML X X
    Kappa Architecture for ML X X
    Programming
    Data Lake for ML X X X
    Separation of Concerns and
    Modularization of ML Components
    X
    Encapsulate ML Models within
    Rule-based Safeguards
    X
    Discard PoC Code X
    Model
    operation
    Parameter-Server Abstraction X X
    Data Flows Up, Model Flows Down X X X
    Secure Aggregation X X X
    Deployable Canary Model X X
    ML Versioning X X X

    View Slide

  21. Topology pattern: Distinguish Business Logic from ML Models
    • Problem: Business logic should be isolated from ML models so that they can
    be changed without impacting rest of business logic.
    • Solution: Separate the business logic and the inference engine, loosely
    coupling the business logic and ML-specific dataflows.
    21
    H. Yokoyama, Machine Learning System Architectural Pattern for Improving Operational Stability, ICSA-C, 2019
    Data Layer
    Logic Layer
    Presentation Layer
    User
    Interface
    Database
    Data
    Collection
    Data Lake
    Business
    Logic
    Data
    Processing
    Inference
    Engine
    Real World
    Business
    Logic Specific
    ML Specific
    Architectural Layers
    Deployed as ML System
    Business Logic Data Flow
    ML Runtime Data Flow
    ML Development Data Flow
    Legend
    Hironori Washizaki, Foutse Khomh, Yann-Gael Gueheneuc, Hironori Takeuchi, Naotake Natori, Takuo Doi, Satoshi Okuda, “Software Engineering
    Design Patterns for Machine Learning Applications,” IEEE Computer, Vol. 55, No. 3, pp. 30-39, 2022.

    View Slide

  22. Programming pattern: Encapsulate ML Models within
    Rule-based Safeguards
    • Problem: ML models are known to be unstable and vulnerable to
    adversarial attacks, noise, and data drift.
    • Solution: Encapsulate functionality provided by ML models and deal
    with the inherent uncertainty in the containing system using
    deterministic and verifiable rules.
    • Know usage: E.g. Apollos’s object detection [Peng20]
    22
    Business
    Logic API
    Rule-based
    Safeguard
    Inference
    (Prediction)
    Encapsulated
    ML model
    Input
    Output
    Rule
    Z. Peng, et al., A First Look at the Integration of Machine Learning Models in Complex Autonomous Driving Systems, ESEC/FSE’20
    Hironori Washizaki, Foutse Khomh, Yann-Gael Gueheneuc, Hironori Takeuchi, Naotake Natori, Takuo Doi, Satoshi Okuda, “Software Engineering
    Design Patterns for Machine Learning Applications,” IEEE Computer, Vol. 55, No. 3, pp. 30-39, 2022.

    View Slide

  23. Model operation pattern: Deployable Canary Model
    • Problem: A surrogate ML that approximates the behavior of the
    best ML model must be built to provide explainability.
    • Solution: Run the explainable inference pipeline in parallel with
    the primary inference pipeline to monitor prediction differences.
    • Known usage: Image-based anomaly detection at factory
    23
    S. Ghanta et al., Interpretability and reproducibility in production machine learning applications, ICMLA 2018
    Input
    Decoy model Data lake
    Canary model
    (E.g., Decision
    tree)
    Production
    model
    (E.g., DNN)
    Monitoring
    and
    comparison
    Output
    Output
    Reproduce
    and
    retraining
    Hironori Washizaki, Foutse Khomh, Yann-Gael Gueheneuc, Hironori Takeuchi, Naotake Natori, Takuo Doi, Satoshi Okuda, “Software Engineering
    Design Patterns for Machine Learning Applications,” IEEE Computer, Vol. 55, No. 3, pp. 30-39, 2022.

    View Slide

  24. Practitioners’ insights on ML design patterns
    • Surveyed 600+ developers, 118
    answered
    – Developers were unfamiliar with
    most patterns, although there were
    several major patterns (e.g., ML
    Versioning and Microservice
    Architecture for ML).
    – Most respondents indicated they
    would consider using most of
    patterns in future designs.
    – Promoting existing ML patterns will
    increase their utilization
    • Pattern usage ratios are 10% at
    organizations reusing, 6% at
    organizations solving problems in
    adhoc, and 1% at others.
    – As respondents become more
    organized in their approach to design
    problems by reuse, the pattern usage
    ratio increased.
    – Development teams and
    organizations will reuse more ML
    patterns as they become more
    consistent in their reuse approach.
    24
    Knew it Didn’t know it
    0 20 40 60 80 100 120
    Data Flows Up, Model Flows Down
    Secure Aggregation
    Deployable Canary Model
    Kappa Architecture for ML
    Parameter-Server Abstraction
    Different Workloads in Different Computing…
    Encapsulate ML models within rule-base…
    ML Gateway Routing Architecture
    Lambda Architecture for ML
    Separation of Concerns and Modularization of…
    Distinguish Business Logic from ML Models
    Data Lake for ML
    Discard PoC code
    Microservice Architecture for ML
    ML Versioning
    Used it Never used it Consider using it Not consider

    View Slide

  25. More primitive ML design patterns
    • Machine Learning Design Patterns (Google, V. Lakshmanan,
    et al. 2020)
    25
    Training data
    Trained model Prediction
    Training
    Infrastructure
    Input data
    Serving
    Infrastructure
    Data Representation patterns
    Problem representation
    patterns
    Model Training
    patterns
    Resilience
    patterns
    Reproducibility
    patterns
    Responsible
    AI patterns
    Machine Learning Design Patterns
    (V. Lakshmanan, et al. 2020)

    View Slide

  26. Primitive ML design patterns
    26
    Data Representation Problem representation Model Training
    Hashed Feature
    Embeddings
    Feature Cross
    Multimodal Input
    Reframing
    Multilabel
    Ensembles
    Cascade
    Neutral Class
    Rebalancing
    Useful Overfitting
    Checkpoints
    Transfer Learning
    Distribution Strategy
    Hyperparameter Tuning
    Machine Learning Design Patterns
    (V. Lakshmanan, et al. 2020)

    View Slide

  27. Primitive ML design patterns (cont.)
    27
    Resilience Reproducibility Responsible AI
    Stateless Serving Function
    Batch Serving
    Continued Model Evaluation
    Two-Phase Predictions
    Keyed Predictions
    Transform
    Repeatable Splitting
    Bridged Schema
    Windowed Inference
    Workflow Pipeline
    Feature Store
    Model Versioning
    Heuristic Benchmark
    Explainable Predictions
    Fairness Lens
    Machine Learning Design Patterns
    (V. Lakshmanan, et al. 2020)

    View Slide

  28. Summary of this part
    • ML needs patterns!
    • Identify ML patterns addressing specific quality attributes
    that are not handled well now
    – Security, usability, and explainability
    • Future works
    – Detection of ML patterns
    – Investigate the impact of patterns on quality attributes of systems
    – Analyze relationships among patterns including related ones
    towards a pattern language
    – Integration into framework to handle from requirements ton
    implementations and testing/debugging
    28
    Jati H. Husen, Hnin Thandar Tun, Nobukazu Yoshioka, Hironori Washizaki and Yoshiaki Fukazawa, “Goal-Oriented Machine Learning-Based Component
    Development Process,” ACM/IEEE 24th International Conference on Model Driven Engineering Languages and Systems (MODELS), Poster
    Hironori Washizaki, Foutse Khomh, Yann-Gael Gueheneuc, Hironori Takeuchi, Naotake Natori, Takuo Doi, Satoshi Okuda, “Software Engineering
    Design Patterns for Machine Learning Applications,” IEEE Computer, Vol. 55, No. 3, pp. 30-39, 2022.

    View Slide

  29. Agenda
    • SWEBOK Evolution
    • Machine Learning Software Engineering
    • Machine Learning Design Patterns
    • Multi-view ML system modeling
    29

    View Slide

  30. Multi-view modeling of ML systems with patterns
    • Iterative, compliance with standards, traceable and consistent
    30
    Goal ML Ops
    Safety Architecture
    Value
    • AI Project Canvas
    KAOS Goal Model
    ML Canvas
    STAMP/STPA
    Safety Case Architecture
    Diagram
    Jati H. Husen, Hironori Washizaki, Hnin Thandar Tun, Nobukazu Yoshioka and Yoshiaki Fukazawa, “Modeling Tool for Managing Canvas-Based Models
    Traceability in ML System Development,” ACM/IEEE 25th International Conference on Model Driven Engineering Languages and Systems (MODELS 2022) Poster

    View Slide

  31. Consistency based on metamodel
    • Goal-oriented approach to align and rationalize multiple
    models and patterns with higher-level development goals
    31
    Jati H. Husen, Hironori Washizaki, Hnin Thandar Tun, Nobukazu Yoshioka and Yoshiaki Fukazawa, “Modeling Tool for Managing Canvas-Based Models
    Traceability in ML System Development,” ACM/IEEE 25th International Conference on Model Driven Engineering Languages and Systems (MODELS 2022) Poster
    ML Design Patterns

    View Slide

  32. 32
    Traceability
    Function
    ML Canvas
    KAOS goal model
    Jati H. Husen, Hironori Washizaki, Hnin Thandar Tun, Nobukazu Yoshioka and Yoshiaki Fukazawa, “Modeling Tool for Managing Canvas-Based Models
    Traceability in ML System Development,” ACM/IEEE 25th International Conference on Model Driven Engineering Languages and Systems (MODELS 2022) Poster

    View Slide

  33. Conclusion
    • Software engineering and SWEBOK evolution from V3 to V4
    – Modern software engineering, practice change and update, BOK grows and
    recently developed areas
    – Architecture, operations, security, agile, AI/IoT
    – Public review of V4 draft is ongoing!
    • Machine Learning Software Engineering
    – Paradigm shifts in “new” software engineering
    – ML learning SE as induction
    – Traditional and ML-specific practices
    – SE techniques particularly useful for ML quality assurance: Metamorphic testing,
    neural network model debugging, multi-view (goal) modeling
    • Machine Learning Design Patterns
    – ML software engineering needs patterns as common language
    – Developers were unfamiliar with most ML patterns, although there were several
    major patterns used by 20+% of the respondents.
    – Promoting existing ML patterns will increase their utilization 33

    View Slide