Upgrade to Pro — share decks privately, control downloads, hide ads and more …

SQuaRE Matters: Reflection of Quality Evaluation, Benchmark, and Practitioners’ Perception through SQuaRE

SQuaRE Matters: Reflection of Quality Evaluation, Benchmark, and Practitioners’ Perception through SQuaRE

Hironori Washizaki, “SQuaRE Matters: Reflection of Quality Evaluation, Benchmark, and Practitioners’ Perception through SQuaRE,” Keynote, 4th International Workshop on Experience with SQuaRE series and its Future Direction in conjunction with the 29th Asia-Pacific Software Engineering Conference (APSEC 2022), Tokyo (virtual), 6th December 2022.

Hironori Washizaki

December 09, 2022
Tweet

More Decks by Hironori Washizaki

Other Decks in Technology

Transcript

  1. SQuaRE Matters: Reflection of Quality
    Evaluation, Benchmark and Practitioners'
    Perception through SQuaRE
    Hironori Washizaki
    Waseda University / National Institute of Informatics / SYSTEM
    INFORMATION / eXmotion
    ISO/IEC/JTC1 SC7/WG20 Convenor, SC7/WG6 Member
    IEEE-CS Vice President
    [email protected]

    View Slide

  2. Agenda
    • SQuaRE matters?
    • Software systems: Quality evaluation and benchmarking
    • Machine Learning systems: Practitioners’ perception and
    design patterns
    • Conclusion
    2

    View Slide

  3. ISO/IEC 9126-1:2001
    3
    Quality in use
    Effectiveness Productivity Safety Satisfaction
    Internal and external quality
    Functionality Reliability Usability Efficiency Maintaina
    bility
    Portability
    Suitability Maturity Understandability Time behavior Analyzability Adaptability
    Accurateness Fault tolerance Learnability Resource
    behavior
    Changeabilit
    y
    Installability
    Interoperability Recoverability Operability Compliance Stability Co-
    existence
    Security Compliance Attractiveness Testability Replaceabilit
    y
    Compliance Compliance Compliance Compliance
    ISO/IEC 9126-1:2001,Software engineering−Product quality−Part 1: Quality model

    View Slide

  4. ISO/IEC 25010: 2011 SQuaRE
    4
    Quality in use
    Effectiveness Efficiency Freedom from risk Satisfaction Context coverage
    Internal and external quality
    Functional
    suitability
    Compatibility Security Reliability Usability Performance
    efficiency
    Maintainability Portability
    Functional
    completeness
    Co-
    existence
    Confidentiality Maturity Appropriateness
    recognisability
    Time-behavior Analysability Adaptability
    Functional
    correctness
    Interoperabili
    ty
    Integrity Fault tolerance Learnability Resource
    utilization
    Modifiability Installability
    Functional
    appropriateness
    Non-
    repudiation
    Recoverability Operability Capacity Reusability Replaceabili
    ty
    Accountability Availability User error
    protection
    Testability
    Authenticity UI aesthetics Modularity
    Accessibility
    ISO/IEC 25010:2011 Systems and software engineering-Systems and software Quality Requirements and Evaluation (SQuaRE) -
    System and software quality models

    View Slide

  5. SQuaRE matters?
    6
    Almost unknown
    Adopted
    organizationally
    Adopted in SQA
    division
    Used in specific
    projects
    Known by
    limited division
    and experts
    Survey at Japanese online seminar on software engineering standards, 2020 Aug (N=58)

    View Slide

  6. Agenda
    • SQuaRE matters?
    • Software systems: Quality evaluation and benchmarking
    • Machine Learning systems: Practitioners’ perception and
    design patterns
    • Conclusion
    7

    View Slide

  7. Comprehensive Software Quality Evaluation
    Framework and Benchmark based on SQuaRE
    • SQuaRE is independent of the domain or product.
    – It assembles important quality characteristics, measurement values, and
    evaluation methods.
    – Definitions of measurements are rather general and abstract.
    – Relationships among different characteristics have yet to be confirmed
    in detail.
    • Our contributions
    – The WSQF, a comprehensive quality evaluation framework with
    concretized SQuaRE quality measurement methods.
    – A comprehensive benchmark of 21 actual software products, revealing
    the limited scope of the present state of product quality and quality-
    in-use characteristics.
    8
    Naohiko Tsuda, Hironori Washizaki, Kiyoshi Honda, Hidenori Nakai, Yoshiaki Fukazawa, Motoei Azuma, Toshihiro Komiyama, Tadashi Nakano, Hirotsugu Suzuki,
    Sumie Morita, Katsue Kojima, Akiyoshi Hando: WSQF: comprehensive software quality evaluation framework and benchmark based on SQuaRE. ICSE (SEIP) 2019

    View Slide

  8. v
    9
    Naohiko Tsuda, Hironori Washizaki, Kiyoshi Honda, Hidenori Nakai, Yoshiaki Fukazawa, Motoei Azuma, Toshihiro Komiyama, Tadashi Nakano, Hirotsugu Suzuki,
    Sumie Morita, Katsue Kojima, Akiyoshi Hando: WSQF: comprehensive software quality evaluation framework and benchmark based on SQuaRE. ICSE (SEIP) 2019

    View Slide

  9. Measurement definition and value normalization
    10
    • Calculating scores by using
    percentile against 21 products
    E.g., Top 30% = 0.7
    High
    Low
    Histogram
    Measured value
    • Goal-Question-Metric (GQM) approach for
    defining concrete measurements
    Naohiko Tsuda, Hironori Washizaki, Kiyoshi Honda, Hidenori Nakai, Yoshiaki Fukazawa, Motoei Azuma, Toshihiro Komiyama, Tadashi Nakano, Hirotsugu Suzuki,
    Sumie Morita, Katsue Kojima, Akiyoshi Hando: WSQF: comprehensive software quality evaluation framework and benchmark based on SQuaRE. ICSE (SEIP) 2019

    View Slide

  10. Measurement summary
    11
    • 21 package software and SaaS
    • Waseda Software Quality Benchmark
    (WSQB2017)
    • ISO/IEC 25000 SQuaRE-based
    concrete measurement
    • Relationships among different
    quality characteristics
    0
    20
    40
    60
    80
    100
    機能適合性
    性能効率性
    互換性
    使用性
    信頼性
    セキュリティ
    保守性
    移植性
    0
    5
    10
    1 2 3 4 5 6
    12 vendors
    21 products
    IV&V org.
    Delegate
    Result
    ISO/IEC JTC1/SC7/WG6
    SQuaRE editors
    Cooperation
    Eval. result

    View Slide

  11. Product quality measurements
    • 66 measurements (i.e., metrics)
    • Measurement coverage: 34% of 66×21 products
    12
    #Measurements dfinede
    4
    10
    2
    9
    13
    6 8
    5
    0
    1
    0
    0
    3
    0
    4
    1
    0
    5
    10
    15
    20
    機能適合性 性能効率性 互換性 使用性 信頼性 セキュリティ 保守性 移植性
    測定値あり 測定値なし
    Measured
    NOT
    measured
    Functional
    suitability
    Performance
    efficiency
    Compatibility Usability Reliability Security Maintainability Portability
    Naohiko Tsuda, Hironori Washizaki, Kiyoshi Honda, Hidenori Nakai, Yoshiaki Fukazawa, Motoei Azuma, Toshihiro Komiyama, Tadashi Nakano, Hirotsugu Suzuki,
    Sumie Morita, Katsue Kojima, Akiyoshi Hando: WSQF: comprehensive software quality evaluation framework and benchmark based on SQuaRE. ICSE (SEIP) 2019

    View Slide

  12. Quality in Use
    User questionnaire
    • Standard questionnaire
    – E.g., Are you satisfied … ?
    • 3 products
    User testing
    • 10 products
    13
    Waseda U.
    Team
    Vendor
    1 Declare
    functions
    2 Extract core
    functions
    3 Define normal
    test cases
    4 Define
    exceptional
    test cases
    5 Conduct test
    6 Measure quality
    Coverage 24%
    Measured
    NOT
    measured

    View Slide

  13. Measurement results
    14
    Similar to normal distribution
    Many products having low scores Bipolarized

    View Slide

  14. Correlation among quality characteristics
    15
    Product quality
    Quality
    in use
    Product quality
    Quality
    in use
    • Maybe due to due to the
    trade-off between
    implementing a stable
    functionality and
    improving user experience
    • As portability is built for
    various environments, the
    product quality may be
    reviewed from
    multifaceted perspectives.
    Naohiko Tsuda, Hironori Washizaki, Kiyoshi Honda, Hidenori Nakai, Yoshiaki Fukazawa, Motoei Azuma, Toshihiro Komiyama, Tadashi Nakano, Hirotsugu Suzuki,
    Sumie Morita, Katsue Kojima, Akiyoshi Hando: WSQF: comprehensive software quality evaluation framework and benchmark based on SQuaRE. ICSE (SEIP) 2019

    View Slide

  15. Reliability model based on fault count accumulation
    • Fault discovery in a time series was obtained from nine products.
    • Upon applying the reliability growth model to these nine products,
    we categorized them into three fault discovery pattern types
    • Stable, gradually increasing, and explosively increasing
    Naohiko Tsuda, Hironori Washizaki, Kiyoshi Honda, Hidenori Nakai, Yoshiaki Fukazawa, Motoei Azuma, Toshihiro Komiyama, Tadashi Nakano, Hirotsugu Suzuki,
    Sumie Morita, Katsue Kojima, Akiyoshi Hando: WSQF: comprehensive software quality evaluation framework and benchmark based on SQuaRE. ICSE (SEIP) 2019

    View Slide

  16. • Functional suitability, reliability, and
    effectiveness have high scores for
    the stable type.
    • Sufficient testing has already
    been performed to identify
    defects for products with a high
    functional suitability, reliability,
    and effectiveness.
    • Explosively increasing type has a
    low quality in performance efficiency
    and compatibility.
    • Sufficient faults have yet to be
    discovered in tests for software
    products with a low performance
    efficiency and compatibility.
    Hence, it is highly likely that
    defects will continue to be
    discovered in the future.

    View Slide

  17. Case study of two products
    • The first priority of Pa is the
    Performance efficiency due to the
    requirement for a strong data
    processing.
    • Pa has a Security score lower than
    the median score. This is consistent
    with the developers’ intention to
    give Security a low priority and use
    only a part of the technologies.
    • Pb prioritized Security in cloud
    service and Effectiveness to
    strengthen the differentiation with
    competing products.
    • Pb has a low Reliability score,
    although Reliability had a high
    priority. The gap indicated by the
    evaluation led to the discovery of
    implicit issues. 18

    View Slide

  18. Recommendations
    19
    Waseda Software Quality Benchmark
    http://www.washi.cs.waseda.ac.jp/?page_id=3479
    Industry ISO/IEC
    JTC1/SC7/WG6
    Continuous measurement
    and PSQ-Certification
    Improve
    SQuaRE
    4. Incorporate concrete measurements into SQuaRE
    5. Address properties specific to Agile and Cloud
    4. Incorporate concrete measurements into SQuaRE
    5. Address properties specific to Agile and Cloud
    1. Low security and compatibility in some products.
    Necessary to address these in IoT era.
    2. Negative correlation between usability and functionality.
    Need to adopt user-centered development.
    3. Limited data and goals. Necessary to measure/benchmark
    1. Low security and compatibility in some products.
    Necessary to address these in IoT era.
    2. Negative correlation between usability and functionality.
    Need to adopt user-centered development.
    3. Limited data and goals. Necessary to measure/benchmark
    Improve quality
    management

    View Slide

  19. Agenda
    • SQuaRE matters?
    • Software systems: Quality evaluation and benchmarking
    • Machine Learning systems: Practitioners’ perception and
    design patterns
    • Conclusion
    20

    View Slide

  20. Preliminary study on practitioners’ insights on quality
    • Surveyed 300+ developers, 46 answered in ML development
    • What product quality attributes considered?
    – Maintainability, reliability, security, and usability
    • What model and prediction quality attributes?
    – Robustness, accuracy, and explainability
    21
    3
    3
    4
    13
    19
    20
    20
    21
    28
    0 20 40
    (not considered)
    Compatibility
    Portability
    Performance efficiency
    Usability
    Reliability
    Security
    Maintainability
    Functional suitability
    10
    4
    20
    21
    26
    0 10 20 30
    (not considered)
    Fairness
    Accuracy
    Explainability
    Robustness
    H. Washizaki, et al., Practitioners’ insights on machine-learning software engineering design patterns: a preliminary study, ICSME 2020

    View Slide

  21. Software Engineering Patterns for ML Applications (SEP4MLA) [Computer’22]
    • Patterns are recurring problems and corresponding solutions under specific contexts.
    • 15 software engineering deign patterns were extracted from academic and gray literature.
    22
    Hironori Washizaki, Foutse Khomh, Yann-Gael Gueheneuc, Hironori Takeuchi, Naotake Natori, Takuo Doi, Satoshi Okuda, “Software
    Engineering Design Patterns for Machine Learning Applications,” IEEE Computer, Vol. 55, No. 3, pp. 30-39, 2022.
    Training data
    Trained model Prediction
    Training
    Infrastructure
    Input data
    Programming patterns
    Serving
    Infrastructure
    Model operation patterns
    Topology patterns
    • Data Lake for ML
    • Separation of Concerns and Modularization of
    ML Components
    • Discard PoC Code
    • Encapsulate ML Models within Rule-base Safeguards
    • Different Workloads in Different Computing Environments
    • Distinguish Business Logic from ML Models
    • ML Gateway Routing Architecture
    • Parameter-Server Abstraction
    • Data Flows Up, Model Flows Down
    • Secure Aggregation
    • Deployable Canary Model
    • ML Versioning
    • Microservice Architecture for ML
    • Lambda Architecture for ML
    • Kappa Architecture for ML

    View Slide

  22. 23
    Category Pattern Perform
    ance
    Compa
    tibility
    Reliab
    ility
    Securi
    ty
    Maintai
    nability
    Portabi
    lity
    Robust
    ness
    Explaina
    bility
    Accur
    acy
    Topology
    Different Workloads in Different
    Computing Environments
    X X
    Distinguish Business Logic from ML
    Models
    X
    ML Gateway Routing Architecture X X
    Microservice Architecture for ML X X X
    Lambda Architecture for ML X X
    Kappa Architecture for ML X X
    Programming
    Data Lake for ML X X X
    Separation of Concerns and
    Modularization of ML Components
    X
    Encapsulate ML Models within Rule-
    based Safeguards
    X
    Discard PoC Code X
    Model
    operation
    Parameter-Server Abstraction X X
    Data Flows Up, Model Flows Down X X X
    Secure Aggregation X X X
    Deployable Canary Model X X
    ML Versioning X X X

    View Slide

  23. Model operation pattern: Deployable Canary Model
    • Problem: A surrogate ML that approximates the behavior of the best
    ML model must be built to provide explainability.
    • Solution: Run the explainable inference pipeline in parallel with the
    primary inference pipeline to monitor prediction differences.
    • Known usage: Image-based anomaly detection at factory
    24
    S. Ghanta et al., Interpretability and reproducibility in production machine learning applications, ICMLA 2018
    Input
    Decoy model Data lake
    Canary model
    (E.g., Decision
    tree)
    Production
    model
    (E.g., DNN)
    Monitoring
    and
    comparison
    Output
    Output
    Reproduce
    and
    retraining
    Hironori Washizaki, Foutse Khomh, Yann-Gael Gueheneuc, Hironori Takeuchi, Naotake Natori, Takuo Doi, Satoshi Okuda, “Software Engineering
    Design Patterns for Machine Learning Applications,” IEEE Computer, Vol. 55, No. 3, pp. 30-39, 2022.

    View Slide

  24. Conclusion: SQuaRE matters!
    • SQuaRE is useful to construct a comprehensive quality
    evaluation framework
    – Applied to 21 products
    – Case study of benchmarking
    • SQuaRE with extension is useful to examine software
    architecture and patterns of ML systems
    25

    View Slide