Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Machine Learning and Value Generation in Software Development: A Survey

Exactpro
November 07, 2019

Machine Learning and Value Generation in Software Development: A Survey

Barakat Akinsanya, Luiz Araujo, Mariia Charikova, Susanna Gimaeva, Alexandr Grichshenko, Adil Khan, Manuel Mazzara, Ozioma Okonicha and Daniil Shilintsev

International Conference on Software Testing, Machine Learning and Complex Process Analysis (TMPA-2019)
7-9 November 2019, Tbilisi

Video: https://youtu.be/mpjjDqNOx8Q

TMPA Conference website https://tmpaconf.org/
TMPA Conference on Facebook https://www.facebook.com/groups/tmpaconf/

Exactpro

November 07, 2019
Tweet

More Decks by Exactpro

Other Decks in Technology

Transcript

  1. Machine Learning and value generation in Software Development: a survey

    Barakat Akinsanya, Luiz Araujo, Mariia Charikova, Susanna Gimaeva, Alexandr Grichshenko, Adil Khan, Manuel Mazzara, Ozioma Okonicha and Daniil Shilintsev, Innopolis University
  2. ❏ Machine learning and its potential usage ❏ Predicting risks

    to a project ❏ Predicting programming effort ❏ Predicting software defects ❏ Discussion ❏ Conclusion and future research Presentation outline
  3. Subfield of artificial intelligence Mathematical models identify patterns in the

    input data and reach a conclusion judging by the data Gaining more and more popularity Machine learning
  4. • Budget • Management • Schedule • Technical (Hu et

    al. Software project risk management modeling with neural network and support vector machine approaches. (2007)) Types of risks
  5. Regression techniques All of the learning algorithms used in the

    research have close prediction performances. (Ceylan et al. Software defect identification using machine learning techniques. (2006)) Budget
  6. Multiple Logistic Regression Helps point out the risk factors. (Christiansen

    et al. Prediction of risk factors of software development project by using multiple logistic regression(2015)) Management
  7. • Lines of code • Function points • Use case

    points • Labour hours • Story points Metrics
  8. 1. Expert estimation 2. Logical statistical models 3. Classical ML

    models 4. Deep learning models Techniques
  9. Planning Poker Сonsiderable human bias Overestimates in 40% of instances

    Very high Mean Magnitude of Relative Error (MMRE) score of 106.8% (Moharreri et al. (2016)) Expert estimation
  10. • Constructive Cost Model (COCOMO) • Software Lifecycle Management (SLIM)

    • Function Points Inconsistent performance due to noisy nature of datasets (Azzeh, M.: Software effort estimation based on optimized model tree. (2011)) Logical statistical models
  11. 1. Case-based reasoning (more suitable if data is limited) 2.

    Decision trees Highly interpretable, superior or at least competitive with parametric methods (Wen et al. Systematic literature review of machine learning based software development effort estimation models.(2012)) Classical ML models
  12. • Noise tolerance • High parallelism • Generalisation capabilities Outperformed

    Regression Tree, KNN and Regression Analysis. (Kim et al. A comparison of techniques for software development effort estimating. (2005)) Deep learning models
  13. • Lines of code • Weighted methods for class •

    Coupling between objects • Response for class • Branch count Metrics
  14. Large dataset - Random Forest Small datasets - Naive Bayes

    (Catal et al. Investigating the effect of dataset size, metrics sets, and feature selection techniques on software fault prediction problem. (2009)) Comparison of models w.r.t dataset size
  15. Method level metrics - Random Forest classifier Class level metric

    - SVM (Shanthini et al. Applying machine learning for fault prediction using software metrics. (2012)) Comparison of models w.r.t metrics
  16. Comparison between Random Forest, Adaboost, Bagging, Multilayer Perceptron, Genetic Programming:

    Best results: Random Forest and Bagging algorithms (Malhotra et al. Fault prediction using statistical and machine learning methods for improving software quality. (2012)) Results
  17. Best fault prediction models: Random Forest and SVM Class level

    metrics show better prediction performance compared to method level metrics. (Karim et al. Software metrics for fault prediction using machine learning approaches: A literature review with PROMISE repository dataset (2017)) Summary
  18. Widely used models: Case-based, Neural Networks ML models are gaining

    popularity in the academic community. However, in the industry these models are not used as frequently as their reported performance would suggest. Models
  19. • Lack of large software datasets • Imbalance of datasets

    • Outdated datasets • Lack of a united and shared dataset Limitations
  20. • Reinforcement learning • Convolutional NN • Recurrent NN Future

    research and recommendations • Larger more representative datasets • Closer interaction between academic and industrial communities