Upgrade to Pro — share decks privately, control downloads, hide ads and more …

The Impact of Parameter Tuning on Software Effort Estimation Using Learning Machines

The Impact of Parameter Tuning on Software Effort Estimation Using Learning Machines

by Liyan Song, Leandro Minku and Xin Yao.

More Decks by PROMISE'13: The 9th International Conference on Predictive Models in Software Engineering

Other Decks in Research

Transcript

  1. The Impact of Parameter Tuning on Software Effort Estimation Using

    Learning Machines Liyan Song Leandro Minku Xin Yao Department of Computer Science University of Birmingham lxs189 & L.L.Minku & [email protected] PROMISE, 2013 Liyan Song, Leandro Minku, Xin Yao (UoB) Impact of Parameter Tuning in SEE PROMISE, 2013 1 / 37
  2. Outline 1 Introduction Brief Background Motivation Three Research Questions 2

    Experiment Design Basic Information Evaluation Criteria 3 Experimental Analysis RQ1: Different Sensitivity RQ2: Time-step performance of best PSs RQ3: Bagging helps? 4 Threats to Validity Liyan Song, Leandro Minku, Xin Yao (UoB) Impact of Parameter Tuning in SEE PROMISE, 2013 2 / 37
  3. Outline 1 Introduction Brief Background Motivation Three Research Questions 2

    Experiment Design Basic Information Evaluation Criteria 3 Experimental Analysis RQ1: Different Sensitivity RQ2: Time-step performance of best PSs RQ3: Bagging helps? 4 Threats to Validity Liyan Song, Leandro Minku, Xin Yao (UoB) Impact of Parameter Tuning in SEE PROMISE, 2013 3 / 37
  4. Background Software Effort Estimation (SEE) Predicting the effort required to

    develop a software system. Early stage of software developing as decision tools in bidding. Over/Under-estimations can be problematic. SEE from Machine Learning Viewpoint Regression Problem Use the completed projects to predict the unknown ones Employing ML approaches, such as k-NN, MLPs, RTs, SVR Liyan Song, Leandro Minku, Xin Yao (UoB) Impact of Parameter Tuning in SEE PROMISE, 2013 4 / 37
  5. Brief Background – Online Scenario Why online? Only completed projects

    can be used; Environment may change. How it works in SEE? At each time step: all completed projects → predicting the next ten. Performance Time step: MAE Overall: average MAE across all time steps. Liyan Song, Leandro Minku, Xin Yao (UoB) Impact of Parameter Tuning in SEE PROMISE, 2013 5 / 37
  6. Outline 1 Introduction Brief Background Motivation Three Research Questions 2

    Experiment Design Basic Information Evaluation Criteria 3 Experimental Analysis RQ1: Different Sensitivity RQ2: Time-step performance of best PSs RQ3: Bagging helps? 4 Threats to Validity Liyan Song, Leandro Minku, Xin Yao (UoB) Impact of Parameter Tuning in SEE PROMISE, 2013 6 / 37
  7. Motivation 1 We’ll investigate whether or not PS affects much

    in SEE. Parameter setting is very important in general ML. In SEE, many studies implicitly assume that PS isn’t going to affect much of the result. 2 Bagging (ensemble) has been shown to improve the performance, will the goodness continue to the sensitivity to PS? [1] L. Minku, and X. Yao. Ensemble and Locality: Insight on Improving Software Effort Estimation. 2013’IST [2] L. Minku, and X. Yao. Can Cross-company Data Improve Performance in Software Effort Estimation? 2012’PROMISE [3] A. Corazza, S. Martino, F. Ferrucci, C. Gravino, F. Sarro, and E. Mendes. How Effective is Tatu Search to Configure Support Vector Regression for Effort Estimation? 2010’PROMISE Liyan Song, Leandro Minku, Xin Yao (UoB) Impact of Parameter Tuning in SEE PROMISE, 2013 7 / 37
  8. Outline 1 Introduction Brief Background Motivation Three Research Questions 2

    Experiment Design Basic Information Evaluation Criteria 3 Experimental Analysis RQ1: Different Sensitivity RQ2: Time-step performance of best PSs RQ3: Bagging helps? 4 Threats to Validity Liyan Song, Leandro Minku, Xin Yao (UoB) Impact of Parameter Tuning in SEE PROMISE, 2013 8 / 37
  9. Research Questions Overview 1 Given an approach and a data

    set, how sensitive is this approach to different parameter settings in terms of its average performance across time steps? 2 Given an approach and a data set, does the best parameter setting in terms of average performance across time steps perform consistently well at each time step compared with other parameter settings? 3 Could Bagging also help to lessen the base learners’ sensitivity to parameter settings? Liyan Song, Leandro Minku, Xin Yao (UoB) Impact of Parameter Tuning in SEE PROMISE, 2013 9 / 37
  10. How to Achieve our Goal? Empirical Methods Liyan Song, Leandro

    Minku, Xin Yao (UoB) Impact of Parameter Tuning in SEE PROMISE, 2013 10 / 37
  11. Outline 1 Introduction Brief Background Motivation Three Research Questions 2

    Experiment Design Basic Information Evaluation Criteria 3 Experimental Analysis RQ1: Different Sensitivity RQ2: Time-step performance of best PSs RQ3: Bagging helps? 4 Threats to Validity Liyan Song, Leandro Minku, Xin Yao (UoB) Impact of Parameter Tuning in SEE PROMISE, 2013 11 / 37
  12. Basic Information: Data Sets & Approaches Data Sets 1 Kitchenham

    2 Maxwell 3 SingleISBSG Five approaches studied: 1 MLPs 2 Bagging+MLPs 3 RTs 4 Bagging+RTs 5 k-NN Liyan Song, Leandro Minku, Xin Yao (UoB) Impact of Parameter Tuning in SEE PROMISE, 2013 12 / 37
  13. Basic Information: Parameter Values Approaches with all the combinations of

    parameter values as follows. Among them, the best&worst&default PSs in terms of the average performances across time steps are determined. Liyan Song, Leandro Minku, Xin Yao (UoB) Impact of Parameter Tuning in SEE PROMISE, 2013 13 / 37
  14. Outline 1 Introduction Brief Background Motivation Three Research Questions 2

    Experiment Design Basic Information Evaluation Criteria 3 Experimental Analysis RQ1: Different Sensitivity RQ2: Time-step performance of best PSs RQ3: Bagging helps? 4 Threats to Validity Liyan Song, Leandro Minku, Xin Yao (UoB) Impact of Parameter Tuning in SEE PROMISE, 2013 14 / 37
  15. Evaluation Criteria Mean Absolute Error (MAE) ( n i=0 |yi

    −yi | n ) Symmetric & unbiased Smaller MAE → better performance Statistical Test Wilcoxon sign-rank test with Holm-Bonferroni Correction (0.05) Effect Size → quantify the size of the difference. Liyan Song, Leandro Minku, Xin Yao (UoB) Impact of Parameter Tuning in SEE PROMISE, 2013 15 / 37
  16. Evaluation Criteria: Statistic Test Procedure Liyan Song, Leandro Minku, Xin

    Yao (UoB) Impact of Parameter Tuning in SEE PROMISE, 2013 16 / 37
  17. Outline 1 Introduction Brief Background Motivation Three Research Questions 2

    Experiment Design Basic Information Evaluation Criteria 3 Experimental Analysis RQ1: Different Sensitivity RQ2: Time-step performance of best PSs RQ3: Bagging helps? 4 Threats to Validity Liyan Song, Leandro Minku, Xin Yao (UoB) Impact of Parameter Tuning in SEE PROMISE, 2013 17 / 37
  18. RQ1: Different Sensitivity Preview Research Question 1 Given an approach

    and a data set, how sensitive is this approach to different parameter settings in terms of its average performance across time steps? Liyan Song, Leandro Minku, Xin Yao (UoB) Impact of Parameter Tuning in SEE PROMISE, 2013 18 / 37
  19. RQ1 – MLPs & Bagging+MLPs (Table 3 in Paper) [1]

    Extremely sensitive to Parameter Settings. Liyan Song, Leandro Minku, Xin Yao (UoB) Impact of Parameter Tuning in SEE PROMISE, 2013 19 / 37
  20. RQ1 – MLPs & Bagging+MLPs (Continue) [2] Default PSs are

    acceptable. Liyan Song, Leandro Minku, Xin Yao (UoB) Impact of Parameter Tuning in SEE PROMISE, 2013 20 / 37
  21. RQ1 – RTs & Bagging+RTs (Table 4 in Paper) [1]

    Usually not sensitive to PS → Blind tuning is allowed Suggestion: If time is not allowed → using default PS Liyan Song, Leandro Minku, Xin Yao (UoB) Impact of Parameter Tuning in SEE PROMISE, 2013 21 / 37
  22. RQ1 – k-NN (Table 5 in Paper) [1] 1-NN always

    behaves the worst. [¯ 1] k = 3 or 5 always perform the best (see Fig.1∼3 in paper). [2] k-NN is not sensitive to parameter choices in SEE. Liyan Song, Leandro Minku, Xin Yao (UoB) Impact of Parameter Tuning in SEE PROMISE, 2013 22 / 37
  23. RQ1: Different Sensitivity Overview Research Question 1 Given an approach

    and a data set, how sensitive is this approach to different parameter settings in terms of its average performance across time steps? Our Response 1 Different learning machines have different sensitivity to their PSs. MLP & Bagging+MLPs: extremely sensitive RTs & Bagging+RTs: usually not sensitive k-NN: not sensitive Liyan Song, Leandro Minku, Xin Yao (UoB) Impact of Parameter Tuning in SEE PROMISE, 2013 23 / 37
  24. Outline 1 Introduction Brief Background Motivation Three Research Questions 2

    Experiment Design Basic Information Evaluation Criteria 3 Experimental Analysis RQ1: Different Sensitivity RQ2: Time-step performance of best PSs RQ3: Bagging helps? 4 Threats to Validity Liyan Song, Leandro Minku, Xin Yao (UoB) Impact of Parameter Tuning in SEE PROMISE, 2013 24 / 37
  25. RQ2: Time-step performance of best PSs Preview Research Question 2

    Given an approach and a data set, does the best parameter setting in terms of average performance across time steps perform consistently well at each time step compared with other parameter settings? We only show results in: 1 MLPs + Maxwell 2 RTS + SingleISBSG 3 k-NN + Kitchenham Refer Fig.4 ∼ 10 in our paper. Liyan Song, Leandro Minku, Xin Yao (UoB) Impact of Parameter Tuning in SEE PROMISE, 2013 25 / 37
  26. RQ2: Time-step performance of best PSs [1] Best PSs outperform

    across most time steps Though a few time steps default/worst outperforms. Liyan Song, Leandro Minku, Xin Yao (UoB) Impact of Parameter Tuning in SEE PROMISE, 2013 26 / 37
  27. RQ2: Time-step performance of best PSs (Continue) Liyan Song, Leandro

    Minku, Xin Yao (UoB) Impact of Parameter Tuning in SEE PROMISE, 2013 27 / 37
  28. RQ2: Time-step performance of best PSs Overview Research Question 2

    Given an approach and a data set, does the best parameter setting in terms of average performance across time steps perform consistently well at each time step compared with other parameter settings? Our Response 2 Best PSs achieve better performance at most time steps. Liyan Song, Leandro Minku, Xin Yao (UoB) Impact of Parameter Tuning in SEE PROMISE, 2013 28 / 37
  29. Outline 1 Introduction Brief Background Motivation Three Research Questions 2

    Experiment Design Basic Information Evaluation Criteria 3 Experimental Analysis RQ1: Different Sensitivity RQ2: Time-step performance of best PSs RQ3: Bagging helps? 4 Threats to Validity Liyan Song, Leandro Minku, Xin Yao (UoB) Impact of Parameter Tuning in SEE PROMISE, 2013 29 / 37
  30. RQ3: Bagging helps? Preview Research Question 3 Could Bagging also

    help to lessen the base learners’ sensitivity to parameter settings? Liyan Song, Leandro Minku, Xin Yao (UoB) Impact of Parameter Tuning in SEE PROMISE, 2013 30 / 37
  31. RQ3: Bagging helps? [1] Bagging helps the default PSs get

    closer to the best ones. [2] In most cases, Bagging helps to improve the overall performance. Liyan Song, Leandro Minku, Xin Yao (UoB) Impact of Parameter Tuning in SEE PROMISE, 2013 31 / 37
  32. RQ3: Bagging helps? (Continue) Liyan Song, Leandro Minku, Xin Yao

    (UoB) Impact of Parameter Tuning in SEE PROMISE, 2013 32 / 37
  33. RQ3: Bagging helps? Overview Research Question 3 Could Bagging also

    help to lessen the base learners’ sensitivity to parameter settings? Our Response 3 Bagging can help the default PSs closer to the best ones. Liyan Song, Leandro Minku, Xin Yao (UoB) Impact of Parameter Tuning in SEE PROMISE, 2013 33 / 37
  34. Threats to validity 1 Other learning machines and more data

    sets should be investigated. We perform on 5 representative approaches on 3 standard data sets. 2 Unable to exhaust all possible parameter values. Impossible A good range in Table 2. Additional values can be tested in the future. Liyan Song, Leandro Minku, Xin Yao (UoB) Impact of Parameter Tuning in SEE PROMISE, 2013 34 / 37
  35. Overall Review 1. Given an approach and a data set,

    how sensitive is this approach to different parameter settings in terms of its average performance across time steps? * Different learning machines have different sensitivity to their PSs. 2. Given an approach and a data set, does the best parameter setting in terms of average performance across time steps perform consistently well at each time step compared with other parameter settings? * Best PSs achieve better performance at most time steps. 3. Could Bagging also help to lessen the base learners’ sensitivity to parameter settings? * Bagging can help the default PSs closer to the best ones. Liyan Song, Leandro Minku, Xin Yao (UoB) Impact of Parameter Tuning in SEE PROMISE, 2013 35 / 37
  36. Thank You! Liyan Song, Leandro Minku, Xin Yao (UoB) Impact

    of Parameter Tuning in SEE PROMISE, 2013 36 / 37
  37. Appendix Slides Effect Size Quantify the size of difference between

    two groups. Formula: d1 = ˆ MAE1− ˆ MAE2 std2 1 +std2 2 2 Pooled Standard Deviation stdpooled = std2 1 +std2 2 +...+std2 n n Liyan Song, Leandro Minku, Xin Yao (UoB) Impact of Parameter Tuning in SEE PROMISE, 2013 37 / 37