The Impact of Parameter Tuning on Software Effort Estimation Using Learning Machines

The Impact of Parameter Tuning on Software Eﬀort Estimation Using
Learning Machines Liyan Song Leandro Minku Xin Yao Department of Computer Science University of Birmingham lxs189 & L.L.Minku & [email protected] PROMISE, 2013 Liyan Song, Leandro Minku, Xin Yao (UoB) Impact of Parameter Tuning in SEE PROMISE, 2013 1 / 37

Outline 1 Introduction Brief Background Motivation Three Research Questions 2
Experiment Design Basic Information Evaluation Criteria 3 Experimental Analysis RQ1: Diﬀerent Sensitivity RQ2: Time-step performance of best PSs RQ3: Bagging helps? 4 Threats to Validity Liyan Song, Leandro Minku, Xin Yao (UoB) Impact of Parameter Tuning in SEE PROMISE, 2013 2 / 37

Background Software Eﬀort Estimation (SEE) Predicting the eﬀort required to
develop a software system. Early stage of software developing as decision tools in bidding. Over/Under-estimations can be problematic. SEE from Machine Learning Viewpoint Regression Problem Use the completed projects to predict the unknown ones Employing ML approaches, such as k-NN, MLPs, RTs, SVR Liyan Song, Leandro Minku, Xin Yao (UoB) Impact of Parameter Tuning in SEE PROMISE, 2013 4 / 37

Brief Background – Online Scenario Why online? Only completed projects
can be used; Environment may change. How it works in SEE? At each time step: all completed projects → predicting the next ten. Performance Time step: MAE Overall: average MAE across all time steps. Liyan Song, Leandro Minku, Xin Yao (UoB) Impact of Parameter Tuning in SEE PROMISE, 2013 5 / 37

Motivation 1 We’ll investigate whether or not PS affects much
in SEE. Parameter setting is very important in general ML. In SEE, many studies implicitly assume that PS isn’t going to affect much of the result. 2 Bagging (ensemble) has been shown to improve the performance, will the goodness continue to the sensitivity to PS? [1] L. Minku, and X. Yao. Ensemble and Locality: Insight on Improving Software Effort Estimation. 2013’IST [2] L. Minku, and X. Yao. Can Cross-company Data Improve Performance in Software Effort Estimation? 2012’PROMISE [3] A. Corazza, S. Martino, F. Ferrucci, C. Gravino, F. Sarro, and E. Mendes. How Effective is Tatu Search to Configure Support Vector Regression for Effort Estimation? 2010’PROMISE Liyan Song, Leandro Minku, Xin Yao (UoB) Impact of Parameter Tuning in SEE PROMISE, 2013 7 / 37

Research Questions Overview 1 Given an approach and a data
set, how sensitive is this approach to diﬀerent parameter settings in terms of its average performance across time steps? 2 Given an approach and a data set, does the best parameter setting in terms of average performance across time steps perform consistently well at each time step compared with other parameter settings? 3 Could Bagging also help to lessen the base learners’ sensitivity to parameter settings? Liyan Song, Leandro Minku, Xin Yao (UoB) Impact of Parameter Tuning in SEE PROMISE, 2013 9 / 37

How to Achieve our Goal? Empirical Methods Liyan Song, Leandro
Minku, Xin Yao (UoB) Impact of Parameter Tuning in SEE PROMISE, 2013 10 / 37

Basic Information: Data Sets & Approaches Data Sets 1 Kitchenham
2 Maxwell 3 SingleISBSG Five approaches studied: 1 MLPs 2 Bagging+MLPs 3 RTs 4 Bagging+RTs 5 k-NN Liyan Song, Leandro Minku, Xin Yao (UoB) Impact of Parameter Tuning in SEE PROMISE, 2013 12 / 37

Basic Information: Parameter Values Approaches with all the combinations of
parameter values as follows. Among them, the best&worst&default PSs in terms of the average performances across time steps are determined. Liyan Song, Leandro Minku, Xin Yao (UoB) Impact of Parameter Tuning in SEE PROMISE, 2013 13 / 37

Evaluation Criteria Mean Absolute Error (MAE) ( n i=0 |yi
−yi | n ) Symmetric & unbiased Smaller MAE → better performance Statistical Test Wilcoxon sign-rank test with Holm-Bonferroni Correction (0.05) Eﬀect Size → quantify the size of the diﬀerence. Liyan Song, Leandro Minku, Xin Yao (UoB) Impact of Parameter Tuning in SEE PROMISE, 2013 15 / 37

Evaluation Criteria: Statistic Test Procedure Liyan Song, Leandro Minku, Xin
Yao (UoB) Impact of Parameter Tuning in SEE PROMISE, 2013 16 / 37

RQ1: Diﬀerent Sensitivity Preview Research Question 1 Given an approach
and a data set, how sensitive is this approach to diﬀerent parameter settings in terms of its average performance across time steps? Liyan Song, Leandro Minku, Xin Yao (UoB) Impact of Parameter Tuning in SEE PROMISE, 2013 18 / 37

RQ1 – MLPs & Bagging+MLPs (Table 3 in Paper) [1]
Extremely sensitive to Parameter Settings. Liyan Song, Leandro Minku, Xin Yao (UoB) Impact of Parameter Tuning in SEE PROMISE, 2013 19 / 37

RQ1 – MLPs & Bagging+MLPs (Continue) [2] Default PSs are
acceptable. Liyan Song, Leandro Minku, Xin Yao (UoB) Impact of Parameter Tuning in SEE PROMISE, 2013 20 / 37

RQ1 – RTs & Bagging+RTs (Table 4 in Paper) [1]
Usually not sensitive to PS → Blind tuning is allowed Suggestion: If time is not allowed → using default PS Liyan Song, Leandro Minku, Xin Yao (UoB) Impact of Parameter Tuning in SEE PROMISE, 2013 21 / 37

RQ1 – k-NN (Table 5 in Paper) [1] 1-NN always
behaves the worst. [¯ 1] k = 3 or 5 always perform the best (see Fig.1∼3 in paper). [2] k-NN is not sensitive to parameter choices in SEE. Liyan Song, Leandro Minku, Xin Yao (UoB) Impact of Parameter Tuning in SEE PROMISE, 2013 22 / 37

RQ1: Different Sensitivity Overview Research Question 1 Given an approach
and a data set, how sensitive is this approach to different parameter settings in terms of its average performance across time steps? Our Response 1 Different learning machines have different sensitivity to their PSs. MLP & Bagging+MLPs: extremely sensitive RTs & Bagging+RTs: usually not sensitive k-NN: not sensitive Liyan Song, Leandro Minku, Xin Yao (UoB) Impact of Parameter Tuning in SEE PROMISE, 2013 23 / 37

RQ2: Time-step performance of best PSs Preview Research Question 2
Given an approach and a data set, does the best parameter setting in terms of average performance across time steps perform consistently well at each time step compared with other parameter settings? We only show results in: 1 MLPs + Maxwell 2 RTS + SingleISBSG 3 k-NN + Kitchenham Refer Fig.4 ∼ 10 in our paper. Liyan Song, Leandro Minku, Xin Yao (UoB) Impact of Parameter Tuning in SEE PROMISE, 2013 25 / 37

RQ2: Time-step performance of best PSs [1] Best PSs outperform
across most time steps Though a few time steps default/worst outperforms. Liyan Song, Leandro Minku, Xin Yao (UoB) Impact of Parameter Tuning in SEE PROMISE, 2013 26 / 37

RQ2: Time-step performance of best PSs (Continue) Liyan Song, Leandro
Minku, Xin Yao (UoB) Impact of Parameter Tuning in SEE PROMISE, 2013 27 / 37

RQ2: Time-step performance of best PSs Overview Research Question 2
Given an approach and a data set, does the best parameter setting in terms of average performance across time steps perform consistently well at each time step compared with other parameter settings? Our Response 2 Best PSs achieve better performance at most time steps. Liyan Song, Leandro Minku, Xin Yao (UoB) Impact of Parameter Tuning in SEE PROMISE, 2013 28 / 37

RQ3: Bagging helps? Preview Research Question 3 Could Bagging also
help to lessen the base learners’ sensitivity to parameter settings? Liyan Song, Leandro Minku, Xin Yao (UoB) Impact of Parameter Tuning in SEE PROMISE, 2013 30 / 37

RQ3: Bagging helps? [1] Bagging helps the default PSs get
closer to the best ones. [2] In most cases, Bagging helps to improve the overall performance. Liyan Song, Leandro Minku, Xin Yao (UoB) Impact of Parameter Tuning in SEE PROMISE, 2013 31 / 37

RQ3: Bagging helps? (Continue) Liyan Song, Leandro Minku, Xin Yao
(UoB) Impact of Parameter Tuning in SEE PROMISE, 2013 32 / 37

RQ3: Bagging helps? Overview Research Question 3 Could Bagging also
help to lessen the base learners’ sensitivity to parameter settings? Our Response 3 Bagging can help the default PSs closer to the best ones. Liyan Song, Leandro Minku, Xin Yao (UoB) Impact of Parameter Tuning in SEE PROMISE, 2013 33 / 37

Threats to validity 1 Other learning machines and more data
sets should be investigated. We perform on 5 representative approaches on 3 standard data sets. 2 Unable to exhaust all possible parameter values. Impossible A good range in Table 2. Additional values can be tested in the future. Liyan Song, Leandro Minku, Xin Yao (UoB) Impact of Parameter Tuning in SEE PROMISE, 2013 34 / 37

Overall Review 1. Given an approach and a data set,
how sensitive is this approach to different parameter settings in terms of its average performance across time steps? * Different learning machines have different sensitivity to their PSs. 2. Given an approach and a data set, does the best parameter setting in terms of average performance across time steps perform consistently well at each time step compared with other parameter settings? * Best PSs achieve better performance at most time steps. 3. Could Bagging also help to lessen the base learners’ sensitivity to parameter settings? * Bagging can help the default PSs closer to the best ones. Liyan Song, Leandro Minku, Xin Yao (UoB) Impact of Parameter Tuning in SEE PROMISE, 2013 35 / 37

Thank You! Liyan Song, Leandro Minku, Xin Yao (UoB) Impact
of Parameter Tuning in SEE PROMISE, 2013 36 / 37

Appendix Slides Eﬀect Size Quantify the size of diﬀerence between
two groups. Formula: d1 = ˆ MAE1− ˆ MAE2 std2 1 +std2 2 2 Pooled Standard Deviation stdpooled = std2 1 +std2 2 +...+std2 n n Liyan Song, Leandro Minku, Xin Yao (UoB) Impact of Parameter Tuning in SEE PROMISE, 2013 37 / 37

The Impact of Parameter Tuning on Software Effo...

The Impact of Parameter Tuning on Software Effort Estimation Using Learning Machines

More Decks by PROMISE'13: The 9th International Conference on Predictive Models in Software Engineering

Other Decks in Research

Featured

Transcript