Slide 1

Slide 1 text

Identifying Utility Functions using Random Forests Tamara Mendes, Marco Tulio Valente, Andre Hora Federal University of Minas Gerais, Brazil Alexander Serebrenik Eindhoven University of Technology, The Netherlands SANER 2016

Slide 2

Slide 2 text

Utility Functions General purpose functions Separately packaged to facilitate reuse Examples: date, time, string manipulation, etc 2

Slide 3

Slide 3 text

Examples /* ace/lib/ace/incremental_search.js */ function isRegExp(obj) { return obj instanceof RegExp; } /* brackets/src/language/HTMLTokenizer.js */ function isWhitespace(c) { return c === " " || c === "\t" || c === "\r" || c === "\n"; } 3

Slide 4

Slide 4 text

/* ace/lib/ace/incremental_search.js */ function isRegExp(obj) { return obj instanceof RegExp; } /* brackets/src/language/HTMLTokenizer.js */ function isWhitespace(c) { return c === " " || c === "\t" || c === "\r" || c === "\n"; } Wrongly Placed Utility Functions 4

Slide 5

Slide 5 text

/* ace/lib/ace/incremental_search.js */ function isRegExp(obj) { return obj instanceof RegExp; } /* brackets/src/language/HTMLTokenizer.js */ function isWhitespace(c) { return c === " " || c === "\t" || c === "\r" || c === "\n"; } ✖ ✖ Wrongly Placed Utility Functions 5 Specific library Specific library

Slide 6

Slide 6 text

Specific library! ! ! ! ! ! /* ace/lib/ace/incremental_search.js */ function isRegExp(obj) { return obj instanceof RegExp; } Utility library! ! ! ! ! ! /* ace/lib/ace/util.js */ function isRegExp(obj) { return obj instanceof RegExp; } … … … … Wrongly Placed Utility Functions 6

Slide 7

Slide 7 text

Solution Use machine learning to identify utility functions Suggest Move Method refactoring Utility functions in util libraries: increase visibility, reuse, and decrease code duplication 7

Slide 8

Slide 8 text

Preliminary Exploratory Study Assumption 1 (Research problem) There are utility functions that are not implemented in util libraries FP: functions implemented in util libraries that are not utility functions FN: utility functions not implemented in util libraries Assumption 2 (Availability of Training Data)! Most functions in util libraries are indeed utility functions 8

Slide 9

Slide 9 text

Preliminary Exploratory Study Assumption 1 (Research problem) There are utility functions that are not implemented in util libraries FP: functions implemented in util libraries that are not utility functions FN: utility functions not implemented in util libraries Assumption 2 (Availability of Training Data)! Most functions in util libraries are indeed utility functions 9

Slide 10

Slide 10 text

Study Design: Case Studies ! 22 JavaScript popular projects from GitHub ! 84 Java projects from Qualitas Corpus + 10

Slide 11

Slide 11 text

Study Design: Classifier • Random Forest: robust to noise and outliers & widely used in software engineering research + 10 fold-cross validation • Predictors: collected by static analysis (function level) • 20 for JavaScript (eg, complexity, LOC, DOM uses, references to this, function calls…) • 24 for Java (eg, complexity, LOC, is static, outcoming and incoming calls…) 11

Slide 12

Slide 12 text

Study Design: Input Data % of utility functions n Utility functions! functions with path = *util* n Non-util functions! randomly selected functions with path ≠ *util* 12

Slide 13

Slide 13 text

Results: AUC and Precision 0.9 0.8 0.83 0.75 13 java javascript java javascript

Slide 14

Slide 14 text

Results: Recall and FM 0.88 0.89 0.84 0.76 14 java javascript java javascript

Slide 15

Slide 15 text

Best Predictors:Java average rank position best predictors 15

Slide 16

Slide 16 text

Best Predictors: JavaScript average rank position best predictors 16

Slide 17

Slide 17 text

Specific library! ! ! ! ! ! /* ace/lib/ace/incremental_search.js */ function isRegExp(obj) { return obj instanceof RegExp; } Utility library! ! ! ! ! ! /* ace/lib/ace/util.js */ function isRegExp(obj) { return obj instanceof RegExp; } … … … … Practical Application: Move Utility Function 17 Utility functions in util libraries: increase visibility, reuse, and decrease code duplication

Slide 18

Slide 18 text

Identifying Utility Functions using Random Forests Tamara Mendes, Marco Tulio Valente, Andre Hora Federal University of Minas Gerais, Brazil Alexander Serebrenik Eindhoven University of Technology, The Netherlands SANER 2016