Identifying Utility Functions using Random Forests (SANER 2016)

Identifying Utility Functions using Random Forests (SANER 2016)

Utility functions are general purpose functions,
which are useful in many parts of a system. To facilitate reuse,
they are usually implemented in specific libraries. However,
developers frequently miss opportunities to implement generalpurpose
functions in utility libraries, which decreases the chances
of reuse. In this paper, we describe our ongoing investigation on
using Random Forest classifiers to automatically identify utility
functions. Using a list of static source code metrics we train
a classifier to identify such functions, both in Java (using 84
projects from the Qualitas Corpus) and in JavaScript (using 22
popular projects from GitHub). We achieve the following median
results for Java: 0.90 (AUC), 0.83 (precision), 0.88 (recall), and
0.84 (F-measure). For JavaScript, the median results are 0.80
(AUC), 0.75 (precision), 0.89 (recall), and 0.76 (F-measure).

13beaa3b7239eca3319d54c6a9f3a85a?s=128

ASERG, DCC, UFMG

March 16, 2016
Tweet

Transcript

  1. Identifying Utility Functions using Random Forests Tamara Mendes, Marco Tulio

    Valente, Andre Hora Federal University of Minas Gerais, Brazil Alexander Serebrenik Eindhoven University of Technology, The Netherlands SANER 2016
  2. Utility Functions General purpose functions Separately packaged to facilitate reuse

    Examples: date, time, string manipulation, etc 2
  3. Examples /* ace/lib/ace/incremental_search.js */ function isRegExp(obj) { return obj instanceof

    RegExp; } /* brackets/src/language/HTMLTokenizer.js */ function isWhitespace(c) { return c === " " || c === "\t" || c === "\r" || c === "\n"; } 3
  4. /* ace/lib/ace/incremental_search.js */ function isRegExp(obj) { return obj instanceof RegExp;

    } /* brackets/src/language/HTMLTokenizer.js */ function isWhitespace(c) { return c === " " || c === "\t" || c === "\r" || c === "\n"; } Wrongly Placed Utility Functions 4
  5. /* ace/lib/ace/incremental_search.js */ function isRegExp(obj) { return obj instanceof RegExp;

    } /* brackets/src/language/HTMLTokenizer.js */ function isWhitespace(c) { return c === " " || c === "\t" || c === "\r" || c === "\n"; } ✖ ✖ Wrongly Placed Utility Functions 5 Specific library Specific library
  6. Specific library! ! ! ! ! ! /* ace/lib/ace/incremental_search.js */

    function isRegExp(obj) { return obj instanceof RegExp; } Utility library! ! ! ! ! ! /* ace/lib/ace/util.js */ function isRegExp(obj) { return obj instanceof RegExp; } … … … … Wrongly Placed Utility Functions 6
  7. Solution Use machine learning to identify utility functions Suggest Move

    Method refactoring Utility functions in util libraries: increase visibility, reuse, and decrease code duplication 7
  8. Preliminary Exploratory Study Assumption 1 (Research problem) There are utility

    functions that are not implemented in util libraries FP: functions implemented in util libraries that are not utility functions FN: utility functions not implemented in util libraries Assumption 2 (Availability of Training Data)! Most functions in util libraries are indeed utility functions 8
  9. Preliminary Exploratory Study Assumption 1 (Research problem) There are utility

    functions that are not implemented in util libraries FP: functions implemented in util libraries that are not utility functions FN: utility functions not implemented in util libraries Assumption 2 (Availability of Training Data)! Most functions in util libraries are indeed utility functions 9
  10. Study Design: Case Studies ! 22 JavaScript popular projects from

    GitHub ! 84 Java projects from Qualitas Corpus + 10
  11. Study Design: Classifier • Random Forest: robust to noise and

    outliers & widely used in software engineering research + 10 fold-cross validation • Predictors: collected by static analysis (function level) • 20 for JavaScript (eg, complexity, LOC, DOM uses, references to this, function calls…) • 24 for Java (eg, complexity, LOC, is static, outcoming and incoming calls…) 11
  12. Study Design: Input Data % of utility functions n Utility

    functions! functions with path = *util* n Non-util functions! randomly selected functions with path ≠ *util* 12
  13. Results: AUC and Precision 0.9 0.8 0.83 0.75 13 java

    javascript java javascript
  14. Results: Recall and FM 0.88 0.89 0.84 0.76 14 java

    javascript java javascript
  15. Best Predictors:Java average rank position best predictors 15

  16. Best Predictors: JavaScript average rank position best predictors 16

  17. Specific library! ! ! ! ! ! /* ace/lib/ace/incremental_search.js */

    function isRegExp(obj) { return obj instanceof RegExp; } Utility library! ! ! ! ! ! /* ace/lib/ace/util.js */ function isRegExp(obj) { return obj instanceof RegExp; } … … … … Practical Application: Move Utility Function 17 Utility functions in util libraries: increase visibility, reuse, and decrease code duplication
  18. Identifying Utility Functions using Random Forests Tamara Mendes, Marco Tulio

    Valente, Andre Hora Federal University of Minas Gerais, Brazil Alexander Serebrenik Eindhoven University of Technology, The Netherlands SANER 2016