Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Identifying Utility Functions using Random Forests (SANER 2016)

Identifying Utility Functions using Random Forests (SANER 2016)

Utility functions are general purpose functions,
which are useful in many parts of a system. To facilitate reuse,
they are usually implemented in specific libraries. However,
developers frequently miss opportunities to implement generalpurpose
functions in utility libraries, which decreases the chances
of reuse. In this paper, we describe our ongoing investigation on
using Random Forest classifiers to automatically identify utility
functions. Using a list of static source code metrics we train
a classifier to identify such functions, both in Java (using 84
projects from the Qualitas Corpus) and in JavaScript (using 22
popular projects from GitHub). We achieve the following median
results for Java: 0.90 (AUC), 0.83 (precision), 0.88 (recall), and
0.84 (F-measure). For JavaScript, the median results are 0.80
(AUC), 0.75 (precision), 0.89 (recall), and 0.76 (F-measure).

ASERG, DCC, UFMG

March 16, 2016
Tweet

More Decks by ASERG, DCC, UFMG

Other Decks in Research

Transcript

  1. Identifying Utility Functions using Random Forests
    Tamara Mendes, Marco Tulio Valente, Andre Hora
    Federal University of Minas Gerais, Brazil
    Alexander Serebrenik
    Eindhoven University of Technology, The Netherlands
    SANER 2016

    View Slide

  2. Utility Functions
    General purpose functions
    Separately packaged to facilitate reuse
    Examples: date, time, string manipulation, etc
    2

    View Slide

  3. Examples
    /* ace/lib/ace/incremental_search.js */
    function isRegExp(obj) {
    return obj instanceof RegExp;
    }
    /* brackets/src/language/HTMLTokenizer.js */
    function isWhitespace(c) {
    return c === " " || c === "\t" || c === "\r" || c === "\n";
    }
    3

    View Slide

  4. /* ace/lib/ace/incremental_search.js */
    function isRegExp(obj) {
    return obj instanceof RegExp;
    }
    /* brackets/src/language/HTMLTokenizer.js */
    function isWhitespace(c) {
    return c === " " || c === "\t" || c === "\r" || c === "\n";
    }
    Wrongly Placed Utility Functions
    4

    View Slide

  5. /* ace/lib/ace/incremental_search.js */
    function isRegExp(obj) {
    return obj instanceof RegExp;
    }
    /* brackets/src/language/HTMLTokenizer.js */
    function isWhitespace(c) {
    return c === " " || c === "\t" || c === "\r" || c === "\n";
    }


    Wrongly Placed Utility Functions
    5
    Specific library
    Specific library

    View Slide

  6. Specific library!
    !
    !
    !
    !
    !
    /* ace/lib/ace/incremental_search.js */
    function isRegExp(obj) {
    return obj instanceof RegExp;
    }
    Utility library!
    !
    !
    !
    !
    !
    /* ace/lib/ace/util.js */
    function isRegExp(obj) {
    return obj instanceof RegExp;
    }
    … …
    … …
    Wrongly Placed Utility Functions
    6

    View Slide

  7. Solution
    Use machine learning to identify utility functions
    Suggest Move Method refactoring
    Utility functions in util libraries: increase visibility,
    reuse, and decrease code duplication
    7

    View Slide

  8. Preliminary Exploratory Study
    Assumption 1 (Research problem)
    There are utility functions that are not implemented in util libraries
    FP: functions implemented in util
    libraries that are not utility functions
    FN: utility functions not
    implemented in util libraries
    Assumption 2 (Availability of Training Data)!
    Most functions in util libraries are indeed utility functions
    8

    View Slide

  9. Preliminary Exploratory Study
    Assumption 1 (Research problem)
    There are utility functions that are not implemented in util libraries
    FP: functions implemented in util
    libraries that are not utility functions
    FN: utility functions not
    implemented in util libraries
    Assumption 2 (Availability of Training Data)!
    Most functions in util libraries are indeed utility functions
    9

    View Slide

  10. Study Design: Case Studies
    !
    22 JavaScript popular
    projects from GitHub
    !
    84 Java projects
    from Qualitas Corpus
    +
    10

    View Slide

  11. Study Design: Classifier
    • Random Forest: robust to noise and outliers & widely used
    in software engineering research + 10 fold-cross validation
    • Predictors: collected by static analysis (function level)
    • 20 for JavaScript (eg, complexity, LOC, DOM uses, references
    to this, function calls…)
    • 24 for Java (eg, complexity, LOC, is static, outcoming and
    incoming calls…)
    11

    View Slide

  12. Study Design: Input Data
    % of utility functions
    n Utility functions!
    functions with path = *util*
    n Non-util functions!
    randomly selected functions
    with path ≠ *util*
    12

    View Slide

  13. Results: AUC and Precision
    0.9
    0.8
    0.83
    0.75
    13
    java javascript java javascript

    View Slide

  14. Results: Recall and FM
    0.88 0.89
    0.84
    0.76
    14
    java javascript java javascript

    View Slide

  15. Best Predictors:Java
    average rank position
    best predictors
    15

    View Slide

  16. Best Predictors: JavaScript
    average rank position
    best predictors
    16

    View Slide

  17. Specific library!
    !
    !
    !
    !
    !
    /* ace/lib/ace/incremental_search.js */
    function isRegExp(obj) {
    return obj instanceof RegExp;
    }
    Utility library!
    !
    !
    !
    !
    !
    /* ace/lib/ace/util.js */
    function isRegExp(obj) {
    return obj instanceof RegExp;
    }
    … …
    … …
    Practical Application: Move
    Utility Function
    17
    Utility functions in util libraries: increase visibility, reuse, and
    decrease code duplication

    View Slide

  18. Identifying Utility Functions using Random Forests
    Tamara Mendes, Marco Tulio Valente, Andre Hora
    Federal University of Minas Gerais, Brazil
    Alexander Serebrenik
    Eindhoven University of Technology, The Netherlands
    SANER 2016

    View Slide