$30 off During Our Annual Pro Sale. View Details »

Assessing the Threat of Untracked Changes in So!ware Evolution (ICSE 2018)

Assessing the Threat of Untracked Changes in So!ware Evolution (ICSE 2018)

While refactoring is extensively performed by practitioners, many Mining Software Repositories (MSR) approaches do not detect nor keep track of refactorings when performing source code evolution analysis. In the best case, keeping track of refactorings could be unnecessary work; in the worst case, these untracked changes could significantly affect the performance of MSR approaches. Since the extent of the threat is unknown, the goal of this paper is to assess whether it is significant. Based on an extensive empirical study, we answer positively: we found that between 10 and 21% of changes at the method level in 15 large Java systems are untracked. This results in a large proportion (25%) of entities that may have their histories split by these changes, and a measurable effect on at least two MSR approaches. We conclude that handling untracked changes should be systematically considered by MSR studies.

ASERG, DCC, UFMG

June 01, 2018
Tweet

More Decks by ASERG, DCC, UFMG

Other Decks in Research

Transcript

  1. Assessing the Threat of Untracked
    Changes in Software Evolution
    André Hora, Danilo Silva,
    Marco Tulio Valente, Romain Robbes
    ICSE 2018

    View Slide

  2. Outline
    1. Context
    2. Problem
    3. Background
    4. Study Design
    5. Results
    6. Final Remarks
    2

    View Slide

  3. Mining Software
    Repositories (MSR)
    3

    View Slide

  4. MSR Examples
    • Library migration
    • Change prediction
    • Bug fixing
    • Warnings prioritization
    • Code expert computation
    • …
    4

    View Slide

  5. Level of analysis
    Changes in classes and
    methods over time
    5

    View Slide

  6. Example 1
    java.util.Vector —> java.util.List
    6

    View Slide

  7. Example 2
    7
    FileInputStream() —> Okio.source()

    View Slide

  8. Example 2
    8
    FileInputStream() —> Okio.source()

    View Slide

  9. Example 2
    9
    FileInputStream() —> Okio.source()
    Rule would not be detected due to
    the method renaming

    View Slide

  10. Several other examples…
    10

    View Slide

  11. Refactoring is common
    practice in software
    development
    11

    View Slide

  12. Outline
    1. Context
    2. Problem
    3. Background
    4. Study Design
    5. Results
    6. Final Remarks
    12

    View Slide

  13. MSR studies may be
    affected by refactoring
    13

    View Slide

  14. MSR researchers are aware about this
    “threat”, but they often do not assess it
    “Our tool is unable to verify if an entity in revision n has been renamed
    in revision n+1” [48]
    “The development history of a file can be lost in case of renaming
    operations, copy or file split” [3]
    “It is possible to miss bug-introducing changes when a file changes its
    name since the approach does not track such name changes” [38]
    “We detect renamed or moved units as units that are removed first and
    added later” [50]
    14

    View Slide

  15. MSR researchers are aware about this
    “threat”, but they often do not assess it
    “Our tool is unable to verify if an entity in revision n has been renamed
    in revision n+1” [48]
    “The development history of a file can be lost in case of renaming
    operations, copy or file split” [3]
    “It is possible to miss bug-introducing changes when a file changes its
    name since the approach does not track such name changes” [38]
    “We detect renamed or moved units as units that are removed first and
    added later” [50]
    15
    [2, 5, 6, 7, 12, 22, 26, 27, 28, 29, 34, 36,
    42, 45, 53, 54, 59, 61, 62, 66, 67, 68…]

    View Slide

  16. What is the impact of
    refactoring on MSR
    studies?
    16

    View Slide

  17. Outline
    1. Context
    2. Problem
    3. Background
    4. Study Design
    5. Results
    6. Final Remarks
    17

    View Slide

  18. Tracked and Untracked
    Changes
    version 1 version 2
    public void foo() {
    obj.print()
    }
    public void foo() {
    obj.println()
    }
    version 3
    public void bar() {
    obj.println()
    }
    tracked change: preserves the entity name and
    modifies its source code
    untracked change: modifies the entity name,
    and may also modify its source code
    18

    View Slide

  19. Change Graph
    class Foo {
    mA() {…}
    }
    class Bar {
    mB() {…}
    mC() {…}
    }
    class Foo {
    mA() {…}
    }
    class Bar {
    mX() {…}
    mC() {…}
    }
    class Foo {
    mA() {…}
    }
    class Bar {
    mX() {…}
    }
    class Foo {
    mA() {…}
    }
    class Baz {
    mY() {…}
    }
    class Qux {
    mC() {…}
    }
    class Qux {
    mC() {…}
    mE() {…}
    }
    version 1 version 2 version 3 version 4
    19
    tracked change
    untracked change
    Legend

    View Slide

  20. Outline
    1. Context
    2. Problem
    3. Background
    4. Study Design
    5. Results
    6. Final Remarks
    20

    View Slide

  21. Research Questions
    • RQ1.What is the frequency of untracked
    changes?
    • RQ2. What is the extension of untracked
    changes?
    • RQ3. What is the impact of untracked
    changes in existing MSR-based
    approaches?
    21

    View Slide

  22. Case Studies
    22

    View Slide

  23. Tracked and Untracked
    Changes Computation
    Refactoring resolution
    • RefDiff [Silva et al., MSR 2017]
    • Precision: 85.6% - 100%
    • Recall: 89.8% - 93.9%
    1. Rename Class
    2. Move Class
    3. Extract Superclass
    4. Move and Rename Class
    5. Extract Interface
    6. Rename Method
    7. Move Method
    8. Extract Method
    9. Inline Method
    10. Pull Up Method
    11. Push Down Method
    23

    View Slide

  24. Outline
    1. Context
    2. Problem
    3. Background
    4. Study Design
    5. Results
    6. Final Remarks
    24

    View Slide

  25. RQ1
    What is the frequency of untracked
    changes?
    25

    View Slide

  26. RQ1. What is the frequency of
    untracked changes? (example)
    class Foo {
    mA() {…}
    }
    class Bar {
    mB() {…}
    mC() {…}
    }
    class Foo {
    mA() {…}
    }
    class Bar {
    mX() {…}
    mC() {…}
    }
    class Foo {
    mA() {…}
    }
    class Bar {
    mX() {…}
    }
    class Foo {
    mA() {…}
    }
    class Baz {
    mY() {…}
    }
    class Qux {
    mC() {…}
    }
    class Qux {
    mC() {…}
    mE() {…}
    }
    version 1 version 2 version 3 version 4
    26
    17 changes
    12 tracked changes
    5 untracked changes

    View Slide

  27. RQ1. What is the frequency of
    untracked changes? (example)
    class Foo {
    mA() {…}
    }
    class Bar {
    mB() {…}
    mC() {…}
    }
    class Foo {
    mA() {…}
    }
    class Bar {
    mX() {…}
    mC() {…}
    }
    class Foo {
    mA() {…}
    }
    class Bar {
    mX() {…}
    }
    class Foo {
    mA() {…}
    }
    class Baz {
    mY() {…}
    }
    class Qux {
    mC() {…}
    }
    class Qux {
    mC() {…}
    mE() {…}
    }
    version 1 version 2 version 3 version 4
    27
    17 changes
    12 tracked changes
    5 untracked changes

    View Slide

  28. RQ1. What is the frequency of
    untracked changes? (example)
    class Foo {
    mA() {…}
    }
    class Bar {
    mB() {…}
    mC() {…}
    }
    class Foo {
    mA() {…}
    }
    class Bar {
    mX() {…}
    mC() {…}
    }
    class Foo {
    mA() {…}
    }
    class Bar {
    mX() {…}
    }
    class Foo {
    mA() {…}
    }
    class Baz {
    mY() {…}
    }
    class Qux {
    mC() {…}
    }
    class Qux {
    mC() {…}
    mE() {…}
    }
    version 1 version 2 version 3 version 4
    28
    17 changes
    12 tracked changes
    5 untracked changes

    View Slide

  29. RQ1. What is the frequency of
    untracked changes? (example)
    class Foo {
    mA() {…}
    }
    class Bar {
    mB() {…}
    mC() {…}
    }
    class Foo {
    mA() {…}
    }
    class Bar {
    mX() {…}
    mC() {…}
    }
    class Foo {
    mA() {…}
    }
    class Bar {
    mX() {…}
    }
    class Foo {
    mA() {…}
    }
    class Baz {
    mY() {…}
    }
    class Qux {
    mC() {…}
    }
    class Qux {
    mC() {…}
    mE() {…}
    }
    version 1 version 2 version 3 version 4
    29
    Not desirable: relevant
    data may be missed !!!
    17 changes
    12 tracked changes
    5 untracked changes

    View Slide

  30. RQ1. What is the frequency of
    untracked changes?
    Untracked
    changes
    Classes

    2% to 15%

    Methods

    10% to 21%
    30

    View Slide

  31. RQ1. What is the frequency of
    untracked changes?
    Untracked
    changes
    Classes

    2% to 15%

    Methods

    10% to 21%
    31
    Untracked changes are frequent

    View Slide

  32. RQ1. What is the frequency of
    untracked changes?
    Untracked
    changes
    Rename mtd: 26%

    Extract mtd: 23%

    Move mtd: 22%

    Move class: 12%
    32

    View Slide

  33. RQ1. What is the frequency of
    untracked changes?
    Untracked
    changes
    Rename mtd: 26%

    Extract mtd: 23%

    Move mtd: 22%

    Move class: 12%
    33
    Keeping track of renamings is not enough

    View Slide

  34. RQ2
    What is the extension of untracked
    changes?
    34

    View Slide

  35. class Foo {
    mA() {…}
    }
    class Bar {
    mB() {…}
    mC() {…}
    }
    class Foo {
    mA() {…}
    }
    class Bar {
    mX() {…}
    mC() {…}
    }
    class Foo {
    mA() {…}
    }
    class Bar {
    mX() {…}
    }
    class Foo {
    mA() {…}
    }
    class Baz {
    mY() {…}
    }
    class Qux {
    mC() {…}
    }
    class Qux {
    mC() {…}
    mE() {…}
    }
    version 1 version 2 version 3 version 4
    7 paths
    3 paths: only tracked
    changes
    4 paths: at least one
    untracked changes
    RQ2. What is the extension of
    untracked changes? (example)
    35

    View Slide

  36. class Foo {
    mA() {…}
    }
    class Bar {
    mB() {…}
    mC() {…}
    }
    class Foo {
    mA() {…}
    }
    class Bar {
    mX() {…}
    mC() {…}
    }
    class Foo {
    mA() {…}
    }
    class Bar {
    mX() {…}
    }
    class Foo {
    mA() {…}
    }
    class Baz {
    mY() {…}
    }
    class Qux {
    mC() {…}
    }
    class Qux {
    mC() {…}
    mE() {…}
    }
    version 1 version 2 version 3 version 4
    RQ2. What is the extension of
    untracked changes? (example)
    36
    1
    2
    3
    7 paths
    3 paths: only tracked
    changes
    4 paths: at least one
    untracked changes

    View Slide

  37. class Foo {
    mA() {…}
    }
    class Bar {
    mB() {…}
    mC() {…}
    }
    class Foo {
    mA() {…}
    }
    class Bar {
    mX() {…}
    mC() {…}
    }
    class Foo {
    mA() {…}
    }
    class Bar {
    mX() {…}
    }
    class Foo {
    mA() {…}
    }
    class Baz {
    mY() {…}
    }
    class Qux {
    mC() {…}
    }
    class Qux {
    mC() {…}
    mE() {…}
    }
    version 1 version 2 version 3 version 4
    RQ2. What is the extension of
    untracked changes? (example)
    37
    1
    2
    3
    4
    7 paths
    3 paths: only tracked
    changes
    4 paths: at least one
    untracked changes

    View Slide

  38. class Foo {
    mA() {…}
    }
    class Bar {
    mB() {…}
    mC() {…}
    }
    class Foo {
    mA() {…}
    }
    class Bar {
    mX() {…}
    mC() {…}
    }
    class Foo {
    mA() {…}
    }
    class Bar {
    mX() {…}
    }
    class Foo {
    mA() {…}
    }
    class Baz {
    mY() {…}
    }
    class Qux {
    mC() {…}
    }
    class Qux {
    mC() {…}
    mE() {…}
    }
    version 1 version 2 version 3 version 4
    RQ2. What is the extension of
    untracked changes? (example)
    38
    1
    2
    3
    4
    Not desirable: their
    histories may be split !!!
    7 paths
    3 paths: only tracked
    changes
    4 paths: at least one
    untracked changes

    View Slide

  39. RQ2. What is the extension of
    untracked changes?
    39
    18% to 41%
    entities with at least
    one untracked change
    in their histories

    View Slide

  40. RQ2. What is the extension of
    untracked changes?
    22% to 58%
    entities with at least
    one untracked change
    in their histories
    Only considering the

    most changed entities
    40

    View Slide

  41. RQ2. What is the extension of
    untracked changes?
    22% to 58%
    entities with at least
    one untracked change
    in their histories
    Only considering the

    most changed entities
    41
    Untracked changes cause splits in entity histories

    View Slide

  42. RQ3. What is the impact of untracked changes
    in existing MSR-based approaches?
    • Approaches
    • API evolution mining rule (eg, Vector —> List)
    • API co-usage mining rule (eg, Map —> HashMap)
    • Results
    • Amount of mined rules: usually improves when taking into
    account untracked changes (median: 0% to +7%)
    • Quality of mined rules: slightly improves when including
    untracked changes (median: -2% to +2%)
    42

    View Slide

  43. RQ3. What is the impact of untracked changes
    in existing MSR-based approaches?
    • Approaches
    • API evolution mining rule (eg, Vector —> List)
    • API co-usage mining rule (eg, Map —> HashMap)
    • Results
    • Amount of mined rules: usually improves when taking into
    account untracked changes (median: 0% to +7%)
    • Quality of mined rules: slightly improves when including
    untracked changes (median: -2% to +2%)
    43
    The impact of untracked changes is difficult to predict,
    and needs to be evaluated in a case-by-case basis

    View Slide

  44. Outline
    1. Context
    2. Problem
    3. Background
    4. Study Design
    5. Results
    6. Final Remarks
    44

    View Slide

  45. Untracked changes are frequent
    (10-21% at method level)
    MSR studies should resolve untracked changes to access potentially
    relevant new mining data
    Keeping track of renamings is not enough
    (≈26%)
    MSR studies should address “extraction” and “moving” for a more
    complete resolution of untracked changes
    Untracked changes cause splits in entity histories
    (18-41%)
    MSR studies should resolve untracked changes when performing
    traceability analysis, for more precise entity lifespans
    45

    View Slide

  46. Assessing the Threat of Untracked
    Changes in Software Evolution
    André Hora, Danilo Silva,
    Marco Tulio Valente, Romain Robbes
    ICSE 2018

    View Slide