Refactoring Graphs: Assessing Refactoring over Time

Refactoring Graphs: Assessing Refactoring over Time

Refactoring is an essential activity during software evolution. Frequently, practitioners rely on such transformations to improve source code maintainability and quality. As a consequence, this process may produce new source code entities or change the structure of existing ones. Sometimes, the transformations are atomic, i.e., performed in a single commit. In other cases, they generate sequences of modifications performed over time. To study and reason about refactorings over time, in this paper, we propose a novel concept called refactoring graphs and provide an algorithm to build such graphs. Then, we investigate the history of 10 popular open-source Java-based projects. After eliminating trivial graphs, we characterize a large sample of 1,150 refactoring graphs, providing quantitative data on their size, commits, age, refactoring composition, and developers. We conclude by discussing applications and implications of refactoring graphs, for example, to improve code comprehension, detect refactoring patterns, and support software evolution studies.

13beaa3b7239eca3319d54c6a9f3a85a?s=128

ASERG, DCC, UFMG

February 19, 2020
Tweet

Transcript

  1. Refactoring Graphs: Assessing Refactoring over Time Aline Brito, Andre Hora,

    Marco Tulio Valente IEEE SANER 2020
  2. Motivation Refactoring is an essential activity during software evolution 2

    Refactoring engines Motivation Benefits and challenges Refactoring over time
  3. Motivation Refactoring is an essential activity during software evolution 3

    Refactoring engines Motivation Benefits and challenges Refactoring over time
  4. Motivation Refactoring is an essential activity during software evolution 4

    Refactoring engines Motivation Benefits and challenges Refactoring over time
  5. Motivation Refactoring is an essential activity during software evolution 5

    Refactoring engines Motivation Benefits and challenges Refactoring over time
  6. Motivation Refactoring is an essential activity during software evolution 6

    Refactoring engines Motivation Benefits and challenges Refactoring over time
  7. Motivation Refactoring is an essential activity during software evolution 7

    Refactoring engines Motivation Benefits and challenges Refactoring over time
  8. “Refactoring takes time...” 8 Fowler, 1999

  9. Refactoring Graph 9

  10. A refactoring graph is a set of disconnected subgraphs 10

    Refactoring Graph Software history
  11. Example of Refactoring Subgraph 11

  12. Example of Refactoring Subgraph class Foo{ A(){…} } Method A()

    from class Foo 12
  13. Example of Refactoring Subgraph class Foo{ A(){…} } Method A()

    from class Foo Alice Bob Two developers 13
  14. Example of Refactoring Subgraph class Foo{ A(){…} } class Bar{

    A(){…} } MOVE Alice moved method A() from class Foo to Bar 14
  15. Example of Refactoring Subgraph class Foo{ A(){…} } class Bar{

    A(){…} } class Bar{ B(){…} } MOVE RENAME Six days later, Bob renamed method A() to B() 15
  16. Example of Refactoring Subgraph class Foo{ A(){…} } class Bar{

    A(){…} } class Bar{ B(){…} } MOVE These operations create a refactoring subgraph over time RENAME 16
  17. Example of Refactoring Subgraph class Foo{ A(){…} } class Bar{

    A(){…} } class Bar{ B(){…} } MOVE The refactoring subgraph contains three vertices RENAME 17
  18. Example of Refactoring Subgraph MOVE RENAME The refactoring subgraph contains

    two edges class Foo{ A(){…} } class Bar{ A(){…} } class Bar{ B(){…} } 18
  19. Example of Refactoring Subgraph class Foo{ A(){…} } class Bar{

    A(){…} } class Bar{ B(){…} } MOVE The edge represents the refactoring operation RENAME 19
  20. Example of Refactoring Subgraph MOVE util.Foo#A() util.Bar#A() util.Bar#B() The vertices

    are the full signature of methods RENAME 20
  21. Example of Refactoring Subgraph class Bar{ A(){…} } class Bar{

    B(){…} } MOVE Method A() from class Foo and package util util.Foo#A() RENAME 21
  22. Example of Refactoring Subgraph class Foo{ A(){…} } class Bar{

    A(){…} } class Bar{ B(){…} } MOVE A refactoring subgraph can include refactorings performed by one or more developers RENAME 22
  23. Example of Refactoring Subgraph class Foo{ A(){…} } class Bar{

    A(){…} } class Bar{ B(){…} } MOVE The subgraph contains refactorings performed by two authors RENAME 23
  24. Outline 1. RefDiff Tool 2. Dataset 3. Building refactoring graphs

    4. Results 24
  25. RefDiff Tool 25

  26. RefDiff A multi-language refactoring detection tool Refactorings between two versions

    of a git-based project 26 RefDiff 2.0: A Multi-language Refactoring Detection Tool TSE, 2020
  27. RefDiff Rename Extract Move Extract and Move Rename and Move

    Push Down Inline Pull Up We center on eight refactorings at the method level 27
  28. Rename Method util.Foo#a() util.Foo#b() util.Foo#m() util.Bar#m() util.Foo#a() util.Bar#b() Move and

    Rename Method Move Method Most trivial operations 28
  29. Rename Method util.Foo#a() util.Foo#b() util.Foo#m() util.Bar#m() util.Foo#a() util.Bar#b() Move and

    Rename Method Move Method Change in method’s name 29
  30. Rename Method util.Foo#a() util.Foo#b() util.Foo#a() util.Bar#b() Move and Rename Method

    Change in method’s class util.Foo#m() util.Bar#m() Move Method 30
  31. util.Foo#m() util.Bar#m() Move Method Change in method’s name and class

    Rename Method util.Foo#a() util.Foo#b() util.Foo#a() util.Bar#b() Move and Rename Method 31
  32. Push Down Method util.SubFoo2#m() util.SuperFoo#m() util.SubFoo1#m() util.SubFooi#m() Pull up Method

    util.SubFoo2#m() util.SuperFoo#m() util.SubFoo1#m() util.SubFooi#m() 32
  33. Push Down Method util.SubFoo2#m() util.SuperFoo#m() util.SubFoo1#m() util.SubFooi#m() Moving a method

    from a superclass to one or more subclasses 33
  34. Pull up Method util.SubFoo2#m() util.SuperFoo#m() util.SubFoo1#m() util.SubFooi#m() Moving one or

    more methods from subclasses to a superclass 34
  35. Extract Method util.Foo#m2() util.Foo#m() util.Foo#m1() util.Foo#mi() util.Foo#m2() util.Foo#m() util.Foo#m1() util.Foo#mi()

    35
  36. Extract Method util.Foo#m2() util.Foo#m() util.Foo#m1() util.Foo#mi() Extracting multiple methods from

    a single method 36
  37. Extract Method util.Foo#m2() util.Foo#m() util.Foo#m1() util.Foo#mi() Extracting a single method

    from multiple methods 37
  38. Inline Method util.Bar2#m2() util.Foo#m() util.Bar1#m1() util.Bari#mi() Extract and Move Method

    util.Foo#m2() util.Bar#m() util.Foo#m1() util.Foo#mi() 38
  39. Extract and Move Method util.Foo#m2() util.Bar#m() util.Foo#m1() util.Foo#mi() Extracting a

    method to a distinct class 39
  40. Inline Method util.Bar2#m2() util.Foo#m() util.Bar1#m1() util.Bari#mi() Removal of trivial elements

    and replacement of the respective calls 40
  41. Dataset 41

  42. Dataset 10 popular Java projects in terms of stars on

    GitHub 42
  43. Dataset 10 popular Java projects in terms of stars on

    GitHub 43
  44. Dataset + 100 Java files + 1K commits 10 popular

    Java projects in terms of stars on GitHub 44
  45. Building Refactoring Graphs 45

  46. Building Refactoring Graphs 46 Scripts INPUT OUTPUT We implement a

    set of scripts to build refactoring graphs
  47. Building Refactoring Graphs 47 Algorithm INPUT OUTPUT The input comprises

    a list of refactorings
  48. Building Refactoring Graphs 48 INPUT OUTPUT Identification of each refactoring

    and the two methods involved Algorithm
  49. Building Refactoring Graphs 49 INPUT OUTPUT Creation of a directed

    edge representing this refactoring Algorithm
  50. Building Refactoring Graphs 50 Algorithm INPUT OUTPUT The output includes

    sets of refactoring subgraphs in text format
  51. Building Refactoring Graphs 51 We detected a total of 8,926

    refactoring subgraphs
  52. Building Refactoring Graphs 52 We assess 1,150 (13%) refactoring subgraphs

    with more than one commit
  53. Results 53

  54. RQ1: What is the size of refactoring subgraphs? 54

  55. Number of vertices by refactoring subgraph 55

  56. Number of vertices by refactoring subgraph Number of vertices ranges

    from two to four (85%) 56
  57. Number of vertices by refactoring subgraph The most frequent cases

    are subgraphs with three vertices (639 occurrences, 56%) 57
  58. Number of edges by refactoring subgraph 58

  59. Number of edges by refactoring subgraph Number of edges ranges

    between two and three (83%) 59
  60. Number of edges by refactoring subgraph The most frequent cases

    are subgraphs with two edges (772 occurrences, 67%) 60
  61. Refactoring subgraph from MPAndroidChart RENAME EXTRACT EXTRACT AND MOVE EXTRACT

    AND MOVE 61
  62. Refactoring subgraph from MPAndroidChart RENAME EXTRACT EXTRACT AND MOVE EXTRACT

    AND MOVE A developer renamed drawYLegend() to drawYLabels() 62
  63. Refactoring subgraph from MPAndroidChart EXTRACT EXTRACT AND MOVE EXTRACT AND

    MOVE RENAME 13 days later The same developer extracted a new method 63
  64. Refactoring subgraph from MPAndroidChart RENAME EXTRACT EXTRACT AND MOVE EXTRACT

    AND MOVE Two days later The developer made new extractions to another class 64
  65. Refactoring subgraph from MPAndroidChart RENAME EXTRACT EXTRACT AND MOVE EXTRACT

    AND MOVE Only one developer 65
  66. Refactoring subgraph from MPAndroidChart RENAME EXTRACT EXTRACT AND MOVE EXTRACT

    AND MOVE Five vertices 66
  67. Refactoring subgraph from MPAndroidChart RENAME EXTRACT EXTRACT AND MOVE EXTRACT

    AND MOVE Four edges 67
  68. Refactoring subgraph from MPAndroidChart RENAME EXTRACT EXTRACT AND MOVE EXTRACT

    AND MOVE Three commits Commit C1 Commit C2 Commit C3 68
  69. RQ2: How many commits are there in refactoring subgraphs? 69

  70. Number of commits by refactoring subgraph 70

  71. Number of commits by refactoring subgraph Most refactoring subgraphs are

    created in two or three commits (95%) 71
  72. Number of commits by refactoring subgraph Most recurrent case has

    two commits (81%) 72
  73. Refactoring subgraph from Elasticsearch MOVE MOVE EXTRACT EXTRACT EXTRACT Commit

    C1 Commit C2 73
  74. Refactoring subgraph from Elasticsearch MOVE MOVE EXTRACT EXTRACT EXTRACT Commit

    C1 A developer moved two methods to another class 74
  75. Refactoring subgraph from Elasticsearch MOVE MOVE EXTRACT EXTRACT EXTRACT A

    second developer extracted duplicated code from three methods Three months later Commit C2 75
  76. Refactoring subgraph from Elasticsearch MOVE MOVE EXTRACT EXTRACT EXTRACT Two

    methods are the ones moved early 76
  77. Refactoring subgraph from Elasticsearch MOVE MOVE EXTRACT EXTRACT EXTRACT Two

    authors were responsible for this refactoring subgraph 77
  78. Refactoring subgraph from Elasticsearch MOVE MOVE EXTRACT EXTRACT EXTRACT Commit

    C1 Commit C2 Two commits 78
  79. RQ3: What is the age of refactoring subgraphs? 79

  80. Age of the refactoring subgraphs Most recent commit Oldest commit

    Age 80
  81. Age of the refactoring subgraphs 81

  82. Age of the refactoring subgraphs 82 67% of the refactoring

    subgraphs have more than one month
  83. Age of the refactoring subgraphs Some subgraphs have few days

    83
  84. Age of the refactoring subgraphs Most subgraphs have weeks or

    even months 84
  85. Refactoring subgraph from Spring Framework RENAME RENAME 85

  86. Refactoring subgraph from Spring Framework RENAME RENAME Commit C1 A

    developer renamed method before(...) to filterBefore(...) 86
  87. Refactoring subgraph from Spring Framework RENAME The same developer reverted

    the operation, renaming filterBefore(...) to before(...) RENAME 87 Six days later
  88. Refactoring subgraph from Spring Framework RENAME RENAME A single developer

    was responsible for this refactoring subgraph 88
  89. Refactoring subgraph from Spring Framework RENAME RENAME Two commits Commit

    C1 Commit C2 89
  90. RQ4: Which refactorings compose the refactoring subgraphs? 90

  91. Frequency of refactoring operations 91

  92. Frequency of refactoring operations Most common refactoring operations include rename

    (21%), extract and move (19%), and extract (17%) 92
  93. Frequency of refactoring operations We detected only 83 occurrences of

    move and rename operations 93
  94. Frequency of refactoring operations There are also few inheritance-based refactorings

    94
  95. Homogeneous: Subgraphs with a single refactoring operation Heterogeneous: Subgraphs with

    two or more distinct refactoring operations Two groups: 95
  96. Heterogeneous vs Homogeneous subgraphs 96

  97. Heterogeneous x Homogeneous subgraphs Most refactoring subgraphs include more than

    one refactoring type (72%) 97
  98. Number of distinct refactoring operations Most heterogeneous subgraphs includes two

    distinct refactoring types (84%) 98
  99. Homogeneous refactoring subgraph from Facebook Fresco EXTRACT EXTRACT EXTRACT EXTRACT

    99
  100. Homogeneous refactoring subgraph from Facebook Fresco EXTRACT EXTRACT EXTRACT EXTRACT

    Four extract method operations 100
  101. Homogeneous refactoring subgraph from Facebook Fresco EXTRACT EXTRACT EXTRACT EXTRACT

    A developer extracted method fetchDecodedImage(...) 101
  102. Homogeneous refactoring subgraph from Facebook Fresco EXTRACT EXTRACT EXTRACT years

    later EXTRACT A second developer made two new extract operations 102
  103. Homogeneous refactoring subgraph from Facebook Fresco EXTRACT EXTRACT EXTRACT EXTRACT

    Two developers 103
  104. Homogeneous refactoring subgraph from Facebook Fresco EXTRACT EXTRACT EXTRACT EXTRACT

    Commit C1 Commit C2 Commit C3 Three commits 104
  105. RQ5: Are the refactoring subgraphs created by the same or

    multiple developers? 105
  106. Subgraphs performed by a single developer Subgraphs created by multiple

    developers Two groups: 106
  107. Developers of refactoring subgraphs 107

  108. Developers of refactoring subgraphs Most refactoring subgraphs are created by

    a single developer (60%) 108
  109. Refactoring subgraph from Square Okhttp RENAME RENAME RENAME MOVE EXTRACT

    EXTRACT EXTRACT 109
  110. Refactoring subgraph from Square Okhttp RENAME RENAME RENAME MOVE EXTRACT

    EXTRACT EXTRACT Commit C1 A developer renamed three methods 110
  111. Refactoring subgraph from Square Okhttp RENAME RENAME RENAME MOVE EXTRACT

    EXTRACT EXTRACT Commit C2 A second developer extracted method checkDuration(...) 111
  112. Refactoring subgraph from Square Okhttp MOVE EXTRACT EXTRACT EXTRACT Commit

    C2 Commit C3 … moving to a new class named Util RENAME RENAME RENAME 112
  113. Refactoring subgraph from Square Okhttp RENAME RENAME RENAME MOVE EXTRACT

    EXTRACT EXTRACT Two developers 113
  114. Refactoring subgraph from Square Okhttp RENAME RENAME RENAME MOVE EXTRACT

    EXTRACT EXTRACT Seven refactoring operations 114
  115. Refactoring subgraph from Square Okhttp RENAME RENAME RENAME MOVE EXTRACT

    EXTRACT EXTRACT Commit C3 Three commits Commit C2 Commit C1 115
  116. Large Subgraph Example 116

  117. Large refactoring subgraph from Square Okhttp 117 37 vertices

  118. Large refactoring subgraph from Square Okhttp Push down and move

    method operations 118
  119. Large refactoring subgraph from Square Okhttp 24 extract and move

    method operations 119
  120. Large refactoring subgraph from Square Okhttp 21 extract and move

    operations to a single method 120
  121. Large refactoring subgraph from Square Okhttp public int readInt() throws

    IOException { require(4, Deadline.NONE); return buffer.readInt(); } A developer performed 21 extract method operations to move this duplicated code to a single method 121
  122. Implications and Conclusions 122

  123. Refactoring-aware Software Evolution Refactoring Graphs is a key data structure

    to improve the results of current software evolution tools 123
  124. Example: Git Blame Show the last author that changed each

    line of a file 124
  125. Example: Git Blame Bob creates a method to calculate the

    area of a square class Math{ } float squareArea(float l){ + return l * l * l; } + float squareArea(float l){ + return l * l; + } 125
  126. Example: Git Blame Git-blame shows Bob as a creator Bob

    class Math{ } float squareArea(float l){ + return l * l * l; } + float squareArea(float l){ + return l * l; + } 126
  127. Example: Git Blame class Math{ } float squareArea(float l){ +

    return l * l * l; } Bob introduces a bug in a second commit + return l * l * l; l 127
  128. Example: Git Blame class Math{ } float squareArea(float l){ return

    l * l; } + return l * l * l; Git-blame shows Bob as responsible for the last change (bug) Bob 128
  129. Example: Git Blame class Math{ } - float squareArea(float l){

    - return l * l * l; - } Alice moves the method to a utility class class Utility{ } + float squareArea(float l){ + return l * l * l; + } 129
  130. Example: Git Blame Git-blame shows Alice as creator of method

    squareArea Alice class Utility{ } float squareArea(float l){ + return l * l * l; } + float squareArea(float l){ + return l * l * l; + } 130
  131. Example: Git Blame class Math{ float squareArea(float l){ return l

    * l; } } class Math{ float squareArea(float l){ return l * l * l; } } class Utility{ float squareArea(float l){ return l * l * l; } } Bob is the real creator of squareArea() 131
  132. Example: Git Blame class Math{ float squareArea(float l){ return l

    * l; } } class Math{ float squareArea(float l){ return l * l * l; } } class Utility{ float squareArea(float l){ return l * l * l; } } Bob is responsible for the bug 132
  133. Example: Git Blame class Math{ float squareArea(float l){ return l

    * l; } } class Math{ float squareArea(float l){ return l * l * l; } } class Utility{ float squareArea(float l){ return l * l * l; } } git-blame may miss relevant data due to refactoring operations 133
  134. Example: Git Blame class Math{ float squareArea(float l){ return l

    * l; } } class Math{ float squareArea(float l){ return l * l * l; } } class Utility{ float squareArea(float l){ return l * l * l; } } A refactoring history can improve existing tools and techniques 134
  135. 135 Future Studies Other popular programming languages and ecosystems Refactoring

    graphs based on class and package level
  136. Refactoring subgraphs... … are small … have up to three

    commits … are often heterogeneous … are mostly created by a single developer … span from a few days to months RQ1 RQ2 RQ3 RQ4 RQ5 136
  137. Refactoring Graphs: Assessing Refactoring over Time Aline Brito, Andre Hora,

    Marco Tulio Valente IEEE SANER 2020