RefDiff: Detecting Refactorings in Version Histories (MSR 2017)

Slide 1

Slide 1 text

RefDiff: Detecting Refactorings in Version Histories Danilo Silva, Marco Tulio Valente Universidade Federal de Minas Gerais Belo Horizonte, Brazil

Slide 2

Slide 2 text

Introduction • Software components are in constant change • One important kind of change is refactoring 2

Slide 3

Slide 3 text

Introduction • Knowledge of the refactoring operations applied is a valuable information – Analyze software evolution – Study refactoring practice – Review and merge code 3

Slide 4

Slide 4 text

Problem: Finding refactoring activity is a non- trivial task 4

Slide 5

Slide 5 text

Problem: Finding refactoring activity is a non- trivial task Documentation? • Refactorings are rarely documented 5

Slide 6

Slide 6 text

Problem: Finding refactoring activity is a non- trivial task Instrumenting refactoring engines? • Refactorings are not always performed using automated support 6

Slide 7

Slide 7 text

Problem: Finding refactoring activity is a non- trivial task Source code analysis? • Viable, but current approaches have precision and recall issues – Refactoring Miner: 63% precision – Ref-Finder: 35% precision and 24% recall 7

Slide 8

Slide 8 text

PROPOSED SOLUTION 8

Slide 9

Slide 9 text

RefDiff 9 • A refactoring detection approach – Employs a combination of heuristics based on static analysis and code similarity – 13 well-known refactoring types – TF-IDF based similarity index

Slide 10

Slide 10 text

RefDiff: Overview 10 Version before Version after Input

Slide 11

Slide 11 text

RefDiff: Overview 11 Version before Version after Source Code Analysis Types, methods, and fields }

Slide 12

Slide 12 text

RefDiff: Overview 12 Version before Version after Relationship Analysis Rename Extract Move

Slide 13

Slide 13 text

Relationships 13

Slide 14

Slide 14 text

Relationship Example: Rename Method 14

Slide 15

Slide 15 text

Relationship Example: Rename Method 15 names of mb and ma should be different

Slide 16

Slide 16 text

Relationship Example: Rename Method 16 mb and ma are in “the same” class

Slide 17

Slide 17 text

Relationship Example: Rename Method 17 the similarity index between mb and ma should be greater than a threshold

Slide 18

Slide 18 text

Computing Similarity 18 • Source code represented as a multiset (or bag) of tokens • Similarity index based on Information Retrieval techniques (TF-IDF)

Slide 19

Slide 19 text

Calibration of Thresholds 19 • Oracle of known refactorings in 10 commits of a public dataset (Silva et al., 2016) • Thresholds from 0.1 to 0.9 by 0.1 increments • We choose the value that optimize the F1 score

Slide 20

Slide 20 text

Calibration Results 20

Slide 21

Slide 21 text

EVALUATION 21

Slide 22

Slide 22 text

Evaluation: Precision and Recall 22 • Oracle of known refactorings applied by students – 7 open-source systems – 448 refactoring relationships • Compare RefDiff’s precison and recall with – Refactoring Miner – Refactoring Crawler – Ref-Finder

Slide 23

Slide 23 text

Evaluation: Precision and Recall 23

Slide 24

Slide 24 text

Conclusion 24 • RefDiff has better precision and recall than other approaches • Execution time is acceptable (1.96s per commit)

Slide 25

Slide 25 text

Future Work 25 • Extended evaluation of RefDiff using actual refactorings applied in open-source systems

Slide 26

Slide 26 text

THANK YOU! https://github.com/aserg-ufmg/RefDiff

Slide 27

Slide 27 text

Evaluation: Precision and Recall 27

Slide 28

Slide 28 text

Evaluation: Execution Time 28

Slide 29

Slide 29 text

Evaluation: Execution Time 29 • We analyzed each commit between January 1, 2017 and March 27, of 10 Java repositories – 1990 commits • We compared execution time with Refactoring Miner

Slide 30

Slide 30 text

Evaluation: Execution Time 30 Approach Avg. time (s) Total time(s) RefDiff 1.96 3,893 Ref. Miner 0.89 1,779

Slide 31

Slide 31 text

Computing Similarity: Example 31

Slide 32

Slide 32 text

Computing Similarity: Example 32

Slide 33

Slide 33 text

Computing Similarity: Example 33

Slide 34

Slide 34 text

Computing Similarity 34 token frequency in entity e inverse document frequency of the token in the collection weighted Jaccard coefficient