RefDiff: Detecting Refactorings in Version Histories (MSR 2017)

RefDiff: Detecting Refactorings in Version Histories Danilo Silva, Marco Tulio
Valente Universidade Federal de Minas Gerais Belo Horizonte, Brazil

Introduction • Software components are in constant change • One
important kind of change is refactoring 2

Introduction • Knowledge of the refactoring operations applied is a
valuable information – Analyze software evolution – Study refactoring practice – Review and merge code 3

Problem: Finding refactoring activity is a non- trivial task 4

Problem: Finding refactoring activity is a non- trivial task Documentation?
• Refactorings are rarely documented 5

Problem: Finding refactoring activity is a non- trivial task Instrumenting
refactoring engines? • Refactorings are not always performed using automated support 6

Problem: Finding refactoring activity is a non- trivial task Source
code analysis? • Viable, but current approaches have precision and recall issues – Refactoring Miner: 63% precision – Ref-Finder: 35% precision and 24% recall 7

PROPOSED SOLUTION 8

RefDiff 9 • A refactoring detection approach – Employs a
combination of heuristics based on static analysis and code similarity – 13 well-known refactoring types – TF-IDF based similarity index

RefDiff: Overview 10 Version before Version after Input

RefDiff: Overview 11 Version before Version after Source Code Analysis
Types, methods, and fields }

RefDiff: Overview 12 Version before Version after Relationship Analysis Rename
Extract Move

Relationships 13

Relationship Example: Rename Method 14

Relationship Example: Rename Method 15 names of mb and ma
should be different

Relationship Example: Rename Method 16 mb and ma are in
“the same” class

Relationship Example: Rename Method 17 the similarity index between mb
and ma should be greater than a threshold

Computing Similarity 18 • Source code represented as a multiset
(or bag) of tokens • Similarity index based on Information Retrieval techniques (TF-IDF)

Calibration of Thresholds 19 • Oracle of known refactorings in
10 commits of a public dataset (Silva et al., 2016) • Thresholds from 0.1 to 0.9 by 0.1 increments • We choose the value that optimize the F1 score

Calibration Results 20

EVALUATION 21

Evaluation: Precision and Recall 22 • Oracle of known refactorings
applied by students – 7 open-source systems – 448 refactoring relationships • Compare RefDiff’s precison and recall with – Refactoring Miner – Refactoring Crawler – Ref-Finder

Evaluation: Precision and Recall 23

Conclusion 24 • RefDiff has better precision and recall than
other approaches • Execution time is acceptable (1.96s per commit)

Future Work 25 • Extended evaluation of RefDiff using actual
refactorings applied in open-source systems

THANK YOU! https://github.com/aserg-ufmg/RefDiff

Evaluation: Precision and Recall 27

Evaluation: Execution Time 28

Evaluation: Execution Time 29 • We analyzed each commit between
January 1, 2017 and March 27, of 10 Java repositories – 1990 commits • We compared execution time with Refactoring Miner

Evaluation: Execution Time 30 Approach Avg. time (s) Total time(s)
RefDiff 1.96 3,893 Ref. Miner 0.89 1,779

Computing Similarity: Example 31

Computing Similarity 34 token frequency in entity e inverse document
frequency of the token in the collection weighted Jaccard coefficient

RefDiff: Detecting Refactorings in Version Hist...

RefDiff: Detecting Refactorings in Version Histories (MSR 2017)

More Decks by ASERG, DCC, UFMG

Other Decks in Research

Featured

Transcript