RefDiff: Detecting Refactorings in
Version Histories
Danilo Silva, Marco Tulio Valente
Universidade Federal de Minas Gerais
Belo Horizonte, Brazil
Slide 2
Slide 2 text
Introduction
• Software components are in constant change
• One important kind of change is refactoring
2
Slide 3
Slide 3 text
Introduction
• Knowledge of the refactoring operations
applied is a valuable information
– Analyze software evolution
– Study refactoring practice
– Review and merge code
3
Slide 4
Slide 4 text
Problem: Finding refactoring activity is a non-
trivial task
4
Slide 5
Slide 5 text
Problem: Finding refactoring activity is a non-
trivial task
Documentation?
• Refactorings are rarely documented
5
Slide 6
Slide 6 text
Problem: Finding refactoring activity is a non-
trivial task
Instrumenting refactoring engines?
• Refactorings are not always performed using
automated support
6
Slide 7
Slide 7 text
Problem: Finding refactoring activity is a non-
trivial task
Source code analysis?
• Viable, but current approaches have precision
and recall issues
– Refactoring Miner: 63% precision
– Ref-Finder: 35% precision and 24% recall
7
Slide 8
Slide 8 text
PROPOSED SOLUTION
8
Slide 9
Slide 9 text
RefDiff
9
• A refactoring detection approach
– Employs a combination of heuristics based on
static analysis and code similarity
– 13 well-known refactoring types
– TF-IDF based similarity index
Slide 10
Slide 10 text
RefDiff: Overview
10
Version
before
Version
after
Input
Slide 11
Slide 11 text
RefDiff: Overview
11
Version
before
Version
after
Source Code
Analysis
Types,
methods,
and fields
}
Slide 12
Slide 12 text
RefDiff: Overview
12
Version
before
Version
after
Relationship
Analysis
Rename
Extract
Move
Slide 13
Slide 13 text
Relationships
13
Slide 14
Slide 14 text
Relationship Example: Rename Method
14
Slide 15
Slide 15 text
Relationship Example: Rename Method
15
names of mb
and ma
should be different
Slide 16
Slide 16 text
Relationship Example: Rename Method
16
mb
and ma
are in “the same” class
Slide 17
Slide 17 text
Relationship Example: Rename Method
17
the similarity index between mb
and ma
should be greater than a threshold
Slide 18
Slide 18 text
Computing Similarity
18
• Source code represented as a multiset (or bag)
of tokens
• Similarity index based on Information
Retrieval techniques (TF-IDF)
Slide 19
Slide 19 text
Calibration of Thresholds
19
• Oracle of known refactorings in 10 commits of
a public dataset (Silva et al., 2016)
• Thresholds from 0.1 to 0.9 by 0.1 increments
• We choose the value that optimize the F1
score
Slide 20
Slide 20 text
Calibration Results
20
Slide 21
Slide 21 text
EVALUATION
21
Slide 22
Slide 22 text
Evaluation: Precision and Recall
22
• Oracle of known refactorings applied by
students
– 7 open-source systems
– 448 refactoring relationships
• Compare RefDiff’s precison and recall with
– Refactoring Miner
– Refactoring Crawler
– Ref-Finder
Slide 23
Slide 23 text
Evaluation: Precision and Recall
23
Slide 24
Slide 24 text
Conclusion
24
• RefDiff has better precision and recall than
other approaches
• Execution time is acceptable (1.96s per commit)
Slide 25
Slide 25 text
Future Work
25
• Extended evaluation of RefDiff using
actual refactorings applied in open-source
systems
Slide 26
Slide 26 text
THANK YOU!
https://github.com/aserg-ufmg/RefDiff
Slide 27
Slide 27 text
Evaluation: Precision and Recall
27
Slide 28
Slide 28 text
Evaluation: Execution Time
28
Slide 29
Slide 29 text
Evaluation: Execution Time
29
• We analyzed each commit between January 1,
2017 and March 27, of 10 Java repositories
– 1990 commits
• We compared execution time with Refactoring
Miner
Slide 30
Slide 30 text
Evaluation: Execution Time
30
Approach Avg. time (s) Total time(s)
RefDiff 1.96 3,893
Ref. Miner 0.89 1,779
Slide 31
Slide 31 text
Computing Similarity: Example
31
Slide 32
Slide 32 text
Computing Similarity: Example
32
Slide 33
Slide 33 text
Computing Similarity: Example
33
Slide 34
Slide 34 text
Computing Similarity
34
token frequency in entity e inverse document frequency of
the token in the collection
weighted Jaccard coefficient