Slide 1

Slide 1 text

Source code diff revolution

Slide 2

Slide 2 text

How are developers reviewing code?

Slide 3

Slide 3 text

A screenshot of a computer Description automatically generated

Slide 4

Slide 4 text

Eugene Myers Language independent Super-fast and scalable Does not handle well moves

Slide 5

Slide 5 text

Abstract Syntax Tree diff Fine-grained diff between AST nodes (not just lines) Supports moves and updates (not just additions and deletions) Still has limitations (coming up soon…)

Slide 6

Slide 6 text

A screenshot of a computer Description automatically generated

Slide 7

Slide 7 text

2007 Change Distiller Fluri et al. 2014 GumTree Falleri et al. 2016 MTDiff Dotzler et al. 2018 IJM Frick et al. 2023 iASTMapper Zhang et al. 2024 RMiner 3 Alikhanifard et al. Language aware Partial matching Language independent Largest identical subtrees Language independent Move action optimizations Tree Matching Statement Mapping

Slide 8

Slide 8 text

No support for multi-mappings Limitation #1 “A given node can only belong to one mapping”

Slide 9

Slide 9 text

No content

Slide 10

Slide 10 text

GumTree Simple

Slide 11

Slide 11 text

GumTree Greedy

Slide 12

Slide 12 text

RefactoringMiner calls to extracted method

Slide 13

Slide 13 text

Incorrect matching of program declarations Limitation #2 “The algorithm is independent of any language specificity”

Slide 14

Slide 14 text

No content

Slide 15

Slide 15 text

GumTree Simple renamed new method

Slide 16

Slide 16 text

RefactoringMiner call to extracted method

Slide 17

Slide 17 text

Semantic ignorance Limitation #3 “Mappings involve two nodes with identical labels”

Slide 18

Slide 18 text

GumTree Greedy Type matched with variable Variable matched with method call Variable matched with lambda parameter For body block matched with method body block

Slide 19

Slide 19 text

RefactoringMiner

Slide 20

Slide 20 text

Refactoring un-awareness Limitation #4

Slide 21

Slide 21 text

GumTree Simple RefactoringMiner Rename object to item Extract variable itemKey object matched with itemKey object matched with item

Slide 22

Slide 22 text

No support for commit-level analysis Limitation #5

Slide 23

Slide 23 text

GumTree Greedy

Slide 24

Slide 24 text

RefactoringMiner Pulled up to superclass AbstractDQLPlanNode Moved from class NestedLoop

Slide 25

Slide 25 text

Poor evaluation standards Limitation #6

Slide 26

Slide 26 text

Is shorter (edit script) better?

Slide 27

Slide 27 text

The path of Virtue or Vice Bench marks shorter edit script

Slide 28

Slide 28 text

AST Diff benchmark • Process (6 months): 1. Run all ASTDiff tools (GumTree 3.0, GumTree 2.1, IJM, MTDiff, RMiner) 2. Manually validate the diffs 3. Construct the “perfect” diff • Datasets: • 800 bug fixings commits from Defects4J • 187 refactoring commits from Refactoring Oracle

Slide 29

Slide 29 text

No content

Slide 30

Slide 30 text

No content

Slide 31

Slide 31 text

Approach

Slide 32

Slide 32 text

RefactoringMiner Statement mappings Program declaration mappings Import declaration mappings Refactoring mappings based on mechanics Tree Matcher Tree Matcher Overwrite conflicting mappings AST mappings AST mappings Final AST mappings Edit script version1 version2

Slide 33

Slide 33 text

Evaluation results

Slide 34

Slide 34 text

AST mapping accuracy dataset RMiner 3 Precision Recall GumTree greedy Precision Recall GumTree simple Precision Recall iASTMapper Precision Recall Defects4J 99.7 99.3 97.5 93.1 98.4 97.8 98.5 99 Refactoring 99.6 99.2 84.1 70.2 86.7 72.4 91.8 79.2 Overall 99.7 99.3 93.8 86.1 95.2 90 96.7 92.9 1. RMiner 3.0 99.5% 2. iASTMapper 94.8% 3. GumTree simple 92.6% 4. GumTree greedy 89.8% Tree Matching Statement Mapping Ranking based on F-score ±1-6% ±8-29% 99.4% 85.0% 78.9% 76.5% Refactoring only

Slide 35

Slide 35 text

How can I use your tool? 1. dependency 2. Command line tool 3. Docker image 4. git rmd APIs: 1. With a commit of a locally cloned git repository 2. With a commit fetched directly from GitHub 3. With the files changed in a GitHub Pull Request 4. With two directories https://github.com/tsantalis/RefactoringMiner

Slide 36

Slide 36 text

No content

Slide 37

Slide 37 text

File split to multiple files

Slide 38

Slide 38 text

Code from different files merged to a single file

Slide 39

Slide 39 text

https://github.com/tsantalis/RefactoringMiner https://github.com/pouryafard75/DiffBenchmark