Slide 1

Slide 1 text

Source code diff revolution

Slide 2

Slide 2 text

How much time a developer spends on coding and code reviewing per day?

Slide 3

Slide 3 text

Global Code Time Report (250K+ developers) https://www.software.com/reports/code-time-report 41 minutes of code reviewing per day

Slide 4

Slide 4 text

How are developers reviewing code?

Slide 5

Slide 5 text

No content

Slide 6

Slide 6 text

No content

Slide 7

Slide 7 text

AST Diff to the rescue • GumTree (Falleri et al., 2014) • First phase: Top-down AST matching to find the largest identical subtrees iteratively (AST hash value based on node label and value). • Second phase: matching previously unmatched AST, having a fair amount of their children matched (dice function, the ratio of common descendants between two nodes must be greater or equal to 0.5).

Slide 8

Slide 8 text

GumTree clones • MTDiff (Dotzler & Philippsen, 2016): introduces 5 optimizations to improve the accuracy of the generated edit script, specifically for the Move actions, which make the edit scripts shorter. • IJM - Iterative Java Matcher (Frick et al., 2018): Partial matching, application of GumTree to selected parts of the source code (import declarations, methods with same signature). • All papers focus their evaluation on which tool generates the shorter edit script.

Slide 9

Slide 9 text

Is shorter (edit script) better? Fan et al. 2021 “GumTree, MTDiff and IJM generate inaccurate mappings for 20%–29%, 25%–36% and 21%– 30% of the file revisions, respectively. Our experimental results show that state-of-the-art AST mapping algorithms still need improvements.”

Slide 10

Slide 10 text

Language agnostic Limitation #1

Slide 11

Slide 11 text

No content

Slide 12

Slide 12 text

GumTree Simple renamed new method

Slide 13

Slide 13 text

RefactoringMiner call to extracted method

Slide 14

Slide 14 text

No support for multi-mappings Limitation #2

Slide 15

Slide 15 text

No content

Slide 16

Slide 16 text

GumTree Simple

Slide 17

Slide 17 text

GumTree Greedy

Slide 18

Slide 18 text

RefactoringMiner calls to extracted method

Slide 19

Slide 19 text

How common are multi-mappings?

Slide 20

Slide 20 text

No content

Slide 21

Slide 21 text

No content

Slide 22

Slide 22 text

No content

Slide 23

Slide 23 text

No content

Slide 24

Slide 24 text

No content

Slide 25

Slide 25 text

No content

Slide 26

Slide 26 text

Semantic ignorance Limitation #3

Slide 27

Slide 27 text

GumTree Greedy Type matched with variable Variable matched with method call Variable matched with lambda parameter

Slide 28

Slide 28 text

RefactoringMiner

Slide 29

Slide 29 text

Matching only nodes of same AST type Limitation #4

Slide 30

Slide 30 text

No content

Slide 31

Slide 31 text

No content

Slide 32

Slide 32 text

Refactoring un-awareness Limitation #5

Slide 33

Slide 33 text

GumTree Simple RefactoringMiner Rename object to item Extract variable itemKey object matched with itemKey

Slide 34

Slide 34 text

GumTree Greedy RefactoringMiner Extract variable onResource Extract variable offResource method call matched with type

Slide 35

Slide 35 text

No support for commit-level analysis Limitation #6

Slide 36

Slide 36 text

GumTree Greedy

Slide 37

Slide 37 text

RefactoringMiner

Slide 38

Slide 38 text

Approach

Slide 39

Slide 39 text

remaining unmatched composites remaining unmatched leaves remaining unmatched leaves Leaf statement mapping method #1 method #2 Round 1: Iden�cal + same depth (iso-structural control flow) Round 2: Iden�cal + different depth Round 3: Non-iden�cal Candidate Sor�ng Composite statement mapping Round 1: Iden�cal + different depth + # mapped children ≥ 1 Round 2: Non-iden�cal + # mapped children ≥ 1 Candidate Sor�ng Inexact leaf mappings unmatched leaves from #1 and #2 Inexact composite mappings unmatched composites from #1 and #2 added methods Leaf + composite statement mapping unmatched/inexact leaves + composites from #1 matched with added method removed methods Leaf + composite statement mapping unmatched/inexact leaves + composites from #2 matched with removed method Extract Method detec�on Inline Method detec�on Mapping op�miza�on 3.1 3.2 3.5 3.4 3.4 3.3 3.3

Slide 40

Slide 40 text

Improvements over RefactoringMiner 2.0 • Sorting criteria for leaf statement mappings • Sorting criteria for composite statement mappings • Multi-mapping support for duplicated code moved out of or moved into conditionals • Statement mapping scope based on call sites

Slide 41

Slide 41 text

tachyon/worker/block/allocator/MaxFreeAllocator.java @Override public TempBlockMeta allocateBlock(long userId, long blockId, long blockSize BlockStoreLocation location) throws IOException { StorageDir candidateDir = null; long maxFreeBytes = blockSize; if (location.equals(BlockStoreLocation.anyTier())) { for (StorageTier tier : mMetaManager.getTiers()) { for (StorageDir dir : tier.getStorageDirs()) { if (dir.getAvailableBytes() >= maxFreeBytes) { maxFreeBytes = dir.getAvailableBytes(); candidateDir = dir; } } } } else if (location.equals(BlockStoreLocation.anyDirInTier(location.tierAli StorageTier tier = mMetaManager.getTier(location.tierAlias()); for (StorageDir dir : tier.getStorageDirs()) { if (dir.getAvailableBytes() >= maxFreeBytes) { maxFreeBytes = dir.getAvailableBytes(); candidateDir = dir; } } } return candidateDir != null ? new TempBlockMeta(userId, blockId, blockSize, candidateDir) : null; } tachyon/worker/block/allocator/MaxFreeAllocator.java @Override public TempBlockMeta allocateBlock(long userId, long blockId, long blockSize BlockStoreLocation location) throws IOException { StorageDir candidateDir = null; if (location.equals(BlockStoreLocation.anyTier())) { for (StorageTier tier : mMetaManager.getTiers()) { candidateDir = getCandidateDirInTier(tier, blockSize); if (candidateDir != null) { return new TempBlockMeta(userId, blockId, blockSize, candidateDir); } } } else if (location.equals(BlockStoreLocation.anyDirInTier(location.tierAli StorageTier tier = mMetaManager.getTier(location.tierAlias()); candidateDir = getCandidateDirInTier(tier, blockSize); } return candidateDir != null ? new TempBlockMeta(userId, blockId, blockSize, candidateDir) : null; } private StorageDir getCandidateDirInTier(StorageTier tier, long blockSize) { StorageDir candidateDir = null; long maxFreeBytes = blockSize - 1; for (StorageDir dir : tier.getStorageDirs()) { if (dir.getAvailableBytes() > maxFreeBytes) { maxFreeBytes = dir.getAvailableBytes(); candidateDir = dir; } } return candidateDir; } Call to the extracted method Moved to the extracted method Parent block mapping

Slide 42

Slide 42 text

GumTree minHeight hyperparameter • minHeight: length of the longest path from one leaf to the root of the subtree • GumTree default threshold = 2 • Why? Avoids matching remaining leaf expressions with height 1 (e.g., SimpleName nodes), which coincidentally have the same value. • Since we give as input a pair of matched statements, we configure minHeight = 1

Slide 43

Slide 43 text

KubernetesListBuilder builder = new KubernetesListBuilder() .withLivenessProbe(getLivenessProbe()) .endContainer() .withVolumes(getVolumes()) .endSpec() .endTemplate() .endSpec() .endReplicationControllerItem(); KubernetesListBuilder builder = new KubernetesListBuilder() .withLivenessProbe(getLivenessProbe()) .withReadinessProbe(getReadinessProbe()) .endContainer() .withVolumes(getVolumes()) .endSpec() .endTemplate() .endSpec() .endReplicationControllerItem(); RefactoringMiner KubernetesListBuilder builder = new KubernetesListBuilder() .withLivenessProbe(getLivenessProbe()) .endContainer() .withVolumes(getVolumes()) .endSpec() .endTemplate() .endSpec() .endReplicationControllerItem(); KubernetesListBuilder builder = new KubernetesListBuilder() .withLivenessProbe(getLivenessProbe()) .withReadinessProbe(getReadinessProbe()) .endContainer() .withVolumes(getVolumes()) .endSpec() .endTemplate() .endSpec() .endReplicationControllerItem(); GT 3.0 simple KubernetesListBuilder builder = new KubernetesListBuilder() .withLivenessProbe(getLivenessProbe()) .endContainer() .withVolumes(getVolumes()) .endSpec() .endTemplate() .endSpec() .endReplicationControllerItem(); KubernetesListBuilder builder = new KubernetesListBuilder() .withLivenessProbe(getLivenessProbe()) .withReadinessProbe( getReadinessProbe()) .endContainer() .withVolumes(getVolumes()) .endSpec() .endTemplate() .endSpec() .endReplicationControllerItem(); GT 3.0 greedy

Slide 44

Slide 44 text

Evaluation results

Slide 45

Slide 45 text

AST Diff benchmark • Process: 1. Run all ASTDiff tools 2. Manually validate the diffs 3. Construct the “perfect” diff • Datasets: • 800 bug fixings commits from Defects4J • 187 refactoring commits from Refactoring Oracle

Slide 46

Slide 46 text

No content

Slide 47

Slide 47 text

No content

Slide 48

Slide 48 text

Novel AST Diff Quality Metrics • Statement (program element) mapping accuracy: • True Positive: a mapping given by a tool that exists in the benchmark • False Positive: a mapping given by a tool does not exist in the benchmark • False Negative: a mapping that exists in the benchmark, but was not reported by a tool • Semantically incompatible mappings: • M = (m1 , m2 ) returned by a tool • m1 and m2 have the same AST type • The parents of m1 and m2 have a different AST type • M is not included in the benchmark

Slide 49

Slide 49 text

Accuracy – Defects4J +0.5-1% F-measure +1-3% F-measure

Slide 50

Slide 50 text

Accuracy – Defects4J +0.2-1.5% F-measure

Slide 51

Slide 51 text

Accuracy – Refactoring oracle +11-12% F-measure +5-7% F-measure

Slide 52

Slide 52 text

Accuracy – Refactoring oracle +13-16% F-measure +9-12% F-measure

Slide 53

Slide 53 text

Accuracy – Refactoring oracle +17-31% F-measure +1-17% F-measure

Slide 54

Slide 54 text

Execution time – Defects4J 3 times slower on median from GTS 4 times slower on average from GTS

Slide 55

Slide 55 text

Execution time – Refactoring oracle 5 times slower on median from GTS 8 times slower on average from GTS None of the other tools supports inter-file mappings

Slide 56

Slide 56 text

Conclusions • RefactoringMiner has the best precision and recall in both benchmarks. • The accuracy improvements are more evident in the Refactoring benchmark. • GumTree 3.0 (simple) has better precision and recall than GumTree 3.0 (greedy), when considering sub-expression mappings. • RefactoringMiner and IJM excel in matching program elements (i.e., method, field declarations) accurately. • GumTree (greedy) and MTDiff generate the largest numbers of semantically incompatible mappings. • RefactoringMiner’s execution time is in the same order of magnitude with that of the faster tools.

Slide 57

Slide 57 text

https://github.com/tsantalis/RefactoringMiner https://github.com/pouryafard75/DiffBenchmark

Slide 58

Slide 58 text

No content