Slide 1

Slide 1 text

Accurate Method and Variable Tracking in Commit History November 14-16, 2022 Nikolaos Tsantalis Mehran Jodavi

Slide 2

Slide 2 text

Developers frequently track code change history • Recover the rationale behind a snippet of code • Find the commits that introduced a bug • Find who are the knowledgeable peers on certain modules • Keep up with the code state evolution • Reverse engineer requirements from code • Apply changes from one branch to another 2 Mihai Codoban, Sruti Srinivasa Ragavan, Danny Dig, and Brian Bailey, Software History under the Lens: A Study on Why and How Developers Examine It (ICSME 2015)

Slide 3

Slide 3 text

Blame and text diff are not sufficient Grund et al. 2021 • Developers prefer source code history information at method/function level, rather than the file level • Current tools are unable to find the commit that introduced a method LaToza & Myers 2010 • “Where was this variable last changed?” • “When, how, by whom, and why was this code changed or inserted?” • “How this code changed over time?” 3 Felix Grund, Shaiful Alam Chowdhury, Nick Bradley, Braxton Hall, and Reid Holmes, CodeShovel: Constructing Method-Level Source Code Histories (ICSE 2021) Thomas D. LaToza and Brad A. Myers, Hard-to-Answer Questions about Code (PLATEAU 2010)

Slide 4

Slide 4 text

Program element tracking tools • Function/Method tracking at commit level • CodeShovel [Grund et al. 2021] • FinerGit [Higo et al. 2020] & Historage [Hata et al. 2011] • Sunghun Kim et al. 2005 • Function, file, subsystem tracking at release/version level • Beagle (original analysis) [Godfrey and Zou 2005] • Type, method, field tracking at commit level • Tempura (Lee et al. 2015) • Hora et al. 2018 (change graph) 4

Slide 5

Slide 5 text

Limitations of previous approaches • Dependence on similarity thresholds • thresholds need calibration for projects with different characteristics • Partially refactoring-aware • Support method signature changes, method moves, file rename/moves • No support for Extract/Inline method → mismatching methods from which a significant part of their body (>75%) has been extracted to new methods • No support for local variable tracking 5

Slide 6

Slide 6 text

Contributions 1. We improve both precision and recall in method tracking over CodeShovel [Grund et al. 2021] 2. First to support variable tracking with 99.7% precision and 99.8% recall 3. We fix and extend Grund et al. oracle by adding the change history of 1345 variables 4. Evolution hooks to model the change history of methods extracted/inlined from/to the tracked method of interest 6

Slide 7

Slide 7 text

7 Challenge How can we take advantage of RefactoringMiner accuracy in a way that is not computationally expensive? Solution Partial and incremental commit analysis based on the location of the tracked program element RefactoringMiner supports 90 refactoring types with very high precision (>99%) and recall (>94%)

Slide 8

Slide 8 text

Step #2 Using er signature as is Step #3 Using er signature omitting method’s body hashed value Step #4 Using RMiner to find the best matching method with signature changes Step #5a Using RMiner to check if er container was renamed or moved Step #5b Using RMiner to check if er itself was moved to another container Step #1 git log --follow filePath input 1. Git repository URL 2. Start commit SHA-1 3. File path 4. Program element name 5. Start line number For each commit r, in which filePath changed, locate e in parent commit p (ep ) If at any step ep is located, skip all subsequent steps and proceed with the next commit. If er is found as introduced the process terminates.

Slide 9

Slide 9 text

g input 1. Git repository URL 2. Start commit SHA-1 3. File path 4. Program element name 5. Start line number

Slide 10

Slide 10 text

Step #1 git log --follow filePath 1

Slide 11

Slide 11 text

Step #2 Using er signature as is Partial model for child & parent commits including only the filePath source file

Slide 12

Slide 12 text

Step #3 Using er signature omitting method’s body hashed value Partial model for child & parent commits including only the filePath source file

Slide 13

Slide 13 text

Step #4 Using RMiner to find the best matching method with signature changes Partial model for child & parent commits including only the filePath source file

Slide 14

Slide 14 text

Step #5a Using RMiner to check if er container was renamed or moved Add all modified & deleted files in parent commit + few modified files in child commit matching heuristics

Slide 15

Slide 15 text

Step #5b Using RMiner to check if er itself was moved to another container s a Add all modified & deleted files in parent commit + few modified files in child commit matching heuristics

Slide 16

Slide 16 text

o o If at any step ep is located, skip all subsequent steps and proceed with the next commit. If er is found as introduced the process terminates.

Slide 17

Slide 17 text

Evaluation RQ1 : What is the accuracy of CodeTracker in method tracking and how does it compare to that of CodeShovel? RQ2 : What is the accuracy of CodeTracker in variable tracking? RQ3 : How does the execution time of CodeTracker compare to that of CodeShovel? RQ4 : What is the execution time speedup of CodeTracker over the default operation mode of RefactoringMiner? 17

Slide 18

Slide 18 text

Oracle construction • Extended the CodeShovel oracle [Grund et al., ICSE’2021 Distinguished Paper], which includes the change history of 200 methods from 20 open-source projects (10 methods from each project) • Corrected discrepancies due to incorrect matches of methods with methods extracted from their body (27 cases) • Added newly supported changes for annotations and documentation • Added the change history of 1345 variables declared within these 200 methods 18

Slide 19

Slide 19 text

Method tracking precision & recall 19 Tool TP FP FN Precision Recall CodeShovel 3527 287 139 92.48 96.21 CodeTracker 3664 1 0 99.97 100 Tool TP FP FN Precision Recall CodeShovel 4327 440 270 90.77 94.13 CodeTracker 4594 6 3 99.87 99.93 Commit level Change level +7% +4% +9% +6%

Slide 20

Slide 20 text

Variable tracking precision & recall 20 Tool TP FP FN Precision Recall CodeTracker 1971 3 1 99.85 99.95 Tool TP FP FN Precision Recall CodeTracker 2037 7 3 99.66 99.85 Commit level Change level

Slide 21

Slide 21 text

Execution time • CodeTracker is 1.6 times slower • Median: 3.35 seconds • Average: 5.5 seconds 21

Slide 22

Slide 22 text

Speedup over RefactoringMiner • Executed RefactoringMiner with its default operation mode • All modified/added files in child commit • All modified/removed files in parent commit • Compared with CodeTracker execution time (steps 2-5) • 3 times faster on median • 7 times faster on average • Despite the significant speedup only a small number of FPs/FNs is introduced, due to MoveMethod changes misreported as FileMove 22

Slide 23

Slide 23 text

Conclusions • CodeTracker: Fully refactoring-aware commit change history • No similarity thresholds • High accuracy: >99% precision and recall • Fast: 3.35 seconds on median for the entire commit change history • Better than competitive tools (CodeShovel): +9% precision, +6% recall 24 https://github.com/jodavimehran/code-tracker • Future (current) work • We already extended the tool to support field and block (loops, try, if) tracking • We made a Chrome extension to show the change history on GitHub

Slide 24

Slide 24 text

25