Accurate Method and Variable Tracking in Commit History
30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE'2022), Singapore, November 14–18, 2022
behind a snippet of code • Find the commits that introduced a bug • Find who are the knowledgeable peers on certain modules • Keep up with the code state evolution • Reverse engineer requirements from code • Apply changes from one branch to another 2 Mihai Codoban, Sruti Srinivasa Ragavan, Danny Dig, and Brian Bailey, Software History under the Lens: A Study on Why and How Developers Examine It (ICSME 2015)
2021 • Developers prefer source code history information at method/function level, rather than the file level • Current tools are unable to find the commit that introduced a method LaToza & Myers 2010 • “Where was this variable last changed?” • “When, how, by whom, and why was this code changed or inserted?” • “How this code changed over time?” 3 Felix Grund, Shaiful Alam Chowdhury, Nick Bradley, Braxton Hall, and Reid Holmes, CodeShovel: Constructing Method-Level Source Code Histories (ICSE 2021) Thomas D. LaToza and Brad A. Myers, Hard-to-Answer Questions about Code (PLATEAU 2010)
• CodeShovel [Grund et al. 2021] • FinerGit [Higo et al. 2020] & Historage [Hata et al. 2011] • Sunghun Kim et al. 2005 • Function, file, subsystem tracking at release/version level • Beagle (original analysis) [Godfrey and Zou 2005] • Type, method, field tracking at commit level • Tempura (Lee et al. 2015) • Hora et al. 2018 (change graph) 4
thresholds need calibration for projects with different characteristics • Partially refactoring-aware • Support method signature changes, method moves, file rename/moves • No support for Extract/Inline method → mismatching methods from which a significant part of their body (>75%) has been extracted to new methods • No support for local variable tracking 5
tracking over CodeShovel [Grund et al. 2021] 2. First to support variable tracking with 99.7% precision and 99.8% recall 3. We fix and extend Grund et al. oracle by adding the change history of 1345 variables 4. Evolution hooks to model the change history of methods extracted/inlined from/to the tracked method of interest 6
in a way that is not computationally expensive? Solution Partial and incremental commit analysis based on the location of the tracked program element RefactoringMiner supports 90 refactoring types with very high precision (>99%) and recall (>94%)
er signature omitting method’s body hashed value Step #4 Using RMiner to find the best matching method with signature changes Step #5a Using RMiner to check if er container was renamed or moved Step #5b Using RMiner to check if er itself was moved to another container Step #1 git log --follow filePath input 1. Git repository URL 2. Start commit SHA-1 3. File path 4. Program element name 5. Start line number For each commit r, in which filePath changed, locate e in parent commit p (ep ) If at any step ep is located, skip all subsequent steps and proceed with the next commit. If er is found as introduced the process terminates.
method tracking and how does it compare to that of CodeShovel? RQ2 : What is the accuracy of CodeTracker in variable tracking? RQ3 : How does the execution time of CodeTracker compare to that of CodeShovel? RQ4 : What is the execution time speedup of CodeTracker over the default operation mode of RefactoringMiner? 17
ICSE’2021 Distinguished Paper], which includes the change history of 200 methods from 20 open-source projects (10 methods from each project) • Corrected discrepancies due to incorrect matches of methods with methods extracted from their body (27 cases) • Added newly supported changes for annotations and documentation • Added the change history of 1345 variables declared within these 200 methods 18
mode • All modified/added files in child commit • All modified/removed files in parent commit • Compared with CodeTracker execution time (steps 2-5) • 3 times faster on median • 7 times faster on average • Despite the significant speedup only a small number of FPs/FNs is introduced, due to MoveMethod changes misreported as FileMove 22
similarity thresholds • High accuracy: >99% precision and recall • Fast: 3.35 seconds on median for the entire commit change history • Better than competitive tools (CodeShovel): +9% precision, +6% recall 24 https://github.com/jodavimehran/code-tracker • Future (current) work • We already extended the tool to support field and block (loops, try, if) tracking • We made a Chrome extension to show the change history on GitHub