Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Accurate Method and Variable Tracking in Commit History

Accurate Method and Variable Tracking in Commit History

30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE'2022), Singapore, November 14–18, 2022

Nikolaos Tsantalis

December 16, 2023
Tweet

More Decks by Nikolaos Tsantalis

Other Decks in Research

Transcript

  1. Developers frequently track code change history • Recover the rationale

    behind a snippet of code • Find the commits that introduced a bug • Find who are the knowledgeable peers on certain modules • Keep up with the code state evolution • Reverse engineer requirements from code • Apply changes from one branch to another 2 Mihai Codoban, Sruti Srinivasa Ragavan, Danny Dig, and Brian Bailey, Software History under the Lens: A Study on Why and How Developers Examine It (ICSME 2015)
  2. Blame and text diff are not sufficient Grund et al.

    2021 • Developers prefer source code history information at method/function level, rather than the file level • Current tools are unable to find the commit that introduced a method LaToza & Myers 2010 • “Where was this variable last changed?” • “When, how, by whom, and why was this code changed or inserted?” • “How this code changed over time?” 3 Felix Grund, Shaiful Alam Chowdhury, Nick Bradley, Braxton Hall, and Reid Holmes, CodeShovel: Constructing Method-Level Source Code Histories (ICSE 2021) Thomas D. LaToza and Brad A. Myers, Hard-to-Answer Questions about Code (PLATEAU 2010)
  3. Program element tracking tools • Function/Method tracking at commit level

    • CodeShovel [Grund et al. 2021] • FinerGit [Higo et al. 2020] & Historage [Hata et al. 2011] • Sunghun Kim et al. 2005 • Function, file, subsystem tracking at release/version level • Beagle (original analysis) [Godfrey and Zou 2005] • Type, method, field tracking at commit level • Tempura (Lee et al. 2015) • Hora et al. 2018 (change graph) 4
  4. Limitations of previous approaches • Dependence on similarity thresholds •

    thresholds need calibration for projects with different characteristics • Partially refactoring-aware • Support method signature changes, method moves, file rename/moves • No support for Extract/Inline method → mismatching methods from which a significant part of their body (>75%) has been extracted to new methods • No support for local variable tracking 5
  5. Contributions 1. We improve both precision and recall in method

    tracking over CodeShovel [Grund et al. 2021] 2. First to support variable tracking with 99.7% precision and 99.8% recall 3. We fix and extend Grund et al. oracle by adding the change history of 1345 variables 4. Evolution hooks to model the change history of methods extracted/inlined from/to the tracked method of interest 6
  6. 7 Challenge How can we take advantage of RefactoringMiner accuracy

    in a way that is not computationally expensive? Solution Partial and incremental commit analysis based on the location of the tracked program element RefactoringMiner supports 90 refactoring types with very high precision (>99%) and recall (>94%)
  7. Step #2 Using er signature as is Step #3 Using

    er signature omitting method’s body hashed value Step #4 Using RMiner to find the best matching method with signature changes Step #5a Using RMiner to check if er container was renamed or moved Step #5b Using RMiner to check if er itself was moved to another container Step #1 git log --follow filePath input 1. Git repository URL 2. Start commit SHA-1 3. File path 4. Program element name 5. Start line number For each commit r, in which filePath changed, locate e in parent commit p (ep ) If at any step ep is located, skip all subsequent steps and proceed with the next commit. If er is found as introduced the process terminates.
  8. g input 1. Git repository URL 2. Start commit SHA-1

    3. File path 4. Program element name 5. Start line number
  9. Step #2 Using er signature as is Partial model for

    child & parent commits including only the filePath source file
  10. Step #3 Using er signature omitting method’s body hashed value

    Partial model for child & parent commits including only the filePath source file
  11. Step #4 Using RMiner to find the best matching method

    with signature changes Partial model for child & parent commits including only the filePath source file
  12. Step #5a Using RMiner to check if er container was

    renamed or moved Add all modified & deleted files in parent commit + few modified files in child commit matching heuristics
  13. Step #5b Using RMiner to check if er itself was

    moved to another container s a Add all modified & deleted files in parent commit + few modified files in child commit matching heuristics
  14. o o If at any step ep is located, skip

    all subsequent steps and proceed with the next commit. If er is found as introduced the process terminates.
  15. Evaluation RQ1 : What is the accuracy of CodeTracker in

    method tracking and how does it compare to that of CodeShovel? RQ2 : What is the accuracy of CodeTracker in variable tracking? RQ3 : How does the execution time of CodeTracker compare to that of CodeShovel? RQ4 : What is the execution time speedup of CodeTracker over the default operation mode of RefactoringMiner? 17
  16. Oracle construction • Extended the CodeShovel oracle [Grund et al.,

    ICSE’2021 Distinguished Paper], which includes the change history of 200 methods from 20 open-source projects (10 methods from each project) • Corrected discrepancies due to incorrect matches of methods with methods extracted from their body (27 cases) • Added newly supported changes for annotations and documentation • Added the change history of 1345 variables declared within these 200 methods 18
  17. Method tracking precision & recall 19 Tool TP FP FN

    Precision Recall CodeShovel 3527 287 139 92.48 96.21 CodeTracker 3664 1 0 99.97 100 Tool TP FP FN Precision Recall CodeShovel 4327 440 270 90.77 94.13 CodeTracker 4594 6 3 99.87 99.93 Commit level Change level +7% +4% +9% +6%
  18. Variable tracking precision & recall 20 Tool TP FP FN

    Precision Recall CodeTracker 1971 3 1 99.85 99.95 Tool TP FP FN Precision Recall CodeTracker 2037 7 3 99.66 99.85 Commit level Change level
  19. Execution time • CodeTracker is 1.6 times slower • Median:

    3.35 seconds • Average: 5.5 seconds 21
  20. Speedup over RefactoringMiner • Executed RefactoringMiner with its default operation

    mode • All modified/added files in child commit • All modified/removed files in parent commit • Compared with CodeTracker execution time (steps 2-5) • 3 times faster on median • 7 times faster on average • Despite the significant speedup only a small number of FPs/FNs is introduced, due to MoveMethod changes misreported as FileMove 22
  21. Conclusions • CodeTracker: Fully refactoring-aware commit change history • No

    similarity thresholds • High accuracy: >99% precision and recall • Fast: 3.35 seconds on median for the entire commit change history • Better than competitive tools (CodeShovel): +9% precision, +6% recall 24 https://github.com/jodavimehran/code-tracker • Future (current) work • We already extended the tool to support field and block (loops, try, if) tracking • We made a Chrome extension to show the change history on GitHub
  22. 25