Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Master's Thesis - Usmi Mukherjee

Master's Thesis - Usmi Mukherjee

Complimenting Deficient Bug Reports With Missing Information Leveraging Neural Text Generation

Abstract:

Software bug reports often lack crucial information (e.g., steps to reproduce, expected behaviour), which makes bug resolution challenging. A recent study found that 78% of bug reports from open-source projects (e.g., Eclipse) contain less than 100 words each and thus require the developers to spend more time on bug resolution. According to an existing survey, 77% of 327 professional developers from major technology companies (e.g., Google, Meta) consider missing information a major problem and emphasize complementing them with useful information (e.g., environment configuration). In this thesis, we propose and evaluate two novel approaches that complement deficient bug reports with relevant information using Generative AI. In our first study, we propose --- BugMentor --- a novel approach that combines structured information retrieval and neural text generation (e.g., CodeT5) to generate appropriate answers to the follow-up questions from bug reports. Our approach identifies past, relevant bug reports to a given bug report, constructs the context and then leverages it to generate the answers. According to our evaluation, BugMentor generates good answers and outperforms three existing baselines significantly in terms of four appropriate metrics (e.g., BLEU, Semantic Similarity). We also conduct a developer study involving 10 participants where BugMentor’s answers were found to be more accurate, precise, concise and useful. In our second study, we propose --- BugEnricher --- a novel approach that enriches bug reports with meaningful explanations using neural text generation. We fine-tuned the T5 model on software-specific vocabulary (e.g., Stack Overflow tags) to generate explanations against software-specific terms and jargon, which has the potential to enrich a bug report. Our evaluation using three performance metrics shows that BugEnricher generates understandable to good explanations according to Google’s standards and outperforms two baselines from the literature. We also conduct a case study to demonstrate the benefit of our bug report enhancement and found that it was able to improve an existing technique in detecting textually dissimilar duplicate bug reports, which has been reported as a major challenge. Given the empirical evidence above, our approaches have strong potential to support bug resolution and bug report management.

Thesis URL : http://hdl.handle.net/10222/83339

Usmi Mukherjee

January 04, 2024
Tweet

Other Decks in Research

Transcript

  1. Complementing Deficient Bug Reports with Missing Information Leveraging Neural Text

    Generation Usmi Mukherjee MCS student Faculty of Computer Science, Dalhousie University, Canada Supervisor: Dr. Masud Rahman 18 December 2023 Usmi Mukherjee, Dalhousie University 2 RAISE Lab Intelligent Automation in Software EngineeRing
  2. Outline of the Talk 18 December 2023 Usmi Mukherjee, Dalhousie

    University 3 Research Problem P1 BugMentor P2 BugEnricher P3 Conclusion + Q&A P4
  3. Software Bugs and Their Impact 18 December 2023 Usmi Mukherjee,

    Dalhousie University 5 Providing radiation therapy to cancer patients, Therac-25 had malfunctions that resulted in 6 deaths. Who is accountable when technology causes harm? The Therac-25 had two prominent software errors, a failed microswitch, and a reduced number of safety features compared to earlier versions of the device. The problem of bugs in the software system causing errors in machines under certain conditions has been used as a cover for careless programming, lack of testing, and lack of safety features built into the system in the Therac-25 accident. After this incident, Leveson and Turner (1993) compiled public information from Atomic Energy Canada Limited (AECL), the Food and Drug Administration (FDA) , and various regulatory agencies and concluded that there was inadequate record keeping when the software was designed. The software was inadequately tested, and “patches” were used from earlier versions of the machine. Furthermore, AECL had great difficulty reproducing the conditions under which the issues were experienced in the clinics. later found that there was an inadequate reporting structure in the company, to follow up with reported accidents. Insufficient information in reported bugs could lead to non reproducibility IEEE Definition : Human-made mistakes that prevent the software from working as expected [1]
  4. Missing Information a Bug Report and a Follow-up Question 18

    December 2023 Usmi Mukherjee, Dalhousie University 6
  5. Domain-specific Terms in a Bug Report and Missing Information 18

    December 2023 Usmi Mukherjee, Dalhousie University 7
  6. Research Problem 4 January 2024 Usmi Mukherjee, Dalhousie University 8

    Answering follow-up question on bug report Explaining domain specific terms or jargon Missing information in bug reports Bug not reproduced or resolved in time Lack of understandability
  7. BugMentor Part 2 P1 P2 P3 P4 Answering Follow-up Questions

    from Bug Reports Leveraging Structured Information Retrieval with Neural Text Generation
  8. Motivating Example 18 December 2023 Usmi Mukherjee, Dalhousie University 10

    Question: Have you checked in nightly version too? Could you let us know your system details for replicating this issue (Xcode Version, OS details, lite c wheel)? Generated Answer: Adding two flags to Xcode’s ‘Other Linker Flags’ settings and modify the Podfile to use the nightly TensorFlow build, specifically ‘TensorFlowLiteSwift’ and ‘TensorFlowLiteSelectTfOps’ https://stackoverflow.com/questions/2125079/adding-linker-flags-in-xcode Closed without resolution in 17 days
  9. Related Work 18 December 2023 Usmi Mukherjee, Dalhousie University 11

    Imran et al., MSR 2021 - BugAutoQ Recommends follow-up questions Tian et al., ASE 2017 - APIBot Answers API related questions Xu et al., ASE 2017 - AnswerBot Answers technical, non factoidal questions Abdellatif et al., ASE 2017 - MSRBot Answers software development questions
  10. 18 December 2023 Usmi Mukherjee, Dalhousie University 13 Constructing the

    Corpus Programming Language Java Python C++ JavaScript GitHub Most Starred > = 500 issues Five Repositories Each Bug Report Labels : “crash”, “bug”, “ defect”, “needs more info” Bug Reports Comments Follow-up Questions Bug Reports Comments Candidate Answer 1 Candidate Answer 2 Candidate Answer 3 First Comment after the Question (not authored by bug reporter) First Comment Authored by the bug reporter Most Similar to Question using BM25 Data Pre-processing Noisy Elements Retain Text and Code Lemmatization
  11. 18 December 2023 Usmi Mukherjee, Dalhousie University 14 Capturing the

    Relevant Q&A Bug Report Corpus DOI = position of answer(i) / total number of answers(n)
  12. 18 December 2023 Usmi Mukherjee, Dalhousie University 15 Generating Relevant

    Answers Question Have you checked in nightly version too? Could you let us know your system details for replicating this issue (Xcode Version, OS details, lite c wheel)? Generated Answer Adding two flags to Xcode's `Other Linker Flags' settings and modify the Podfile to use the nightly TensorFlow build, specifically `TensorFlowLiteSwift' and `TensorFlowLiteSelectTfOps'.
  13. Ground Truth Construction 18 December 2023 Usmi Mukherjee, Dalhousie University

    17 550 Bug Reports 90 Bug Reports per Bucket Divide Data into 6 Buckets Each Bucket is Annotated by 3 Annotators Majority Voting and Conflict Resolution Randomly Sampled Held Out Dataset
  14. Evaluation Metrics 18 December 2023 Usmi Mukherjee, Dalhousie University 18

    Bi-Lingual Evaluation of Understudy BLEU Similarity between candidate text to a reference text based on the matching of their n-grams. Metric for Evaluation of Translation with Explicit ORdering METEOR Similarity between a candidate text and the reference text by sequentially applying exact match, stemmed match and wordnet-based synonym match Word Mover Distance WMD The minimum cost to transform candidate text into reference text by calculating the Euclidean distance between their word embeddings Semantic Similarity SS A candidate text is compared with the reference text based on their embeddings using cosine similarity, it has the highest correlation with human-evaluated similarity
  15. Research Questions 18 December 2023 Usmi Mukherjee, Dalhousie University 19

    RQ1: Performance of BugMentor How does our technique perform in answering follow-up questions in terms of different automatic evaluation metrics? RQ2: Comparison with Existing Baselines Can our technique outperform the existing baselines in terms of automatic evaluation metrics? RQ3: Role of different components in BugMentor How do different components impact the overall performance of BugMentor? RQ4: Evaluation of BugMentor using a developer study How accurate, precise, useful, and concise are the answers from BugMentor?
  16. RQ2: Comparison with Existing Baselines 4 January 2024 Usmi Mukherjee,

    Dalhousie University 22 Baseline Lucene Query: follow-up question Corpus: all candidate answers Baseline CodeT5 Question: follow-up question Context: given bug reports AnswerBot Query: follow-up question Corpus: all bug reports, candidate answers
  17. RQ4: Evaluation of BugMentor using a developer study 18 December

    2023 Usmi Mukherjee, Dalhousie University 25
  18. RQ4: Evaluation of BugMentor using a developer study – Manual

    Analysis 18 December 2023 Usmi Mukherjee, Dalhousie University 26
  19. Key Findings 18 December 2023 Usmi Mukherjee, Dalhousie University 27

    BugMentor's answers – understandable to good (e.g. BLEU score of 31.94 ) BugMentor outperforms all three baselines by significant margin (p-value = 0.010<0.016) More accurate, precise, concise, and useful compared to baseline – Developers (∼40% for accuracy)
  20. BugEnricher Part 3 P1 P2 P3 P4 Explaining Domain-specific Terms

    and Jargon from Bug Reports with Neural Machine Translation
  21. Motivating Example 18 December 2023 Usmi Mukherjee, Dalhousie University 29

    Fig. : An example of a bug report from BugZilla (ID #530801) Resolved in 1 year and 3 months
  22. Motivating Example 18 December 2023 Usmi Mukherjee, Dalhousie University 30

    Fig. : An example enriched bug report from BugZilla (ID #530801) Enriched Bug Report When I enabled annotation (It is used to describe an annotation object) based null analysis (It is a Java library for analyzing null data), Javadoc (It is documentation generated) hovers use BindingLinkedLabelComposer (It is for composing labels). In that context, Javadoc hover for a module(It is a unit of Java code) does not show the module name, because the BindingLinkedLabelComposer knows nothing about modules
  23. Related Work 18 December 2023 Usmi Mukherjee, Dalhousie University 31

    Zhang et al., ICPC 2017 Enriches bug report Dit et et al., RSSE 2008 Recommends relevant comments Xu et al., ASE 2017 - AnswerBot Answers technical, non factoidal questions Correa et al., APSEC 2013- Samekana Web links to external knowledge source
  24. Proposed Technique - BugEnricher 18 December 2023 Usmi Mukherjee, Dalhousie

    University 33 Dataset construction and pre-processing Vocabulary Construction 1 Data Cleaning 2 Tag: javafx-11 Explanation: The JavaFX platform enables developers to create client applications based on JavaSE that behave consistently across multiple platforms. Built on Java technology since JavaFX 2.0, it was part of the default JDK since JDK 1.8, but starting Java 11, JavaFX is offered as a component separate from the core JDK. Term: java.io Explanation: Provides for system input and output through data streams, serialization and the file system. Term: immutable Explanation: An object with a fixed value. Immutable objects include numbers, strings and tuples. Such an object cannot be altered. A new object has to be created if a different value has to be stored. They play an important role in places where a constant hash value is needed, for example, as a key in a dictionary. Remove HTML Tags, URLS Spellchecker using pyspellchecker Lemmatization e.g. querying -> query Stack Overflow Tags 1b API Documentation Glossary
  25. Proposed Technique - BugEnricher 18 December 2023 Usmi Mukherjee, Dalhousie

    University 34 Dataset Splitting and Model fine-tuning Data Splitting 3 Fine-tuning 4 Fine-tuned Model 5 Data Java – 28,760 Python – 21,365 Miscellaneous – 105,822 Data Splitting Training : 80 Testing: 10 Validation: 10 Discarding Duplicates Dataset - 141,567 Input: Domain-specific terms or jargon Target: Explanation T5 T5ForConditionalGeneration Example Domain-specific terms: module Explanation: It is a unit of Java code
  26. Enriching the bug reports Use case of BugEnricher – Explaining

    Domain specific term in a Bug Report 18 December 2023 Usmi Mukherjee, Dalhousie University 35 Fine-tuned Model Keyword Extraction Explanation Generation Bug Report Enriched Bug Report Keywords annotation, null-analysis, javadoc, module, BlindLinkedLabelComposer Keywords Explanations javadoc It is documentation generated BindingLinkedLabelComposer It is for composing labels annotation It is used to describe an annotation object null-analysis It is a Java library for analyzing null data module It is a unit of Java code Enriched Bug Report When I enabled annotation (It is used to describe an annotation object) based null analysis (It is a Java library for analyzing null data), Javadoc (It is documentation generated) hovers use BindingLinkedLabelComposer (It is for composing labels). In that context, Javadoc hover for a module(It is a unit of Java code) does not show the module name, because the BindingLinkedLabelComposer knows nothing about modules keywords
  27. Test Dataset Construction for Experiments 18 December 2023 Usmi Mukherjee,

    Dalhousie University 37 Test Vocabulary for Answering RQ1 and RQ2 • 2,876 Java • 2,136 Python • 10,582 Miscellaneous Bug Report Keywords for Answering RQ3 (Case study) • 92,854 Bug Reports from Eclipse, Firefox and Mobile • Discard Stopwords • Token splitting • Remove HTML tags, and URLs • Lowercase
  28. Evaluation Metrics 18 December 2023 Usmi Mukherjee, Dalhousie University 38

    Bi-Lingual Evaluation of Understudy BLEU Compares a candidate text to a reference text and determines how similar they are based on the matching of their n-grams. Metric for Evaluation of Translation with Explicit ORdering METEOR Similarity between a candidate text and the reference text by sequentially applying exact match, stemmed match and wordnet-based synonym match between the texts. Semantic Similarity SS A candidate text is compared with the reference text based on these embeddings using cosine- similarity, it has the highest correlation with human-evaluated similarity
  29. Research Questions 18 December 2023 Usmi Mukherjee, Dalhousie University 39

    RQ1: Performance of BugEnricher How does our technique perform in explaining domain-specific terms or jargon according to automatic evaluation metrics? RQ2: Comparison with Existing Baselines Can our technique outperform the existing baseline techniques in generating explanations to domain-specific terms or jargon? RQ3: Case Study - Performance of an Existing Duplicate Bug Report Detection Technique Does our enrichment of bug reports help improve an existing technique for duplicate bug report detection?
  30. RQ2: Comparison with Two Existing Baselines 4 January 2024 Usmi

    Mukherjee, Dalhousie University 41 Baseline T5 Input: Domain-specific terms or jargon AnswerBot Query: “What is” + Domain-specific terms or jargon Corpus: Dataset provided by the author
  31. RQ2: Comparison with Two Existing Baselines 18 December 2023 Usmi

    Mukherjee, Dalhousie University 42 Performance Gain BugEnricher_Java and Baseline_T5 72.12% Performance Gain BugEnricher_Java and AnswerBot 88.34%
  32. RQ3: Case Study - Performance of an Existing Duplicate Bug

    Report Detection Technique 18 December 2023 Usmi Mukherjee, Dalhousie University 43
  33. Key Findings 18 December 2023 Usmi Mukherjee, Dalhousie University 44

    BugEnricher’s explanations are understandable to good for domain-specific terms or jargon (e.g BLEU score of 28.85) BugEnricher outperforms both baselines (e.g. Performance Gain of 72.12% and 88.34% ) BugEnricher’s explanations offer complementary information, improves a duplicate bug report detection technique.
  34. Take Home Messages 18 December 2023 Usmi Mukherjee, Dalhousie University

    46 Answers to Follow-up Questions Explanation to domain- specific terms or jargon BugMentor BugEnricher Missing information in bug reports Bug not reproduced or resolved in time Lack of understandability BugMentor BugEnricher Understandable to good explanations Outperforms baselines Improves duplicate bug report detection Understandable to good answers to follow-up question Outperforms baselines More accurate, precise, concise, and useful Answers - developers
  35. 18 December 2023 Usmi Mukherjee, Dalhousie University 48 Questions RAISE

    Lab Intelligent Automation in Software EngineeRing Contact: [email protected]
  36. RQ3: Role of different components in BugMentor 18 December 2023

    Usmi Mukherjee, Dalhousie University 50 Back
  37. RQ3: Role of different components in BugMentor 4 January 2024

    Usmi Mukherjee, Dalhousie University 51 Back
  38. RQ1: Performance of BugMentor – Cross Project 18 December 2023

    Usmi Mukherjee, Dalhousie University 52 Back
  39. Software Bugs and Their Impact IEEE Definition : Human-made mistakes

    that prevent the software from working as expected [1] 4 January 2024 Usmi Mukherjee, Dalhousie University 53 https://www.it-cisq.org/the-cost-of-poor-quality-software-in-the-us-a-2022-report Back
  40. References [1] IEEE, “Ieee standard glossary of software engineering terminology,”

    IEEE Std 610.12-1990, pp. 1–84, 1990. doi: 10.1109/IEEESTD.1990.101064 [2] “Therac-25,” Ethics Unwrapped, https://ethicsunwrapped.utexas.edu/case-study/therac-25 (accessed Dec. 15, 2023). [3] M. M. Imran, A. Ciborowska, and K. Damevski, “Automatically selecting follow-up questions for deficient bug reports,” in 2021 IEEE/ACM 18th International Conference on Mining Software Repositories (MSR), IEEE, 2021, pp. 167–178. [4] Xu, B., Xing, Z., Xia, X., & Lo, D. (2017, October). AnswerBot: Automated generation of answer summary to developers' technical questions. In 2017 32nd IEEE/ACM international conference on automated software engineering (ASE) (pp. 706-716). IEEE. [5] R. K. Saha, M. Lease, S. Khurshid, and D. E. Perry, “Improving bug localization using structured information retrieval,” in 2013 28th IEEE/ACM International Conference on Automated Software Engineering (ASE), IEEE, 2013, pp. 345–355 4 January 2024 Usmi Mukherjee, Dalhousie University 54 Back