Master's Thesis - Usmi Mukherjee

Master’s Thesis Defense

Complementing Deficient Bug Reports with Missing Information Leveraging Neural Text
Generation Usmi Mukherjee MCS student Faculty of Computer Science, Dalhousie University, Canada Supervisor: Dr. Masud Rahman 18 December 2023 Usmi Mukherjee, Dalhousie University 2 RAISE Lab Intelligent Automation in Software EngineeRing

Outline of the Talk 18 December 2023 Usmi Mukherjee, Dalhousie
University 3 Research Problem P1 BugMentor P2 BugEnricher P3 Conclusion + Q&A P4

Research Problem Part 1 P1 P2 P3 P4

Software Bugs and Their Impact 18 December 2023 Usmi Mukherjee,
Dalhousie University 5 Providing radiation therapy to cancer patients, Therac-25 had malfunctions that resulted in 6 deaths. Who is accountable when technology causes harm? The Therac-25 had two prominent software errors, a failed microswitch, and a reduced number of safety features compared to earlier versions of the device. The problem of bugs in the software system causing errors in machines under certain conditions has been used as a cover for careless programming, lack of testing, and lack of safety features built into the system in the Therac-25 accident. After this incident, Leveson and Turner (1993) compiled public information from Atomic Energy Canada Limited (AECL), the Food and Drug Administration (FDA) , and various regulatory agencies and concluded that there was inadequate record keeping when the software was designed. The software was inadequately tested, and “patches” were used from earlier versions of the machine. Furthermore, AECL had great difficulty reproducing the conditions under which the issues were experienced in the clinics. later found that there was an inadequate reporting structure in the company, to follow up with reported accidents. Insufficient information in reported bugs could lead to non reproducibility IEEE Definition : Human-made mistakes that prevent the software from working as expected [1]

Missing Information a Bug Report and a Follow-up Question 18
December 2023 Usmi Mukherjee, Dalhousie University 6

Domain-specific Terms in a Bug Report and Missing Information 18
December 2023 Usmi Mukherjee, Dalhousie University 7

Research Problem 4 January 2024 Usmi Mukherjee, Dalhousie University 8
Answering follow-up question on bug report Explaining domain specific terms or jargon Missing information in bug reports Bug not reproduced or resolved in time Lack of understandability

BugMentor Part 2 P1 P2 P3 P4 Answering Follow-up Questions
from Bug Reports Leveraging Structured Information Retrieval with Neural Text Generation

Motivating Example 18 December 2023 Usmi Mukherjee, Dalhousie University 10
Question: Have you checked in nightly version too? Could you let us know your system details for replicating this issue (Xcode Version, OS details, lite c wheel)? Generated Answer: Adding two flags to Xcode’s ‘Other Linker Flags’ settings and modify the Podfile to use the nightly TensorFlow build, specifically ‘TensorFlowLiteSwift’ and ‘TensorFlowLiteSelectTfOps’ https://stackoverflow.com/questions/2125079/adding-linker-flags-in-xcode Closed without resolution in 17 days

Related Work 18 December 2023 Usmi Mukherjee, Dalhousie University 11
Imran et al., MSR 2021 - BugAutoQ Recommends follow-up questions Tian et al., ASE 2017 - APIBot Answers API related questions Xu et al., ASE 2017 - AnswerBot Answers technical, non factoidal questions Abdellatif et al., ASE 2017 - MSRBot Answers software development questions

Proposed Technique - BugMentor 12 Part 2

18 December 2023 Usmi Mukherjee, Dalhousie University 13 Constructing the
Corpus Programming Language Java Python C++ JavaScript GitHub Most Starred > = 500 issues Five Repositories Each Bug Report Labels : “crash”, “bug”, “ defect”, “needs more info” Bug Reports Comments Follow-up Questions Bug Reports Comments Candidate Answer 1 Candidate Answer 2 Candidate Answer 3 First Comment after the Question (not authored by bug reporter) First Comment Authored by the bug reporter Most Similar to Question using BM25 Data Pre-processing Noisy Elements Retain Text and Code Lemmatization

18 December 2023 Usmi Mukherjee, Dalhousie University 14 Capturing the
Relevant Q&A Bug Report Corpus DOI = position of answer(i) / total number of answers(n)

18 December 2023 Usmi Mukherjee, Dalhousie University 15 Generating Relevant
Answers Question Have you checked in nightly version too? Could you let us know your system details for replicating this issue (Xcode Version, OS details, lite c wheel)? Generated Answer Adding two flags to Xcode's `Other Linker Flags' settings and modify the Podfile to use the nightly TensorFlow build, specifically `TensorFlowLiteSwift' and `TensorFlowLiteSelectTfOps'.

Experiment 16

Ground Truth Construction 18 December 2023 Usmi Mukherjee, Dalhousie University
17 550 Bug Reports 90 Bug Reports per Bucket Divide Data into 6 Buckets Each Bucket is Annotated by 3 Annotators Majority Voting and Conflict Resolution Randomly Sampled Held Out Dataset

Evaluation Metrics 18 December 2023 Usmi Mukherjee, Dalhousie University 18
Bi-Lingual Evaluation of Understudy BLEU Similarity between candidate text to a reference text based on the matching of their n-grams. Metric for Evaluation of Translation with Explicit ORdering METEOR Similarity between a candidate text and the reference text by sequentially applying exact match, stemmed match and wordnet-based synonym match Word Mover Distance WMD The minimum cost to transform candidate text into reference text by calculating the Euclidean distance between their word embeddings Semantic Similarity SS A candidate text is compared with the reference text based on their embeddings using cosine similarity, it has the highest correlation with human-evaluated similarity

Research Questions 18 December 2023 Usmi Mukherjee, Dalhousie University 19
RQ1: Performance of BugMentor How does our technique perform in answering follow-up questions in terms of different automatic evaluation metrics? RQ2: Comparison with Existing Baselines Can our technique outperform the existing baselines in terms of automatic evaluation metrics? RQ3: Role of different components in BugMentor How do different components impact the overall performance of BugMentor? RQ4: Evaluation of BugMentor using a developer study How accurate, precise, useful, and concise are the answers from BugMentor?

RQ1: Performance of BugMentor 18 December 2023 Usmi Mukherjee, Dalhousie
University 20

RQ1: Performance of BugMentor 4 January 2024 Usmi Mukherjee, Dalhousie
University 21

RQ2: Comparison with Existing Baselines 4 January 2024 Usmi Mukherjee,
Dalhousie University 22 Baseline Lucene Query: follow-up question Corpus: all candidate answers Baseline CodeT5 Question: follow-up question Context: given bug reports AnswerBot Query: follow-up question Corpus: all bug reports, candidate answers

RQ2: Comparison with Existing Baselines 18 December 2023 Usmi Mukherjee,
Dalhousie University 23

RQ2: Comparison with Existing Baselines 4 January 2024 Usmi Mukherjee,
Dalhousie University 24

RQ4: Evaluation of BugMentor using a developer study 18 December
2023 Usmi Mukherjee, Dalhousie University 25

RQ4: Evaluation of BugMentor using a developer study – Manual
Analysis 18 December 2023 Usmi Mukherjee, Dalhousie University 26

Key Findings 18 December 2023 Usmi Mukherjee, Dalhousie University 27
BugMentor's answers – understandable to good (e.g. BLEU score of 31.94 ) BugMentor outperforms all three baselines by significant margin (p-value = 0.010<0.016) More accurate, precise, concise, and useful compared to baseline – Developers (∼40% for accuracy)

BugEnricher Part 3 P1 P2 P3 P4 Explaining Domain-specific Terms
and Jargon from Bug Reports with Neural Machine Translation

Fig. : An example of a bug report from BugZilla (ID #530801) Resolved in 1 year and 3 months

Fig. : An example enriched bug report from BugZilla (ID #530801) Enriched Bug Report When I enabled annotation (It is used to describe an annotation object) based null analysis (It is a Java library for analyzing null data), Javadoc (It is documentation generated) hovers use BindingLinkedLabelComposer (It is for composing labels). In that context, Javadoc hover for a module(It is a unit of Java code) does not show the module name, because the BindingLinkedLabelComposer knows nothing about modules

Related Work 18 December 2023 Usmi Mukherjee, Dalhousie University 31
Zhang et al., ICPC 2017 Enriches bug report Dit et et al., RSSE 2008 Recommends relevant comments Xu et al., ASE 2017 - AnswerBot Answers technical, non factoidal questions Correa et al., APSEC 2013- Samekana Web links to external knowledge source

Proposed Technique - BugEnricher 32

Proposed Technique - BugEnricher 18 December 2023 Usmi Mukherjee, Dalhousie
University 33 Dataset construction and pre-processing Vocabulary Construction 1 Data Cleaning 2 Tag: javafx-11 Explanation: The JavaFX platform enables developers to create client applications based on JavaSE that behave consistently across multiple platforms. Built on Java technology since JavaFX 2.0, it was part of the default JDK since JDK 1.8, but starting Java 11, JavaFX is offered as a component separate from the core JDK. Term: java.io Explanation: Provides for system input and output through data streams, serialization and the file system. Term: immutable Explanation: An object with a fixed value. Immutable objects include numbers, strings and tuples. Such an object cannot be altered. A new object has to be created if a different value has to be stored. They play an important role in places where a constant hash value is needed, for example, as a key in a dictionary. Remove HTML Tags, URLS Spellchecker using pyspellchecker Lemmatization e.g. querying -> query Stack Overflow Tags 1b API Documentation Glossary

Proposed Technique - BugEnricher 18 December 2023 Usmi Mukherjee, Dalhousie
University 34 Dataset Splitting and Model fine-tuning Data Splitting 3 Fine-tuning 4 Fine-tuned Model 5 Data Java – 28,760 Python – 21,365 Miscellaneous – 105,822 Data Splitting Training : 80 Testing: 10 Validation: 10 Discarding Duplicates Dataset - 141,567 Input: Domain-specific terms or jargon Target: Explanation T5 T5ForConditionalGeneration Example Domain-specific terms: module Explanation: It is a unit of Java code

Enriching the bug reports Use case of BugEnricher – Explaining
Domain specific term in a Bug Report 18 December 2023 Usmi Mukherjee, Dalhousie University 35 Fine-tuned Model Keyword Extraction Explanation Generation Bug Report Enriched Bug Report Keywords annotation, null-analysis, javadoc, module, BlindLinkedLabelComposer Keywords Explanations javadoc It is documentation generated BindingLinkedLabelComposer It is for composing labels annotation It is used to describe an annotation object null-analysis It is a Java library for analyzing null data module It is a unit of Java code Enriched Bug Report When I enabled annotation (It is used to describe an annotation object) based null analysis (It is a Java library for analyzing null data), Javadoc (It is documentation generated) hovers use BindingLinkedLabelComposer (It is for composing labels). In that context, Javadoc hover for a module(It is a unit of Java code) does not show the module name, because the BindingLinkedLabelComposer knows nothing about modules keywords

Experiment 36

Test Dataset Construction for Experiments 18 December 2023 Usmi Mukherjee,
Dalhousie University 37 Test Vocabulary for Answering RQ1 and RQ2 • 2,876 Java • 2,136 Python • 10,582 Miscellaneous Bug Report Keywords for Answering RQ3 (Case study) • 92,854 Bug Reports from Eclipse, Firefox and Mobile • Discard Stopwords • Token splitting • Remove HTML tags, and URLs • Lowercase

Evaluation Metrics 18 December 2023 Usmi Mukherjee, Dalhousie University 38
Bi-Lingual Evaluation of Understudy BLEU Compares a candidate text to a reference text and determines how similar they are based on the matching of their n-grams. Metric for Evaluation of Translation with Explicit ORdering METEOR Similarity between a candidate text and the reference text by sequentially applying exact match, stemmed match and wordnet-based synonym match between the texts. Semantic Similarity SS A candidate text is compared with the reference text based on these embeddings using cosine- similarity, it has the highest correlation with human-evaluated similarity

Research Questions 18 December 2023 Usmi Mukherjee, Dalhousie University 39
RQ1: Performance of BugEnricher How does our technique perform in explaining domain-specific terms or jargon according to automatic evaluation metrics? RQ2: Comparison with Existing Baselines Can our technique outperform the existing baseline techniques in generating explanations to domain-specific terms or jargon? RQ3: Case Study - Performance of an Existing Duplicate Bug Report Detection Technique Does our enrichment of bug reports help improve an existing technique for duplicate bug report detection?

RQ1: Performance of BugEnricher 18 December 2023 Usmi Mukherjee, Dalhousie
University 40

RQ2: Comparison with Two Existing Baselines 4 January 2024 Usmi
Mukherjee, Dalhousie University 41 Baseline T5 Input: Domain-specific terms or jargon AnswerBot Query: “What is” + Domain-specific terms or jargon Corpus: Dataset provided by the author

RQ2: Comparison with Two Existing Baselines 18 December 2023 Usmi
Mukherjee, Dalhousie University 42 Performance Gain BugEnricher_Java and Baseline_T5 72.12% Performance Gain BugEnricher_Java and AnswerBot 88.34%

RQ3: Case Study - Performance of an Existing Duplicate Bug
Report Detection Technique 18 December 2023 Usmi Mukherjee, Dalhousie University 43

Key Findings 18 December 2023 Usmi Mukherjee, Dalhousie University 44
BugEnricher’s explanations are understandable to good for domain-specific terms or jargon (e.g BLEU score of 28.85) BugEnricher outperforms both baselines (e.g. Performance Gain of 72.12% and 88.34% ) BugEnricher’s explanations offer complementary information, improves a duplicate bug report detection technique.

Conclusion Part 4 P1 P2 P3 P4

Take Home Messages 18 December 2023 Usmi Mukherjee, Dalhousie University
46 Answers to Follow-up Questions Explanation to domain- specific terms or jargon BugMentor BugEnricher Missing information in bug reports Bug not reproduced or resolved in time Lack of understandability BugMentor BugEnricher Understandable to good explanations Outperforms baselines Improves duplicate bug report detection Understandable to good answers to follow-up question Outperforms baselines More accurate, precise, concise, and useful Answers - developers

THANK YOU

18 December 2023 Usmi Mukherjee, Dalhousie University 48 Questions RAISE
Lab Intelligent Automation in Software EngineeRing Contact: [email protected]

APPENDIX 4 January 2024 Usmi Mukherjee, Dalhousie University 49 CHAT-GPT
RQ3- ABLATION RQ1-CROSS LANGUAGE REFERENCES

RQ3: Role of different components in BugMentor 18 December 2023
Usmi Mukherjee, Dalhousie University 50 Back

RQ3: Role of different components in BugMentor 4 January 2024

RQ1: Performance of BugMentor – Cross Project 18 December 2023

Software Bugs and Their Impact IEEE Definition : Human-made mistakes
that prevent the software from working as expected [1] 4 January 2024 Usmi Mukherjee, Dalhousie University 53 https://www.it-cisq.org/the-cost-of-poor-quality-software-in-the-us-a-2022-report Back

References [1] IEEE, “Ieee standard glossary of software engineering terminology,”
IEEE Std 610.12-1990, pp. 1–84, 1990. doi: 10.1109/IEEESTD.1990.101064 [2] “Therac-25,” Ethics Unwrapped, https://ethicsunwrapped.utexas.edu/case-study/therac-25 (accessed Dec. 15, 2023). [3] M. M. Imran, A. Ciborowska, and K. Damevski, “Automatically selecting follow-up questions for deficient bug reports,” in 2021 IEEE/ACM 18th International Conference on Mining Software Repositories (MSR), IEEE, 2021, pp. 167–178. [4] Xu, B., Xing, Z., Xia, X., & Lo, D. (2017, October). AnswerBot: Automated generation of answer summary to developers' technical questions. In 2017 32nd IEEE/ACM international conference on automated software engineering (ASE) (pp. 706-716). IEEE. [5] R. K. Saha, M. Lease, S. Khurshid, and D. E. Perry, “Improving bug localization using structured information retrieval,” in 2013 28th IEEE/ACM International Conference on Automated Software Engineering (ASE), IEEE, 2013, pp. 345–355 4 January 2024 Usmi Mukherjee, Dalhousie University 54 Back

CHAT-GPT 4 January 2024 Usmi Mukherjee, Dalhousie University 55 Back

Master's Thesis - Usmi Mukherjee

Master's Thesis - Usmi Mukherjee

Other Decks in Research

Featured

Transcript