Slide 1

Slide 1 text

Defect report classification in accordance with areas of testing Anna Gromova, Exactpro Open Access Quality Assurance & Related Software Development for Financial Markets Tel: +7 495 640 2460, +1 415 830 38 49 www.exactpro.com

Slide 2

Slide 2 text

Open Access Quality Assurance & Related Software Development for Financial Markets Tel: +7 495 640 24 60 , +1 415 830 38 49 www.exactpro.com 2 Defect Management Areas of research in defect management: • automatic defect fixing • automatic defect detection • metrics and predictions of defect reports • quality of defect reports • triaging defect reports

Slide 3

Slide 3 text

Open Access Quality Assurance & Related Software Development for Financial Markets Tel: +7 495 640 24 60 , +1 415 830 38 49 www.exactpro.com 3 • Examples of metrics: • time to fix / time to resolve • which defects get reopened • which defects get fixed • which defects get rejected Metrics of testing

Slide 4

Slide 4 text

Open Access Quality Assurance & Related Software Development for Financial Markets Tel: +7 495 640 24 60 , +1 415 830 38 49 www.exactpro.com 4 Area of testing: Component/s and Summary

Slide 5

Slide 5 text

Open Access Quality Assurance & Related Software Development for Financial Markets Tel: +7 495 640 24 60 , +1 415 830 38 49 www.exactpro.com 5 ● Manual classification of 2,795 defect reports extracted from the bug tracking system. ● Answers to the following questions based on the previous classification and natural language processing: 1. Does feature selection improve defect classification? 1. What combinations of the classifiers and feature selection methods give the best results? Contribution

Slide 6

Slide 6 text

Open Access Quality Assurance & Related Software Development for Financial Markets Tel: +7 495 640 24 60 , +1 415 830 38 49 www.exactpro.com 6 Text categorization allows solving the following tasks: ● classifying defects in relation to different features, such as the type of issue, security or the configuration aspect; ● predicting the assignment of a developer that should fix the bug; ● predicting the category of the software component that is connected to the defect, etc. Classification: related work

Slide 7

Slide 7 text

Open Access Quality Assurance & Related Software Development for Financial Markets Tel: +7 495 640 24 60 , +1 415 830 38 49 www.exactpro.com 7 Techniques: preprocessing ● Natural language processing: ❖ Tokenization ❖ Removal of stop-words ❖ Stemming ● Bag of words (TF-IDF) TF(t,d)=freq(t,d)/(maxw∈D freq(w,d)) IDF(t,D)=log2 (|D|/(d∈D:t∈d)) freq(t,d) — term frequency, i.e. the number of times that term t occurs in document d; max w∈D freq(w,d) — the maximum frequency of any term in document d; d∈D:t∈d — number of documents containing t; D — total number of documents in the corpus TFIDF=TF(t,d)×IDF(t,D)

Slide 8

Slide 8 text

Open Access Quality Assurance & Related Software Development for Financial Markets Tel: +7 495 640 24 60 , +1 415 830 38 49 www.exactpro.com 8 Techniques: feature selection

Slide 9

Slide 9 text

Open Access Quality Assurance & Related Software Development for Financial Markets Tel: +7 495 640 24 60 , +1 415 830 38 49 www.exactpro.com 9 Classifiers: ● Logistic regression ● SVM ● Decision tree ● Random forest ● Naive Bayes ● Bayes Net Techniques

Slide 10

Slide 10 text

Open Access Quality Assurance & Related Software Development for Financial Markets Tel: +7 495 640 24 60 , +1 415 830 38 49 www.exactpro.com 10 Objects

Slide 11

Slide 11 text

Open Access Quality Assurance & Related Software Development for Financial Markets Tel: +7 495 640 24 60 , +1 415 830 38 49 www.exactpro.com 11 Example CR T1 : Property1 = true T2 : Property1 = true Market Structure Document Ti : Property1 = false Current situation Market Structure Gateway T1 : Property1 = true T1 : Property1 = NULL

Slide 12

Slide 12 text

Open Access Quality Assurance & Related Software Development for Financial Markets Tel: +7 495 640 24 60 , +1 415 830 38 49 www.exactpro.com 12 Approach

Slide 13

Slide 13 text

Open Access Quality Assurance & Related Software Development for Financial Markets Tel: +7 495 640 24 60 , +1 415 830 38 49 www.exactpro.com 13 Results: metrics

Slide 14

Slide 14 text

Open Access Quality Assurance & Related Software Development for Financial Markets Tel: +7 495 640 24 60 , +1 415 830 38 49 www.exactpro.com 14 The red values correspond to the minimum values of the F-measure, the green values - to the maximum. Classifier FS AREA 1 AREA 2 AREA 3 AREA 4 AREA 5 AREA 6 AREA 7 AREA 8 F-measure F-measure F-measure F-measure F-measure F-measure F-measure F-measure LogReg No 0,745 0,404 0,758 0,905 0,8 0,892 0,964 0,877 SVM No 0,741 0 0,389 0,852 0,389 0,723 0,914 0,864 J48 No 0,898 0,832 0,739 0,953 0,931 0,955 0,991 0,952 RandFor No 0,771 0,628 0,667 0,928 0,867 0,874 0,935 0,968 Bnet No 0,716 0,864 0,764 0,912 0,92 0,862 0,982 0,917 Bayes No 0,68 0,628 0,647 0,847 0,779 0,777 0,956 0,867 LogReg IG 0,907 0,811 0,764 0,883 0,88 0,922 0,894 0,916 SVM IG 0,948 0,862 0,836 0,924 0,938 0,95 0,991 0,938 J48 IG 0,822 0,867 0,739 0,943 0,931 0,955 0,991 0,973 RandFor IG 0,959 0,887 0,897 0,938 0,948 0,936 0,991 0,98 Bnet IG 0,716 0,864 0,764 0,912 0,92 0,862 0,982 0,917 Bayes IG 0,701 0,633 0,688 0,846 0,815 0,784 0,956 0,861 LogReg Cons 0,909 0,86 0,915 0,952 0,938 0,964 0,991 0,973 SVM Cons 0,95 0,87 0,885 0,953 0,938 0,964 0,991 0,976 J48 Cons 0,804 0,829 0,739 0,921 0,931 0,955 0,991 0,902 RandFor Cons 0,939 0,877 0,9 0,95 0,945 0,964 0,991 0,991 Bnet Cons 0,86 0,862 0,792 0,941 0,939 0,964 0,991 0,962 Bayes Cons 0,816 0,752 0,733 0,892 0,935 0,955 0,991 0,929 LogReg Cfs 0,88 0,811 0,83 0,921 0,93 0,915 0,991 0,912 SVM Cfs 0,941 0,862 0,836 0,915 0,938 0,936 0,957 0,91 J48 Cfs 0,821 0,821 0,739 0,916 0,931 0,931 0,991 0,838 RandFor Cfs 0,941 0,842 0,815 0,93 0,938 0,936 0,991 0,918 Bnet Cfs 0,782 0,862 0,815 0,926 0,945 0,847 0,982 0,903 Bayes Cfs 0,714 0,782 0,881 0,914 0,925 0,8 0,991 0,889 LogReg SSF 0,923 0,87 0,836 0,916 0,938 0,955 0,991 0,962 SVM SSF 0,923 0,87 0,836 0,916 0,938 0,955 0,991 0,962 J48 SSF 0,821 0,829 0,739 0,916 0,931 0,955 0,991 0,894 RandFor SSF 0,923 0,87 0,836 0,916 0,938 0,955 0,991 0,962 Bnet SSF 0,86 0,862 0,836 0,916 0,938 0,955 0,991 0,962 Bayes SSF 0,923 0,87 0,836 0,916 0,938 0,955 0,991 0,928 Results: hold out

Slide 15

Slide 15 text

Open Access Quality Assurance & Related Software Development for Financial Markets Tel: +7 495 640 24 60 , +1 415 830 38 49 www.exactpro.com 15 Results: hold out

Slide 16

Slide 16 text

Open Access Quality Assurance & Related Software Development for Financial Markets Tel: +7 495 640 24 60 , +1 415 830 38 49 www.exactpro.com 16 Results: cross-validation Classifier FS AREA 1 AREA 2 AREA 3 AREA 4 AREA 5 AREA 6 AREA 7 AREA 8 F-measure F-measure F-measure F-measure F-measure F-measure F-measure F-measure LogReg No 0,724 0,654 0,464 0,837 0,618 0,875 0,967 0,915 SVM No 0,748 0,052 0,726 0,873 0,563 0,86 0,949 0,877 J48 No 0,925 0,821 0,743 0,925 0,927 0,963 0,991 0,957 RandFor No 0,813 0,687 0,721 0,93 0,875 0,941 0,975 0,948 Bnet No 0,717 0,856 0,691 0,913 0,89 0,911 0,982 0,911 Bayes No 0,718 0,7 0,654 0,853 0,789 0,814 0,969 0,841 LogReg IG 0,856 0,785 0,789 0,881 0,882 0,852 0,991 0,879 SVM IG 0,948 0,854 0,825 0,933 0,954 0,971 0,991 0,943 J48 IG 0,931 0,868 0,752 0,947 0,944 0,969 0,991 0,957 RandFor IG 0,954 0,859 0,918 0,939 0,943 0,964 0,985 0,974 Bnet IG 0,717 0,856 0,691 0,913 0,818 0,911 0,982 0,911 Bayes IG 0,718 0,776 0,631 0,849 0,89 0,827 0,973 0,844 LogReg Cons 0,934 0,833 0,914 0,948 0,948 0,974 0,991 0,969 SVM Cons 0,946 0,844 0,914 0,954 0,954 0,976 0,991 0,965 J48 Cons 0,931 0,809 0,789 0,923 0,934 0,968 0,991 0,952 RandFor Cons 0,942 0,837 0,92 0,95 0,951 0,975 0,991 0,975 Bnet Cons 0,818 0,855 0,757 0,946 0,93 0,975 0,991 0,964 Bayes Cons 0,811 0,773 0,78 0,882 0,891 0,937 0,991 0,935 LogReg Cfs 0,921 0,831 0,872 0,931 0,939 0,951 0,982 0,915 SVM Cfs 0,941 0,844 0,841 0,937 0,952 0,962 0,982 0,92 J48 Cfs 0,933 0,791 0,748 0,917 0,933 0,963 0,991 0,905 RandFor Cfs 0,929 0,858 0,88 0,938 0,949 0,958 0,988 0,922 Bnet Cfs 0,797 0,856 0,815 0,931 0,93 0,935 0,988 0,903 Bayes Cfs 0,739 0,78 0,865 0,909 0,912 0,879 0,988 0,849 LogReg SSF 0,924 0,856 0,836 0,916 0,942 0,968 0,991 0,96 SVM SSF 0,924 0,849 0,836 0,917 0,941 0,968 0,991 0,96 J48 SSF 0,927 0,794 0,748 0,917 0,933 0,968 0,991 0,942 RandFor SSF 0,924 0,849 0,841 0,916 0,942 0,968 0,991 0,958 Bnet SSF 0,866 0,856 0,823 0,915 0,942 0,968 0,991 0,958 Bayes SSF 0,924 0,85 0,841 0,916 0,938 0,968 0,991 0,957 The red values correspond to the minimum values of the F-measure, the green values - to the maximum.

Slide 17

Slide 17 text

Open Access Quality Assurance & Related Software Development for Financial Markets Tel: +7 495 640 24 60 , +1 415 830 38 49 www.exactpro.com 17 Results: cross-validation

Slide 18

Slide 18 text

Open Access Quality Assurance & Related Software Development for Financial Markets Tel: +7 495 640 24 60 , +1 415 830 38 49 www.exactpro.com 18 Results: hold-out vs cross-validation

Slide 19

Slide 19 text

Open Access Quality Assurance & Related Software Development for Financial Markets Tel: +7 495 640 24 60 , +1 415 830 38 49 www.exactpro.com 19 1. Manual classification of 2,795 defect reports extracted from the bug tracking system according to the area of testing. 2. Building classifiers for each area using different machine learning and natural language processing techniques. Methods of feature selection: information gain, the consistency-based and correlation-based methods, and the simplified silhouette filter. Methods of classification: logistic regression, support vector machines, decision tree, random forest, Bayes net and Naive Bayes. ❖ Feature selection is an integral part of a successful classification process ❖ The following combinations of the classifiers and feature selection methods have the best results in both types of the set division: - random forest and information gain; - random forest and the consistency-based method; - support vector machines and information gain; - support vector machines and the consistency-based method. Conclusions

Slide 20

Slide 20 text

Open Access Quality Assurance & Related Software Development for Financial Markets Tel: +7 495 640 24 60 , +1 415 830 38 49 www.exactpro.com 20 ● Clustering of defect-reports ● Prediction of the metric called “which defects get reopened”. Future work

Slide 21

Slide 21 text

Open Access Quality Assurance & Related Software Development for Financial Markets Tel: +7 495 640 24 60 , +1 415 830 38 49 www.exactpro.com 21 Thank you!

Slide 22

Slide 22 text

Open Access Quality Assurance & Related Software Development for Financial Markets Tel: +7 495 640 24 60 , +1 415 830 38 49 www.exactpro.com 22 ● Antoniol G., Ayari K., Di Penta M., Khomh F., Guhneuc Y.-G.: Is it a bug or an enhancement?: A text- based approach to classify change requests. In Proc. 2008 Conf. Center for Adv. Studies Collaborative Res.: Meeting Minds, 2008, ser. CASCON 08, Article No. 23. New York, NY, USA: ACM, 304-318 ● Xia X., Lo D., Qiu W., Wang B., Zhou B.: Automated Configuration Bug Report Prediction Using Text Mining. In 2014 IEEE 38th Annual Computer Software and Applications Conference, 2014, 107–116 ● Gegick M., Rotella P., Xie T.: Identifying security bug reports via text mining: An industrial case study. In Proc. 7th IEEE Working Conf. Mining Software Repositories (MSR), May 2010, IEEE Computer Society, 11-20 ● Zhou Y., Tong Y., Ruihang Gu, Gall H.C.: Combining Text Mining and Data Mining for Bug Report Classification. In Proc. of 30th International Conference on Software Maintenance and Evolution (ICSM/ICSME), IEEE, 2014, 311–320 ● Somasundaram K., Murphy G.C.: Automatic categorization of bug reports using latent dirichlet allocation. In proc. of the 5th India Software Engineering Conference , ISEC’12, New York, 2012, ACM, 125–130 ● Cubranic D., Murphy G.C: Automatic bug triage using text categorization. In Proc. 16th Int. Conf. Software Eng. Knowledge Eng.. : KSI Press, 2004, 92–97 ● Sureka A.,Indukuri K.V.: Linguistic analysis of bug report titles with respect to the dimension of bug importance. In Proceedings of the Third Annual ACM Bangalore Conference, Article No. 9, ACM, 2010, 1–6 Related work