Upgrade to Pro — share decks privately, control downloads, hide ads and more …

TMPA-2017: Defect Report Classification in Acco...

TMPA-2017: Defect Report Classification in Accordance with Areas of Testing

TMPA-2017: Tools and Methods of Program Analysis
3-4 March, 2017, Hotel Holiday Inn Moscow Vinogradovo, Moscow

Defect Report Classification in Accordance with Areas of Testing
Anna Gromova, Exactpro

For video follow the link: https://youtu.be/UQwLbSnV_qU

Would like to know more?
Visit our website:
www.tmpaconf.org
www.exactprosystems.com/events/tmpa

Follow us:
https://www.linkedin.com/company/exactpro-systems-llc?trk=biz-companies-cym
https://twitter.com/exactpro

Exactpro

March 23, 2017
Tweet

More Decks by Exactpro

Other Decks in Technology

Transcript

  1. Defect report classification in accordance with areas of testing Anna

    Gromova, Exactpro Open Access Quality Assurance & Related Software Development for Financial Markets Tel: +7 495 640 2460, +1 415 830 38 49 www.exactpro.com
  2. Open Access Quality Assurance & Related Software Development for Financial

    Markets Tel: +7 495 640 24 60 , +1 415 830 38 49 www.exactpro.com 2 Defect Management Areas of research in defect management: • automatic defect fixing • automatic defect detection • metrics and predictions of defect reports • quality of defect reports • triaging defect reports
  3. Open Access Quality Assurance & Related Software Development for Financial

    Markets Tel: +7 495 640 24 60 , +1 415 830 38 49 www.exactpro.com 3 • Examples of metrics: • time to fix / time to resolve • which defects get reopened • which defects get fixed • which defects get rejected Metrics of testing
  4. Open Access Quality Assurance & Related Software Development for Financial

    Markets Tel: +7 495 640 24 60 , +1 415 830 38 49 www.exactpro.com 4 Area of testing: Component/s and Summary
  5. Open Access Quality Assurance & Related Software Development for Financial

    Markets Tel: +7 495 640 24 60 , +1 415 830 38 49 www.exactpro.com 5 • Manual classification of 2,795 defect reports extracted from the bug tracking system. • Answers to the following questions based on the previous classification and natural language processing: 1. Does feature selection improve defect classification? 1. What combinations of the classifiers and feature selection methods give the best results? Contribution
  6. Open Access Quality Assurance & Related Software Development for Financial

    Markets Tel: +7 495 640 24 60 , +1 415 830 38 49 www.exactpro.com 6 Text categorization allows solving the following tasks: • classifying defects in relation to different features, such as the type of issue, security or the configuration aspect; • predicting the assignment of a developer that should fix the bug; • predicting the category of the software component that is connected to the defect, etc. Classification: related work
  7. Open Access Quality Assurance & Related Software Development for Financial

    Markets Tel: +7 495 640 24 60 , +1 415 830 38 49 www.exactpro.com 7 Techniques: preprocessing • Natural language processing: ❖ Tokenization ❖ Removal of stop-words ❖ Stemming • Bag of words (TF-IDF) TF(t,d)=freq(t,d)/(maxw∈D freq(w,d)) IDF(t,D)=log2 (|D|/(d∈D:t∈d)) freq(t,d) — term frequency, i.e. the number of times that term t occurs in document d; max w∈D freq(w,d) — the maximum frequency of any term in document d; d∈D:t∈d — number of documents containing t; D — total number of documents in the corpus TFIDF=TF(t,d)×IDF(t,D)
  8. Open Access Quality Assurance & Related Software Development for Financial

    Markets Tel: +7 495 640 24 60 , +1 415 830 38 49 www.exactpro.com 8 Techniques: feature selection
  9. Open Access Quality Assurance & Related Software Development for Financial

    Markets Tel: +7 495 640 24 60 , +1 415 830 38 49 www.exactpro.com 9 Classifiers: • Logistic regression • SVM • Decision tree • Random forest • Naive Bayes • Bayes Net Techniques
  10. Open Access Quality Assurance & Related Software Development for Financial

    Markets Tel: +7 495 640 24 60 , +1 415 830 38 49 www.exactpro.com 10 Objects
  11. Open Access Quality Assurance & Related Software Development for Financial

    Markets Tel: +7 495 640 24 60 , +1 415 830 38 49 www.exactpro.com 11 Example CR T1 : Property1 = true T2 : Property1 = true Market Structure Document Ti : Property1 = false Current situation Market Structure Gateway T1 : Property1 = true T1 : Property1 = NULL
  12. Open Access Quality Assurance & Related Software Development for Financial

    Markets Tel: +7 495 640 24 60 , +1 415 830 38 49 www.exactpro.com 12 Approach
  13. Open Access Quality Assurance & Related Software Development for Financial

    Markets Tel: +7 495 640 24 60 , +1 415 830 38 49 www.exactpro.com 13 Results: metrics
  14. Open Access Quality Assurance & Related Software Development for Financial

    Markets Tel: +7 495 640 24 60 , +1 415 830 38 49 www.exactpro.com 14 The red values correspond to the minimum values of the F-measure, the green values - to the maximum. Classifier FS AREA 1 AREA 2 AREA 3 AREA 4 AREA 5 AREA 6 AREA 7 AREA 8 F-measure F-measure F-measure F-measure F-measure F-measure F-measure F-measure LogReg No 0,745 0,404 0,758 0,905 0,8 0,892 0,964 0,877 SVM No 0,741 0 0,389 0,852 0,389 0,723 0,914 0,864 J48 No 0,898 0,832 0,739 0,953 0,931 0,955 0,991 0,952 RandFor No 0,771 0,628 0,667 0,928 0,867 0,874 0,935 0,968 Bnet No 0,716 0,864 0,764 0,912 0,92 0,862 0,982 0,917 Bayes No 0,68 0,628 0,647 0,847 0,779 0,777 0,956 0,867 LogReg IG 0,907 0,811 0,764 0,883 0,88 0,922 0,894 0,916 SVM IG 0,948 0,862 0,836 0,924 0,938 0,95 0,991 0,938 J48 IG 0,822 0,867 0,739 0,943 0,931 0,955 0,991 0,973 RandFor IG 0,959 0,887 0,897 0,938 0,948 0,936 0,991 0,98 Bnet IG 0,716 0,864 0,764 0,912 0,92 0,862 0,982 0,917 Bayes IG 0,701 0,633 0,688 0,846 0,815 0,784 0,956 0,861 LogReg Cons 0,909 0,86 0,915 0,952 0,938 0,964 0,991 0,973 SVM Cons 0,95 0,87 0,885 0,953 0,938 0,964 0,991 0,976 J48 Cons 0,804 0,829 0,739 0,921 0,931 0,955 0,991 0,902 RandFor Cons 0,939 0,877 0,9 0,95 0,945 0,964 0,991 0,991 Bnet Cons 0,86 0,862 0,792 0,941 0,939 0,964 0,991 0,962 Bayes Cons 0,816 0,752 0,733 0,892 0,935 0,955 0,991 0,929 LogReg Cfs 0,88 0,811 0,83 0,921 0,93 0,915 0,991 0,912 SVM Cfs 0,941 0,862 0,836 0,915 0,938 0,936 0,957 0,91 J48 Cfs 0,821 0,821 0,739 0,916 0,931 0,931 0,991 0,838 RandFor Cfs 0,941 0,842 0,815 0,93 0,938 0,936 0,991 0,918 Bnet Cfs 0,782 0,862 0,815 0,926 0,945 0,847 0,982 0,903 Bayes Cfs 0,714 0,782 0,881 0,914 0,925 0,8 0,991 0,889 LogReg SSF 0,923 0,87 0,836 0,916 0,938 0,955 0,991 0,962 SVM SSF 0,923 0,87 0,836 0,916 0,938 0,955 0,991 0,962 J48 SSF 0,821 0,829 0,739 0,916 0,931 0,955 0,991 0,894 RandFor SSF 0,923 0,87 0,836 0,916 0,938 0,955 0,991 0,962 Bnet SSF 0,86 0,862 0,836 0,916 0,938 0,955 0,991 0,962 Bayes SSF 0,923 0,87 0,836 0,916 0,938 0,955 0,991 0,928 Results: hold out
  15. Open Access Quality Assurance & Related Software Development for Financial

    Markets Tel: +7 495 640 24 60 , +1 415 830 38 49 www.exactpro.com 15 Results: hold out
  16. Open Access Quality Assurance & Related Software Development for Financial

    Markets Tel: +7 495 640 24 60 , +1 415 830 38 49 www.exactpro.com 16 Results: cross-validation Classifier FS AREA 1 AREA 2 AREA 3 AREA 4 AREA 5 AREA 6 AREA 7 AREA 8 F-measure F-measure F-measure F-measure F-measure F-measure F-measure F-measure LogReg No 0,724 0,654 0,464 0,837 0,618 0,875 0,967 0,915 SVM No 0,748 0,052 0,726 0,873 0,563 0,86 0,949 0,877 J48 No 0,925 0,821 0,743 0,925 0,927 0,963 0,991 0,957 RandFor No 0,813 0,687 0,721 0,93 0,875 0,941 0,975 0,948 Bnet No 0,717 0,856 0,691 0,913 0,89 0,911 0,982 0,911 Bayes No 0,718 0,7 0,654 0,853 0,789 0,814 0,969 0,841 LogReg IG 0,856 0,785 0,789 0,881 0,882 0,852 0,991 0,879 SVM IG 0,948 0,854 0,825 0,933 0,954 0,971 0,991 0,943 J48 IG 0,931 0,868 0,752 0,947 0,944 0,969 0,991 0,957 RandFor IG 0,954 0,859 0,918 0,939 0,943 0,964 0,985 0,974 Bnet IG 0,717 0,856 0,691 0,913 0,818 0,911 0,982 0,911 Bayes IG 0,718 0,776 0,631 0,849 0,89 0,827 0,973 0,844 LogReg Cons 0,934 0,833 0,914 0,948 0,948 0,974 0,991 0,969 SVM Cons 0,946 0,844 0,914 0,954 0,954 0,976 0,991 0,965 J48 Cons 0,931 0,809 0,789 0,923 0,934 0,968 0,991 0,952 RandFor Cons 0,942 0,837 0,92 0,95 0,951 0,975 0,991 0,975 Bnet Cons 0,818 0,855 0,757 0,946 0,93 0,975 0,991 0,964 Bayes Cons 0,811 0,773 0,78 0,882 0,891 0,937 0,991 0,935 LogReg Cfs 0,921 0,831 0,872 0,931 0,939 0,951 0,982 0,915 SVM Cfs 0,941 0,844 0,841 0,937 0,952 0,962 0,982 0,92 J48 Cfs 0,933 0,791 0,748 0,917 0,933 0,963 0,991 0,905 RandFor Cfs 0,929 0,858 0,88 0,938 0,949 0,958 0,988 0,922 Bnet Cfs 0,797 0,856 0,815 0,931 0,93 0,935 0,988 0,903 Bayes Cfs 0,739 0,78 0,865 0,909 0,912 0,879 0,988 0,849 LogReg SSF 0,924 0,856 0,836 0,916 0,942 0,968 0,991 0,96 SVM SSF 0,924 0,849 0,836 0,917 0,941 0,968 0,991 0,96 J48 SSF 0,927 0,794 0,748 0,917 0,933 0,968 0,991 0,942 RandFor SSF 0,924 0,849 0,841 0,916 0,942 0,968 0,991 0,958 Bnet SSF 0,866 0,856 0,823 0,915 0,942 0,968 0,991 0,958 Bayes SSF 0,924 0,85 0,841 0,916 0,938 0,968 0,991 0,957 The red values correspond to the minimum values of the F-measure, the green values - to the maximum.
  17. Open Access Quality Assurance & Related Software Development for Financial

    Markets Tel: +7 495 640 24 60 , +1 415 830 38 49 www.exactpro.com 17 Results: cross-validation
  18. Open Access Quality Assurance & Related Software Development for Financial

    Markets Tel: +7 495 640 24 60 , +1 415 830 38 49 www.exactpro.com 18 Results: hold-out vs cross-validation
  19. Open Access Quality Assurance & Related Software Development for Financial

    Markets Tel: +7 495 640 24 60 , +1 415 830 38 49 www.exactpro.com 19 1. Manual classification of 2,795 defect reports extracted from the bug tracking system according to the area of testing. 2. Building classifiers for each area using different machine learning and natural language processing techniques. Methods of feature selection: information gain, the consistency-based and correlation-based methods, and the simplified silhouette filter. Methods of classification: logistic regression, support vector machines, decision tree, random forest, Bayes net and Naive Bayes. ❖ Feature selection is an integral part of a successful classification process ❖ The following combinations of the classifiers and feature selection methods have the best results in both types of the set division: - random forest and information gain; - random forest and the consistency-based method; - support vector machines and information gain; - support vector machines and the consistency-based method. Conclusions
  20. Open Access Quality Assurance & Related Software Development for Financial

    Markets Tel: +7 495 640 24 60 , +1 415 830 38 49 www.exactpro.com 20 • Clustering of defect-reports • Prediction of the metric called “which defects get reopened”. Future work
  21. Open Access Quality Assurance & Related Software Development for Financial

    Markets Tel: +7 495 640 24 60 , +1 415 830 38 49 www.exactpro.com 21 Thank you!
  22. Open Access Quality Assurance & Related Software Development for Financial

    Markets Tel: +7 495 640 24 60 , +1 415 830 38 49 www.exactpro.com 22 • Antoniol G., Ayari K., Di Penta M., Khomh F., Guhneuc Y.-G.: Is it a bug or an enhancement?: A text- based approach to classify change requests. In Proc. 2008 Conf. Center for Adv. Studies Collaborative Res.: Meeting Minds, 2008, ser. CASCON 08, Article No. 23. New York, NY, USA: ACM, 304-318 • Xia X., Lo D., Qiu W., Wang B., Zhou B.: Automated Configuration Bug Report Prediction Using Text Mining. In 2014 IEEE 38th Annual Computer Software and Applications Conference, 2014, 107–116 • Gegick M., Rotella P., Xie T.: Identifying security bug reports via text mining: An industrial case study. In Proc. 7th IEEE Working Conf. Mining Software Repositories (MSR), May 2010, IEEE Computer Society, 11-20 • Zhou Y., Tong Y., Ruihang Gu, Gall H.C.: Combining Text Mining and Data Mining for Bug Report Classification. In Proc. of 30th International Conference on Software Maintenance and Evolution (ICSM/ICSME), IEEE, 2014, 311–320 • Somasundaram K., Murphy G.C.: Automatic categorization of bug reports using latent dirichlet allocation. In proc. of the 5th India Software Engineering Conference , ISEC’12, New York, 2012, ACM, 125–130 • Cubranic D., Murphy G.C: Automatic bug triage using text categorization. In Proc. 16th Int. Conf. Software Eng. Knowledge Eng.. : KSI Press, 2004, 92–97 • Sureka A.,Indukuri K.V.: Linguistic analysis of bug report titles with respect to the dimension of bug importance. In Proceedings of the Third Annual ACM Bangalore Conference, Article No. 9, ACM, 2010, 1–6 Related work