Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Analysis of the characteristics and causes of underestimated bug reports

Analysis of the characteristics and causes of underestimated bug reports

MACSPro'2019 - Modeling and Analysis of Complex Systems and Processes, Vienna
21 - 23 March 2019

Anna Gromova

Conference website http://macspro.club/

Website https://exactpro.com/
Linkedin https://www.linkedin.com/company/exactpro-systems-llc
Instagram https://www.instagram.com/exactpro/
Twitter https://twitter.com/exactpro
Facebook https://www.facebook.com/exactpro/
Youtube Channel https://www.youtube.com/c/exactprosystems

Exactpro

March 22, 2019
Tweet

More Decks by Exactpro

Other Decks in Research

Transcript

  1. Underestimated Bug Reports Submit bug report Close bug report Reopen

    bug report Close bug report Resolution: • Rejected • Won’t Fix • Non a defect • Non-issue Resolution: • Done • Fixed ?
  2. Contribution - Revealing the special characteristics of underestimated bug reports.

    - Proposing to use different methods of feature selection and ranking for determining the most significant terms of underestimated bug reports. - Conducting an analytical study to investigate the potential causes of the initial resolution of such bug reports via the most significant terms.
  3. Understanding the Nature of Different Types of Defects • invalid

    bug reports • high-impact bug reports • reopened bug reports • security and non-security bug reports • defects and non-defects
  4. Considered Datasets TYPE 1: No change resolution TYPE 2: Change

    resolution Close bug report Resolution: • Done • Fixed
  5. Metrics of Comparison - time to resolve - count of

    comments and count of attachments - percentage of “Critical” and “Blocker” priority - the length of description
  6. Objects JBOSS Jenkins Sakai Number of type 1 / Number

    of type 2 11848 / 117 14280 / 150 18763 / 183 Resolution of type 2 Won’t Fix, Reject Won’t fix, Not a defect Won’t fix, Non-issue TTR of type 1 min/max/mean 0 / 3086 / 52.619 0 / 3762 / 180.142 0 / 3486 / 86.898 TTR of type 2 min/max/mean 0 / 1847 / 93.394 0 / 2791 / 367.353 0 / 4708 / 378.525 Count of comments of type 1 min/max/mean 0 / 92 / 2.772 0 / 174 / 5.665 0 / 80 / 4.426 Count of comments of type 2 min/max/mean 0 / 35 /4.324 1 / 127 / 14.7 1 / 66 / 8.749 Count of attachments of type 1 min/max/mean 0 / 22 /0.324 0 / 20 / 0.415 0 / 47 / 0.639 Count of attachments of type 2 min/max/mean 0 / 8 / 0.451 0 / 23 / 1.247 0 / 13 / 0.836 Percentage of Blocker / Critical of type 1 7% / 12% 9% / 12% 10% / 13% Percentage of Blocker / Critical of type 2 4% / 8% 12% / 13% 6% / 16% Mean description length of type 1 /Mean description length of type 2 4982.13 / 6669.768 2082.092 / 2550.617 971.113 / 726.3289
  7. Text Preprocessing • Natural language processing: ❖ Tokenization ❖ Removal

    of stop-words ❖ Stemming • Bag of words (TF-IDF)
  8. The Top List of Significant Terms of JBOSS Chi2 RFE

    Random Forest Logistic regression 'cast', 'materi', 'osgi', 'busi', 'constructor', 'bundl', 'vdb', 'network', 'comment', 'request', 'jar', 'spec', 'lookup', 'redirect', 'lot' 'busi', 'capabl', 'connector', 'consequ', 'day', 'determin', 'download', 'end', 'facet', 'includ', 'later', 'long', 'network', 'osgi', 'perform' 'import', 'Color', 'normal', 'classexternallink', 'lineheight', 'condit', 'fonttyl', 'error', 'file', 'event', 'like', 'comment', 'fonteight', 'Properti', 'consol', 'cast', 'request', 'event', 'jar', 'bundl', 'error', 'busi', 'osgi', 'open', 'materi', 'comment', 'constructor', 'connect', 'server', 'vdb'
  9. The Top List of Significant Terms of Jenkins Chi2 RFE

    Random Forest Logistic regression 'stdout', 'ssl', dynam', 'testsuit', 'password', 'wrapper', 'certif', 'stderr', ‘git’, 'larg', ‘emailtext’, 'perforc', 'upstream', 'setup', 'gitssh' 'avoid', 'capac', 'correspond', 'detect', 'ensur', 'general', 'head', 'increment', 'introduc', 'jdk', 'listen', 'previous', 'provis', 'servic', 'strang' 'job', 'build', 'error', 'configur', 'password', 'run', 'use', 'document', 'need', 'testsuit', 'jenkin', 'log', 'poll', 'follow', 'long' 'git', 'server', 'password', 'stdout', 'error', 'setup', 'copi', 'document', 'perforc', 'dynam', 'upstream', 'way', 'avail', 'findbug', 'log'
  10. The Top List of Significant Terms of Sakai Chi2 RFE

    Random Forest Logistic regression 'recommend', 'desir', 'idea', 'addit', 'retract', 'exit', 'pool', 'random', 'uniqu', 'person', 'app', 'font', 'portfolio', 'audio', 'edit' 'administr', 'app', 'applic', 'breadcrumb', 'exit', 'explicit', 'role', 'process', 'recommend', 'retract', 'situat', 'stay', 'trunk', 'write', 'uniqu' 'user', 'tool', 'appear', 'error', 'recommend', 'question', 'use', 'classexternallink', 'make', 'chang', 'info', 'click', 'screen', 'follow', 'list' 'recommend', 'appear', 'addit', 'edit', 'resourc', 'call', 'desir', 'pool', 'mean', 'entri', 'inform', 'idea', 'gradebook', 'creat', 'requir'
  11. Groups of Terms • “Prejudiced” terms: idea, recommend, desir, larg,

    strange, long, comment, etc. • “False friend” terms : error, perform, appear, document, configur, etc. • Domain-related terms: osgi, gitssh, retract, gradebook, bundl, etc.
  12. “Prejudiced” Terms I think the idea is to… I recommend…

    it would be more desirable… unprofessional comments…. goes through after long time…. large logfile.. there is something strange with
  13. “False Friend” Terms The documentation claims that... Add the name

    again and continue but the name does not appear…. When trying to perform operation, the exception is thrown Some plugins fail to startup with the following error.
  14. Domain-related Terms Current git-client-plugin does not work with latest git

    release and non-standard ssh ports…. The building of the classloader for a component is dependent on the order that files are returned from the filesystem. This can lead to unpredictable deployments where one node in a cluster will load different classes to another node.
  15. The Accuracy Comparison Project Without using feature selection methods Chi2

    RFE Random Forest Logistic regression JBOSS 0.8 0.82 0.81 0.96 0.88 Jenkins 0.77 0.83 0.82 0.98 0.88 Sakai 0.69 0.7 0.77 0.96 0.88
  16. Conclusion • Revealing the specifics of such bug reports. •

    Using methods of feature selection and ranking: chi-square, recursive feature elimination, features importance of random forest and coefficients of logistic regression. • Proposing three groups of terms: “prejudiced” terms, “false friend” terms and domain-related terms.
  17. Future Work • Analysing each potentially problematic resolution separately. •

    Comparing underestimated bug reports with defects that have the final resolution of “Rejected”, “Won’t Fix”, etc.