Upgrade to Pro — share decks privately, control downloads, hide ads and more …

TMPA-2015: Towards a Usable Defect Prediction T...

Avatar for Exactpro Exactpro
December 01, 2015

TMPA-2015: Towards a Usable Defect Prediction Tool: Crossbreeding Machine Learning and Heuristics

Towards a Usable Defect Prediction Tool: Crossbreeding Machine Learning and Heuristics
Vladimir Kovalenko, Galina Alperovich , JetBrains

12 - 14 November 2015
Tools and Methods of Program Analysis in St. Petersburg

Avatar for Exactpro

Exactpro

December 01, 2015
Tweet

More Decks by Exactpro

Other Decks in Science

Transcript

  1. Defect Prediction • Common goal: identify defect-prone entities in advance

    • Why? • QA • Resource allocation (testing, review, etc)
  2. Research papers (common points) • ML defect prediction models work

    in general • No universal model: projects are too different • Code metrics as features improve prediction quality • Typical precision/recall ~0.7
  3. Google case study • Collaborated with researchers to introduce defect

    prediction in internal code review system • Came up with a heuristic model Google Bug Prediction Score (Time Weighted Risk)
  4. Tools • No defect prediction tools known to be used

    in industry • Why? • Too low accuracy • Too much effort to set up
  5. Tool usability criteria • Language independent • “entity” >= file

    • Little or no effort to set up • no plain supervised learning • Near real-time • Easy to use: VCS agnostic, etc • Accurate!
  6. Implementation • CI server plugin • Only use VCS metrics

    • Automatic bugfix changes detection (heuristics) • Processing: detect bug-introducing changes • ML classifier: Naive Bayes / Decision Tree • Take prediction top, not absolute values • Automatic quality evaluation
  7. Features • Local change frequency • Number of authors •

    File age • Number of affecting commits • Google Score
  8. Quality Evaluation • Bug tracker integration: find bugfix changes •

    Quality metric: fraction of files from model predictions affected by bugfix changes in the future
  9. Conclusions • It is possible combine learning and heuristic approaches

    to get the best of two worlds • The accuracy is still not good enough • No wonder no prediction tools are widely used yet