Upgrade to Pro — share decks privately, control downloads, hide ads and more …

TMPA-2015: Towards a Usable Defect Prediction Tool: Crossbreeding Machine Learning and Heuristics

Exactpro
December 01, 2015

TMPA-2015: Towards a Usable Defect Prediction Tool: Crossbreeding Machine Learning and Heuristics

Towards a Usable Defect Prediction Tool: Crossbreeding Machine Learning and Heuristics
Vladimir Kovalenko, Galina Alperovich , JetBrains

12 - 14 November 2015
Tools and Methods of Program Analysis in St. Petersburg

Exactpro

December 01, 2015
Tweet

More Decks by Exactpro

Other Decks in Science

Transcript

  1. Defect Prediction • Common goal: identify defect-prone entities in advance

    • Why? • QA • Resource allocation (testing, review, etc)
  2. Research papers (common points) • ML defect prediction models work

    in general • No universal model: projects are too different • Code metrics as features improve prediction quality • Typical precision/recall ~0.7
  3. Google case study • Collaborated with researchers to introduce defect

    prediction in internal code review system • Came up with a heuristic model Google Bug Prediction Score (Time Weighted Risk)
  4. Tools • No defect prediction tools known to be used

    in industry • Why? • Too low accuracy • Too much effort to set up
  5. Tool usability criteria • Language independent • “entity” >= file

    • Little or no effort to set up • no plain supervised learning • Near real-time • Easy to use: VCS agnostic, etc • Accurate!
  6. Implementation • CI server plugin • Only use VCS metrics

    • Automatic bugfix changes detection (heuristics) • Processing: detect bug-introducing changes • ML classifier: Naive Bayes / Decision Tree • Take prediction top, not absolute values • Automatic quality evaluation
  7. Features • Local change frequency • Number of authors •

    File age • Number of affecting commits • Google Score
  8. Quality Evaluation • Bug tracker integration: find bugfix changes •

    Quality metric: fraction of files from model predictions affected by bugfix changes in the future
  9. Conclusions • It is possible combine learning and heuristic approaches

    to get the best of two worlds • The accuracy is still not good enough • No wonder no prediction tools are widely used yet