TMPA-2015: Towards a Usable Defect Prediction Tool: Crossbreeding Machine Learning and Heuristics

Towards a usable defect prediction tool Vladimir Kovalenko JetBrains Crossbreeding
machine learning and heuristics

Defect Prediction • Common goal: identify defect-prone entities in advance
• Why? • QA • Resource allocation (testing, review, etc)

Previous research • Academia • Microsoft Research • Google case
study

Research papers (common points) • ML defect prediction models work
in general • No universal model: projects are too different • Code metrics as features improve prediction quality • Typical precision/recall ~0.7

Google case study • Collaborated with researchers to introduce defect
prediction in internal code review system • Came up with a heuristic model Google Bug Prediction Score (Time Weighted Risk)

Tools • No defect prediction tools known to be used
in industry • Why? • Too low accuracy • Too much effort to set up

Tool usability criteria • Language independent • “entity” >= ﬁle
• Little or no effort to set up • no plain supervised learning • Near real-time • Easy to use: VCS agnostic, etc • Accurate!

Implementation • CI server plugin • Only use VCS metrics
• Automatic bugﬁx changes detection (heuristics) • Processing: detect bug-introducing changes • ML classiﬁer: Naive Bayes / Decision Tree • Take prediction top, not absolute values • Automatic quality evaluation

Features • Local change frequency • Number of authors •
File age • Number of affecting commits • Google Score

Quality Evaluation • Bug tracker integration: find bugfix changes •
Quality metric: fraction of files from model predictions affected by bugfix changes in the future

Result samples Project A, 2 years Project B, 1 year

Conclusions • It is possible combine learning and heuristic approaches
to get the best of two worlds • The accuracy is still not good enough • No wonder no prediction tools are widely used yet

Thank you! [email protected]

TMPA-2015: Towards a Usable Defect Prediction T...

TMPA-2015: Towards a Usable Defect Prediction Tool: Crossbreeding Machine Learning and Heuristics

Exactpro
PRO

More Decks by Exactpro

Other Decks in Science

Featured

Transcript

Towards a usable defect prediction tool Vladimir Kovalenko JetBrains Crossbreeding

Defect Prediction • Common goal: identify defect-prone entities in advance

Previous research • Academia • Microsoft Research • Google case

Research papers (common points) • ML defect prediction models work

Google case study • Collaborated with researchers to introduce defect

Tools • No defect prediction tools known to be used

Tool usability criteria • Language independent • “entity” >= ﬁle

Implementation • CI server plugin • Only use VCS metrics

Features • Local change frequency • Number of authors •

Quality Evaluation • Bug tracker integration: ﬁnd bugﬁx changes •

Result samples Project A, 2 years Project B, 1 year

Conclusions • It is possible combine learning and heuristic approaches

Thank you! [email protected]