A Comparative Evaluation of Static Analysis Actionable Alert Identification Techniques

! A Comparative Evaluation of Static Analysis Actionable Alert Identification
Techniques Sarah!Heckman!and!Laurie!Williams Department!of!Computer!Science North!Carolina!State!University

! Motivation • Automated!static!analysis!can!find!a!large!number!of alerts – Empirically!observed!alert!density!of!40!alerts/KLOC[HW08] • Alert!inspection!required!to!determine!if!developer should!(and!could)!fix
– Developer!may!only!fix!9%[HW08]!to!65%[KAY04]!of!alerts – Suppose!1000!alerts!–!5!minute!inspection!per!alert!–!10.4 work!days!to!inspect!all!alerts – Potential!savings!of!3.6W9.5!days!by!only!inspecting!alerts!the developer!will!fix • Fixing!3W4!alerts!that!could!lead!to!field!failures!justifies the!cost!of!static!analysis[WDA08] ! PROMISE 2013 (c) Sarah Heckman 0

! Coding Problem? • Actionable:!alerts!the!developer!wants!to!fix – Faults!in!the!code – Conformance!to!coding!standards –
Developer(action:!fix!the!alert!in!the!source!code ! • Unactionable:!alerts!the!developer!does!not!want!to!fix – Static!analysis!false!positive – Developer!knowledge!that!alert!is!not!a!problem – Inconsequential!coding!problems!(style) – Fixing!the!alert!may!not!be!worth!effort – Developer(action:!suppress!the !!!alert ! PROMISE 2013 (c) Sarah Heckman 0

! Actionable Alert Identification Techniques • Supplement!automated!static!analysis – Classification:!predict!actionability –
Prioritization:!order!by!predicted!actionability • AAIT!utilize!additional!information!about!the alert,!code,!and!other!artifacts – Artifact!Characteristics • Can!we!determine!a!“best”!AAIT? PROMISE 2013 (c) Sarah Heckman 0

! Research Objective • to#inform#the#selection#of#an#actionable#alert identification#technique#for#ranking#the#output#of automated#static#analysis#through#a#comparative evaluation#of#six#actionable#alert#identification techniques. PROMISE
2013 (c) Sarah Heckman 0

! Related Work • Comparative!evaluation!of!AAIT![AAH12] – Languages:!Java!and!Smalltalk – ASA:!PMD,!FindBugs,!SmallLint –
Benchmark:!FAULTBENCH – Evaluation!Metrics • Effort!–!“average!number!of!alerts!one!must!inspect!to!find an!actionable!one” • Fault!Detection!Rate!Curve!–!number!of!faults!detected against!number!alerts!inspected. – Selected!AAIT:!APM,!FeedbackRank,!LRM,!ZRanking, ATL'D,!EFindBugs ! PROMISE 2013 (c) Sarah Heckman 0

! Comparative Evaluation • Considered!AAIT!in!literature![HW11][SFZ11] • !Selection!Criteria – AAIT!classify!or!prioritize!alerts!generated!by automated!static!analysis!for!the!Java!programming
language – An!implementation!of!the!AAIT!is!described!allowing for!replication – The!AAIT!is!fully!automated!and!does!not!require manual!intervention!or!inspection!of!alerts!as!part!of the!process PROMISE 2013 (c) Sarah Heckman 0

! Selected AAIT (1) • Actionable!Prioritization!Models!(APM)![HW08] – ACs:!code!location,!alert!type • Alert!Type!Lifetime!(ATL)![KE07a]
– AC:!alert!type!lifetime – ATLHD:!measures!the!lifetime!in!days – ATLHR:!measures!the!lifetime!in!revisions • Check!‘n’!Crash!(CnC)![CSX08] – AC:!test!failures – Generates!tests!that!try!to!cause!RuntimeExceptions PROMISE 2013 (c) Sarah Heckman 0

! Selected AAIT (2) • History*Based!Warning!Prioritization!(HWP) [KE07b] – ACs:!commit!messages!that!identify!fault/non*fault fixes
• Logistic!Regression!Models!(LRM)![RPM08] – ACs:!33!including!two!proprietary/internal!AC • Systematic!Actionable!Alert!Identification!(SAAI) [HW09] – ACs:!42 – Machine!learning PROMISE 2013 (c) Sarah Heckman 0

! FAULTBENCH v0.3 • 3!Subject!Programs:!jdom,!runtime,!logging • Procedure 1. Gather!Alert!and!Artifact!Characteristic!Data Sources
2. Artifact!Characteristic!and!Alert!Oracle!Generation 3. Training!and!Test!Sets 4. Model!Building 5. Model!Evaluation PROMISE 2013 (c) Sarah Heckman 0

! Gather Data • Download!from!repo • Compile • ASA!–!FindBugs!&!Check!‘n’!Crash!(ESC/Java) •
Source!Metrics!–!JavaNCSS • Repository!History!–!CVS!&!SVN • Difficulties – Libraries!–!changed!over!time – Not!every!revision!would!build!(especially!early!ones) ! PROMISE 2013 (c) Sarah Heckman 0

! Artifact Characteristics Independent!Variables Alert&Identifier&and&History • Alert!information!(type,!location) • Number!of!alert!modifications Source&Code&Metrics
• Size!and!complexity!metrics Source&Code&History • Developers • File!creation,!deletion,!and!modification revisions Source&Code&Churn • Added!and!deleted!lines!of!code Aggregate&Characteristics • Alert!lifetime,!alert!counts,!staleness ! Dependent!Variable!–!Alert!Classification ! PROMISE 2013 (c) Sarah Heckman 0 Alert Info Surrounding Code Alert Actionable Alert Unactionable Alert

! Alert Oracle Generation PROMISE 2013 (c) Sarah Heckman 0
• Iterate!through!all!revisions,!starting!with!the earliest,!and!compare!alerts!between!revisions • Closed!!!!!!!Actionable • Filtered!!!!!!!!Unactionable • Deleted • Open – Inspection – All!unactionable Open Deleted Closed Filtered

! Training and Test Sets • Simulate!how!AAIT!would!be!used!in!practice • Training!set:!first!X%!of!revisions!to!train!the!models –
70%,!80%,!and!90% • Test!set:!use!remaining!100EX%!of!revisions!to!test!the models • Overlapping!alerts – Alerts!open!at!the!cutoff!revision • Deleted!alerts – If!an!alert!is!deleted,!the!alert!is!not!considered!UNLESS!the alert!isn’t!deleted!in!the!training!set.!In!that!case!the!alert!is used!in!model!building. PROMISE 2013 (c) Sarah Heckman 0

! Model Building & Model Evaluation PROMISE 2013 (c) Sarah
Heckman 0 • Classification!Statistics: – Precision!=!TP!/!(TP!+!FP) – Recall!=!TP!/!(TP!+!FN) – Accuracy!=!(TP!+!TN)!/!(TP!+!TN!+!FP!+!FN) ! Predicted Actual True Positive (TP) Actionable Actionable False Positive (FP) Actionable Unactionable False Negative (FN) Unactionable Actionable True Negative (TN) Unactionable Unactionable • All!AAIT!are!built!using!the!training!data!and!evaluated by!predicting!the!actionability!of!the!test!data

! Results - jdom PROMISE 2013 (c) Sarah Heckman 0
! Accuracy!(%) Precision!(%) Recall!(%) Rev. 70 80 90 70 80 90 70 80 90 APM 80 83 87 46 42 0 9 10 0 ATL8D 72 83 88 26 20 20 22 2 3 ATL8R 77 81 86 32 24 24 11 8 13 CnC 73 80 95 100 100 0 6 9 0 HWP 31 35 32 19 15 9 73 67 57 LRM 72 76 83 37 35 32 64 55 59 SAAI 83 86 90 92 100 67 16 13 7

! Results - runtime PROMISE 2013 (c) Sarah Heckman 0
! Accuracy!(%) Precision!(%) Recall!(%) AAIT 70 80 90 70 80 90 70 80 90 APM 36 23 50 88 70 47 32 17 57 ATL7D 18 17 55 92 82 100 8 4 3 ATL7R 34 43 59 93 94 55 27 36 60 HWP 68 66 46 88 85 45 74 73 83 LRM 88 87 53 88 87 49 100 100 100 SAAI 49 65 83 90 91 100 48 66 63

! Results - logging ! Accuracy!(%) Precision!(%) Recall!(%) AAIT 70
80 90 70 80 90 70 80 90 APM 85 89 92 0 0 0 0 0 0 ATL7D 92 97 100 0 0 0 0 0 0 ATL7R 92 97 100 0 0 0 0 0 0 CnC 67 100 100 0 0 0 0 0 0 HWP 32 35 33 8 4 0 100 100 0 LRM 77 84 83 25 14 0 100 100 0 SAAI 90 97 100 0 0 0 0 0 0 PROMISE 2013 (c) Sarah Heckman 0

! Threats to Validity • Internal!Validity – Automation!of!data!generation,!collection,!and!artifact characteristic!generation –
Alert!oracle!–!uninspected!alerts!are!considered!unactionable – Alert!closure!is!not!an!explicit!action!by!the!developer – Alert!continuity!not!perfect • Close!and!open!a!new!alert!if!both!the!line!number!and!source!hash of!the!alert!change – Number!of!revisions • External!Validity – Generalizability!of!results – Limitations!of!the!AAIT!in!comparative!evaluation • Construct!Validity – Calculations!for!artifact!characteristics ! PROMISE 2013 (c) Sarah Heckman 0

! Future Work • Incorporate!additional!projects!into FAULTBENCH – Emphasis!on!adding!projects!that!actively!use!ASA and!include!filter!files –
Allow!for!evaluation!of!AAIT!with!different!goals • Identification!of!most!predictive!artifact characteristics • Evaluate!different!windows!for!generating!test data – A!full!project!history!may!not!be!as!predictive!as!the most!recent!history PROMISE 2013 (c) Sarah Heckman 0

! Conclusions • SAAI!found!to!be!the!best!overall!model!when considering!accuracy – Highest!accuracy,!or!tie,!for!6!of!9!treatments • ATLAD,!ATLAR,!and!LRM!were!also!predictive when!considering!accuracy
– CnC!also!performed!well,!but!only!considered!alerts from!one!ASA • LRM!and!HWP!had!the!highest!recall ! PROMISE 2013 (c) Sarah Heckman 0

! References [AAH12]!S.!Allier,!N.!Anquetil,!A.!Hora,!S.!Ducasse,!“A!Framework!to!Compare!Alert!Ranking!Algorithms,”!2012!19th Working!conference!on!Reverse!Engineering,!Kingston,!Ontario,!Canada,!October!15O18,!2012,!p.!277O285. [CSX08]!C.!Csallner,!Y.!Smaragdakis,!and!T.!Xie,!"DSDOCrasher:!A!Hybrid!Analysis!Tool!for!Bug!Finding,"!ACM$Transactions on$Software$Engineering$and$Methodology,$vol.17,!no.!2,!pp.!1O36,!April,!2008. [HW08]!S.!Heckman!and!L.!Williams,!"On!Establishing!a!Benchmark!for!Evaluating!Static!Analysis!Alert!Prioritization!and Classification!Techniques,"!Proceedings!of!the!2nd!International!Symposium!on!Empirical!Software!Engineering!and Measurement,!Kaiserslautern,!Germany,!October!9O10,!2008,!pp.!41O50. [HW09]!S.!Heckman!and!L.!Williams,!"A!Model!Building!Process!for!Identifying!Actionable!Static!Analysis!Alerts,"
Proceedings!of!the!2nd!IEEE!International!Conference!on!Software!Testing,!Verification!and!Validation,!Denver,!CO, USA,!2009,!pp.!161O170. [HW11]!S.!Heckman!and!L.!Williams,!"A!Systematic!Literature!Review!of!Actionable!Alert!Identification!Techniques!for Automated!Static!Code!Analysis,"!Information$and$Software$Technology,!vol.!53,!no.!4,!April!2011,!p.!363O387. [KE07a]!S.!Kim!and!M.!D.!Ernst,!"Prioritizing!Warning!Categories!by!Analyzing!Software!History,"!Proceedings$of$the International$Workshop$on$Mining$Software$Repositories,!Minneapolis,!MN,!USA,!May!19O20,!2007,!p27. [KE07b]!S.!Kim!and!M.!D.!Ernst,!"Which!Warnings!Should!I!Fix!First?,"!Proceedings$of$the$6th$Joint$Meeting$of$the European$Software$Engineering$Conference$and$the$ACM$SIGSOFT$Symposium$on$the$Foundations$of$Software Engineering,!Dubrovnik,!Croatia,!September!3O7,!2007,!pp.!45O54. [KAY04]!T.!Kremenek,!K.!Ashcraft,!J.!Yang,!and!D.!Engler,!"Correlation!Exploitation!in!Error!Ranking,"!Proceedings$of$the 12th$ACM$SIGSOFT$International$Symposium$on$Foundations$of$Software$Engineering,!Newport!Beach,!CA,!USA, 2004,!pp.!83O93. [RPM08]!J.!R.!Ruthruff,!J.!Penix,!J.!D.!Morgenthaler,!S.!Elbaum,!G.!Rothermel,!“Predicting!Accurate!and!Actionable Static!Analysis!Warnings:!An!Experimental!Approach,”!Proceedings$of$the$30th$International$Conference$on Software$Engineering,!Leipzig,!Germany,!May!10O18,!2008,!pp.!341O350. [SFZ11]!H.!Shen,!J.!Fang,!J.!Zhao,!“EFindBugs:!Effective!Error!Ranking!for!FindBugs,”!2011!IEEE!4th!International Conference!on!Software!Testing,!Verification!and!Validation,!Berlin,!Germany,!March!21O25,!2011,!p.!299O308. [WDA08]!S.!Wagner,!F.!Deissenboeck,!M.!Aichner,!J.!Wimmer,!M.!Schwalb,!“An!Evaluation!of!Two!Bug!Pattern!Tools!for PROMISE 2013 (c) Sarah Heckman 0

A Comparative Evaluation of Static Analysis Act...

A Comparative Evaluation of Static Analysis Actionable Alert Identification Techniques

PROMISE'13: The 9th International Conference on Predictive Models in Software Engineering

More Decks by PROMISE'13: The 9th International Conference on Predictive Models in Software Engineering

Other Decks in Research

Featured

Transcript

! A Comparative Evaluation of Static Analysis Actionable Alert Identification

! Motivation • Automated!static!analysis!can!find!a!large!number!of alerts – Empirically!observed!alert!density!of!40!alerts/KLOC[HW08] • Alert!inspection!required!to!determine!if!developer should!(and!could)!fix

! Coding Problem? • Actionable:!alerts!the!developer!wants!to!fix – Faults!in!the!code – Conformance!to!coding!standards –

! Actionable Alert Identification Techniques • Supplement!automated!static!analysis – Classification:!predict!actionability –

! Research Objective • to#inform#the#selection#of#an#actionable#alert identification#technique#for#ranking#the#output#of automated#static#analysis#through#a#comparative evaluation#of#six#actionable#alert#identification techniques. PROMISE

! Related Work • Comparative!evaluation!of!AAIT![AAH12] – Languages:!Java!and!Smalltalk – ASA:!PMD,!FindBugs,!SmallLint –

! Comparative Evaluation • Considered!AAIT!in!literature![HW11][SFZ11] • !Selection!Criteria – AAIT!classify!or!prioritize!alerts!generated!by automated!static!analysis!for!the!Java!programming

! Selected AAIT (1) • Actionable!Prioritization!Models!(APM)![HW08] – ACs:!code!location,!alert!type • Alert!Type!Lifetime!(ATL)![KE07a]

! Selected AAIT (2) • HistoryBased!Warning!Prioritization!(HWP) [KE07b] – ACs:!commit!messages!that!identify!fault/nonfault fixes

! FAULTBENCH v0.3 • 3!Subject!Programs:!jdom,!runtime,!logging • Procedure 1. Gather!Alert!and!Artifact!Characteristic!Data Sources

! Gather Data • Download!from!repo • Compile • ASA!–!FindBugs!&!Check!‘n’!Crash!(ESC/Java) •

! Artifact Characteristics Independent!Variables Alert&Identifier&and&History • Alert!information!(type,!location) • Number!of!alert!modifications Source&Code&Metrics

! Alert Oracle Generation PROMISE 2013 (c) Sarah Heckman 0

! Training and Test Sets • Simulate!how!AAIT!would!be!used!in!practice • Training!set:!first!X%!of!revisions!to!train!the!models –

! Model Building & Model Evaluation PROMISE 2013 (c) Sarah

! Results - jdom PROMISE 2013 (c) Sarah Heckman 0

! Results - runtime PROMISE 2013 (c) Sarah Heckman 0

! Results - logging ! Accuracy!(%) Precision!(%) Recall!(%) AAIT 70

! Threats to Validity • Internal!Validity – Automation!of!data!generation,!collection,!and!artifact characteristic!generation –

! Future Work • Incorporate!additional!projects!into FAULTBENCH – Emphasis!on!adding!projects!that!actively!use!ASA and!include!filter!files –

! Conclusions • SAAI!found!to!be!the!best!overall!model!when considering!accuracy – Highest!accuracy,!or!tie,!for!6!of!9!treatments • ATLAD,!ATLAR,!and!LRM!were!also!predictive when!considering!accuracy