Systematic Architecture Level Fault Diagnosis Using Statistical Techniques

Systematic Architecture Level Fault Diagnosis Using Statistical Techniques Bachelor Thesis
by Fabian Keller

Estimated Costs 2012 as reported by Britton et al. [2013]
11.11.2014 STARDUST - Fabian Keller 2

Agenda 1. Automated Fault Diagnosis 2. State of the Art
3. Case Study: AspectJ 4. Evaluation 5. Conclusions 11.11.2014 STARDUST - Fabian Keller 3

Fault Diagnosis what is the current practice? Goal: Pinpoint single/multiple
failure/s Commonly used techniques: • System.out.println() • Symbolic Debugging • Static Slicing / Dynamic Slicing  There is room for improvement! 11.11.2014 STARDUST - Fabian Keller 5

Automated Fault Diagnosis is it possible? B1 B2 B3 B4
B5 Error Test1 1 0 0 0 0 0 Test2 1 1 0 0 0 0 Test3 1 1 1 1 1 0 Test4 1 1 1 1 1 0 Test5 1 1 1 1 1 1 Test6 1 1 1 0 1 0 11.11.2014 STARDUST - Fabian Keller 6 By intuition: A block is more suspicious, if: - It is involved in failing test cases - It is not involved in passing test cases

Ranking Metrics … it is possible Tarantula = # #
+ #𝑁𝑁 # # + #𝑁𝑁 + # # + #𝑁𝑁 Jaccard = # # + #𝑁𝑁 + # Ochiai = # (# + #𝑁𝑁) ⋅ # + # Involved / Not involved / Failing / Passing 11.11.2014 STARDUST - Fabian Keller 7 B1 B2 B3 B4 B5 Error Test1 1 0 0 0 0 0 Test2 1 1 0 0 0 0 Test3 1 1 1 1 1 0 Test4 1 1 1 1 1 0 Test5 1 1 1 1 1 1 Test6 1 1 1 0 1 0 0,50 0,56 0,63 0,71 0,63 0,17 0,20 0,25 0,33 0,25 0,41 0,45 0,50 0,58 0,50 Ranking: 1. B4 2. B3, B5 3. B2 4. B1

Commonly Used Data and its limiting factors 11.11.2014 STARDUST -
Fabian Keller 9 Software-artifact Infrastructure Repository • Siemens set • space program Program Faulty versions LOC Test cases Description print_tokens 7 478 4130 Lexical anayzer print_tokens2 10 399 4115 Lexical analyzer replace 32 512 5542 Pattern recognition schedule 9 292 2650 Priority scheduler schedule2 10 301 2710 Priority scheduler tcas 41 141 1608 Altitude separation tot_info 23 440 1052 Information measure space 38 6218 13585 Array definition language

Performance Metrics how can fault localization performance be evaluated? •
Wasted Effort (WE): Ranking: L4, L3, L2, L7, L6, L1, L5, L9, L10, L8 Wasted Effort (prominent bug): 2 (or 20%) • Proportion of Bugs Localized (PBL) Percentage of bugs localized with WE < p% • Hit@X Number of bugs localized after inspecting X elements 11.11.2014 STARDUST - Fabian Keller 10

AspectJ – Lines of Code nearly doubled in the examined
timespan 11.11.2014 STARDUST - Fabian Keller 12

AspectJ – Commits active development with mostly 50+ commits per
month 11.11.2014 STARDUST - Fabian Keller 13

AspectJ – Bugs nearly 2500 bugs reported in the examined
time span 11.11.2014 STARDUST - Fabian Keller 14

AspectJ – Data less than 40% of the investigated bugs
are applicable for SBFL AspectJ AJDT Sum All bugs 1544 886 2430 Bugs in iBugs 285 65 350 Classified Bugs 99 11 110 Applicable Bugs 41 1 42 Involved Bugs 20 1 21 11.11.2014 STARDUST - Fabian Keller 15 What happened?

Bug 36234 workarounds cannot be used as evaluation oracle 11.11.2014
STARDUST - Fabian Keller 16 Bug report: „Getting an out of memory error when compiling with Ajc 1.1 RC1 […]” Pre-Fix Post-Fix

Bug 61411 platform specific bugs are mostly not present in
test suites 11.11.2014 STARDUST - Fabian Keller 17 Bug report: „[…] highlights a problem that I've seen using ajdoc.bat on Windows […]” Pre-Fix Post-Fix

Bug 151182 synchronization bugs are mostly not present in test
suites 11.11.2014 STARDUST - Fabian Keller 18 Bug report: „[…] recompiled the aspect using 1.5.2 and tried to run it […], but it fails with a NullPointerException.[…]” Pre-Fix Post-Fix

Research Questions • RQ1: How does the program size influence
fault localization performance? • RQ2: How many bugs can be found when examining a fixed amount of ranked elements? • RQ3: How does the program size influence suspiciousness scores produced by different ranking metrics? • RQ4: Are the fault localization performance metrics currently used by the research community valid? 11.11.2014 STARDUST - Fabian Keller 20

RQ1: Program Size vs. SBFL Performance? multiple ranked elements are
mapped to the same suspiciousness 11.11.2014 STARDUST - Fabian Keller 21

11.11.2014 STARDUST - Fabian Keller 22

RQ4: Are the Performance Metrics Valid? on average, no bugs
can be found in the first 100 lines 11.11.2014 STARDUST - Fabian Keller 23

RQ4: Are the Performance Metrics Valid? with luck, 33% of
all bugs can be found in the first 1000 lines 11.11.2014 STARDUST - Fabian Keller 24

Conclusions there is still some work to be done •
Bugs need more context to be fully understood • Current metrics cannot be applied to large projects • SBFL is not feasible for large projects • New metrics are starting point for future work 11.11.2014 STARDUST - Fabian Keller 26

Thank you for your attention! Questions? 11.11.2014 STARDUST - Fabian
Keller 27

RQ2: examining a fixed amount inspect more than 100 files
to find 50% of all bugs 11.11.2014 STARDUST - Fabian Keller 28

RQ3: Program Size vs. Suspiciousness mean suspiciousness drops for larger
programs 11.11.2014 STARDUST - Fabian Keller 29

WAUC: Weighted Area Under Curve 11.11.2014 STARDUST - Fabian Keller
30

Systematic Architecture Level Fault Diagnosis Using Statistical Techniques

Systematic Architecture Level Fault Diagnosis Using Statistical Techniques

Fabian Keller

More Decks by Fabian Keller

Other Decks in Research

Featured

Transcript

Systematic Architecture Level Fault Diagnosis Using Statistical Techniques Bachelor Thesis

Estimated Costs 2012 as reported by Britton et al. [2013]

Agenda 1. Automated Fault Diagnosis 2. State of the Art

Agenda 1. Automated Fault Diagnosis 2. State of the Art

Fault Diagnosis what is the current practice? Goal: Pinpoint single/multiple

Automated Fault Diagnosis is it possible? B1 B2 B3 B4

Ranking Metrics … it is possible Tarantula = # #

Agenda 1. Automated Fault Diagnosis 2. State of the Art

Commonly Used Data and its limiting factors 11.11.2014 STARDUST -

Performance Metrics how can fault localization performance be evaluated? •

Agenda 1. Automated Fault Diagnosis 2. State of the Art

AspectJ – Lines of Code nearly doubled in the examined

AspectJ – Commits active development with mostly 50+ commits per

AspectJ – Bugs nearly 2500 bugs reported in the examined

AspectJ – Data less than 40% of the investigated bugs

Bug 36234 workarounds cannot be used as evaluation oracle 11.11.2014

Bug 61411 platform specific bugs are mostly not present in

Bug 151182 synchronization bugs are mostly not present in test

Agenda 1. Automated Fault Diagnosis 2. State of the Art

Research Questions • RQ1: How does the program size influence

RQ1: Program Size vs. SBFL Performance? multiple ranked elements are

11.11.2014 STARDUST - Fabian Keller 22

RQ4: Are the Performance Metrics Valid? on average, no bugs

RQ4: Are the Performance Metrics Valid? with luck, 33% of

Agenda 1. Automated Fault Diagnosis 2. State of the Art

Conclusions there is still some work to be done •

Thank you for your attention! Questions? 11.11.2014 STARDUST - Fabian

RQ2: examining a fixed amount inspect more than 100 files

RQ3: Program Size vs. Suspiciousness mean suspiciousness drops for larger

WAUC: Weighted Area Under Curve 11.11.2014 STARDUST - Fabian Keller