Upgrade to Pro — share decks privately, control downloads, hide ads and more …

The Impact of Test Case Summaries on Bug Fixing...

The Impact of Test Case Summaries on Bug Fixing Performance: An Empirical Investigation

Automated test generation tools have been widely investigated
with the goal of reducing the cost of testing activities.
However, generated tests have been shown not to help developers in detecting and finding more bugs even though
they reach higher structural coverage compared to manual
testing. The main reason is that generated tests are di-
cult to understand and maintain. Our paper proposes an
approach, coined TestDescriber, which automatically generates
test case summaries of the portion of code exercised by
each individual test, thereby improving understandability.
We argue that this approach can complement the current
techniques around automated unit test generation or search based techniques designed to generate a possibly minimal set
of test cases. In evaluating our approach we found that (1)
developers find twice as many bugs, and (2) test case summaries
signi ficantly improve the comprehensibility of test
cases, which is considered particularly useful by developers.

Avatar for Sebastiano Panichella

Sebastiano Panichella

July 12, 2016
Tweet

More Decks by Sebastiano Panichella

Other Decks in Research

Transcript

  1. The Impact of Test Case Summaries on Bug Fixing Performance:

    An Empirical Investigation Sebastiano Panichella Annibale Panichella Moritz Beller Andy Zaidam Harald Gall
  2. Why? @Test public void test0() throws Throwable { Option option0

    = new Option("aaabbb", true, "aaabbb"); Option option1 = new Option("aaabbb", true, "aaabbb"); boolean boolean0 = option1.equals((Object) option0); assertEquals("arg", option1.getArgName()); assertTrue(option0.hasArg()); assertTrue(boolean0); } @Test public void test1() throws Throwable { Option option0 = new Option("aaabbb", true, "aaabbb"); Option option1 = new Option("aaabbb", true, "aaabbb"); option0.setLongOpt("adafv"); option1.setLongOpt("adafv"); boolean boolean0 = option1.equals((Object) option0); assertEquals("arg", option1.getArgName()); assertTrue(option0.hasArg()); assertTrue(boolean0); } 2 Class Name: Option.java Library: Apache Commons-Cli
  3. @Test public void test0() throws Throwable { Option option0 =

    new Option("aaabbb", true, "aaabbb"); Option option1 = new Option("aaabbb", true, "aaabbb"); boolean boolean0 = option1.equals((Object) option0); assertEquals("arg", option1.getArgName()); assertTrue(option0.hasArg()); assertTrue(boolean0); } @Test public void test1() throws Throwable { Option option0 = new Option("aaabbb", true, "aaabbb"); Option option1 = new Option("aaabbb", true, "aaabbb"); option0.setLongOpt("adafv"); option1.setLongOpt("adafv"); boolean boolean0 = option1.equals((Object) option0); assertEquals("arg", option1.getArgName()); assertTrue(option0.hasArg()); assertTrue(boolean0); } Class Name: Option.java Library: Apache Commons-Cli Why? 3 Q1: What are the main differences? Q2: Do they cover different parts of the code?
  4. @Test public void test0() throws Throwable { Option option0 =

    new Option("aaabbb", true, "aaabbb"); Option option1 = new Option("aaabbb", true, "aaabbb"); boolean boolean0 = option1.equals((Object) option0); assertEquals("arg", option1.getArgName()); assertTrue(option0.hasArg()); assertTrue(boolean0); } @Test public void test1() throws Throwable { Option option0 = new Option("aaabbb", true, "aaabbb"); Option option1 = new Option("aaabbb", true, "aaabbb"); option0.setLongOpt("adafv"); option1.setLongOpt("adafv"); boolean boolean0 = option1.equals((Object) option0); assertEquals("arg", option1.getArgName()); assertTrue(option0.hasArg()); assertTrue(boolean0); } Class Name: Option.java Library: Apache Commons-Cli 4 Why? Q1: What are the main differences? Q2: Do they cover different parts of the code?
  5. @Test public void test0() throws Throwable { Option option0 =

    new Option("aaabbb", true, "aaabbb"); Option option1 = new Option("aaabbb", true, "aaabbb"); boolean boolean0 = option1.equals((Object) option0); assertEquals("arg", option1.getArgName()); assertTrue(option0.hasArg()); assertTrue(boolean0); } @Test public void test1() throws Throwable { Option option0 = new Option("aaabbb", true, "aaabbb"); Option option1 = new Option("aaabbb", true, "aaabbb"); option0.setLongOpt("adafv"); option1.setLongOpt("adafv"); boolean boolean0 = option1.equals((Object) option0); assertEquals("arg", option1.getArgName()); assertTrue(option0.hasArg()); assertTrue(boolean0); } Class Name: Option.java Library: Apache Commons-Cli 5 Candidate Assertions Why? Q1: What are the main differences? Q2: Do they cover different parts of the code?
  6. @Test public void test0() throws Throwable { Option option0 =

    new Option("aaabbb", true, "aaabbb"); Option option1 = new Option("aaabbb", true, "aaabbb"); boolean boolean0 = option1.equals((Object) option0); assertEquals("arg", option1.getArgName()); assertTrue(option0.hasArg()); assertTrue(boolean0); } @Test public void test1() throws Throwable { Option option0 = new Option("aaabbb", true, "aaabbb"); Option option1 = new Option("aaabbb", true, "aaabbb"); option0.setLongOpt("adafv"); option1.setLongOpt("adafv"); boolean boolean0 = option1.equals((Object) option0); assertEquals("arg", option1.getArgName()); assertTrue(option0.hasArg()); assertTrue(boolean0); } Class Name: Option.java Library: Apache Commons-Cli 6 Q3: Are these assertions correct? Why? Q1: What are the main differences? Q2: Do they cover different parts of the code?
  7. @Test public void test0() throws Throwable { Option option0 =

    new Option("aaabbb", true, "aaabbb"); Option option1 = new Option("aaabbb", true, "aaabbb"); boolean boolean0 = option1.equals((Object) option0); assertEquals("arg", option1.getArgName()); assertTrue(option0.hasArg()); assertTrue(boolean0); } @Test public void test1() throws Throwable { Option option0 = new Option("aaabbb", true, "aaabbb"); Option option1 = new Option("aaabbb", true, "aaabbb"); option0.setLongOpt("adafv"); option1.setLongOpt("adafv"); boolean boolean0 = option1.equals((Object) option0); assertEquals("arg", option1.getArgName()); assertTrue(option0.hasArg()); assertTrue(boolean0); } 7 Test Code Comprehension Generated Tests Production Code public class Options implements Serializable { private static final long serialVersionUID = 1L; /** a map of the options with the character key */ private Map shortOpts = new HashMap(); /** a map of the options with the long key */ private Map longOpts = new HashMap(); /** a map of the required options */ private List requiredOpts = new ArrayList(); Earl T. Barr, et al., “The Oracle Problem in Software Testing: A Survey”. IEEE Transactions on Software Engineering, 2015.
  8. Are Generated Tests Helpful? G. Fraser et al., Does Automated

    Unit Test Generation Really Help Software Testers? A Controlled Empirical Study, TOSEM 2015. Do not lead to detection of more faults. 8 0% Testing Comprehension Testing time 75% 100%
  9. Test Coverage Analysis COBERTURA Test Suite Generation Option.java TestDescriber @Test

    public void testProva() throws Throwable { Option option0 = new Option("aaa", true, "aaa"); Option option1 = new Option("aaa", true, "aaa"); boolean boolean0 = option1.equals((Object) option0); assertEquals("arg", option1.getArgName()); assertTrue(option0.hasArg()); assertTrue(boolean0); } @Test public void testProva2() throws Throwable { Option option0 = new Option("aaa", true, "aaa"); Option option1 = new Option("aaa", true, "aaa"); option0.setLongOpt("adafv"); option1.setLongOpt("adafv"); boolean boolean0 = option1.equals((Object) option0); assertEquals("arg", option1.getArgName()); assertTrue(option0.hasArg()); assertTrue(boolean0); } Summary Generation 10
  10. Summary Generator Software Words Usage Model: deriving <actions>, <themes>, and

    <secondary arguments> from class, methods, attributes and variable identifiers E. Hill et al. Automatically capturing source code context of NL-queries for software maintenance and reuse. ICSE 2009 11
  11. Summary Generator public class Option { public Option(String opt, String

    longOpt, boolean hasArg, String descr) throws IllegalArgumentException { OptionValidator.validateOption(opt); this.opt = opt; this.longOpt = longOpt; if (hasArg) { this.numberOfArgs = 1; } this.description = descr; } ... } SWUM in TestDescriber: Covered Code 12
  12. public class Option { public Option(String opt, String longOpt, boolean

    hasArg, String descr) throws IllegalArgumentException { OptionValidator.validateOption(opt); this.opt = opt; this.longOpt = longOpt; if (hasArg) { //FALSE this.numberOfArgs = 1; } this.description = descr; } ... } Summary Generator SWUM in TestDescriber: 1) Select the covered statements Covered Code 13
  13. public class Option { public Option(String opt, String longOpt, boolean

    hasArg, String descr) throws IllegalArgumentException { OptionValidator.validateOption(opt); this opt = opt; this longOpt = longOpt; if (hasArg) {false } this description = descr; } ... } SWUM in TestDescriber: 1) Select the covered statements 2) Filter out Java keywords, etc. Summary Generator Covered Code 14
  14. public class Option { public Option(String opt, String long Opt,

    boolean has Arg, String descr) throws IllegalArgumentException { Option Validator.validate Option(opt); this opt = opt; this long Opt = long Opt; if (has Arg) {false ; } this description = descr; } ... } SWUM in TestDescriber: 1) Select the covered statements 2) Filter out Java keywords, etc. 3) Identifier Splitting (Camel case) Summary Generator Covered Code 15
  15. public class Option { public Option(String option, String long Option,

    boolean has Argument String description) throws IllegalArgumentException { Option Validator.validate Option(option); this option = option; this long Option = long Option; if (has Argument) {false } this description = description; } ... } SWUM in TestDescriber: 1) Select the covered statements 2) Filter out Java keywords, etc. 3) Identifier Splitting (Camel case) 4) Abbreviation Expansion (using external vocabularies) Summary Generator Covered Code 16
  16. SWUM in TestDescriber: 1) Select the covered statements 2) Filter

    out Java keywords, etc. 3) Identifier Splitting (Camel case) 4) Abbreviation Expansion (using external vocabularies) 5) Part-of-Speech tagger Summary Generator <actions> = Verbs <themes> = Nouns/Subjects <secondary arguments> = Nouns / objectes, adjectives, etc public class Option { Option(String option, String long Option , boolean has Argument String description) throws IllegalArgumentException Option Validator.validate Option(option); this option = option ; this long Option = long Option; if (has Argument false } this description = description; } NOUN NOUN NOUN ADJ NOUN NOUN VERB NOUN NOUN NOUN NOUN VERB NOUN NOUN ADJ ADJ ADJ ADJ NOUN NOUN NOUN VERB ADJ NOUN CON NOUN ADJ Covered Code 17
  17. Summary Generator NOUN NOUN NOUN ADJ NOUN NOUN VERB NOUN

    NOUN NOUN NOUN VERB NOUN NOUN ADJ ADJ ADJ ADJ NOUN NOUN NOUN VERB ADJ NOUN CON NOUN The test case instantiates an "Option" with: - option equal to “...” - long option equal to “...” - it has no argument - description equal to “…” An option validator validates it The test exercises the following condition: - "Option" has no argument public class Option { Option(String option, String long Option , boolean has Argument String description) throws IllegalArgumentException Option Validator.validate Option(option); this option = option ; this long Option = long Option; if (has Argument false } this description = description; } NOUN NOUN NOUN ADJ NOUN NOUN VERB NOUN NOUN NOUN NOUN VERB NOUN NOUN ADJ ADJ ADJ ADJ NOUN NOUN NOUN VERB ADJ NOUN CON NOUN ADJ Natural Language Sentences Parsed Code 18
  18. The test case instantiates an "Option" with: - option equal

    to “...” - long option equal to “...” - it has no argument - description equal to “…” An option validator validates it The test exercises the following condition: - "Option" has no argument Natural Language Sentences 19 Class Level Method Level Statement Level Branch Level Summarisation Levels
  19. Summarisation Levels The test case instantiates an "Option" with: -

    option equal to “...” - long option equal to “...” - it has no argument - description equal to “…” An option validator validates it The test exercises the following condition: - "Option" has no argument Natural Language Sentences 20 Class Level Method Level Statement Level Branch Level Do Test Summaries Improve Test Readability? Do Test Summaries Help Developers?
  20. Context Object: two Java classes from Apache Commons Primitives and

    Math4J that have been used in previous studies on search-based software testing [by Fraser et al. TOSEM 2015] Subjects: 30 Developers ArrayIntList.java Rational.java 22
  21. Subjects: 30 Developers (23 Researchers and 7 Developers) Context Object:

    two Java classes from Apache Commons Primitives and Math4J that have been used in previous studies on search-based software testing [by Fraser et al. TOSEM 2015] ArrayIntList.java Rational.java 23
  22. Bug Fixing Tasks Experiment conducted Offline via a Survey platform

    Each participant received the experiment package consisting of: 1. A pretest questionnaire 2. Instructions and materials to perform the experiment 3. A post-test questionnaire We do not revealed the goal of the study 45 minutes of time for each task 29
  23. RQ1: How do test case summaries impact the number of

    bugs fixed by developers? Participants WITHOUT TestDescriber summaries fixed 40% of injected bugs None of them was able to fix all bugs. 32
  24. RQ1: How do test case summaries impact the number of

    bugs fixed by developers? Participants, WITH TestDescriber summaries, fixed 60%-80% of injected bugs 31% of them fixed all the bugs. 33 Participants WITHOUT TestDescriber summaries fixed 40% of injected bugs None of them was able to fix all bugs.
  25. RQ1: How do test case summaries impact the number of

    bugs fixed by developers? With summaries, the participants were able to fix twice as many number of bugs (+50%,+100%), in the same time window (45 minutes). The differences are statistically significant (Wilcoxon test with p-value<0.05) A12 Effect Size is always LARGE 34 Participants, WITH TestDescriber summaries, fixed 60%-80% of injected bugs 31% of them fixed all the bugs. Participants WITHOUT TestDescriber summaries fixed 40% of injected bugs None of them was able to fix all bugs.
  26. RQ1: How do test case summaries impact the number of

    bugs fixed by developers? Results are not influenced by developers’ experience: (i) the number of bugs fixed is not significantly influenced by the programming experience; (ii)there is no significant interaction between the programming experience and the presence of test case summaries. 35 The differences are statistically significant (Wilcoxon test with p-value<0.05) A12 Effect Size is always LARGE
  27. RQ1: How do test case summaries impact the number of

    bugs fixed by developers? Results are not influenced by developers’ experience: (i) the number of bugs fixed is not significantly influenced by the programming experience; (ii) there is no significant interaction between the programming experience and the presence of test case summaries. Summary: Using automatically generated test case summaries significantly helps developers to identify and fix more bugs. 36
  28. How do test case summaries impact developers to change test

    cases in terms of structural and mutation coverage? RQ2
  29. ArrayIntList.java Rational.java RQ2: How do test case summaries impact developers

    to change test cases in terms of structural and mutation coverage? 38
  30. ArrayIntList.java Rational.java RQ2: How do test case summaries impact developers

    to change test cases in terms of structural and mutation coverage? ONLY for Rational there is an improvements of the mutation score (+10%) when tests are enriched with summaries. 10% 39
  31. ArrayIntList.java Rational.java RQ2: How do test case summaries impact developers

    to change test cases in terms of structural and mutation coverage? ONLY for Rational there is an improvements of the mutation score (+10%) when tests are enriched with summaries. 10% Summary: Test case summaries do not influence how the developers manage the test cases in terms of structural coverage. 40
  32. Test Cases Summaries and Comprehension Without With 4% 6% 14%

    33% 14% 6% 32% 9% 36% 45% Medium High Very High Low Very Low Perceived test comprehensibility WITH and WITHOUT TestDescriber summaries 41
  33. Test Cases Summaries and Comprehension WITH Summaries: (i) 46% of

    participants consider the test cases as “easy to understand”. (iii) Only 18% of participants considered the test cases as incomprehensible. Without With 4% 6% 14% 33% 14% 6% 32% 9% 36% 45% Medium High Very High Low Very Low Perceived test comprehensibility WITH and WITHOUT TestDescriber summaries 42
  34. Test Cases Summaries and Comprehension WITHOUT Summaries: (i) Only 15%

    of participants consider the test cases as “easy to understand”. (iii) 40% of participants considered the test cases as incomprehensible. WITH Summaries: (i) 46% of participants consider the test cases as “easy to understand”. (iii) Only 18% of participants considered the test cases as incomprehensible. Without With 4% 6% 14% 33% 14% 6% 32% 9% 36% 45% Medium High Very High Low Very Low Perceived test comprehensibility WITH and WITHOUT TestDescriber summaries 43
  35. Without With 4% 6% 14% 33% 14% 6% 32% 9%

    36% 45% Medium High Very High Low Very Low Perceived test comprehensibility WITH and WITHOUT TestDescriber summaries Test Cases Summaries and Comprehension WITHOUT Summaries: (i) Only 15% of participants consider the test cases as “easy to understand”. (iii) 40% of participants considered the test cases as incomprehensible. WITH Summaries: (i) 46% of participants consider the test cases as “easy to understand”. (iii) Only 18% of participants considered the test cases as incomprehensible. Summary: Test summaries statistically improve the comprehensibility of automatically generated test case according to human judgments. 44
  36. Quality of TestDescriber’ Summaries Expressiveness 30% 70% Is easy to

    read 
 and understand
 Is somewhat readable 
 and understandable Is hard to read and 
 understand Conciseness 10% 52% 38% Has no unnecessary 
 information
 Has some unnecessary 
 information Has a lot of unnecessary
 information Content adequacy 13% 37% 50% Is not missing any 
 information
 Missing some 
 information Missing some very 
 important information 45
  37. Quality of TestDescriber’ Summaries Expressiveness 30% 70% Is easy to

    read 
 and understand
 Is somewhat readable 
 and understandable Is hard to read and 
 understand Conciseness 10% 52% 38% Has no unnecessary 
 information
 Has some unnecessary 
 information Has a lot of unnecessary
 information Content adequacy 13% 37% 50% Is not missing any 
 information
 Missing some 
 information Missing some very 
 important information 46
  38. Quality of TestDescriber’ Summaries Expressiveness 30% 70% Is easy to

    read 
 and understand
 Is somewhat readable 
 and understandable Is hard to read and 
 understand Conciseness 10% 52% 38% Has no unnecessary 
 information
 Has some unnecessary 
 information Has a lot of unnecessary
 information Content adequacy 13% 37% 50% Is not missing any 
 information
 Missing some 
 information Missing some very 
 important information 47
  39. Conclusion 1) Using automatically generated test case summaries significantly helps

    developers to identify and fix more bugs. 2) Test case summaries do not influence how the developers manage the test cases in terms of structural coverage. 3) Test summaries statistically improve the comprehensibility of automatically generated test case according to human judgments. Panichella et al. “The Impact of Test Case Summaries on Bug Fixing Performance: An Empirical Investigation”. ICSE 2016 48