Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Test Amplification

Test Amplification

An overview of the test amplification research conducted within the EU STAMP project, with a special focus on our work on search-based crash replication.

Presented at the "Nederlandse Testdag", Delft, The Netherlands, November 6, 2018.

https://www.stamp-project.eu/

Keywords: Search based testing, evolutionary algorithms, artificial intelligence, test automation, debugging, model inference, benchmarking.

Arie van Deursen

November 06, 2018
Tweet

More Decks by Arie van Deursen

Other Decks in Technology

Transcript

  1. Test Amplifcation
    Arie van Deursen, TU Delft
    Dutch Testing Day, November 6, 2018
    Pouria Derakhshanfar, Xavier Devroey, Mozhan Soltani, Annibale Panichella, Andy Zaidman
    1

    View Slide

  2. 2

    View Slide

  3. Test Amplification Tools Under Construction
    DSPOT:
    Detect and generate missing assertions for Junit test cases
    DESCARTES:
    Speed up mutation testing by making bigger changes (drop method body)
    CAMP:
    Environment test amplification through Docker mutations
    EvoCrash / Botsing:
    Test suite amplification via crash reproduction
    https://www.stamp-project.eu/
    https://github.com/STAMP-project
    3

    View Slide

  4. java.lang.ClassCastException: […]
    at org…..SolrEntityReferenceResolver.getWikiReference(....java:93)
    at org…..SolrEntityReferenceResolver.getEntityReference(….java:70)
    at org…..SolrEntityReferenceResolver.resolve(….java:63)
    at org…..SolrDocumentReferenceResolver.resolve(….java:48)
    at …
    Crash!?!
    4

    View Slide

  5. Crash Reproduction
    • File an issue
    • Discover steps to
    reproduce the crash
    • Example: XWIKI-13031
    Challenges:
    • Labor intensive
    • Hard to automate
    5

    View Slide

  6. Java Stack Trace (Issue XWIKI-13031)
    java.lang.ClassCastException: […]
    at org…..SolrEntityReferenceResolver.getWikiReference(....java:93)
    at org…..SolrEntityReferenceResolver.getEntityReference(….java:70)
    at org…..SolrEntityReferenceResolver.resolve(….java:63)
    at org…..SolrDocumentReferenceResolver.resolve(….java:48)
    at …
    Exception
    Frames
    {Target ➞
    6

    View Slide

  7. Search-Based Crash Reproduction
    a() b()
    c() e()
    d()
    e()
    c() e()
    a() b()
    c() e()
    d()
    e()
    c() e()
    a() b()
    c() e()
    d()
    e()
    c() e()
    a() b()
    c() e()
    d()
    e()
    c() e()
    Random initial
    test suite
    a() b()
    c() e()
    d()
    e()
    c() e()
    a() b()
    c() e()
    d()
    e()
    c() e()
    a()
    c() e()
    d()
    e()
    c() b() e()
    a()
    c() e()
    d()
    e()
    c() b() e()
    Evolutionary
    search
    a() e()
    d()
    Exception:
    at x(…)
    at y(…)
    at e(…)
    Exception:
    at x(…)
    at y(…)
    at e(…)
    Crash reproducing
    test case
    Stack trace
    7
    Soltani, Panichella, van Deursen. “Search-Based Crash Reproduction and Its Impact on Debugging.”
    IEEE Transactions on Software Engineering, 2018. pure.tudelft.nl

    View Slide

  8. Crash-reproducing Test Case
    public void test0() throws Throwable {

    SolrEntityReferenceResolver solrEntityReferenceResolver0 = new …();
    EntityReferenceResolver entityReferenceResolver0 = … mock(…);
    solrDocument0.put("wiki", (Object) entityType0);
    Injector.inject(solrEntityReferenceResolver0, …);
    Injector.validateBean(solrEntityReferenceResolver0, …);

    // Undeclared exception!
    solrEntityReferenceResolver0.resolve(solrDocument0, entityType0, objectArray0);
    }
    java.lang.ClassCastException: […]
    at org…..SolrEntityReferenceResolver.getWikiReference(....java:93)
    at org…..SolrEntityReferenceResolver.getEntityReference(….java:70)
    at org…..SolrEntityReferenceResolver.resolve(….java:63)
    8

    View Slide

  9. EvoSuite
    • Search based test generation
    • Many random JUnit tests
    • Optimized to maximize (e.g.)
    branch coverage
    • Combines and improves tests
    to optimize overall fitness
    • http://www.evosuite.org/
    9
    Initialize population
    Evaluate fitness
    Next generation
    Selection
    Crossover
    Mutation
    Reinsertion
    [fitness == 0 or
    budget exhausted]

    View Slide

  10. EvoCrash
    • Implemented on top of
    EvoSuite
    • Requires
    • Stack trace
    • Binaries: .jar files
    • Time budget: Set by user
    • Produce single test case
    • That reproduces the crash
    10
    Initialize population
    Evaluate fitness
    Next generation
    Selection
    Crossover
    Mutation
    Reinsertion
    [fitness == 0 or
    budget exhausted]

    View Slide

  11. Initialize
    Population
    • Guided initialization
    • Random method calls
    • Guarantees that a target
    method call is inserted in
    each test at least once
    • Direct for public and
    protected methods
    • Indirect for private methods
    11
    Initialize population
    Evaluate fitness
    Next generation
    Selection
    Crossover
    Mutation
    Reinsertion
    [fitness == 0 or
    budget exhausted]

    View Slide

  12. Evaluate Fitness
    • Global fitness function to guide
    generation process
    • Line coverage
    • How far are we from the line
    where the exception is thrown?
    • Exception coverage
    • Is the exception thrown?
    • Stack trace similarity
    • How similar is the stack trace
    compared to the original (given)
    stack trace?
    12
    Initialize population
    Evaluate fitness
    Next generation
    Selection
    Crossover
    Mutation
    Reinsertion
    [fitness == 0 or
    budget exhausted]

    View Slide

  13. Next Generation
    • Selection
    • Fittest tests according to the
    fitness function
    • Guided crossover
    • Single-point crossover
    • Check that call to target method
    is preserved
    • Guided mutation
    • Add/change/drop statements
    • Check that call to target method
    is preserved
    13
    Initialize population
    Evaluate fitness
    Next generation
    Selection
    Crossover
    Mutation
    Reinsertion
    [fitness == 0 or
    budget exhausted]

    View Slide

  14. Does it Work?
    The JCrashPack Crash Replication Benchmark
    • 200 crashes from various open source projects
    • XWiki
    • STAMP partner
    • From XWiki issue tracking system: 51 crashes
    • Defects4J applications
    • State of the art fault localization benchmark
    • 73 crashes (with fixes)
    • Elasticsearch
    • Based on popularity
    • From Elasticsearch issue tracking system: 76 crashes
    • Filtered, verified, cleaned up, right jar versions, …
    14 https://github.com/STAMP-project/JCrashPack

    View Slide

  15. ExRunner:
    Running Crash Replication
    • Python tool to benchmark crash reproduction tools
    • Multithreaded execution
    • Monitors tool executions
    Stack traces
    Job Generator
    Observer
    Thread 1
    Thread n
    .
    .
    Job 1
    Logs
    results
    Tool Configs. Job n
    Test case
    Jar files
    .
    .
    Logs
    results
    Test case
    15

    View Slide

  16. EvoCrash Applied to JCrashPack
    Defects4J XWiki Elasticsearch
    crashed
    ex. thrown
    failed
    line reached
    reproduced
    crashed
    ex. thrown
    failed
    line reached
    reproduced
    crashed
    ex. thrown
    failed
    line reached
    reproduced
    1
    10
    100
    Number of frames (log. scale)
    16

    View Slide

  17. Identified 12 Key Challenges
    • Input data generation
    • For complex inputs, generic types, etc.
    • Environmental dependencies
    • Environment state hard to manage at unit level
    • Complex code
    • Long methods, with lot of nested predicates
    • Abstract classes and methods
    • Cannot be instantiated and one concrete implementation is picked randomly
    • […]
    17
    Mozhan Soltani · Pouria Derakhshanfar · Xavier Devroey · Arie van Deursen
    An Empirical Evaluation of Search-Based Crash Reproduction. TU Delft, 2018. In preparation.

    View Slide

  18. Work in Progress:
    Improved Seeding
    Hypothesis
    An initial test suite close to
    actual usage of the class is more
    likely to lead to a crash
    reproducing test case than a
    random one
    18
    a() b()
    c() e()
    d()
    e()
    c() e()
    a() b()
    c() e()
    d()
    e()
    c() e()
    a() b()
    c() e()
    d()
    e()
    c() e()
    a() b()
    c() e()
    d()
    e()
    c() e()
    Random initial
    test suite
    Exception:
    at x(…)
    at y(…)
    at e(…)
    Stack trace
    Pouria Derakhshanfar, Xavier Devroey, Gilles Perrouin, Andy Zaidman and Arie van Deursen.
    Search-based Crash Reproduction using Behavioral Model Seeding. TU Delft, 2018. Submitted

    View Slide

  19. Test seeding
    Use the existing tests to
    generate the initial test
    suite
    19
    a() b()
    c() e()
    d()
    e()
    c() e()
    a() b()
    c() e()
    d()
    e()
    c() e()
    a() b()
    c() e()
    d()
    e()
    c() e()
    a() c()
    c() e()
    c()
    e()
    c() a()
    Exception:
    at x(…)
    at y(…)
    at e(…)
    Stack trace
    a() c()
    c() c()
    b() a() e()
    Existing tests
    e()
    Random initial
    test suite
    Existing tests
    subset

    View Slide

  20. Model seeding
    Use a model of
    method usage to
    generate the initial
    test suite
    20
    Random initial
    test suite
    Exception:
    at x(…)
    at y(…)
    at e(…)
    Stack trace
    a()
    b()
    c() e()
    d()
    Model
    Model-driven
    initial test suite
    a() b()
    c() e()
    d()
    e()
    c() e()
    a() b()
    c() e()
    d()
    e()
    c() e()
    a() b()
    c() e()
    d()
    e()
    c() e()
    b() a()
    c() e()
    d()
    e()
    c() e()
    a()

    View Slide

  21. Call Sequence Models
    21
    a()
    b()
    c() e()
    d()
    Model
    • Model generated from
    sequences of method calls
    • Coming from
    • Source code
    • Static analysis
    • Test cases
    • Dynamic analysis
    • Operations logs
    • Online analysis
    N-gram inference
    [b(), a(), e()]
    [c(), d(), a(), e()]
    [b(), a(), d(), a(), d(), a(), e()]

    View Slide

  22. Example Transitions for java.util.LinkedList
    size()
    add(Object)
    iterator()
    S0
    get(int)
    remove(int)
    S1
    S2
    S3
    remove(int)
    S4
    S5
    SX
    size()
    add(Object)
    size()
    add(Object)
    22

    View Slide

  23. Model example
    23

    View Slide

  24. Model Seeding Approach
    Guided Genetic Algorithm
    app.jar
    tests.jar
    Pr[clone] Guided
    initialization
    Fitness
    evaluation
    [fitness == 0 or
    budget exhausted]
    Selection
    Guided
    crossover
    Guided
    mutation
    Reinsertion
    Stacktrace
    Test
    cases
    Objects
    pool
    Test seeding
    Bheavioral model seeding
    populates
    Instrumented
    execution
    Static analysis
    Instrumented
    execution
    Call
    sequ.
    Models
    inference
    populates
    Abstract
    test cases
    selection
    Models
    Pr[pick init]
    Pr[pick mut]
    1
    2
    3 4
    5
    A
    24

    View Slide

  25. 0
    25
    50
    75
    not started failed line reached ex. thrown reproduced
    Number of frames
    Configurations
    no s.
    test s. 0.2
    test s. 0.5
    test s. 0.8
    test s. 1.0
    model s. 0.2
    model s. 0.5
    model s. 0.8
    model s. 1.0
    25

    View Slide

  26. Model Seeding: Results
    • Behavioral seeding outperforms test seeding / no seeding
    • 13 more crashes could be reproduced
    • No performance overhead
    • Extra crashes are more complex
    • Higher frame levels
    • Better at ”industrial” cases
    • Model seeding outperforms test seeding.
    26

    View Slide

  27. Implementation: Botsing
    • Open source implementation of EvoCrash approach
    • Uses EvoSuite as library for instrumentation
    • Extensible, modular, tested
    • Used as test bed for new crash reproduction ideas (model seeding)
    • https://github.com/STAMP-project/botsing
    27

    View Slide

  28. 28
    https://www.stamp-project.eu/

    View Slide

  29. 29
    https://www.stamp-project.eu/

    View Slide