$30 off During Our Annual Pro Sale. View Details »

Cloud9: Parallel Symbolic Execution for Automated Real-World Software Testing

Cloud9: Parallel Symbolic Execution for Automated Real-World Software Testing

Stefan Bucur

April 12, 2011
Tweet

More Decks by Stefan Bucur

Other Decks in Research

Transcript

  1. Parallel Symbolic Execution for
    Automated Real-World Software Testing
    Stefan Bucur, Vlad Ureche, Cristian Zam r, George Candea
    Cloud9
    School of Computer and Communication Sciences

    View Slide

  2. Automated
    Techniques
    Automated Software Testing
    2
    λ
    Symbolic Execution
    Model Checking
    Industrial
    SW Testing
    Manual Testing
    Static Analysis
    Fuzzing
    Scalability
    Applicability
    Usability

    View Slide

  3. Cloud9 - The Big Picture
    • Parallel symbolic execution
    • Linear scalability on commodity clusters
    • Full symbolic POSIX support
    • Applicable on real-world systems
    • Platform for writing test cases
    • Easy-to-use platform API
    3

    View Slide

  4. Automated Systems Testing
    4
    [*] C. Cadar, D. Dunbar, D. Engler, “KLEE: Unassisted and automatic generation
    of high-coverage tests for complex systems programs”, OSDI 2008
    • Promising for systems testing:
    KLEE [*]
    • High-coverage test cases
    • Found new bugs
    • ... But applied only on small
    programs
    λ
    Symbolic Execution

    View Slide

  5. 5
    Memcached GNU Coreutils
    Apache

    View Slide

  6. void proc_pkt(packet_t* pkt) {
    if (pkt->magic != 0xC9) {
    err(pkt);
    return;
    }
    if (pkt->cmd == GET) {
    ...
    } else if ...
    ...
    }
    Symbolic Execution in a Nutshell
    [C9 A0 ... ]
    6

    View Slide

  7. void proc_pkt(packet_t* pkt) {
    if (pkt->magic != 0xC9) {
    err(pkt);
    return;
    }
    if (pkt->cmd == GET) {
    ...
    } else if ...
    ...
    }
    Symbolic Execution in a Nutshell
    [C9 A0 ... ]
    6

    View Slide

  8. pkt->magic != 0xC9
    void proc_pkt(packet_t* pkt) {
    if (pkt->magic != 0xC9) {
    err(pkt);
    return;
    }
    if (pkt->cmd == GET) {
    ...
    } else if ...
    ...
    }
    Symbolic Execution in a Nutshell
    [C9 A0 ... ]
    6

    View Slide

  9. pkt->cmd == GET
    pkt->magic != 0xC9
    void proc_pkt(packet_t* pkt) {
    if (pkt->magic != 0xC9) {
    err(pkt);
    return;
    }
    if (pkt->cmd == GET) {
    ...
    } else if ...
    ...
    }
    Symbolic Execution in a Nutshell
    [C9 A0 ... ]
    6

    View Slide

  10. pkt->cmd == GET
    pkt->magic != 0xC9
    void proc_pkt(packet_t* pkt) {
    if (pkt->magic != 0xC9) {
    err(pkt);
    return;
    }
    if (pkt->cmd == GET) {
    ...
    } else if ...
    ...
    }
    Symbolic Execution in a Nutshell
    [C9 A0 ... ]
    6

    View Slide

  11. void proc_pkt(packet_t* pkt) {
    if (pkt->magic != 0xC9) {
    err(pkt);
    return;
    }
    if (pkt->cmd == GET) {
    ...
    } else if ...
    ...
    }
    Symbolic Execution in a Nutshell
    7
    λ

    View Slide

  12. λ.magic == 0xC9 λ.magic != 0xC9
    void proc_pkt(packet_t* pkt) {
    if (pkt->magic != 0xC9) {
    err(pkt);
    return;
    }
    if (pkt->cmd == GET) {
    ...
    } else if ...
    ...
    }
    Symbolic Execution in a Nutshell
    7
    λ

    View Slide

  13. λ.cmd == GET λ.cmd != GET
    λ.magic == 0xC9 λ.magic != 0xC9
    void proc_pkt(packet_t* pkt) {
    if (pkt->magic != 0xC9) {
    err(pkt);
    return;
    }
    if (pkt->cmd == GET) {
    ...
    } else if ...
    ...
    }
    Symbolic Execution in a Nutshell
    7
    λ

    View Slide

  14. λ.cmd == GET λ.cmd != GET
    λ.magic == 0xC9 λ.magic != 0xC9
    void proc_pkt(packet_t* pkt) {
    if (pkt->magic != 0xC9) {
    err(pkt);
    return;
    }
    if (pkt->cmd == GET) {
    ...
    } else if ...
    ...
    }
    Symbolic Execution in a Nutshell
    7
    ∼2 paths
    λ
    program
    size

    View Slide

  15. 8
    CPU Bottleneck
    Memory Exhaustion

    View Slide

  16. W1 W2 W3
    Parallel Tree Exploration
    8

    View Slide

  17. W1
    W2 W3
    Parallel Tree Exploration
    8
    Key research problem:
    Scalable parallel exploration

    View Slide

  18. Linear Solution to Exponential
    Problem
    9
    Program Size
    Time to Test

    View Slide

  19. Linear Solution to Exponential
    Problem
    9
    Program Size
    Time to Test
    Testing
    target
    1 worker

    View Slide

  20. Linear Solution to Exponential
    Problem
    9
    Program Size
    Time to Test
    Testing
    target
    Bring testing time down to practical values
    1 worker
    2 workers
    4 workers
    8 workers

    View Slide

  21. Throw Hardware at the Problem
    10

    View Slide

  22. Scalability Challenges
    Tree structure not known a priori
    ?
    ?
    ?
    ?
    ?
    ? ?
    ?
    ? ?
    11

    View Slide

  23. Scalability Challenges
    Static Allocation
    12

    View Slide

  24. Scalability Challenges
    12

    View Slide

  25. Scalability Challenges
    Anticipate Allocation
    13

    View Slide

  26. Scalability Challenges
    13

    View Slide

  27. Outline
    • Scalable Parallel Symbolic Execution
    • POSIX Environment Model
    • Evaluation
    14

    View Slide

  28. Cloud9 Architecture
    15
    Global
    Symbolic Tree

    View Slide

  29. Cloud9 Architecture
    15
    W1’s Local Tree W2’s Local Tree W3’s Local Tree
    Each worker runs a local
    sequential symbolic execution engine (KLEE)

    View Slide

  30. Cloud9 Architecture
    16
    Candidate
    nodes
    Fence
    nodes
    • Candidate nodes are selected for
    exploration
    • Fence nodes bound the local tree

    View Slide

  31. Load Balancing
    LB
    W1 W2 W3
    17
    Hybrid distributed system:
    centralized reports, P2P work transfer

    View Slide

  32. Load Balancing
    LB
    W1 W2 W3
    17
    Hybrid distributed system:
    centralized reports, P2P work transfer

    View Slide

  33. Load Balancing
    LB
    W1 W2 W3
    17
    Hybrid distributed system:
    centralized reports, P2P work transfer

    View Slide

  34. Work Transfer
    W1
    18
    Candidate
    Fence

    View Slide

  35. Work Transfer
    W1 W2
    18
    Candidate
    Fence

    View Slide

  36. Work Transfer
    W1 W2
    Virtual
    18
    Candidate
    Fence

    View Slide

  37. Work Transfer
    W1 W2
    Virtual
    18
    Candidate
    Fence

    View Slide

  38. Work Transfer
    W1 W2
    Materialized
    18
    Candidate
    Fence

    View Slide

  39. Work Transfer
    W1 W2
    18
    Exploration disjointness + completeness
    Candidate
    Fence

    View Slide

  40. 1
    1
    1
    1
    1
    0
    0
    0
    0
    0
    0
    0
    0
    Path-based Encoding
    19
    • Nodes are encoded as paths in tree
    • Compact binary representation
    • Two paths can share common pre x
    • Small encoding size
    • For a tree of 2100 leaves, a path ts in
    <128 bits (16 bytes)

    View Slide

  41. Load Balancing in Practice
    20
    LB stops after 1 min
    LB stops after 4 min
    Continuous load balancing
    Work done
    [% of total instructions]
    Time [minutes]
    0
    10
    20
    30
    40
    50
    60
    70
    80
    90
    100
    0 2 4 6 8 10
    Load balancing necessary to ensure scalability

    View Slide

  42. Outline
    • Scalable Parallel Symbolic Execution
    • POSIX Environment Model
    • Evaluation
    21

    View Slide

  43. Calls into the Environment
    22
    if (fork() == 0) {
    ...
    if ((res = recv(sock, buff, size, 0)) > 0) {
    pthread_mutex_lock(&mutex);
    memcpy(gBuff, buff, res);
    pthread_mutex_unlock(&mutex);
    }
    ...
    } else {
    ...
    pid_t pid = wait(&stat);
    ...
    }

    View Slide

  44. fork()
    Program Under Test
    Environment
    (C Library / OS)
    Environment Model
    23
    Cannot directly execute symbolically

    View Slide

  45. fork()
    Program Under Test
    Environment
    (C Library / OS)
    Environment Model
    23
    Model Code
    Symbolic Execution Engine
    Equivalent functionality
    Executable symbolically

    View Slide

  46. Starting Point
    24
    Symbolic Execution Engine
    Network
    Stubs
    Files
    POSIX
    Single-threaded
    isolated
    nodes
    Single-threaded
    utilities

    View Slide

  47. POSIX Environment Model
    25
    Symbolic Execution Engine
    Network
    TCP/UDP/UNIX
    Files Pipes
    Threads
    pthread_*
    Processes
    POSIX
    M
    essage
    passing
    Servers and
    clients
    M
    ulti-threaded
    program
    s
    Distributed
    system
    s
    Signals
    Asynchronous events,
    IPC
    Single-threaded
    utilities

    View Slide

  48. Key Changes in Symbolic Execution
    Multithreading and Scheduling
    • Deterministic or symbolic scheduling
    • Non-preemptive execution model
    Address Space Isolation
    • Copy on Write (CoW) between processes
    • CoW domains for memory sharing
    26

    View Slide

  49. Symbolic Engine System Calls
    • Symbolic engine support
    needed for threads/processes
    1. Thread/process lifecycle
    2. Synchronization
    3. Shared memory
    27
    Symbolic Engine
    System Calls
    thread_create
    thread_terminate
    process_fork
    process_terminate
    get_context
    thread_preempt
    thread_sleep
    thread_notify
    get_wait_list
    make_shared
    1
    2
    3

    View Slide

  50. Outline
    • Scalable Parallel Symbolic Execution
    • POSIX Environment Model
    • Evaluation
    28

    View Slide

  51. Testing Real-World Software
    29
    Memcached GNU Coreutils
    Apache

    View Slide

  52. Time to Reach Target Coverage
    30
    printf
    Faster time-to-cover, higher coverage values
    60% coverage
    70% coverage
    80% coverage
    90% coverage
    0
    10
    20
    30
    40
    50
    60
    1 4 8 24 48
    Time to achieve target
    coverage [minutes]
    Number of workers

    View Slide

  53. Increase in Code Coverage
    0
    10
    20
    30
    40
    50
    0 10 20 30 40 50 60 70 80 90
    Additional code covered
    [ % of program LOC ]
    Index of tested Coreutil (sorted by additional coverage)
    31
    Coreutils suite (12 workers, 10 min.)
    Consistent code coverage increase

    View Slide

  54. Exhaustive Exploration
    32
    0
    1
    2
    3
    4
    5
    6
    2 4 6 12 24 48
    Time to complete
    exhaustive test [hours]
    Number of workers
    Scalability of exhaustive path exploration
    memcached (7.4×104 paths)

    View Slide

  55. Instruction Throughput
    33
    0.0e+00
    2.0e+09
    4.0e+09
    6.0e+09
    8.0e+09
    1.0e+10
    1.2e+10
    1.4e+10
    1.6e+10
    1.8e+10
    1 4 6 12 24 48
    Useful work done
    [ # of instructions ]
    Number of workers
    4 minutes
    6 minutes
    8 minutes
    10 minutes
    memcached
    Linear scalability with number of workers

    View Slide

  56. Execute the “whole world” symbolically
    Symbolic State
    Experimental Setup
    34
    Client
    Process
    memcached/
    Apache/
    lighttpd
    TCP Stream
    Symbolic cmd.
    Srv. response

    View Slide

  57. Symbolic Test Cases
    • Easy-to-use API for developers to write
    symbolic test cases
    • Basic symbolic memory support
    • POSIX extensions for environment control
    • Network conditions, fault injection, symbolic
    scheduler
    35

    View Slide

  58. Symbolic Test Cases
    36
    Testing HTTP header extension
    make_symbolic(hdrData);
    // Append symbolic header to request
    strcat(req, “X-NewExtension: “);
    strcat(req, hdrData);
    // Enable fault injection on socket
    ioctl(ssock, SIO_FAULT_INJ, RD | WR);
    // Symbolic stream fragmentation
    ioctl(ssock, SIO_PKT_FRAGMENT, RD);

    View Slide

  59. Conclusions
    • Parallel symbolic execution
    • Linear scalability on commodity clusters
    • Full POSIX environment model
    • Real-world systems testing
    • Use cases
    • Increasing coverage
    • Exhaustive path exploration
    • Bug patch veri cation
    37

    View Slide