Slide 1

Slide 1 text

Parallel Symbolic Execution for Automated Real-World Software Testing Stefan Bucur, Vlad Ureche, Cristian Zam r, George Candea Cloud9 School of Computer and Communication Sciences

Slide 2

Slide 2 text

Automated Techniques Automated Software Testing 2 λ Symbolic Execution Model Checking Industrial SW Testing Manual Testing Static Analysis Fuzzing Scalability Applicability Usability

Slide 3

Slide 3 text

Cloud9 - The Big Picture • Parallel symbolic execution • Linear scalability on commodity clusters • Full symbolic POSIX support • Applicable on real-world systems • Platform for writing test cases • Easy-to-use platform API 3

Slide 4

Slide 4 text

Automated Systems Testing 4 [*] C. Cadar, D. Dunbar, D. Engler, “KLEE: Unassisted and automatic generation of high-coverage tests for complex systems programs”, OSDI 2008 • Promising for systems testing: KLEE [*] • High-coverage test cases • Found new bugs • ... But applied only on small programs λ Symbolic Execution

Slide 5

Slide 5 text

5 Memcached GNU Coreutils Apache

Slide 6

Slide 6 text

void proc_pkt(packet_t* pkt) { if (pkt->magic != 0xC9) { err(pkt); return; } if (pkt->cmd == GET) { ... } else if ... ... } Symbolic Execution in a Nutshell [C9 A0 ... ] 6

Slide 7

Slide 7 text

void proc_pkt(packet_t* pkt) { if (pkt->magic != 0xC9) { err(pkt); return; } if (pkt->cmd == GET) { ... } else if ... ... } Symbolic Execution in a Nutshell [C9 A0 ... ] 6

Slide 8

Slide 8 text

pkt->magic != 0xC9 void proc_pkt(packet_t* pkt) { if (pkt->magic != 0xC9) { err(pkt); return; } if (pkt->cmd == GET) { ... } else if ... ... } Symbolic Execution in a Nutshell [C9 A0 ... ] 6

Slide 9

Slide 9 text

pkt->cmd == GET pkt->magic != 0xC9 void proc_pkt(packet_t* pkt) { if (pkt->magic != 0xC9) { err(pkt); return; } if (pkt->cmd == GET) { ... } else if ... ... } Symbolic Execution in a Nutshell [C9 A0 ... ] 6

Slide 10

Slide 10 text

pkt->cmd == GET pkt->magic != 0xC9 void proc_pkt(packet_t* pkt) { if (pkt->magic != 0xC9) { err(pkt); return; } if (pkt->cmd == GET) { ... } else if ... ... } Symbolic Execution in a Nutshell [C9 A0 ... ] 6

Slide 11

Slide 11 text

void proc_pkt(packet_t* pkt) { if (pkt->magic != 0xC9) { err(pkt); return; } if (pkt->cmd == GET) { ... } else if ... ... } Symbolic Execution in a Nutshell 7 λ

Slide 12

Slide 12 text

λ.magic == 0xC9 λ.magic != 0xC9 void proc_pkt(packet_t* pkt) { if (pkt->magic != 0xC9) { err(pkt); return; } if (pkt->cmd == GET) { ... } else if ... ... } Symbolic Execution in a Nutshell 7 λ

Slide 13

Slide 13 text

λ.cmd == GET λ.cmd != GET λ.magic == 0xC9 λ.magic != 0xC9 void proc_pkt(packet_t* pkt) { if (pkt->magic != 0xC9) { err(pkt); return; } if (pkt->cmd == GET) { ... } else if ... ... } Symbolic Execution in a Nutshell 7 λ

Slide 14

Slide 14 text

λ.cmd == GET λ.cmd != GET λ.magic == 0xC9 λ.magic != 0xC9 void proc_pkt(packet_t* pkt) { if (pkt->magic != 0xC9) { err(pkt); return; } if (pkt->cmd == GET) { ... } else if ... ... } Symbolic Execution in a Nutshell 7 ∼2 paths λ program size

Slide 15

Slide 15 text

8 CPU Bottleneck Memory Exhaustion

Slide 16

Slide 16 text

W1 W2 W3 Parallel Tree Exploration 8

Slide 17

Slide 17 text

W1 W2 W3 Parallel Tree Exploration 8 Key research problem: Scalable parallel exploration

Slide 18

Slide 18 text

Linear Solution to Exponential Problem 9 Program Size Time to Test

Slide 19

Slide 19 text

Linear Solution to Exponential Problem 9 Program Size Time to Test Testing target 1 worker

Slide 20

Slide 20 text

Linear Solution to Exponential Problem 9 Program Size Time to Test Testing target Bring testing time down to practical values 1 worker 2 workers 4 workers 8 workers

Slide 21

Slide 21 text

Throw Hardware at the Problem 10

Slide 22

Slide 22 text

Scalability Challenges Tree structure not known a priori ? ? ? ? ? ? ? ? ? ? 11

Slide 23

Slide 23 text

Scalability Challenges Static Allocation 12

Slide 24

Slide 24 text

Scalability Challenges 12

Slide 25

Slide 25 text

Scalability Challenges Anticipate Allocation 13

Slide 26

Slide 26 text

Scalability Challenges 13

Slide 27

Slide 27 text

Outline • Scalable Parallel Symbolic Execution • POSIX Environment Model • Evaluation 14

Slide 28

Slide 28 text

Cloud9 Architecture 15 Global Symbolic Tree

Slide 29

Slide 29 text

Cloud9 Architecture 15 W1’s Local Tree W2’s Local Tree W3’s Local Tree Each worker runs a local sequential symbolic execution engine (KLEE)

Slide 30

Slide 30 text

Cloud9 Architecture 16 Candidate nodes Fence nodes • Candidate nodes are selected for exploration • Fence nodes bound the local tree

Slide 31

Slide 31 text

Load Balancing LB W1 W2 W3 17 Hybrid distributed system: centralized reports, P2P work transfer

Slide 32

Slide 32 text

Load Balancing LB W1 W2 W3 17 Hybrid distributed system: centralized reports, P2P work transfer

Slide 33

Slide 33 text

Load Balancing LB W1 W2 W3 17 Hybrid distributed system: centralized reports, P2P work transfer

Slide 34

Slide 34 text

Work Transfer W1 18 Candidate Fence

Slide 35

Slide 35 text

Work Transfer W1 W2 18 Candidate Fence

Slide 36

Slide 36 text

Work Transfer W1 W2 Virtual 18 Candidate Fence

Slide 37

Slide 37 text

Work Transfer W1 W2 Virtual 18 Candidate Fence

Slide 38

Slide 38 text

Work Transfer W1 W2 Materialized 18 Candidate Fence

Slide 39

Slide 39 text

Work Transfer W1 W2 18 Exploration disjointness + completeness Candidate Fence

Slide 40

Slide 40 text

1 1 1 1 1 0 0 0 0 0 0 0 0 Path-based Encoding 19 • Nodes are encoded as paths in tree • Compact binary representation • Two paths can share common pre x • Small encoding size • For a tree of 2100 leaves, a path ts in <128 bits (16 bytes)

Slide 41

Slide 41 text

Load Balancing in Practice 20 LB stops after 1 min LB stops after 4 min Continuous load balancing Work done [% of total instructions] Time [minutes] 0 10 20 30 40 50 60 70 80 90 100 0 2 4 6 8 10 Load balancing necessary to ensure scalability

Slide 42

Slide 42 text

Outline • Scalable Parallel Symbolic Execution • POSIX Environment Model • Evaluation 21

Slide 43

Slide 43 text

Calls into the Environment 22 if (fork() == 0) { ... if ((res = recv(sock, buff, size, 0)) > 0) { pthread_mutex_lock(&mutex); memcpy(gBuff, buff, res); pthread_mutex_unlock(&mutex); } ... } else { ... pid_t pid = wait(&stat); ... }

Slide 44

Slide 44 text

fork() Program Under Test Environment (C Library / OS) Environment Model 23 Cannot directly execute symbolically

Slide 45

Slide 45 text

fork() Program Under Test Environment (C Library / OS) Environment Model 23 Model Code Symbolic Execution Engine Equivalent functionality Executable symbolically

Slide 46

Slide 46 text

Starting Point 24 Symbolic Execution Engine Network Stubs Files POSIX Single-threaded isolated nodes Single-threaded utilities

Slide 47

Slide 47 text

POSIX Environment Model 25 Symbolic Execution Engine Network TCP/UDP/UNIX Files Pipes Threads pthread_* Processes POSIX M essage passing Servers and clients M ulti-threaded program s Distributed system s Signals Asynchronous events, IPC Single-threaded utilities

Slide 48

Slide 48 text

Key Changes in Symbolic Execution Multithreading and Scheduling • Deterministic or symbolic scheduling • Non-preemptive execution model Address Space Isolation • Copy on Write (CoW) between processes • CoW domains for memory sharing 26

Slide 49

Slide 49 text

Symbolic Engine System Calls • Symbolic engine support needed for threads/processes 1. Thread/process lifecycle 2. Synchronization 3. Shared memory 27 Symbolic Engine System Calls thread_create thread_terminate process_fork process_terminate get_context thread_preempt thread_sleep thread_notify get_wait_list make_shared 1 2 3

Slide 50

Slide 50 text

Outline • Scalable Parallel Symbolic Execution • POSIX Environment Model • Evaluation 28

Slide 51

Slide 51 text

Testing Real-World Software 29 Memcached GNU Coreutils Apache

Slide 52

Slide 52 text

Time to Reach Target Coverage 30 printf Faster time-to-cover, higher coverage values 60% coverage 70% coverage 80% coverage 90% coverage 0 10 20 30 40 50 60 1 4 8 24 48 Time to achieve target coverage [minutes] Number of workers

Slide 53

Slide 53 text

Increase in Code Coverage 0 10 20 30 40 50 0 10 20 30 40 50 60 70 80 90 Additional code covered [ % of program LOC ] Index of tested Coreutil (sorted by additional coverage) 31 Coreutils suite (12 workers, 10 min.) Consistent code coverage increase

Slide 54

Slide 54 text

Exhaustive Exploration 32 0 1 2 3 4 5 6 2 4 6 12 24 48 Time to complete exhaustive test [hours] Number of workers Scalability of exhaustive path exploration memcached (7.4×104 paths)

Slide 55

Slide 55 text

Instruction Throughput 33 0.0e+00 2.0e+09 4.0e+09 6.0e+09 8.0e+09 1.0e+10 1.2e+10 1.4e+10 1.6e+10 1.8e+10 1 4 6 12 24 48 Useful work done [ # of instructions ] Number of workers 4 minutes 6 minutes 8 minutes 10 minutes memcached Linear scalability with number of workers

Slide 56

Slide 56 text

Execute the “whole world” symbolically Symbolic State Experimental Setup 34 Client Process memcached/ Apache/ lighttpd TCP Stream Symbolic cmd. Srv. response

Slide 57

Slide 57 text

Symbolic Test Cases • Easy-to-use API for developers to write symbolic test cases • Basic symbolic memory support • POSIX extensions for environment control • Network conditions, fault injection, symbolic scheduler 35

Slide 58

Slide 58 text

Symbolic Test Cases 36 Testing HTTP header extension make_symbolic(hdrData); // Append symbolic header to request strcat(req, “X-NewExtension: “); strcat(req, hdrData); // Enable fault injection on socket ioctl(ssock, SIO_FAULT_INJ, RD | WR); // Symbolic stream fragmentation ioctl(ssock, SIO_PKT_FRAGMENT, RD);

Slide 59

Slide 59 text

Conclusions • Parallel symbolic execution • Linear scalability on commodity clusters • Full POSIX environment model • Real-world systems testing • Use cases • Increasing coverage • Exhaustive path exploration • Bug patch veri cation 37