ASPLOS'18: Sulong, and Thanks For All the Bugs: Finding Errors in C Programs by Abstracting from the Native Execution Model

Sulong, and Thanks For All the Bugs Finding Errors in
C Programs by Abstracting from the Native Execution Model Manuel Rigger1, Roland Schatz2, René Mayrhofer1, Matthias Grimmer2, Hanspeter Mössenböck1 1 Johannes Kepler University Linz, Austria 2 Oracle Labs, Austria ASPLOS ’18, March 27, 2018, Williamsburg, VA, USA

Claim: Unsafe Languages can be Executed Safely and Efficiently on
the Java Virtual Machine 2

What are Unsafe Languages? 4 Unsafe Languages Do not specify
an operation for all inputs e.g., C Safe Languages Strictly define all operations e.g., Java (Felleisen et al. 1999)

Buffer Overflows 5 int *arr = malloc(3 * sizeof(int)); arr[5]
= …

= … C Undefined Behavior

= … Java C Undefined Behavior ArrayIndexOutOfBoundsException int[] arr = new int[3]; arr[5] = …

Use-after-free Errors 6 free(arr); arr[0] = …

Use-after-free Errors 6 free(arr); arr[0] = … C Undefined Behavior

Use-after-free Errors 6 free(arr); arr[0] = … C Undefined Behavior
NullPointerException Java arr = null; arr[0] =

Idea 7 Map the semantics of a C Program to
Java to automatically detect memory safety errors

State of the Art 8 a.out Clang/GCC C ./a.out Hello
world!

State of the Art Compile-time instrumentation • AddressSanitizer (ASan) (Serebryany
et al. 2012) • SoftBound+CETS (Nagarakatte et al. 2009, 2010) 8 a.out Clang/GCC C ./a.out Hello world!

State of the Art Compile-time instrumentation • AddressSanitizer (ASan) (Serebryany
et al. 2012) • SoftBound+CETS (Nagarakatte et al. 2009, 2010) 8 a.out Clang/GCC C ./a.out Hello world! Run-time instrumentation • Valgrind (Nethercote et al. 2007) • Dr. Memory (Bruening et al. 2011)

State of the Art 9 a.out Clang/GCC C ./a.out Hello
world! Such tools were very helpful in finding bugs in widely used code

Can we do better? 10 a.out Clang/GCC C ./a.out Hello
world! Static compilers: optimize code based on Undefined Behavior Bug-finding tools: find bugs assuming that violations are visible side effects (Wang et al. 2012, D'Silva 2015)

world! Static compilers: optimize code based on Undefined Behavior Bug-finding tools: find bugs assuming that violations are visible side effects struct sock *sk = tun->sk; if (!tun) return POLLERR; (Wang et al. 2012, D'Silva 2015)

world! Current approaches do not abstract from the underlying machine/native execution model

world! Current approaches do not abstract from the underlying machine/native execution model Manually adding instrumentation is error-prone

System Overview 13 LLVM IR Interpreter LLVM IR Clang program.c
libc.c Truffle Graal JVM -O0

libc.c Truffle Graal JVM We are currently using a custom libc implementation (without system calls) -O0

libc.c Truffle Graal JVM (Lattner 2004) -O0

libc.c Truffle Graal JVM (Lattner 2004) Executing LLVM IR allows us to also execute other unsafe languages -O0

libc.c Truffle Graal JVM -O0 Unaware of the underlying machine, execution model, and ABI

libc.c Truffle Graal JVM (Würthinger et al. 2013) -O0

libc.c Truffle Graal JVM (Würthinger et al. 2013) -O0 Truffle and Graal allow Safe Sulong to reach “native speeds”

libc.c Truffle Graal JVM -O0 All checks are automatically performed by the underlying JVM

Prevent Out-Of-Bounds Accesses 14 int *arr = malloc(3 * sizeof(int))
arr[5] = … ManagedAddress offset=5 data I32Array contents {0, 0, 0}

Prevent Out-Of-Bounds Accesses contents[5]  ArrayIndexOutOfBoundsException 14 int *arr =
malloc(3 * sizeof(int)) arr[5] = … ManagedAddress offset=5 data I32Array contents {0, 0, 0}

ManagedAddress offset=0 data I32Array contents=null Prevent Use-After-Free Errors 15 free(arr);
arr[0] = …

ManagedAddress offset=0 data I32Array contents=null Prevent Use-After-Free Errors contents[0] NullPointerException
15 free(arr); arr[0] = …

ManagedAddress offset=0 data I32Array contents=null Prevent Use-After-Free Errors contents[0] NullPointerException
15 free(arr); arr[0] = … Safe Sulong can detect other categories of errors (e.g., double-free errors)

Evaluation • Found 68 errors in small open-source projects •
Safe Sulong found 8 errors that were both not found by ASan and Valgrind • Compiler optimizations (ASan –O3) prevented the detection of 4 additional bugs • Valgrind detected half of the errors 16

Evaluation: Example ASan 17 int main(int argc, char** argv) {
printf("%d %s\n", argc, argv[100]); } ASan does not instrument the main() arguments since they are allocated by libc https://github.com/google/sanitizers/issues/762

Example Program 19 void processRequests () { int i =
0; do { processPacket (); i ++; } while (i < 10000) ; } C

Example Program 19 void processRequests () { int i =
0; do { processPacket (); i ++; } while (i < 10000) ; } define void @processRequests () #0 { ; ( basic block 0) br label %1 ; <label >:1 ( basic block 1) %i = phi i32 [ 0, %0 ], [ %2 , %1 ] call void @processPacket () %2 = add nsw i32 %i, 1 %3 = icmp slt i32 %2 , 10000 br i1 %3 , label %1 , label %4 ; <label >:4 ( basic block 2) ret void } LLVM IR Clang C

20 define void @processRequests () #0 { ; ( basic
block 0) br label %1 ; <label >:1 ( basic block 1) %i = phi i32 [ 0, %0 ], [ %2 , %1 ] call void @processPacket () %2 = add nsw i32 %i, 1 %3 = icmp slt i32 %2 , 10000 br i1 %3 , label %1 , label %4 ; <label >:4 ( basic block 2) ret void } LLVM IR Implementation of Operations

block 0) br label %1 ; <label >:1 ( basic block 1) %i = phi i32 [ 0, %0 ], [ %2 , %1 ] call void @processPacket () %2 = add nsw i32 %i, 1 %3 = icmp slt i32 %2 , 10000 br i1 %3 , label %1 , label %4 ; <label >:4 ( basic block 2) ret void } LLVM IR Executable Abstract Syntax Tree Implementation of Operations write %2 add read %i 1

block 0) br label %1 ; <label >:1 ( basic block 1) %i = phi i32 [ 0, %0 ], [ %2 , %1 ] call void @processPacket () %2 = add nsw i32 %i, 1 %3 = icmp slt i32 %2 , 10000 br i1 %3 , label %1 , label %4 ; <label >:4 ( basic block 2) ret void } LLVM IR Implementation of Basic Blocks

block 0) br label %1 ; <label >:1 ( basic block 1) %i = phi i32 [ 0, %0 ], [ %2 , %1 ] call void @processPacket () %2 = add nsw i32 %i, 1 %3 = icmp slt i32 %2 , 10000 br i1 %3 , label %1 , label %4 ; <label >:4 ( basic block 2) ret void } LLVM IR Executable Abstract Syntax Tree Implementation of Basic Blocks Block1

Implementation of Control Flow Support 22 define void @processRequests ()
#0 { ; ( basic block 0) br label %1 ; <label >:1 ( basic block 1) %i = phi i32 [ 0, %0 ], [ %2 , %1 ] call void @processPacket () %2 = add nsw i32 %i, 1 %3 = icmp slt i32 %2 , 10000 br i1 %3 , label %1 , label %4 ; <label >:4 ( basic block 2) ret void } LLVM IR

Implementation of Control Flow Support 22 define void @processRequests ()
#0 { ; ( basic block 0) br label %1 ; <label >:1 ( basic block 1) %i = phi i32 [ 0, %0 ], [ %2 , %1 ] call void @processPacket () %2 = add nsw i32 %i, 1 %3 = icmp slt i32 %2 , 10000 br i1 %3 , label %1 , label %4 ; <label >:4 ( basic block 2) ret void } LLVM IR Block0 Block1 Block2 Basic Block Dispatch Node 1 2 -1 1

Compilation • For frequently executed functions • Partial evaluation: inline
execute methods of the graph (recursively) • Further optimize the graph 23 Block0 Block1 Block2 Basic Block Dispatch Node 1 2 -1 1

Block0 Block1 Block2 Basic Block Dispatch Node 1 2 -1
1 Compiler 24 int blockIndex = 0; block0: blockIndex = 1 %i.0 = 0 block1: while (true): processPacket() %2 = %i.0 + 1 %3 = %2 < 10000 if %3: blockIndex = 1 %i.0 = %2 continue; else: blockIndex = 2 block2: blockIndex = -1 return Unrolling of the interpreter loop

Block0 Block1 Block2 Basic Block Dispatch Node 1 2 -1
1 Compiler 24 int blockIndex = 0; block0: blockIndex = 1 %i.0 = 0 block1: while (true): processPacket() %2 = %i.0 + 1 %3 = %2 < 10000 if %3: blockIndex = 1 %i.0 = %2 continue; else: blockIndex = 2 block2: blockIndex = -1 return Unrolling of the interpreter loop Graal further optimizes the partially evaluated interpreter

Safe Semantics • Safe by design: errors result in exceptions
• Invalid memory accesses are not optimized away 25

Evaluation: Peak Performance 26 lower is better

Evaluation: Peak Performance 27 Small benchmarks since Safe Sulong failed
executing SPEC  preliminary results lower is better

Evaluation: Peak Performance 28 Baseline is Clang –O0, Safe Sulong
is faster in all but one case lower is better

Evaluation: Peak Performance 29 Safe Sulong is close to Clang
–O3 in some cases lower is better

Evaluation: Peak Performance 30 Safe Sulong –O0 is mostly faster
than ASan –O0 lower is better

Future Work and Summary 31

32 Executing libc and Other System Libraries

32 asm("rdtsc":"=a"(tickl),"=d"(tickh)); Inline assembly Executing libc and Other System Libraries
(Rigger et al. 2018)

32 if (__builtin_expect(x, 0)) foo (); asm("rdtsc":"=a"(tickl),"=d"(tickh)); Inline assembly Executing
libc and Other System Libraries Compiler builtins (Rigger et al. 2018)

libc and Other System Libraries Compiler builtins getcwd(buf, size); System calls (Rigger et al. 2018)

libc and Other System Libraries Compiler builtins getcwd(buf, size); System calls Implementing them will allow Safe Sulong to execute existing libcs (and SPEC) (Rigger et al. 2018)

Summary 33 @RiggerManuel Approaches are based on “unsafe” compilers Safe
Sulong automatically detects errors It reaches good peak performance We are still working on completeness

Bibliography • Matthias Felleisen and Shriram Krishnamurthi. 1999. Safety in
Programming Languages. Technical Report. Rice University. • Konstantin Serebryany, Derek Bruening, Alexander Potapenko, and Dmitry Vyukov. 2012. AddressSanitizer: a fast address sanity checker. In Proceedings of the 2012 USENIX conference on Annual Technical Conference (USENIX ATC'12). USENIX Association, Berkeley, CA, USA, 28-28. • Santosh Nagarakatte, Jianzhou Zhao, Milo M.K. Martin, and Steve Zdancewic. 2009. SoftBound: highly compatible and complete spatial memory safety for c. In Proceedings of the 30th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI '09). ACM, New York, NY, USA, 245-258. • Santosh Nagarakatte, Jianzhou Zhao, Milo M.K. Martin, and Steve Zdancewic. 2010. CETS: compiler enforced temporal safety for C. In Proceedings of the 2010 international symposium on Memory management (ISMM '10). ACM, New York, NY, USA, 31-40. • Nicholas Nethercote and Julian Seward. 2007. Valgrind: a framework for heavyweight dynamic binary instrumentation. In Proceedings of the 28th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI '07). • Derek Bruening and Qin Zhao. 2011. Practical memory checking with Dr. Memory. In Proceedings of the 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization (CGO '11). IEEE Computer Society, Washington, DC, USA, 213-223. • Xi Wang, Haogang Chen, Alvin Cheung, Zhihao Jia, Nickolai Zeldovich, and M. Frans Kaashoek. 2012. Undefined behavior: what happened to my code?. In Proceedings of the Asia-Pacific Workshop on Systems (APSYS '12). ACM, New York, NY, USA, Article 9, 7 pages. • Vijay D'Silva, Mathias Payer, and Dawn Song. 2015. The Correctness-Security Gap in Compiler Optimization. In Proceedings of the 2015 IEEE Security and Privacy Workshops (SPW '15). IEEE Computer Society, Washington, DC, USA, 73-87. • Thomas Würthinger, Christian Wimmer, Andreas Wöß, Lukas Stadler, Gilles Duboscq, Christian Humer, Gregor Richards, Doug Simon, and Mario Wolczko. 2013. One VM to rule them all. In Proceedings of the 2013 ACM international symposium on New ideas, new paradigms, and reflections on programming & software (Onward! 2013). ACM, New York, NY, USA, 187-204. • Chris Lattner and Vikram Adve. 2004. LLVM: A Compilation Framework for Lifelong Program Analysis & Transformation. In Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization(CGO '04). IEEE Computer Society, Washington, DC, USA. • Manuel Rigger and Stefan Marr and Stephen Kell and David Leopoldseder and Hanspeter Mössenböck, (2018) An Analysis of x86-64 Inline Assembly in C Programs. In: 14th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments, 25 March 2018, Williamsburg, VA, USA. 34

ASPLOS'18: Sulong, and Thanks For All the Bugs:...

ASPLOS'18: Sulong, and Thanks For All the Bugs: Finding Errors in C Programs by Abstracting from the Native Execution Model

More Decks by Manuel Rigger

Other Decks in Research

Featured

Transcript