Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Safe and Efficient Execution of LLVM-based Languages on the Java Virtual Machine

Safe and Efficient Execution of LLVM-based Languages on the Java Virtual Machine

Talk held at the Swiss LLVM Compiler and Code Generation Social. The talk was recorded and is available at https://www.youtube.com/watch?v=SMth9PN2sF4.

Manuel Rigger

March 14, 2019
Tweet

More Decks by Manuel Rigger

Other Decks in Research

Transcript

  1. Safe and Efficient Execution of LLVM-based Languages on the Java

    Virtual Machine Swiss LLVM Compiler and Code Generation Social 14. March 2019 Manuel Rigger Advanced Software Technologies Lab (Zhendong Su) @RiggerManuel
  2. Unsafe Languages are Popular Rank Programming Language 1 Java 2

    C 3 C++ 3 (TIOBE Index for November 2018)
  3. Unsafe Languages are Popular Rank Programming Language 1 Java 2

    C 3 C++ 3 C and C++ are considered unsafe (TIOBE Index for November 2018)
  4. C/C++ is Responsible for Dangerous Vulnerabilities 5 Heartbleed Cloudbleed Caused

    by buffer overflows, the most dangerous vulnerability in unsafe languages
  5. What Makes a Language Unsafe? 6 Undefined Behavior (UB) “behavior,

    upon use of a nonportable or erroneous program construct or of erroneous data, for which this International Standard imposes no requirements “ (C99 standard)
  6. Buffer Overflows: Leaking Sensitive Data 9 long *arr = malloc(3

    * sizeof(long)); long dest[4]; memcpy(dest, arr, sizeof(dest)); arr: dest: secret
  7. Buffer Overflows: Leaking Sensitive Data 9 long *arr = malloc(3

    * sizeof(long)); long dest[4]; memcpy(dest, arr, sizeof(dest)); arr: dest: secret UB
  8. Buffer Overflows: Leaking Sensitive Data 10 long *arr = malloc(3

    * sizeof(long)); long dest[4]; memcpy(dest, arr, sizeof(dest)); arr: dest: secret secret
  9. Buffer Overflows: Leaking Sensitive Data 10 long *arr = malloc(3

    * sizeof(long)); long dest[4]; memcpy(dest, arr, sizeof(dest)); arr: dest: Heartbleed and Cloudbleed were such vulnerabilities secret secret
  10. Buffer Overflows: Changing Control Flow 12 long *arr = malloc(3

    * sizeof(long)); arr[4] = 0xfe…; arrbefore : arrafter : &func 0xfe...
  11. Buffer Overflows: Changing Control Flow 12 long *arr = malloc(3

    * sizeof(long)); arr[4] = 0xfe…; arrbefore : arrafter : UB &func 0xfe...
  12. Buffer Overflows: Changing Control Flow 12 long *arr = malloc(3

    * sizeof(long)); arr[4] = 0xfe…; arrbefore : Allows attackers to change the program‘s control flow arrafter : UB &func 0xfe...
  13. Use-after-free Error 13 long *arr = malloc(3 * sizeof(long)); free(arr);

    arr[0] = …; UB Can overwrite another object if the memory was reallocated
  14. Integer Overflow 14 int a = 1, b = INT_MAX;

    int val = a + b; Can result in inconsistent/surprising behavior if UB is “optimized away“ UB
  15. Integer Overflow 15 void pause() { int a = 0;

    // run until overflow while (a < a + 1) { a++; } }
  16. Integer Overflow 15 void pause() { int a = 0;

    // run until overflow while (a < a + 1) { a++; } } What’s the compilation output of Clang/GCC? 1. The function works as expected by the programmer 2. The function body is optimized away 3. The function results in an endless loop 4. It depends on the optimization level
  17. Integer Overflow 16 void pause() { int a = 0;

    // run until overflow while (a < a + 1) { a++; } }
  18. Integer Overflow 16 void pause() { int a = 0;

    // run until overflow while (a < a + 1) { a++; } } mov dword ptr [rsp - 4], 0 jmp loop_header loop_body: add dword ptr [rsp - 4], 1 loop_header: mov eax, dword ptr [rsp - 4] mov ecx, dword ptr [rsp - 4] add ecx, 1 cmp eax, ecx jl loop_body ret -O0
  19. Integer Overflow 16 void pause() { int a = 0;

    // run until overflow while (a < a + 1) { a++; } } loop: jmp loop mov dword ptr [rsp - 4], 0 jmp loop_header loop_body: add dword ptr [rsp - 4], 1 loop_header: mov eax, dword ptr [rsp - 4] mov ecx, dword ptr [rsp - 4] add ecx, 1 cmp eax, ecx jl loop_body ret -O3 -O0
  20. Goal of my PhD 18 Tackle UB by safely and

    efficiently executing unsafe languages on the JVM
  21. Goal of my PhD 19 Tackle UB by safely and

    efficiently executing unsafe languages on the JVM
  22. Goal of my PhD 19 Tackle UB by safely and

    efficiently executing unsafe languages on the JVM Well-defined semantics even for errors and corner cases
  23. 23 Lenient C Safe Sulong and its Bug-finding Mode Introspection

    Terminate the program Continue execution
  24. Existing Approaches 26 Instrumentation- based bug-finding tools Symbolic execution Safe

    languages Hardware security Static analysis Attacker mitigation
  25. Existing Approaches 27 Instrumentation- based bug-finding tools Symbolic execution Safe

    languages Hardware security Static analysis Attacker mitigation
  26. Existing Approaches 27 Instrumentation- based bug-finding tools Symbolic execution Safe

    languages Hardware security Static analysis Attacker mitigation • LLVM’s AddressSanitizer (Serebryany et al. 2012) • Memcheck (Nethercote et al. 2007) • SoftBound+CETS (Nagarakatte et al. 2009, 2010) • Dr. Memory (Bruening et al. 2011)
  27. State of the Art: Instrumentation-based Tools Compile-time instrumentation • AddressSanitizer

    • SoftBound+CETS 28 a.out Clang/GCC C ./a.out Hello world! Run-time instrumentation • Memcheck • Dr. Memory
  28. Conundrum: Finding Bugs vs. Performance 29 a.out Clang/GCC C ./a.out

    Hello world! Static compilers: optimize code based on Undefined Behavior Bug-finding tools: find bugs assuming that violations are visible side effects (Wang et al. 2012, D'Silva 2015)
  29. Conundrum: Finding Bugs vs. Performance 30 To find all bugs,

    developers need to disable compiler optimizations
  30. Lack of Abstraction 31 a.out Clang/GCC C ./a.out Hello world!

    Checks omitted/forgotten result in overlooked bugs
  31. Map Data Structures and Operations to Java 32 long *arr

    = malloc(3 * sizeof(long)); arr[4] = …
  32. Map Data Structures and Operations to Java 32 long *arr

    = malloc(3 * sizeof(long)); arr[4] = … Map to Java Code
  33. Map Data Structures and Operations to Java 32 long[] arr

    = new long[3]; arr[4] = … long *arr = malloc(3 * sizeof(long)); arr[4] = … Map to Java Code
  34. Map Data Structures and Operations to Java 32 long[] arr

    = new long[3]; arr[4] = … long *arr = malloc(3 * sizeof(long)); arr[4] = … Map to Java Code The semantics of an out-of- bounds access are well specified
  35. Map Data Structures and Operations to Java 32 long[] arr

    = new long[3]; arr[4] = … long *arr = malloc(3 * sizeof(long)); arr[4] = … Map to Java Code ArrayIndexOutOfBoundsException The semantics of an out-of- bounds access are well specified
  36. Map Data Structures and Operations to Java 32 long[] arr

    = new long[3]; arr[4] = … long *arr = malloc(3 * sizeof(long)); arr[4] = … Map to Java Code ArrayIndexOutOfBoundsException The semantics of an out-of- bounds access are well specified Automatic bounds checks that cannot be optimized away
  37. Execution of LLVM IR 35 Safe Execution Platform LLVM IR

    Clang C C++ GCC Fortran Other LLVM frontend ... [Languages other than C?]
  38. Execution of LLVM IR 35 Safe Execution Platform LLVM IR

    Clang C C++ GCC Fortran Other LLVM frontend ... (Lattner et al. 2004) [Languages other than C?]
  39. Execution of LLVM IR 35 Safe Execution Platform LLVM IR

    Clang C C++ GCC Fortran Other LLVM frontend ... (Lattner et al. 2004) We disable compiler optimizations of the front ends [Languages other than C?]
  40. Execution of LLVM IR 35 Safe Execution Platform LLVM IR

    Clang C C++ GCC Fortran Other LLVM frontend ... (Lattner et al. 2004) We disable compiler optimizations of the front ends [Languages other than C?]
  41. Execution of LLVM IR 35 Safe Execution Platform LLVM IR

    Clang C C++ GCC Fortran Other LLVM frontend ... (Lattner et al. 2004) Targeting LLVM IR allows executing multiple unsafe languages [Languages other than C?]
  42. Execution of LLVM IR 35 Safe Execution Platform LLVM IR

    Clang C C++ GCC Fortran Other LLVM frontend ... (Lattner et al. 2004) Targeting LLVM IR allows executing multiple unsafe languages [Languages other than C?]
  43. Execution of LLVM IR 36 LLVM IR Interpreter Truffle LLVM

    IR Graal JVM [How does the compilation work?] [Array bounds check elimination] [Optimizations Overview] [Completenesss vs. Soundness] [Languages other than C?]
  44. Execution of LLVM IR 36 LLVM IR Interpreter Truffle LLVM

    IR Graal JVM [How does the compilation work?] [Array bounds check elimination] [Optimizations Overview] [Completenesss vs. Soundness] [Languages other than C?]
  45. Execution of LLVM IR 36 LLVM IR Interpreter Truffle LLVM

    IR Graal JVM (Würthinger et al. 2012, 2017) [How does the compilation work?] [Array bounds check elimination] [Optimizations Overview] [Completenesss vs. Soundness] [Languages other than C?]
  46. Execution of LLVM IR 36 LLVM IR Interpreter Truffle LLVM

    IR Graal JVM (Würthinger et al. 2012, 2017) Using Truffle and Graal, we can minimize the instrumentation overhead [How does the compilation work?] [Array bounds check elimination] [Optimizations Overview] [Completenesss vs. Soundness] [Languages other than C?]
  47. Execution of LLVM IR 36 LLVM IR Interpreter Truffle LLVM

    IR Graal JVM (Würthinger et al. 2012, 2017) [How does the compilation work?] [Array bounds check elimination] [Optimizations Overview] [Completenesss vs. Soundness] [Languages other than C?]
  48. Execution of LLVM IR 36 LLVM IR Interpreter Truffle LLVM

    IR Graal JVM (Würthinger et al. 2012, 2017) [How does the compilation work?] [Array bounds check elimination] [Optimizations Overview] Safe Sulong can rely on the underlying JVM • Automatic checks • Safe optimizations • Abstraction from the underlying machine and OS [Completenesss vs. Soundness] [Languages other than C?]
  49. {0, 0, 0} Address offset = 0 data I64Array contents

    Prevent Out-Of-Bounds Accesses 37 long *arr = malloc(3 * sizeof(long)); [How do we know the type?] [What other errors can Safe Sulong detect?] [Pointer to an integer?] [Array bounds check elimination] [Strict-aliasing rule]
  50. Prevent Out-Of-Bounds Accesses 38 long *arr = malloc(3 * sizeof(long));

    arr[4] = … {0, 0, 0} Address offset = 4 data I64Array contents [What other errors can Safe Sulong detect?] [Pointer to an integer?] [Array bounds check elimination] [Strict-aliasing rule]
  51. Prevent Out-Of-Bounds Accesses contents[4] → ArrayIndexOutOfBoundsException 38 long *arr =

    malloc(3 * sizeof(long)); arr[4] = … {0, 0, 0} Address offset = 4 data I64Array contents [What other errors can Safe Sulong detect?] [Pointer to an integer?] [Array bounds check elimination] [Strict-aliasing rule]
  52. Prevent Use-after-Free Errors 39 long *arr = malloc(3 * sizeof(long));

    free(arr); {0, 0, 0} Address offset = 0 data I64Array contents [What other errors can Safe Sulong detect?] [Pointer to an integer?] [Strict-aliasing rule]
  53. Prevent Use-after-Free Errors 40 long *arr = malloc(3 * sizeof(long));

    free(arr); Address offset = 0 data I64Array contents=null [What other errors can Safe Sulong detect?] [Pointer to an integer?] [Strict-aliasing rule]
  54. Prevent Use-after-Free Errors 41 long *arr = malloc(3 * sizeof(long));

    free(arr); arr[0] = … Address offset = 0 data I64Array contents=null [What other errors can Safe Sulong detect?] [Pointer to an integer?] [Strict-aliasing rule]
  55. Prevent Use-after-Free Errors contents[0] → NullPointerException 42 long *arr =

    malloc(3 * sizeof(long)); free(arr); arr[0] = … Address offset = 0 data I64Array contents=null [What other errors can Safe Sulong detect?] [Pointer to an integer?] [Strict-aliasing rule]
  56. Prevent Integer Overflows 43 int a = 1, b =

    INT_MAX; int val = a + b; Math.addExact(a, b); [What other errors can Safe Sulong detect?] [Pointer to an integer?]
  57. Prevent Integer Overflows 43 int a = 1, b =

    INT_MAX; int val = a + b; Math.addExact(a, b); ArithmeticException [What other errors can Safe Sulong detect?] [Pointer to an integer?]
  58. Example Program 45 void processRequests () { int i =

    0; do { processPacket (); i ++; } while (i < 10000) ; }
  59. Example Program 45 void processRequests () { int i =

    0; do { processPacket (); i ++; } while (i < 10000) ; } define void @processRequests () #0 { ; ( basic block 0) br label %1 ; <label >:1 ( basic block 1) %i .0 = phi i32 [ 0, %0 ], [ %2 , %1 ] call void @processPacket () %2 = add nsw i32 %i .0, 1 %3 = icmp slt i32 %2 , 10000 br i1 %3 , label %1 , label %4 ; <label >:4 ( basic block 2) ret void } LLVM IR Clang
  60. 46 define void @processRequests () #0 { ; ( basic

    block 0) br label %1 ; <label >:1 ( basic block 1) %i .0 = phi i32 [ 0, %0 ], [ %2 , %1 ] call void @processPacket () %2 = add nsw i32 %i .0, 1 %3 = icmp slt i32 %2 , 10000 br i1 %3 , label %1 , label %4 ; <label >:4 ( basic block 2) ret void } LLVM IR Implementation of Operations
  61. 46 define void @processRequests () #0 { ; ( basic

    block 0) br label %1 ; <label >:1 ( basic block 1) %i .0 = phi i32 [ 0, %0 ], [ %2 , %1 ] call void @processPacket () %2 = add nsw i32 %i .0, 1 %3 = icmp slt i32 %2 , 10000 br i1 %3 , label %1 , label %4 ; <label >:4 ( basic block 2) ret void } LLVM IR write %2 add read %i.0 1 Executable Abstract Syntax Tree Implementation of Operations
  62. 47 write %2 add read %i.0 1 Abstract Syntax Tree

    class LLVMI32LiteralNode extends LLVMExpressionNode { final int literal; public LLVMI32LiteralNode(int literal) { this.literal = literal; } @Override public int executeI32(VirtualFrame frame) { return literal; } } Executable AST node Implementation of Operations
  63. 47 write %2 add read %i.0 1 Abstract Syntax Tree

    class LLVMI32LiteralNode extends LLVMExpressionNode { final int literal; public LLVMI32LiteralNode(int literal) { this.literal = literal; } @Override public int executeI32(VirtualFrame frame) { return literal; } } Executable AST node Implementation of Operations
  64. 47 write %2 add read %i.0 1 Abstract Syntax Tree

    class LLVMI32LiteralNode extends LLVMExpressionNode { final int literal; public LLVMI32LiteralNode(int literal) { this.literal = literal; } @Override public int executeI32(VirtualFrame frame) { return literal; } } Executable AST node Nodes return their result in an execute() method Implementation of Operations (Würthinger et al. 2012)
  65. 48 Abstract Syntax Tree @NodeChildren({@NodeChild("leftNode"), @NodeChild("rightNode")}) class LLVMI32AddNode extends LLVMExpressionNode

    { @Specialization protected int executeI32(int left, int right) { return left + right; } } Executable AST node write %2 add read %i.0 1 Implementation of Operations
  66. 48 Abstract Syntax Tree @NodeChildren({@NodeChild("leftNode"), @NodeChild("rightNode")}) class LLVMI32AddNode extends LLVMExpressionNode

    { @Specialization protected int executeI32(int left, int right) { return left + right; } } Executable AST node write %2 add read %i.0 1 Implementation of Operations
  67. 48 Abstract Syntax Tree @NodeChildren({@NodeChild("leftNode"), @NodeChild("rightNode")}) class LLVMI32AddNode extends LLVMExpressionNode

    { @Specialization protected int executeI32(int left, int right) { return left + right; } } Executable AST node write %2 add read %i.0 1 A DSL allows a declarative style of specifying and executing nodes Implementation of Operations (Humer et al. 2015)
  68. 49 Abstract Syntax Tree @NodeChild("valueNode") class LLVMWriteI32Node extends LLVMExpressionNode {

    final FrameSlot slot; public LLVMWriteI32Node(FrameSlot slot) { this.slot = slot; } @Specialization public void writeI32(VirtualFrame frame, int value) { frame.setInt(slot, value); } } Executable AST node write %2 add read %i.0 1 Implementation of Operations
  69. 49 Abstract Syntax Tree @NodeChild("valueNode") class LLVMWriteI32Node extends LLVMExpressionNode {

    final FrameSlot slot; public LLVMWriteI32Node(FrameSlot slot) { this.slot = slot; } @Specialization public void writeI32(VirtualFrame frame, int value) { frame.setInt(slot, value); } } Executable AST node write %2 add read %i.0 1 Implementation of Operations
  70. 49 Abstract Syntax Tree @NodeChild("valueNode") class LLVMWriteI32Node extends LLVMExpressionNode {

    final FrameSlot slot; public LLVMWriteI32Node(FrameSlot slot) { this.slot = slot; } @Specialization public void writeI32(VirtualFrame frame, int value) { frame.setInt(slot, value); } } Executable AST node write %2 add read %i.0 1 Local variables are represented by an array-like VirtualFrame object Implementation of Operations
  71. 50 define void @processRequests () #0 { ; ( basic

    block 0) br label %1 ; <label >:1 ( basic block 1) %i = phi i32 [ 0, %0 ], [ %2 , %1 ] call void @processPacket () %2 = add nsw i32 %i, 1 %3 = icmp slt i32 %2 , 10000 br i1 %3 , label %1 , label %4 ; <label >:4 ( basic block 2) ret void } LLVM IR Implementation of Basic Blocks
  72. 50 define void @processRequests () #0 { ; ( basic

    block 0) br label %1 ; <label >:1 ( basic block 1) %i = phi i32 [ 0, %0 ], [ %2 , %1 ] call void @processPacket () %2 = add nsw i32 %i, 1 %3 = icmp slt i32 %2 , 10000 br i1 %3 , label %1 , label %4 ; <label >:4 ( basic block 2) ret void } LLVM IR Executable Abstract Syntax Tree Implementation of Basic Blocks Block1
  73. Example Program 51 define void @processRequests () #0 { ;

    ( basic block 0) br label %1 ; <label >:1 ( basic block 1) %i .0 = phi i32 [ 0, %0 ], [ %2 , %1 ] call void @processPacket () %2 = add nsw i32 %i .0, 1 %3 = icmp slt i32 %2 , 10000 br i1 %3 , label %1 , label %4 ; <label >:4 ( basic block 2) ret void } LLVM IR
  74. Example Program 51 define void @processRequests () #0 { ;

    ( basic block 0) br label %1 ; <label >:1 ( basic block 1) %i .0 = phi i32 [ 0, %0 ], [ %2 , %1 ] call void @processPacket () %2 = add nsw i32 %i .0, 1 %3 = icmp slt i32 %2 , 10000 br i1 %3 , label %1 , label %4 ; <label >:4 ( basic block 2) ret void } LLVM IR An AST interpreter cannot represent goto statements
  75. Block0 Block1 Block2 Basic Block Dispatch Node 1 2 -1

    1 Interpreter 52 int blockIndex = 0; while (blockIndex != -1) blockIndex = blocks[blockIndex].execute (); Interpreter implementation
  76. Block0 Block1 Block2 Basic Block Dispatch Node 1 2 -1

    1 Interpreter 53 define void @processRequests () #0 { ; ( basic block 0) br label %1 ; <label >:1 ( basic block 1) %i .0 = phi i32 [ 0, %0 ], [ %2 , %1 ] call void @processPacket () %2 = add nsw i32 %i .0, 1 %3 = icmp slt i32 %2 , 10000 br i1 %3 , label %1 , label %4 ; <label >:4 ( basic block 2) ret void } Program execution
  77. Block0 Block1 Block2 Basic Block Dispatch Node 1 2 -1

    1 Interpreter 54 define void @processRequests () #0 { ; ( basic block 0) br label %1 ; <label >:1 ( basic block 1) %i .0 = phi i32 [ 0, %0 ], [ %2 , %1 ] call void @processPacket () %2 = add nsw i32 %i .0, 1 %3 = icmp slt i32 %2 , 10000 br i1 %3 , label %1 , label %4 ; <label >:4 ( basic block 2) ret void } Program execution
  78. Block0 Block1 Block2 Basic Block Dispatch Node 1 2 -1

    1 Interpreter 55 define void @processRequests () #0 { ; ( basic block 0) br label %1 ; <label >:1 ( basic block 1) %i .0 = phi i32 [ 0, %0 ], [ %2 , %1 ] call void @processPacket () %2 = add nsw i32 %i .0, 1 %3 = icmp slt i32 %2 , 10000 br i1 %3 , label %1 , label %4 ; <label >:4 ( basic block 2) ret void } Program execution (Rigger et al. 2016 VMIL)
  79. Block0 Block1 Block2 Basic Block Dispatch Node 1 2 -1

    1 Compiler 56 int blockIndex = 0; block0: blockIndex = 1 %i.0 = 0 block1: while (true): processPacket() %2 = %i.0 + 1 %3 = %2 < 10000 if %3: blockIndex = 1 %i.0 = %2 continue; else: blockIndex = 2 block2: blockIndex = -1 return Partially evaluated interpreter (pseudo code) Graal
  80. Block0 Block1 Block2 Basic Block Dispatch Node 1 2 -1

    1 Compiler 56 int blockIndex = 0; block0: blockIndex = 1 %i.0 = 0 block1: while (true): processPacket() %2 = %i.0 + 1 %3 = %2 < 10000 if %3: blockIndex = 1 %i.0 = %2 continue; else: blockIndex = 2 block2: blockIndex = -1 return Partially evaluated interpreter (pseudo code) Graal further optimizes the partially evaluated interpreter Graal
  81. Evaluation Hypotheses • Effectiveness: Safe Sulong detects bugs that are

    overlooked by other tools • Performance: Safe Sulong’s performance overhead is “reasonable” 57
  82. Effectiveness: Errors in GitHub Projects • Valgrind detected half of

    the errors • 8 errors not found by LLVM’s AddressSanitizer (and Valgrind) • Compiler optimizations (ASan –O3) prevented the detection of 4 additional bugs 59 [What are the other errors?] [Completenesss vs. Soundness] [Comparison tools]
  83. Effectiveness: Errors in GitHub Projects 60 int main(int argc, char**

    argv) { printf("%d %s\n", argc, argv[5]); } Out-of-bounds accesses to argv are not instrumented by ASan [What are the other errors?] [Comparison tools]
  84. Effectiveness: Errors in GitHub Projects • 8 errors not found

    by LLVM’s AddressSanitizer and Valgrind 62 int main(int argc, char** argv) { printf("%d %s\n", argc, argv[5]); } In Safe Sulong instrumentation cannot be omitted by design [What are the other errors?] [Completenesss vs. Soundness] [Comparison tools]
  85. Peak Performance 63 lower is better Safe Sulong‘s performance is

    mostly between Clang –O0 and Clang –O3, and mostly faster than ASan –O0
  86. Warmup Performance 64 0 10 20 30 40 50 60

    1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 Iterations per second Second Meteor benchmark ASan (Clang O0) Safe Sulong Valgrind
  87. Symbolic execution Hardware security Static analysis Attacker mitigation Existing Approaches

    65 Instrumentation- based bug-finding tools Safe languages Safe Sulong improves upon aspects of existing bug-finding tools • Safe optimizations • Abstraction from the native execution model
  88. Symbolic execution Hardware security Static analysis Attacker mitigation Existing Approaches

    66 Instrumentation- based bug-finding tools Safe languages Safe Sulong leverages a safe implementation language for its bug-finding capabilities
  89. Limitations/Selected Threats to Validity • Lack of support for binary

    libraries • Generalizability of the benchmark results • Relied on a custom libc for evaluation • Lacks common low-level features 67
  90. Defined Behavior in C 69 C11 Implementing the semantics described

    in the standard is (often) relatively straightforward int arr[3]; int result = &arr[0] < &arr[2];
  91. Relational Comparison of Pointers 70 a: Address offset pointee b:

    Address offset pointee < integer_rep(a) < integer_rep(b)
  92. Integer Representation: Safe Sulong 71 integer_rep(a) = a.offset int arr[3];

    int result = &arr[0] < &arr[2]; Can anyone see where our implementation could break programs? {0, 0, 0} Address offset = 2 data I64Array contents
  93. 72 Response % of Respondants Yes 33% Yes, but it

    shouldn’t 12% No, but there might well be 29% No, that would be crazy 16% Don’t know 8% [Do you know code that uses] relational comparison (with <, >, <=, or >=) of two pointers to separately allocated objects (of compatible object types)? (Memarian et al. 2016) Code Relies on Undefined Behavior
  94. Idea 74 Goal: Continue execution in the presence of UB

    and make common otherwise undefined patterns work
  95. {0, 0, 0} Address offset = 2 data I64Array contents

    Integer Representation: Lenient C 76 integer_rep(a) = (long) System.identityHashCode(a.pointee) << 32 | offset;
  96. {0, 0, 0} Address offset = 2 data I64Array contents

    Integer Representation: Lenient C 76 Breaks antisymmetry as different objects might have the same hash code  integer_rep(a) = (long) System.identityHashCode(a.pointee) << 32 | offset;
  97. {0, 0, 0} Address address data I64Array contents offset =

    2 address Integer Representation: Lenient C 77 integer_rep(a) = a.pointee.address
  98. {0, 0, 0} Address address data I64Array contents offset =

    2 address Integer Representation: Lenient C 77 integer_rep(a) = a.pointee.address Need to assign distinct addresses ☺
  99. Address offset = 0 data I64Array contents {0, 0, 0}

    Mitigate Use-after-Free Errors 78 long *arr = malloc(3 * sizeof(long)); free(arr);
  100. Address offset = 0 data I64Array contents {0, 0, 0}

    Mitigate Use-after-Free Errors 79 long *arr = malloc(3 * sizeof(long)); free(arr); arr[0] = …
  101. Address offset = 0 data I64Array contents {0, 0, 0}

    Mitigate Use-after-Free Errors contents[0] = … 79 long *arr = malloc(3 * sizeof(long)); free(arr); arr[0] = …
  102. Address offset = 0 data I64Array contents {0, 0, 0}

    Mitigate Use-after-Free Errors contents[0] = … 79 long *arr = malloc(3 * sizeof(long)); free(arr); arr[0] = … The GC will collect the object when it is no longer referenced
  103. Mitigate Integer Overflows 80 int a = 1, b =

    INT_MAX; int val = a + b; a + b
  104. Mitigate Integer Overflows 80 int a = 1, b =

    INT_MAX; int val = a + b; a + b INT_MIN
  105. Existing Approaches 81 Instrumentation- based bug-finding tools Symbolic execution Hardware

    security Static analysis Safe languages Attacker mitigation Lenient C assigns semantics to otherwise undefined behavior (cf. Friendly C)
  106. Existing Approaches 82 Instrumentation- based bug-finding tools Symbolic execution Hardware

    security Static analysis Safe languages Attacker mitigation Increases robustness of programs without terminating execution
  107. Idea 84 Records metadata int *arr = malloc(sizeof (int) *

    10); … arr[4] = … ; arr.size = 40
  108. Idea 84 Records metadata int *arr = malloc(sizeof (int) *

    10); … arr[4] = … ; arr.size = 40 Checks accesses
  109. Idea 84 Records metadata int *arr = malloc(sizeof (int) *

    10); … arr[4] = … ; arr.size = 40 Checks accesses int size = size_right(str);
  110. Idea 84 Records metadata int *arr = malloc(sizeof (int) *

    10); … arr[4] = … ; arr.size = 40 Checks accesses Query meta data From the tool int size = size_right(str);
  111. Introspection Functions 85 int *arr = malloc(sizeof (int) * 10)

    ; int *ptr = &(arr[4]); printf ("%ld\n", size_right(ptr)); // prints 24 _size_right() sizeof(int) * 10
  112. Introspection Functions 85 int *arr = malloc(sizeof (int) * 10)

    ; int *ptr = &(arr[4]); printf ("%ld\n", size_right(ptr)); // prints 24 _size_right() sizeof(int) * 10 We also designed introspection functions for other meta data
  113. Example: strlen() 86 size_t strlen(const char *str) { size_t len

    = 0; while (*str != '\0') { len++; str++; } return len; }
  114. Example: strlen() 86 size_t strlen(const char *str) { size_t len

    = 0; while (*str != '\0') { len++; str++; } return len; } P r o g r a m m i n g \0 ... ...
  115. Example: strlen() 86 size_t strlen(const char *str) { size_t len

    = 0; while (*str != '\0') { len++; str++; } return len; } P r o g r a m m i n g \0 ... ...
  116. Example: strlen() 86 size_t strlen(const char *str) { size_t len

    = 0; while (*str != '\0') { len++; str++; } return len; } 11 P r o g r a m m i n g \0 ... ...
  117. Example: strlen() 87 size_t strlen(const char *str) { size_t len

    = 0; while (*str != '\0') { len++; str++; } return len; } P r o g r a m m i n g ... ...
  118. Example: strlen() 87 size_t strlen(const char *str) { size_t len

    = 0; while (*str != '\0') { len++; str++; } return len; } P r o g r a m m i n g ... ...
  119. Example: strlen() 87 size_t strlen(const char *str) { size_t len

    = 0; while (*str != '\0') { len++; str++; } return len; } P r o g r a m m i n g ... ... ==16497==ERROR: AddressSanitizer: stack-buffer- overflow on address 0x7ffc59c0ef63 READ of size 1 at 0x7ffc59c0ef63 thread T0 #0 0x4e7442 in strlen /home/manuel/test.c:10:12 #1 0x4e7392 in main /home/manuel/test.c:5:5
  120. size_t strlen(const char *str) { size_t len = 0; while

    ( size_right(str) > 0 && *str != '\0') { len++; str++; } return len; } Example: strlen() 90 P r o g r a m m i n g ... ...
  121. size_t strlen(const char *str) { size_t len = 0; while

    ( size_right(str) > 0 && *str != '\0') { len++; str++; } return len; } Example: strlen() 90 P r o g r a m m i n g ... ...
  122. size_t strlen(const char *str) { size_t len = 0; while

    ( size_right(str) > 0 && *str != '\0') { len++; str++; } return len; } Example: strlen() 90 11 P r o g r a m m i n g ... ...
  123. size_t strlen(const char *str) { size_t len = 0; while

    ( size_right(str) > 0 && *str != '\0') { len++; str++; } return len; } Example: strlen() 90 11 P r o g r a m m i n g ... ... We enhanced a libc to deal with unterminated strings
  124. CVE-2017-9047 (Libxml2) 93 if (content->name != NULL) strcat(buf, (char *)

    content->name); The parser printed a truncated error message, similar to the fixed version
  125. Hardware security Existing Approaches 94 Instrumentation- based bug-finding tools Symbolic

    execution Safe languages Static analysis Attacker mitigation
  126. Hardware security Existing Approaches 94 Instrumentation- based bug-finding tools Symbolic

    execution Safe languages Static analysis Extension of Failure-oblivious Computing (Rinard et al. 2004) Attacker mitigation
  127. Sulong as Part of GraalVM 97 Java Virtual Machine Graal

    Compiler Truffle Framework https://www.graalvm.org/ TruffleRuby Graal.js Graal.python FastR (Würthinger et al. 2016)
  128. Sulong as Part of GraalVM 97 Java Virtual Machine Graal

    Compiler Truffle Framework https://www.graalvm.org/ TruffleRuby Graal.js Graal.python FastR Optimization Boundary (Würthinger et al. 2016)
  129. Sulong as Part of GraalVM 98 Java Virtual Machine Graal

    Compiler Truffle Framework https://www.graalvm.org/ TruffleRuby Graal.js Graal.python FastR Optimization Boundary Java Native Interface (Würthinger et al. 2016)
  130. Sulong as Part of GraalVM 99 Java Virtual Machine Graal

    Compiler Truffle Framework https://www.graalvm.org/ TruffleRuby Graal.js Graal.python FastR Optimization Boundary LLVM IR Interpreter LLVM IR Clang Flang (Würthinger et al. 2016)
  131. Sulong Key Collaborators 101 Jacob Kreindl Raphael Mosaner Roland Schatz

    Josef Eisl Christian Häubl Matthias Grimmer Thomas Pointhuber Daniel Pekarek Chris Seaton Lukas Stadler Florian Angerer David Gnedt https://github.com/graalvm/sulong/graphs/contributors Swapnil Gaikwad
  132. Sulong Key Collaborators 102 Jacob Kreindl Raphael Mosaner Roland Schatz

    Josef Eisl Christian Häubl Matthias Grimmer Thomas Pointhuber Daniel Pekarek Chris Seaton Lukas Stadler Florian Angerer David Gnedt Swapnil Gaikwad EuroLLVM 2019 Talk LLVM IR in GraalVM: Multi-Level, Polyglot Debugging with Sulong https://github.com/graalvm/sulong/graphs/contributors
  133. Sulong Key Collaborators 103 Jacob Kreindl Raphael Mosaner Roland Schatz

    Josef Eisl Christian Häubl Matthias Grimmer Thomas Pointhuber Daniel Pekarek Chris Seaton Lukas Stadler Florian Angerer David Gnedt Swapnil Gaikwad EuroLLVM 2019 Talk Sulong: An experience report of using the "other end" of LLVM in GraalVM. https://github.com/graalvm/sulong/graphs/contributors
  134. Summary 104 UB is problematic Existing approaches can “optimize” UB

    “away” Execute C/C++ on the JVM! Automatic checks detect UB But: Programs often invoke UB Metadata for manual checks GraalVM