Salzburg '18: Memory-safe and Efficient Execution of C/C++ on the GraalVM

Slide 1

Slide 1 text

Memory-safe and Efficient Execution of C/C++ on the GraalVM Manuel Rigger Johannes Kepler University Linz, Austria Supervisor: Hanspeter Mössenböck University of Salzburg, 17 May 2018

Slide 2

Slide 2 text

Show of Hands Who has heard of GraalVM? 2 Credit: https://pngtree.com/

Slide 3

Slide 3 text

3 April 17: GraalVM 1.0 RC http://www.graalvm.org/

Slide 4

Slide 4 text

GraalVM • Multi-language VM • Language interoperability • Roots at JKU Linz, productized by Oracle 4 Java Virtual Machine Graal Compiler Truffle Framework TruffleRuby Graal.js Graal.python FastR Substrate VM

Slide 5

Slide 5 text

Truffle and Graal 5 U U U U U I I I G G I I I G G Node Specialization for Profiling Feedback AST Interpreter Specialized Nodes AST Interpreter Uninitialized Nodes Compilation using Partial Evaluation Compiled Code Node Transitions S U I D G Uninitialized Integer Generic Double String (Würthinger et al. 2012, 2013)

Slide 6

Slide 6 text

Truffle and Graal 6 I I I G G I I I G G Transfer back to AST Interpreter D I D G G D I D G G Node Specialization to Update Profiling Feedback Recompilation using Partial Evaluation (Würthinger et al. 2012, 2013)

Slide 7

Slide 7 text

Project Sulong 7 Execute LLVM-based languages on the GraalVM

Slide 8

Slide 8 text

Sulong as Part of GraalVM 8 Java Virtual Machine Graal Compiler Truffle Framework TruffleRuby Graal.js Graal.python FastR (Rigger et al. 2016 ICOOOLPS)

Slide 9

Slide 9 text

Sulong as Part of GraalVM 8 Java Virtual Machine Graal Compiler Truffle Framework TruffleRuby Graal.js Graal.python FastR Optimization Boundary (Rigger et al. 2016 ICOOOLPS)

Slide 10

Slide 10 text

Sulong as Part of GraalVM 9 Java Virtual Machine Graal Compiler Truffle Framework TruffleRuby Graal.js Graal.python FastR Optimization Boundary Java Native Interface (Rigger et al. 2016 ICOOOLPS)

Slide 11

Slide 11 text

Sulong as Part of GraalVM 10 Java Virtual Machine Graal Compiler Truffle Framework TruffleRuby Graal.js Graal.python FastR Optimization Boundary LLVM IR Interpreter LLVM IR Clang Flang (Rigger et al. 2016 ICOOOLPS)

Slide 12

Slide 12 text

Project Sulong 11 In GraalVM, Sulong is used to implement native function interfaces

Slide 13

Slide 13 text

Project Sulong 12 Can we execute C/C++ safely and efficiently by abstracting their semantic to Java?

Slide 14

Slide 14 text

Advantages of Execution on the JVM 13 Checked accesses

Slide 15

Slide 15 text

Buffer Overflows 14 int *arr = malloc(3 * sizeof(int)); arr[5] = …

Slide 16

Slide 16 text

Buffer Overflows 14 int *arr = malloc(3 * sizeof(int)); arr[5] = … C Undefined Behavior

Slide 17

Slide 17 text

Buffer Overflows 14 int *arr = malloc(3 * sizeof(int)); arr[5] = … Java C Undefined Behavior ArrayIndexOutOfBoundsException int[] arr = new int[3]; arr[5] = …

Slide 18

Slide 18 text

Use-after-free Errors 15 free(arr); arr[0] = …

Slide 19

Slide 19 text

Use-after-free Errors 15 free(arr); arr[0] = … C Undefined Behavior

Slide 20

Slide 20 text

Use-after-free Errors 15 free(arr); arr[0] = … C Undefined Behavior NullPointerException Java arr = null; arr[0] =

Slide 21

Slide 21 text

Advantages of Execution on the JVM 16 Checked accesses

Slide 22

Slide 22 text

Advantages of Execution on the JVM 16 Checked accesses Garbage collection

Slide 23

Slide 23 text

Garbage Collection 17 C int *arr = malloc(3 * sizeof(int)); free(arr); Memory Leak

Slide 24

Slide 24 text

Garbage Collection 17 Java C int[] arr = new int[3]; int *arr = malloc(3 * sizeof(int)); free(arr); Memory Leak Collected by the GC

Slide 25

Slide 25 text

Advantages of Execution on the JVM 18 Checked accesses Garbage collection

Slide 26

Slide 26 text

Advantages of Execution on the JVM 18 Checked accesses Garbage collection Well-defined semantics

Slide 27

Slide 27 text

Compiler Optimizations 19 int test(size_t i) { int arr[2] = {0}; if (i >= 2) { arr[2] = 0xcafe; } return arr[i]; }

Slide 28

Slide 28 text

Compiler Optimizations 19 int test(size_t i) { int arr[2] = {0}; if (i >= 2) { arr[2] = 0xcafe; } return arr[i]; } int test(size_t i) { return 0; } C

Slide 29

Slide 29 text

Compiler Optimizations 19 int test(size_t i) { int arr[2] = {0}; if (i >= 2) { arr[2] = 0xcafe; } return arr[i]; } int test(size_t i) { return 0; } State-of-the-art detection tools fail to detect the out-of-bounds access C

Slide 30

Slide 30 text

Compiler Optimizations 20 int test(size_t i) { int arr[2] = {0}; if (i >= 2) { arr[2] = 0xcafe; } return arr[i]; }

Slide 31

Slide 31 text

Compiler Optimizations 20 ArrayIndexOutOfBoundsException int test(size_t i) { int arr[2] = {0}; if (i >= 2) { arr[2] = 0xcafe; } return arr[i]; } Java

Slide 32

Slide 32 text

21 Dynamic Optimizations & Memory Safety Implementation of the Interpreter Evaluation Unstandardized Elements in C Projects

Slide 33

Slide 33 text

22 Implementation of the Interpreter

Slide 34

Slide 34 text

Example Program 23 void processRequests () { int i = 0; do { processPacket (); i ++; } while (i < 10000) ; }

Slide 35

Slide 35 text

Example Program 23 void processRequests () { int i = 0; do { processPacket (); i ++; } while (i < 10000) ; } define void @processRequests () #0 { ; ( basic block 0) br label %1 ; :1 ( basic block 1) %i .0 = phi i32 [ 0, %0 ], [ %2 , %1 ] call void @processPacket () %2 = add nsw i32 %i .0, 1 %3 = icmp slt i32 %2 , 10000 br i1 %3 , label %1 , label %4 ; :4 ( basic block 2) ret void } LLVM IR Clang

Slide 36

Slide 36 text

24 define void @processRequests () #0 { ; ( basic block 0) br label %1 ; :1 ( basic block 1) %i .0 = phi i32 [ 0, %0 ], [ %2 , %1 ] call void @processPacket () %2 = add nsw i32 %i .0, 1 %3 = icmp slt i32 %2 , 10000 br i1 %3 , label %1 , label %4 ; :4 ( basic block 2) ret void } LLVM IR Implementation of Operations (Rigger et al. 2016 VMIL)

Slide 37

Slide 37 text

24 define void @processRequests () #0 { ; ( basic block 0) br label %1 ; :1 ( basic block 1) %i .0 = phi i32 [ 0, %0 ], [ %2 , %1 ] call void @processPacket () %2 = add nsw i32 %i .0, 1 %3 = icmp slt i32 %2 , 10000 br i1 %3 , label %1 , label %4 ; :4 ( basic block 2) ret void } LLVM IR write %2 add read %i.0 1 Executable Abstract Syntax Tree Implementation of Operations (Rigger et al. 2016 VMIL)

Slide 38

Slide 38 text

25 write %2 add read %i.0 1 Abstract Syntax Tree Implementation of Operations (Rigger et al. 2016 VMIL)

Slide 39

Slide 39 text

25 write %2 add read %i.0 1 Abstract Syntax Tree class LLVMI32LiteralNode extends LLVMExpressionNode { final int literal; public LLVMI32LiteralNode(int literal) { this.literal = literal; } @Override public int executeI32(VirtualFrame frame) { return literal; } } Executable AST node Implementation of Operations (Rigger et al. 2016 VMIL)

Slide 40

Slide 40 text

Slide 41

Slide 41 text

25 write %2 add read %i.0 1 Abstract Syntax Tree class LLVMI32LiteralNode extends LLVMExpressionNode { final int literal; public LLVMI32LiteralNode(int literal) { this.literal = literal; } @Override public int executeI32(VirtualFrame frame) { return literal; } } Executable AST node Nodes return their result in an execute() method Implementation of Operations (Rigger et al. 2016 VMIL)

Slide 42

Slide 42 text

26 Abstract Syntax Tree write %2 add read %i.0 1 Implementation of Operations (Rigger et al. 2016 VMIL)

Slide 43

Slide 43 text

26 Abstract Syntax Tree @NodeChildren({@NodeChild("leftNode"), @NodeChild("rightNode")}) class LLVMI32AddNode extends LLVMExpressionNode { @Specialization protected int executeI32(int left, int right) { return left + right; } } Executable AST node write %2 add read %i.0 1 Implementation of Operations (Rigger et al. 2016 VMIL)

Slide 44

Slide 44 text

Slide 45

Slide 45 text

26 Abstract Syntax Tree @NodeChildren({@NodeChild("leftNode"), @NodeChild("rightNode")}) class LLVMI32AddNode extends LLVMExpressionNode { @Specialization protected int executeI32(int left, int right) { return left + right; } } Executable AST node write %2 add read %i.0 1 A DSL allows a declarative style of specifying and executing nodes Implementation of Operations (Rigger et al. 2016 VMIL)

Slide 46

Slide 46 text

27 Abstract Syntax Tree write %2 add read %i.0 1 Implementation of Operations (Rigger et al. 2016 VMIL)

Slide 47

Slide 47 text

27 Abstract Syntax Tree @NodeChild("valueNode") class LLVMWriteI32Node extends LLVMExpressionNode { final FrameSlot slot; public LLVMWriteI32Node(FrameSlot slot) { this.slot = slot; } @Specialization public void writeI32(VirtualFrame frame, int value) { frame.setInt(slot, value); } } Executable AST node write %2 add read %i.0 1 Implementation of Operations (Rigger et al. 2016 VMIL)

Slide 48

Slide 48 text

Slide 49

Slide 49 text

27 Abstract Syntax Tree @NodeChild("valueNode") class LLVMWriteI32Node extends LLVMExpressionNode { final FrameSlot slot; public LLVMWriteI32Node(FrameSlot slot) { this.slot = slot; } @Specialization public void writeI32(VirtualFrame frame, int value) { frame.setInt(slot, value); } } Executable AST node write %2 add read %i.0 1 Local variables are represented by an array-like VirtualFrame object Implementation of Operations (Rigger et al. 2016 VMIL)

Slide 50

Slide 50 text

28 define void @processRequests () #0 { ; ( basic block 0) br label %1 ; :1 ( basic block 1) %i = phi i32 [ 0, %0 ], [ %2 , %1 ] call void @processPacket () %2 = add nsw i32 %i, 1 %3 = icmp slt i32 %2 , 10000 br i1 %3 , label %1 , label %4 ; :4 ( basic block 2) ret void } LLVM IR Implementation of Basic Blocks (Rigger et al. 2016 VMIL)

Slide 51

Slide 51 text

28 define void @processRequests () #0 { ; ( basic block 0) br label %1 ; :1 ( basic block 1) %i = phi i32 [ 0, %0 ], [ %2 , %1 ] call void @processPacket () %2 = add nsw i32 %i, 1 %3 = icmp slt i32 %2 , 10000 br i1 %3 , label %1 , label %4 ; :4 ( basic block 2) ret void } LLVM IR Executable Abstract Syntax Tree Implementation of Basic Blocks Block1 (Rigger et al. 2016 VMIL)

Slide 52

Slide 52 text

Example Program 29 define void @processRequests () #0 { ; ( basic block 0) br label %1 ; :1 ( basic block 1) %i .0 = phi i32 [ 0, %0 ], [ %2 , %1 ] call void @processPacket () %2 = add nsw i32 %i .0, 1 %3 = icmp slt i32 %2 , 10000 br i1 %3 , label %1 , label %4 ; :4 ( basic block 2) ret void } LLVM IR (Rigger et al. 2016 VMIL)

Slide 53

Slide 53 text

Slide 54

Slide 54 text

Block0 Block1 Block2 Basic Block Dispatch Node 1 2 -1 1 Interpreter 30 int blockIndex = 0; while (blockIndex != -1) blockIndex = blocks[blockIndex].execute (); Interpreter implementation (Rigger et al. 2016 VMIL)

Slide 55

Slide 55 text

Block0 Block1 Block2 Basic Block Dispatch Node 1 2 -1 1 Interpreter 31 define void @processRequests () #0 { ; ( basic block 0) br label %1 ; :1 ( basic block 1) %i .0 = phi i32 [ 0, %0 ], [ %2 , %1 ] call void @processPacket () %2 = add nsw i32 %i .0, 1 %3 = icmp slt i32 %2 , 10000 br i1 %3 , label %1 , label %4 ; :4 ( basic block 2) ret void } Program execution (Rigger et al. 2016 VMIL)

Slide 56

Slide 56 text

Block0 Block1 Block2 Basic Block Dispatch Node 1 2 -1 1 Interpreter 32 define void @processRequests () #0 { ; ( basic block 0) br label %1 ; :1 ( basic block 1) %i .0 = phi i32 [ 0, %0 ], [ %2 , %1 ] call void @processPacket () %2 = add nsw i32 %i .0, 1 %3 = icmp slt i32 %2 , 10000 br i1 %3 , label %1 , label %4 ; :4 ( basic block 2) ret void } Program execution (Rigger et al. 2016 VMIL)

Slide 57

Slide 57 text

Block0 Block1 Block2 Basic Block Dispatch Node 1 2 -1 1 Interpreter 33 define void @processRequests () #0 { ; ( basic block 0) br label %1 ; :1 ( basic block 1) %i .0 = phi i32 [ 0, %0 ], [ %2 , %1 ] call void @processPacket () %2 = add nsw i32 %i .0, 1 %3 = icmp slt i32 %2 , 10000 br i1 %3 , label %1 , label %4 ; :4 ( basic block 2) ret void } Program execution (Rigger et al. 2016 VMIL)

Slide 58

Slide 58 text

Truffle and Graal 34 U U U U U I I I G G I I I G G Node Specialization for Profiling Feedback AST Interpreter Specialized Nodes AST Interpreter Uninitialized Nodes Compilation using Partial Evaluation Compiled Code Node Transitions S U I D G Uninitialized Integer Generic Double String (Würthinger et al. 2012, 2013)

Slide 59

Slide 59 text

Block0 Block1 Block2 Basic Block Dispatch Node 1 2 -1 1 Compiler 35 int blockIndex = 0; block0: blockIndex = 1 %i.0 = 0 block1: while (true): processPacket() %2 = %i.0 + 1 %3 = %2 < 10000 if %3: blockIndex = 1 %i.0 = %2 continue; else: blockIndex = 2 block2: blockIndex = -1 return Partially evaluated interpreter (pseudo code) (Rigger et al. 2016 VMIL)

Slide 60

Slide 60 text

Slide 61

Slide 61 text

36 Dynamic Optimizations & Memory Safety

Slide 62

Slide 62 text

Truffle and Graal 37 U U U U U I I I G G I I I G G Node Specialization for Profiling Feedback AST Interpreter Specialized Nodes AST Interpreter Uninitialized Nodes Compilation using Partial Evaluation Compiled Code Node Transitions S U I D G Uninitialized Integer Generic Double String (Würthinger et al. 2012, 2013)

Slide 63

Slide 63 text

Speculative Optimization: Value Profiling 38 public class LLVMI32LoadNode extends LLVMExpressionNode { final int expectedValue; // observed value @Specialization protected int doI32(Address addr) { int val = memory.getI32(addr); if (val == expectedValue) { return expectedValue; } else { CompilerDirectives.transferToInterpreter(); replace(new LLVMI32LoadGenericNode()); return val; } } } (Rigger et al. 2016 VMIL)

Slide 64

Slide 64 text

Slide 65

Slide 65 text

Polymorphic Inline Caches for Indirect Calls 39 int inc(int val) { return val + 1; } int dec(int val) { return val - 1; } int square(int val) { return val * val; } int (*func)(int); // ... result = func(4); uninit call (Rigger et al. 2016 VMIL)

Slide 66

Slide 66 text

Slide 67

Slide 67 text

int inc(int val) { return val + 1; } int dec(int val) { return val - 1; } int square(int val) { return val * val; } int (*func)(int); // ... result = func(4); Polymorphic Inline Caches for Indirect Calls 40 call inc uninit call inc (Rigger et al. 2016 VMIL)

Slide 68

Slide 68 text

Slide 69

Slide 69 text

Slide 70

Slide 70 text

int inc(int val) { return val + 1; } int dec(int val) { return val - 1; } int square(int val) { return val * val; } int (*func)(int); // ... result = func(4); call inc call dec uninit call Polymorphic Inline Caches for Indirect Calls 41 inc dec (Rigger et al. 2016 VMIL)

Slide 71

Slide 71 text

Slide 72

Slide 72 text

Polymorphic Inline Caches for Indirect Calls 42 indirect call int inc(int val) { return val + 1; } int dec(int val) { return val - 1; } int square(int val) { return val * val; } int (*func)(int); // ... result = func(4); inc dec square (Rigger et al. 2016 VMIL)

Slide 73

Slide 73 text

Slide 74

Slide 74 text

How to Represent Allocations 43 int *arr = malloc(sizeof(int) * 4); %1 = call i8* @malloc(i64 16) %2 = bitcast i8* %1 to i32* (Rigger et al. 2018 ASPLOS)

Slide 75

Slide 75 text

How to Represent Allocations 44 %1 = call i8* @malloc(i64 16) %2 = bitcast i8* %1 to i32* Address offset = 0 data UntypedAllocation size=16 (Rigger et al. 2018 ASPLOS)

Slide 76

Slide 76 text

How to Represent Allocations 45 %1 = call i8* @malloc(i64 16) %2 = bitcast i8* %1 to i32* Address offset = 0 data I32Array contents {0, 0, 0, 0} Address offset = 0 data UntypedAllocation size=16 (Rigger et al. 2018 ASPLOS)

Slide 77

Slide 77 text

Prevent Out-Of-Bounds Accesses 46 arr[5] = … Address offset = 5 data I32Array contents {0, 0, 0, 0} (Rigger et al. 2018 ASPLOS)

Slide 78

Slide 78 text

Prevent Out-Of-Bounds Accesses contents[5] → ArrayIndexOutOfBoundsException 46 arr[5] = … Address offset = 5 data I32Array contents {0, 0, 0, 0} (Rigger et al. 2018 ASPLOS)

Slide 79

Slide 79 text

Prevent Use-After-Free Errors 47 free(arr); arr[0] = … Address offset=0 data I32Array contents {0, 0, 0, 0} (Rigger et al. 2018 ASPLOS)

Slide 80

Slide 80 text

Address offset=0 data I32Array contents=null Prevent Use-After-Free Errors 48 free(arr); arr[0] = … (Rigger et al. 2018 ASPLOS)

Slide 81

Slide 81 text

Address offset=0 data I32Array contents=null Prevent Use-After-Free Errors contents[0]→ NullPointerException 49 free(arr); arr[0] = … (Rigger et al. 2018 ASPLOS)

Slide 82

Slide 82 text

50 Evaluation

Slide 83

Slide 83 text

Evaluation • Effectiveness • Bug-finding capabilities on open-source projects • Performance • Warm-up and peak performance • Comparison to state-of-the-art approaches 51 (Rigger et al. 2018 ASPLOS)

Slide 84

Slide 84 text

Evaluation: State-of-the-art Approaches 52 a.out Clang/GCC C ./a.out Hello world! (Rigger et al. 2018 ASPLOS)

Slide 85

Slide 85 text

Evaluation: State-of-the-art Approaches 52 Compile-time instrumentation •AddressSanitizer (ASan) (Serebryany et al. 2012) •SoftBound+CETS (Nagarakatte et al. 2009, 2010) a.out Clang/GCC C ./a.out Hello world! (Rigger et al. 2018 ASPLOS)

Slide 86

Slide 86 text

Slide 87

Slide 87 text

Evaluation: Effectiveness • Found 68 errors 53 (Rigger et al. 2018 ASPLOS)

Slide 88

Slide 88 text

Evaluation: Effectiveness • Found 68 errors • 8 errors were not found by ASan and Valgrind • Valgrind detected half of the errors 54 (Rigger et al. 2018 ASPLOS)

Slide 89

Slide 89 text

Evaluation: Effectiveness 55 int main(int argc, char** argv) { printf("%d %s\n", argc, argv[100]); } ASan does not instrument the main() arguments since they are allocated by libc https://github.com/google/sanitizers/issues/762 (Rigger et al. 2018 ASPLOS)

Slide 90

Slide 90 text

Discussion: Effectiveness 56 a.out Clang/GCC C ./a.out Hello world! (Rigger et al. 2018 ASPLOS)

Slide 91

Slide 91 text

Discussion: Effectiveness 56 Manually adding instrumentation is error-prone a.out Clang/GCC C ./a.out Hello world! (Rigger et al. 2018 ASPLOS)

Slide 92

Slide 92 text

Discussion: Effectiveness • 4 errors not found when optimizations were turned on 57 (Rigger et al. 2018 ASPLOS)

Slide 93

Slide 93 text

Discussion: Effectiveness 58 Static compilers: optimize code based on Undefined Behavior Bug-finding tools: find bugs assuming that violations are visible side effects (Rigger et al. 2018 ASPLOS)

Slide 94

Slide 94 text

Discussion: Effectiveness 58 Static compilers: optimize code based on Undefined Behavior Bug-finding tools: find bugs assuming that violations are visible side effects Compile with Clang –O0 when using ASan or Sulong to detect bugs (Rigger et al. 2018 ASPLOS)

Slide 95

Slide 95 text

Discussion: Effectiveness 59 ArrayIndexOutOfBoundsException int test(size_t i) { int arr[2] = {0}; if (i >= 2) { arr[2] = 0xcafe; } return arr[i]; } Java (Rigger et al. 2018 ASPLOS)

Slide 96

Slide 96 text

Evaluation: Warmup Performance 60 0 10 20 30 40 50 60 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 Iterations per second Second Meteor benchmark ASan (Clang O0) Sulong (Clang O0) Valgrind (Rigger et al. 2018 ASPLOS)

Slide 97

Slide 97 text

Slide 98

Slide 98 text

Evaluation: Peak Performance 61 lower is better (Rigger et al. 2018 ASPLOS)

Slide 99

Slide 99 text

Evaluation: Peak Performance 62 lower is better (Rigger et al. 2018 ASPLOS)

Slide 100

Slide 100 text

Evaluation: Peak Performance 62 Small benchmarks since Sulong failed executing SPEC lower is better (Rigger et al. 2018 ASPLOS)

Slide 101

Slide 101 text

Evaluation: Peak Performance 63 lower is better (Rigger et al. 2018 ASPLOS)

Slide 102

Slide 102 text

Evaluation: Peak Performance 63 Baseline is Clang –O0, Sulong –O0 is faster in all but one case lower is better (Rigger et al. 2018 ASPLOS)

Slide 103

Slide 103 text

Evaluation: Peak Performance 64 lower is better (Rigger et al. 2018 ASPLOS)

Slide 104

Slide 104 text

Evaluation: Peak Performance 64 Sulong –O0 is close to Clang –O3 in some cases lower is better (Rigger et al. 2018 ASPLOS)

Slide 105

Slide 105 text

Evaluation: Peak Performance 65 lower is better (Rigger et al. 2018 ASPLOS)

Slide 106

Slide 106 text

Evaluation: Peak Performance 65 Sulong –O0 is mostly faster than ASan –O0 lower is better (Rigger et al. 2018 ASPLOS)

Slide 107

Slide 107 text

66 Unstandardized Elements in C Projects

Slide 108

Slide 108 text

67 C projects consist of more than C code

Slide 109

Slide 109 text

68 C Projects Consist of More Than C Code

Slide 110

Slide 110 text

68 asm("rdtsc":"=a"(tickl),"=d"(tickh)); Inline Assembly C Projects Consist of More Than C Code

Slide 111

Slide 111 text

68 asm("rdtsc":"=a"(tickl),"=d"(tickh)); #pragma GCC poison printf Inline Assembly Compiler pragmas C Projects Consist of More Than C Code

Slide 112

Slide 112 text

68 if (__builtin_expect(x, 0)) foo (); asm("rdtsc":"=a"(tickl),"=d"(tickh)); #pragma GCC poison printf Inline Assembly Compiler builtins Compiler pragmas C Projects Consist of More Than C Code

Slide 113

Slide 113 text

68 if (__builtin_expect(x, 0)) foo (); asm("rdtsc":"=a"(tickl),"=d"(tickh)); #pragma GCC poison printf Inline Assembly Compiler builtins Compiler pragmas #define getmax(a,b) ((a)>(b)?(a):(b)) Preprocessor macros C Projects Consist of More Than C Code

Slide 114

Slide 114 text

Slide 115

Slide 115 text

68 if (__builtin_expect(x, 0)) foo (); asm("rdtsc":"=a"(tickl),"=d"(tickh)); Inline Assembly Compiler builtins C Projects Consist of More Than C Code

Slide 116

Slide 116 text

69 if (__builtin_expect(x, 0)) foo (); asm("rdtsc":"=a"(tickl),"=d"(tickh)); Inline Assembly Compiler builtins C Projects Consist of More Than C Code

Slide 117

Slide 117 text

69 if (__builtin_expect(x, 0)) foo (); asm("rdtsc":"=a"(tickl),"=d"(tickh)); Inline Assembly Compiler builtins Inline assembly and compiler builtins still exist on the LLVM IR level C Projects Consist of More Than C Code

Slide 118

Slide 118 text

Implementation in Sulong 70 public class LLVMAMD64RdtscReadNode extends LLVMExpressionNode { public long executeRdtsc() { return System.nanoTime(); } } asm("rdtsc":"=a"(tickl),"=d"(tickh)); Inline Assembly

Slide 119

Slide 119 text

Implementation in Sulong 70 public class LLVMAMD64RdtscReadNode extends LLVMExpressionNode { public long executeRdtsc() { return System.nanoTime(); } } Emulate the behavior of assembly asm("rdtsc":"=a"(tickl),"=d"(tickh)); Inline Assembly

Slide 120

Slide 120 text

71 if (__builtin_expect(x, 0)) foo (); asm("rdtsc":"=a"(tickl),"=d"(tickh)); Inline Assembly Compiler builtins C Projects Consist of More Than C Code

Slide 121

Slide 121 text

71 if (__builtin_expect(x, 0)) foo (); asm("rdtsc":"=a"(tickl),"=d"(tickh)); Inline Assembly Compiler builtins Which ones should be implemented in Sulong? How are they used? C Projects Consist of More Than C Code

Slide 122

Slide 122 text

Methodology • Repository mining approach • Analyzed >1000 GitHub C projects • Different data sets • Grep for usages of • inline assembly • GCC builtins • Inserted usage into a database and analyzed them 72

Slide 123

Slide 123 text

Percentage of Projects 73 28% 37% 0 10 20 30 40 % of projects Popular projects with inline assembly (Popular) projects with GCC builtins (Rigger et al. 2018 VEE)

Slide 124

Slide 124 text

Density (occurrence per KLOC) 74 50k 6k 0 10 20 30 40 50 Density (occurrence per KLOC) Popular projects with inline assembly (Popular) projects with GCC builtins (Rigger et al. 2018 VEE)

Slide 125

Slide 125 text

Average Number per Project 75 4 17 0 5 10 15 20 Average Number Unique Builtins/Inline Assembly Popular projects with inline assembly (Popular) projects with GCC builtins (Rigger et al. 2018 VEE)

Slide 126

Slide 126 text

Use Cases of Inline Assembly 76 Instructions In % of projects rdtsc 27.4% cpuid 25.4% mov 24.9% 21.8% lock xchg 14.2% … … (Rigger et al. 2018 VEE)

Slide 127

Slide 127 text

Use Cases of Inline Assembly 76 Instructions In % of projects rdtsc 27.4% cpuid 25.4% mov 24.9% 21.8% lock xchg 14.2% … … Functionality not available in C (CPU feature detection, clock cycles) (Rigger et al. 2018 VEE)

Slide 128

Slide 128 text

Use Cases of Inline Assembly 77 Instructions In % of projects rdtsc 27.4% cpuid 25.4% mov 24.9% 21.8% lock xchg 14.2% … … Supporting instructions (data copying, arithmetic, control flow) (Rigger et al. 2018 VEE)

Slide 129

Slide 129 text

Use Cases of Inline Assembly 78 Instructions In % of projects rdtsc 27.4% cpuid 25.4% mov 24.9% 21.8% lock xchg 14.2% … … (Rigger et al. 2018 VEE)

Slide 130

Slide 130 text

Use Cases of Inline Assembly 78 Instructions In % of projects rdtsc 27.4% cpuid 25.4% mov 24.9% 21.8% lock xchg 14.2% … … Instruction order (compiler barriers, memory barriers, atomics) (Rigger et al. 2018 VEE)

Slide 131

Slide 131 text

Use Cases of Inline Assembly 79 Instructions In % of projects rdtsc 27.4% cpuid 25.4% mov 24.9% 21.8% lock xchg 14.2% … … Performance optimization (SIMD, endianness conversion, bitscan) (Rigger et al. 2018 VEE)

Slide 132

Slide 132 text

Use Cases of GCC Builtins 80 Builtins In % of projects __builtin_expect 48.2% __builtin_clz 29.3% __builtin_bswap32 26.2% __builtin_constant_p 23.3% __builtin_alloca 20.3% … … Similar use cases as for inline assembly

Slide 133

Slide 133 text

Use Cases of GCC Builtins 81 Builtins In % of projects __builtin_expect 48.2% __builtin_clz 29.3% __builtin_bswap32 26.2% __builtin_constant_p 23.3% __builtin_alloca 20.3% … … But also for compiler interaction and metaprogramming

Slide 134

Slide 134 text

Do Projects use the Same Subset? 82 • How many projects can be supported by implementing 5% of x86-64’s ~1000 instructions? • At least 64% of projects (we did not analyze some large projects) 0 10 20 30 40 50 60 70 80 90 2 4 13 22 28 31 32 36 46 47 49 50 % of supported projects Number of implemented instructions 77.9% (Rigger et al. 2018 VEE)

Slide 135

Slide 135 text

Do Projects use the Same Subset? 83

Slide 136

Slide 136 text

Do Projects use the Same Subset? 83 32 builtins to support half of projects

Slide 137

Slide 137 text

Do Projects use the Same Subset? 83 1600 builtins to support 99% of projects

Slide 138

Slide 138 text

Conclusion 84 @RiggerManuel Sulong and GraalVM

Slide 139

Slide 139 text

Conclusion 84 @RiggerManuel Sulong and GraalVM Sulong’s LLVM IR Interpreter

Slide 140

Slide 140 text

Conclusion 84 @RiggerManuel Sulong and GraalVM Sulong’s LLVM IR Interpreter Detection of Errors

Slide 141

Slide 141 text

Conclusion 84 @RiggerManuel Sulong and GraalVM Sulong’s LLVM IR Interpreter Detection of Errors Dynamic Optimizations

Slide 142

Slide 142 text

Conclusion 84 @RiggerManuel Sulong and GraalVM Sulong’s LLVM IR Interpreter Detection of Errors Dynamic Optimizations Usage of Unstandardized C Elements

Slide 143

Slide 143 text

Bibliography • Thomas Würthinger, Andreas Wöß, Lukas Stadler, Gilles Duboscq, Doug Simon, and Christian Wimmer. 2012. Self-optimizing AST interpreters. In Proceedings of the 8th symposium on Dynamic languages (DLS '12). ACM, New York, NY, USA, 73-82. DOI=http://dx.doi.org/10.1145/2384577.2384587 • Thomas Würthinger, Christian Wimmer, Andreas Wöß, Lukas Stadler, Gilles Duboscq, Christian Humer, Gregor Richards, Doug Simon, and Mario Wolczko. 2013. One VM to rule them all. In Proceedings of the 2013 ACM international symposium on New ideas, new paradigms, and reflections on programming & software (Onward! 2013). ACM, New York, NY, USA, 187-204. DOI=http://dx.doi.org/10.1145/2509578.2509581 • Manuel Rigger, Matthias Grimmer, and Hanspeter Mössenböck. 2016. Sulong - execution of LLVM-based languages on the JVM: position paper. In Proceedings of the 11th Workshop on Implementation, Compilation, Optimization of Object-Oriented Languages, Programs and Systems (ICOOOLPS '16). ACM, New York, NY, USA, , Article 7 , 4 pages. DOI: https://doi.org/10.1145/3012408.3012416 • Manuel Rigger, Matthias Grimmer, Christian Wimmer, Thomas Würthinger, and Hanspeter Mössenböck. 2016. Bringing low-level languages to the JVM: efficient execution of LLVM IR on Truffle. In Proceedings of the 8th International Workshop on Virtual Machines and Intermediate Languages (VMIL 2016). ACM, New York, NY, USA, 6-15. DOI: https://doi.org/10.1145/2998415.2998416 • Manuel Rigger, Roland Schatz, René Mayrhofer, Matthias Grimmer, and Hanspeter Mössenböck. 2018. Sulong, and Thanks for All the Bugs: Finding Errors in C Programs by Abstracting from the Native Execution Model. In Proceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '18). ACM, New York, NY, USA, 377-391. DOI: https://doi.org/10.1145/3173162.3173174 85