Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Salzburg '18: Memory-safe and Efficient Execution of C/C++ on the GraalVM

Salzburg '18: Memory-safe and Efficient Execution of C/C++ on the GraalVM

Talk hosted by Christoph Kirsch at University of Salzburg

Manuel Rigger

May 17, 2018
Tweet

More Decks by Manuel Rigger

Other Decks in Research

Transcript

  1. Memory-safe and Efficient Execution of C/C++ on the GraalVM Manuel

    Rigger Johannes Kepler University Linz, Austria Supervisor: Hanspeter Mössenböck University of Salzburg, 17 May 2018
  2. GraalVM • Multi-language VM • Language interoperability • Roots at

    JKU Linz, productized by Oracle 4 Java Virtual Machine Graal Compiler Truffle Framework TruffleRuby Graal.js Graal.python FastR Substrate VM
  3. Truffle and Graal 5 U U U U U I

    I I G G I I I G G Node Specialization for Profiling Feedback AST Interpreter Specialized Nodes AST Interpreter Uninitialized Nodes Compilation using Partial Evaluation Compiled Code Node Transitions S U I D G Uninitialized Integer Generic Double String (Würthinger et al. 2012, 2013)
  4. Truffle and Graal 6 I I I G G I

    I I G G Transfer back to AST Interpreter D I D G G D I D G G Node Specialization to Update Profiling Feedback Recompilation using Partial Evaluation (Würthinger et al. 2012, 2013)
  5. Sulong as Part of GraalVM 8 Java Virtual Machine Graal

    Compiler Truffle Framework TruffleRuby Graal.js Graal.python FastR (Rigger et al. 2016 ICOOOLPS)
  6. Sulong as Part of GraalVM 8 Java Virtual Machine Graal

    Compiler Truffle Framework TruffleRuby Graal.js Graal.python FastR Optimization Boundary (Rigger et al. 2016 ICOOOLPS)
  7. Sulong as Part of GraalVM 9 Java Virtual Machine Graal

    Compiler Truffle Framework TruffleRuby Graal.js Graal.python FastR Optimization Boundary Java Native Interface (Rigger et al. 2016 ICOOOLPS)
  8. Sulong as Part of GraalVM 10 Java Virtual Machine Graal

    Compiler Truffle Framework TruffleRuby Graal.js Graal.python FastR Optimization Boundary LLVM IR Interpreter LLVM IR Clang Flang (Rigger et al. 2016 ICOOOLPS)
  9. Project Sulong 12 Can we execute C/C++ safely and efficiently

    by abstracting their semantic to Java?
  10. Buffer Overflows 14 int *arr = malloc(3 * sizeof(int)); arr[5]

    = … Java C Undefined Behavior ArrayIndexOutOfBoundsException int[] arr = new int[3]; arr[5] = …
  11. Use-after-free Errors 15 free(arr); arr[0] = … C Undefined Behavior

    NullPointerException Java arr = null; arr[0] =
  12. Garbage Collection 17 Java C int[] arr = new int[3];

    int *arr = malloc(3 * sizeof(int)); free(arr); Memory Leak Collected by the GC
  13. Compiler Optimizations 19 int test(size_t i) { int arr[2] =

    {0}; if (i >= 2) { arr[2] = 0xcafe; } return arr[i]; }
  14. Compiler Optimizations 19 int test(size_t i) { int arr[2] =

    {0}; if (i >= 2) { arr[2] = 0xcafe; } return arr[i]; } int test(size_t i) { return 0; } C
  15. Compiler Optimizations 19 int test(size_t i) { int arr[2] =

    {0}; if (i >= 2) { arr[2] = 0xcafe; } return arr[i]; } int test(size_t i) { return 0; } State-of-the-art detection tools fail to detect the out-of-bounds access C
  16. Compiler Optimizations 20 int test(size_t i) { int arr[2] =

    {0}; if (i >= 2) { arr[2] = 0xcafe; } return arr[i]; }
  17. Compiler Optimizations 20 ArrayIndexOutOfBoundsException int test(size_t i) { int arr[2]

    = {0}; if (i >= 2) { arr[2] = 0xcafe; } return arr[i]; } Java
  18. 21 Dynamic Optimizations & Memory Safety Implementation of the Interpreter

    Evaluation Unstandardized Elements in C Projects
  19. Example Program 23 void processRequests () { int i =

    0; do { processPacket (); i ++; } while (i < 10000) ; }
  20. Example Program 23 void processRequests () { int i =

    0; do { processPacket (); i ++; } while (i < 10000) ; } define void @processRequests () #0 { ; ( basic block 0) br label %1 ; <label >:1 ( basic block 1) %i .0 = phi i32 [ 0, %0 ], [ %2 , %1 ] call void @processPacket () %2 = add nsw i32 %i .0, 1 %3 = icmp slt i32 %2 , 10000 br i1 %3 , label %1 , label %4 ; <label >:4 ( basic block 2) ret void } LLVM IR Clang
  21. 24 define void @processRequests () #0 { ; ( basic

    block 0) br label %1 ; <label >:1 ( basic block 1) %i .0 = phi i32 [ 0, %0 ], [ %2 , %1 ] call void @processPacket () %2 = add nsw i32 %i .0, 1 %3 = icmp slt i32 %2 , 10000 br i1 %3 , label %1 , label %4 ; <label >:4 ( basic block 2) ret void } LLVM IR Implementation of Operations (Rigger et al. 2016 VMIL)
  22. 24 define void @processRequests () #0 { ; ( basic

    block 0) br label %1 ; <label >:1 ( basic block 1) %i .0 = phi i32 [ 0, %0 ], [ %2 , %1 ] call void @processPacket () %2 = add nsw i32 %i .0, 1 %3 = icmp slt i32 %2 , 10000 br i1 %3 , label %1 , label %4 ; <label >:4 ( basic block 2) ret void } LLVM IR write %2 add read %i.0 1 Executable Abstract Syntax Tree Implementation of Operations (Rigger et al. 2016 VMIL)
  23. 25 write %2 add read %i.0 1 Abstract Syntax Tree

    Implementation of Operations (Rigger et al. 2016 VMIL)
  24. 25 write %2 add read %i.0 1 Abstract Syntax Tree

    class LLVMI32LiteralNode extends LLVMExpressionNode { final int literal; public LLVMI32LiteralNode(int literal) { this.literal = literal; } @Override public int executeI32(VirtualFrame frame) { return literal; } } Executable AST node Implementation of Operations (Rigger et al. 2016 VMIL)
  25. 25 write %2 add read %i.0 1 Abstract Syntax Tree

    class LLVMI32LiteralNode extends LLVMExpressionNode { final int literal; public LLVMI32LiteralNode(int literal) { this.literal = literal; } @Override public int executeI32(VirtualFrame frame) { return literal; } } Executable AST node Implementation of Operations (Rigger et al. 2016 VMIL)
  26. 25 write %2 add read %i.0 1 Abstract Syntax Tree

    class LLVMI32LiteralNode extends LLVMExpressionNode { final int literal; public LLVMI32LiteralNode(int literal) { this.literal = literal; } @Override public int executeI32(VirtualFrame frame) { return literal; } } Executable AST node Nodes return their result in an execute() method Implementation of Operations (Rigger et al. 2016 VMIL)
  27. 26 Abstract Syntax Tree write %2 add read %i.0 1

    Implementation of Operations (Rigger et al. 2016 VMIL)
  28. 26 Abstract Syntax Tree @NodeChildren({@NodeChild("leftNode"), @NodeChild("rightNode")}) class LLVMI32AddNode extends LLVMExpressionNode

    { @Specialization protected int executeI32(int left, int right) { return left + right; } } Executable AST node write %2 add read %i.0 1 Implementation of Operations (Rigger et al. 2016 VMIL)
  29. 26 Abstract Syntax Tree @NodeChildren({@NodeChild("leftNode"), @NodeChild("rightNode")}) class LLVMI32AddNode extends LLVMExpressionNode

    { @Specialization protected int executeI32(int left, int right) { return left + right; } } Executable AST node write %2 add read %i.0 1 Implementation of Operations (Rigger et al. 2016 VMIL)
  30. 26 Abstract Syntax Tree @NodeChildren({@NodeChild("leftNode"), @NodeChild("rightNode")}) class LLVMI32AddNode extends LLVMExpressionNode

    { @Specialization protected int executeI32(int left, int right) { return left + right; } } Executable AST node write %2 add read %i.0 1 A DSL allows a declarative style of specifying and executing nodes Implementation of Operations (Rigger et al. 2016 VMIL)
  31. 27 Abstract Syntax Tree write %2 add read %i.0 1

    Implementation of Operations (Rigger et al. 2016 VMIL)
  32. 27 Abstract Syntax Tree @NodeChild("valueNode") class LLVMWriteI32Node extends LLVMExpressionNode {

    final FrameSlot slot; public LLVMWriteI32Node(FrameSlot slot) { this.slot = slot; } @Specialization public void writeI32(VirtualFrame frame, int value) { frame.setInt(slot, value); } } Executable AST node write %2 add read %i.0 1 Implementation of Operations (Rigger et al. 2016 VMIL)
  33. 27 Abstract Syntax Tree @NodeChild("valueNode") class LLVMWriteI32Node extends LLVMExpressionNode {

    final FrameSlot slot; public LLVMWriteI32Node(FrameSlot slot) { this.slot = slot; } @Specialization public void writeI32(VirtualFrame frame, int value) { frame.setInt(slot, value); } } Executable AST node write %2 add read %i.0 1 Implementation of Operations (Rigger et al. 2016 VMIL)
  34. 27 Abstract Syntax Tree @NodeChild("valueNode") class LLVMWriteI32Node extends LLVMExpressionNode {

    final FrameSlot slot; public LLVMWriteI32Node(FrameSlot slot) { this.slot = slot; } @Specialization public void writeI32(VirtualFrame frame, int value) { frame.setInt(slot, value); } } Executable AST node write %2 add read %i.0 1 Local variables are represented by an array-like VirtualFrame object Implementation of Operations (Rigger et al. 2016 VMIL)
  35. 28 define void @processRequests () #0 { ; ( basic

    block 0) br label %1 ; <label >:1 ( basic block 1) %i = phi i32 [ 0, %0 ], [ %2 , %1 ] call void @processPacket () %2 = add nsw i32 %i, 1 %3 = icmp slt i32 %2 , 10000 br i1 %3 , label %1 , label %4 ; <label >:4 ( basic block 2) ret void } LLVM IR Implementation of Basic Blocks (Rigger et al. 2016 VMIL)
  36. 28 define void @processRequests () #0 { ; ( basic

    block 0) br label %1 ; <label >:1 ( basic block 1) %i = phi i32 [ 0, %0 ], [ %2 , %1 ] call void @processPacket () %2 = add nsw i32 %i, 1 %3 = icmp slt i32 %2 , 10000 br i1 %3 , label %1 , label %4 ; <label >:4 ( basic block 2) ret void } LLVM IR Executable Abstract Syntax Tree Implementation of Basic Blocks Block1 (Rigger et al. 2016 VMIL)
  37. Example Program 29 define void @processRequests () #0 { ;

    ( basic block 0) br label %1 ; <label >:1 ( basic block 1) %i .0 = phi i32 [ 0, %0 ], [ %2 , %1 ] call void @processPacket () %2 = add nsw i32 %i .0, 1 %3 = icmp slt i32 %2 , 10000 br i1 %3 , label %1 , label %4 ; <label >:4 ( basic block 2) ret void } LLVM IR (Rigger et al. 2016 VMIL)
  38. Example Program 29 define void @processRequests () #0 { ;

    ( basic block 0) br label %1 ; <label >:1 ( basic block 1) %i .0 = phi i32 [ 0, %0 ], [ %2 , %1 ] call void @processPacket () %2 = add nsw i32 %i .0, 1 %3 = icmp slt i32 %2 , 10000 br i1 %3 , label %1 , label %4 ; <label >:4 ( basic block 2) ret void } LLVM IR An AST interpreter cannot represent goto statements (Rigger et al. 2016 VMIL)
  39. Block0 Block1 Block2 Basic Block Dispatch Node 1 2 -1

    1 Interpreter 30 int blockIndex = 0; while (blockIndex != -1) blockIndex = blocks[blockIndex].execute (); Interpreter implementation (Rigger et al. 2016 VMIL)
  40. Block0 Block1 Block2 Basic Block Dispatch Node 1 2 -1

    1 Interpreter 31 define void @processRequests () #0 { ; ( basic block 0) br label %1 ; <label >:1 ( basic block 1) %i .0 = phi i32 [ 0, %0 ], [ %2 , %1 ] call void @processPacket () %2 = add nsw i32 %i .0, 1 %3 = icmp slt i32 %2 , 10000 br i1 %3 , label %1 , label %4 ; <label >:4 ( basic block 2) ret void } Program execution (Rigger et al. 2016 VMIL)
  41. Block0 Block1 Block2 Basic Block Dispatch Node 1 2 -1

    1 Interpreter 32 define void @processRequests () #0 { ; ( basic block 0) br label %1 ; <label >:1 ( basic block 1) %i .0 = phi i32 [ 0, %0 ], [ %2 , %1 ] call void @processPacket () %2 = add nsw i32 %i .0, 1 %3 = icmp slt i32 %2 , 10000 br i1 %3 , label %1 , label %4 ; <label >:4 ( basic block 2) ret void } Program execution (Rigger et al. 2016 VMIL)
  42. Block0 Block1 Block2 Basic Block Dispatch Node 1 2 -1

    1 Interpreter 33 define void @processRequests () #0 { ; ( basic block 0) br label %1 ; <label >:1 ( basic block 1) %i .0 = phi i32 [ 0, %0 ], [ %2 , %1 ] call void @processPacket () %2 = add nsw i32 %i .0, 1 %3 = icmp slt i32 %2 , 10000 br i1 %3 , label %1 , label %4 ; <label >:4 ( basic block 2) ret void } Program execution (Rigger et al. 2016 VMIL)
  43. Truffle and Graal 34 U U U U U I

    I I G G I I I G G Node Specialization for Profiling Feedback AST Interpreter Specialized Nodes AST Interpreter Uninitialized Nodes Compilation using Partial Evaluation Compiled Code Node Transitions S U I D G Uninitialized Integer Generic Double String (Würthinger et al. 2012, 2013)
  44. Block0 Block1 Block2 Basic Block Dispatch Node 1 2 -1

    1 Compiler 35 int blockIndex = 0; block0: blockIndex = 1 %i.0 = 0 block1: while (true): processPacket() %2 = %i.0 + 1 %3 = %2 < 10000 if %3: blockIndex = 1 %i.0 = %2 continue; else: blockIndex = 2 block2: blockIndex = -1 return Partially evaluated interpreter (pseudo code) (Rigger et al. 2016 VMIL)
  45. Block0 Block1 Block2 Basic Block Dispatch Node 1 2 -1

    1 Compiler 35 int blockIndex = 0; block0: blockIndex = 1 %i.0 = 0 block1: while (true): processPacket() %2 = %i.0 + 1 %3 = %2 < 10000 if %3: blockIndex = 1 %i.0 = %2 continue; else: blockIndex = 2 block2: blockIndex = -1 return Partially evaluated interpreter (pseudo code) Graal further optimizes the partially evaluated interpreter (Rigger et al. 2016 VMIL)
  46. Truffle and Graal 37 U U U U U I

    I I G G I I I G G Node Specialization for Profiling Feedback AST Interpreter Specialized Nodes AST Interpreter Uninitialized Nodes Compilation using Partial Evaluation Compiled Code Node Transitions S U I D G Uninitialized Integer Generic Double String (Würthinger et al. 2012, 2013)
  47. Speculative Optimization: Value Profiling 38 public class LLVMI32LoadNode extends LLVMExpressionNode

    { final int expectedValue; // observed value @Specialization protected int doI32(Address addr) { int val = memory.getI32(addr); if (val == expectedValue) { return expectedValue; } else { CompilerDirectives.transferToInterpreter(); replace(new LLVMI32LoadGenericNode()); return val; } } } (Rigger et al. 2016 VMIL)
  48. Speculative Optimization: Value Profiling 38 public class LLVMI32LoadNode extends LLVMExpressionNode

    { final int expectedValue; // observed value @Specialization protected int doI32(Address addr) { int val = memory.getI32(addr); if (val == expectedValue) { return expectedValue; } else { CompilerDirectives.transferToInterpreter(); replace(new LLVMI32LoadGenericNode()); return val; } } } The compiler can assume that the loaded value is constant (Rigger et al. 2016 VMIL)
  49. Polymorphic Inline Caches for Indirect Calls 39 int inc(int val)

    { return val + 1; } int dec(int val) { return val - 1; } int square(int val) { return val * val; } int (*func)(int); // ... result = func(4); uninit call (Rigger et al. 2016 VMIL)
  50. Polymorphic Inline Caches for Indirect Calls 39 int inc(int val)

    { return val + 1; } int dec(int val) { return val - 1; } int square(int val) { return val * val; } int (*func)(int); // ... result = func(4); uninit call inc (Rigger et al. 2016 VMIL)
  51. int inc(int val) { return val + 1; } int

    dec(int val) { return val - 1; } int square(int val) { return val * val; } int (*func)(int); // ... result = func(4); Polymorphic Inline Caches for Indirect Calls 40 call inc uninit call inc (Rigger et al. 2016 VMIL)
  52. int inc(int val) { return val + 1; } int

    dec(int val) { return val - 1; } int square(int val) { return val * val; } int (*func)(int); // ... result = func(4); Polymorphic Inline Caches for Indirect Calls 40 call inc uninit call Enables inlining of indirect calls inc (Rigger et al. 2016 VMIL)
  53. int inc(int val) { return val + 1; } int

    dec(int val) { return val - 1; } int square(int val) { return val * val; } int (*func)(int); // ... result = func(4); Polymorphic Inline Caches for Indirect Calls 40 call inc uninit call Enables inlining of indirect calls inc dec (Rigger et al. 2016 VMIL)
  54. int inc(int val) { return val + 1; } int

    dec(int val) { return val - 1; } int square(int val) { return val * val; } int (*func)(int); // ... result = func(4); call inc call dec uninit call Polymorphic Inline Caches for Indirect Calls 41 inc dec (Rigger et al. 2016 VMIL)
  55. int inc(int val) { return val + 1; } int

    dec(int val) { return val - 1; } int square(int val) { return val * val; } int (*func)(int); // ... result = func(4); call inc call dec uninit call Polymorphic Inline Caches for Indirect Calls 41 inc dec square (Rigger et al. 2016 VMIL)
  56. Polymorphic Inline Caches for Indirect Calls 42 indirect call int

    inc(int val) { return val + 1; } int dec(int val) { return val - 1; } int square(int val) { return val * val; } int (*func)(int); // ... result = func(4); inc dec square (Rigger et al. 2016 VMIL)
  57. Polymorphic Inline Caches for Indirect Calls 42 indirect call int

    inc(int val) { return val + 1; } int dec(int val) { return val - 1; } int square(int val) { return val * val; } int (*func)(int); // ... result = func(4); inc dec square Can be used to optimize virtual calls in C++ (Rigger et al. 2016 VMIL)
  58. How to Represent Allocations 43 int *arr = malloc(sizeof(int) *

    4); %1 = call i8* @malloc(i64 16) %2 = bitcast i8* %1 to i32* (Rigger et al. 2018 ASPLOS)
  59. How to Represent Allocations 44 %1 = call i8* @malloc(i64

    16) %2 = bitcast i8* %1 to i32* Address offset = 0 data UntypedAllocation size=16 (Rigger et al. 2018 ASPLOS)
  60. How to Represent Allocations 45 %1 = call i8* @malloc(i64

    16) %2 = bitcast i8* %1 to i32* Address offset = 0 data I32Array contents {0, 0, 0, 0} Address offset = 0 data UntypedAllocation size=16 (Rigger et al. 2018 ASPLOS)
  61. Prevent Out-Of-Bounds Accesses 46 arr[5] = … Address offset =

    5 data I32Array contents {0, 0, 0, 0} (Rigger et al. 2018 ASPLOS)
  62. Prevent Out-Of-Bounds Accesses contents[5] → ArrayIndexOutOfBoundsException 46 arr[5] = …

    Address offset = 5 data I32Array contents {0, 0, 0, 0} (Rigger et al. 2018 ASPLOS)
  63. Prevent Use-After-Free Errors 47 free(arr); arr[0] = … Address offset=0

    data I32Array contents {0, 0, 0, 0} (Rigger et al. 2018 ASPLOS)
  64. Evaluation • Effectiveness • Bug-finding capabilities on open-source projects •

    Performance • Warm-up and peak performance • Comparison to state-of-the-art approaches 51 (Rigger et al. 2018 ASPLOS)
  65. Evaluation: State-of-the-art Approaches 52 Compile-time instrumentation •AddressSanitizer (ASan) (Serebryany et

    al. 2012) •SoftBound+CETS (Nagarakatte et al. 2009, 2010) a.out Clang/GCC C ./a.out Hello world! (Rigger et al. 2018 ASPLOS)
  66. Evaluation: State-of-the-art Approaches 52 Compile-time instrumentation •AddressSanitizer (ASan) (Serebryany et

    al. 2012) •SoftBound+CETS (Nagarakatte et al. 2009, 2010) a.out Clang/GCC C ./a.out Hello world! Run-time instrumentation • Valgrind (Nethercote et al. 2007) • Dr. Memory (Bruening et al. 2011) (Rigger et al. 2018 ASPLOS)
  67. Evaluation: Effectiveness • Found 68 errors • 8 errors were

    not found by ASan and Valgrind • Valgrind detected half of the errors 54 (Rigger et al. 2018 ASPLOS)
  68. Evaluation: Effectiveness 55 int main(int argc, char** argv) { printf("%d

    %s\n", argc, argv[100]); } ASan does not instrument the main() arguments since they are allocated by libc https://github.com/google/sanitizers/issues/762 (Rigger et al. 2018 ASPLOS)
  69. Discussion: Effectiveness 58 Static compilers: optimize code based on Undefined

    Behavior Bug-finding tools: find bugs assuming that violations are visible side effects (Rigger et al. 2018 ASPLOS)
  70. Discussion: Effectiveness 58 Static compilers: optimize code based on Undefined

    Behavior Bug-finding tools: find bugs assuming that violations are visible side effects Compile with Clang –O0 when using ASan or Sulong to detect bugs (Rigger et al. 2018 ASPLOS)
  71. Discussion: Effectiveness 59 ArrayIndexOutOfBoundsException int test(size_t i) { int arr[2]

    = {0}; if (i >= 2) { arr[2] = 0xcafe; } return arr[i]; } Java (Rigger et al. 2018 ASPLOS)
  72. Evaluation: Warmup Performance 60 0 10 20 30 40 50

    60 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 Iterations per second Second Meteor benchmark ASan (Clang O0) Sulong (Clang O0) Valgrind (Rigger et al. 2018 ASPLOS)
  73. Evaluation: Warmup Performance 60 0 10 20 30 40 50

    60 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 Iterations per second Second Meteor benchmark ASan (Clang O0) Sulong (Clang O0) Valgrind We are working on On-stack Replacement to reduce the warmup time (Rigger et al. 2018 ASPLOS)
  74. Evaluation: Peak Performance 63 Baseline is Clang –O0, Sulong –O0

    is faster in all but one case lower is better (Rigger et al. 2018 ASPLOS)
  75. Evaluation: Peak Performance 64 Sulong –O0 is close to Clang

    –O3 in some cases lower is better (Rigger et al. 2018 ASPLOS)
  76. Evaluation: Peak Performance 65 Sulong –O0 is mostly faster than

    ASan –O0 lower is better (Rigger et al. 2018 ASPLOS)
  77. 68 if (__builtin_expect(x, 0)) foo (); asm("rdtsc":"=a"(tickl),"=d"(tickh)); #pragma GCC poison

    printf Inline Assembly Compiler builtins Compiler pragmas C Projects Consist of More Than C Code
  78. 68 if (__builtin_expect(x, 0)) foo (); asm("rdtsc":"=a"(tickl),"=d"(tickh)); #pragma GCC poison

    printf Inline Assembly Compiler builtins Compiler pragmas #define getmax(a,b) ((a)>(b)?(a):(b)) Preprocessor macros C Projects Consist of More Than C Code
  79. 68 if (__builtin_expect(x, 0)) foo (); asm("rdtsc":"=a"(tickl),"=d"(tickh)); #pragma GCC poison

    printf Inline Assembly Compiler builtins Compiler pragmas #define getmax(a,b) ((a)>(b)?(a):(b)) Preprocessor macros void fatal() __attribute__ ((noreturn)); Attributes C Projects Consist of More Than C Code
  80. 69 if (__builtin_expect(x, 0)) foo (); asm("rdtsc":"=a"(tickl),"=d"(tickh)); Inline Assembly Compiler

    builtins Inline assembly and compiler builtins still exist on the LLVM IR level C Projects Consist of More Than C Code
  81. Implementation in Sulong 70 public class LLVMAMD64RdtscReadNode extends LLVMExpressionNode {

    public long executeRdtsc() { return System.nanoTime(); } } asm("rdtsc":"=a"(tickl),"=d"(tickh)); Inline Assembly
  82. Implementation in Sulong 70 public class LLVMAMD64RdtscReadNode extends LLVMExpressionNode {

    public long executeRdtsc() { return System.nanoTime(); } } Emulate the behavior of assembly asm("rdtsc":"=a"(tickl),"=d"(tickh)); Inline Assembly
  83. 71 if (__builtin_expect(x, 0)) foo (); asm("rdtsc":"=a"(tickl),"=d"(tickh)); Inline Assembly Compiler

    builtins Which ones should be implemented in Sulong? How are they used? C Projects Consist of More Than C Code
  84. Methodology • Repository mining approach • Analyzed >1000 GitHub C

    projects • Different data sets • Grep for usages of • inline assembly • GCC builtins • Inserted usage into a database and analyzed them 72
  85. Percentage of Projects 73 28% 37% 0 10 20 30

    40 % of projects Popular projects with inline assembly (Popular) projects with GCC builtins (Rigger et al. 2018 VEE)
  86. Density (occurrence per KLOC) 74 50k 6k 0 10 20

    30 40 50 Density (occurrence per KLOC) Popular projects with inline assembly (Popular) projects with GCC builtins (Rigger et al. 2018 VEE)
  87. Average Number per Project 75 4 17 0 5 10

    15 20 Average Number Unique Builtins/Inline Assembly Popular projects with inline assembly (Popular) projects with GCC builtins (Rigger et al. 2018 VEE)
  88. Use Cases of Inline Assembly 76 Instructions In % of

    projects rdtsc 27.4% cpuid 25.4% mov 24.9% <compiler barrier> 21.8% lock xchg 14.2% … … (Rigger et al. 2018 VEE)
  89. Use Cases of Inline Assembly 76 Instructions In % of

    projects rdtsc 27.4% cpuid 25.4% mov 24.9% <compiler barrier> 21.8% lock xchg 14.2% … … Functionality not available in C (CPU feature detection, clock cycles) (Rigger et al. 2018 VEE)
  90. Use Cases of Inline Assembly 77 Instructions In % of

    projects rdtsc 27.4% cpuid 25.4% mov 24.9% <compiler barrier> 21.8% lock xchg 14.2% … … Supporting instructions (data copying, arithmetic, control flow) (Rigger et al. 2018 VEE)
  91. Use Cases of Inline Assembly 78 Instructions In % of

    projects rdtsc 27.4% cpuid 25.4% mov 24.9% <compiler barrier> 21.8% lock xchg 14.2% … … (Rigger et al. 2018 VEE)
  92. Use Cases of Inline Assembly 78 Instructions In % of

    projects rdtsc 27.4% cpuid 25.4% mov 24.9% <compiler barrier> 21.8% lock xchg 14.2% … … Instruction order (compiler barriers, memory barriers, atomics) (Rigger et al. 2018 VEE)
  93. Use Cases of Inline Assembly 79 Instructions In % of

    projects rdtsc 27.4% cpuid 25.4% mov 24.9% <compiler barrier> 21.8% lock xchg 14.2% … … Performance optimization (SIMD, endianness conversion, bitscan) (Rigger et al. 2018 VEE)
  94. Use Cases of GCC Builtins 80 Builtins In % of

    projects __builtin_expect 48.2% __builtin_clz 29.3% __builtin_bswap32 26.2% __builtin_constant_p 23.3% __builtin_alloca 20.3% … … Similar use cases as for inline assembly
  95. Use Cases of GCC Builtins 81 Builtins In % of

    projects __builtin_expect 48.2% __builtin_clz 29.3% __builtin_bswap32 26.2% __builtin_constant_p 23.3% __builtin_alloca 20.3% … … But also for compiler interaction and metaprogramming
  96. Do Projects use the Same Subset? 82 • How many

    projects can be supported by implementing 5% of x86-64’s ~1000 instructions? • At least 64% of projects (we did not analyze some large projects) 0 10 20 30 40 50 60 70 80 90 2 4 13 22 28 31 32 36 46 47 49 50 % of supported projects Number of implemented instructions 77.9% (Rigger et al. 2018 VEE)
  97. Conclusion 84 @RiggerManuel Sulong and GraalVM Sulong’s LLVM IR Interpreter

    Detection of Errors Dynamic Optimizations Usage of Unstandardized C Elements
  98. Bibliography • Thomas Würthinger, Andreas Wöß, Lukas Stadler, Gilles Duboscq,

    Doug Simon, and Christian Wimmer. 2012. Self-optimizing AST interpreters. In Proceedings of the 8th symposium on Dynamic languages (DLS '12). ACM, New York, NY, USA, 73-82. DOI=http://dx.doi.org/10.1145/2384577.2384587 • Thomas Würthinger, Christian Wimmer, Andreas Wöß, Lukas Stadler, Gilles Duboscq, Christian Humer, Gregor Richards, Doug Simon, and Mario Wolczko. 2013. One VM to rule them all. In Proceedings of the 2013 ACM international symposium on New ideas, new paradigms, and reflections on programming & software (Onward! 2013). ACM, New York, NY, USA, 187-204. DOI=http://dx.doi.org/10.1145/2509578.2509581 • Manuel Rigger, Matthias Grimmer, and Hanspeter Mössenböck. 2016. Sulong - execution of LLVM-based languages on the JVM: position paper. In Proceedings of the 11th Workshop on Implementation, Compilation, Optimization of Object-Oriented Languages, Programs and Systems (ICOOOLPS '16). ACM, New York, NY, USA, , Article 7 , 4 pages. DOI: https://doi.org/10.1145/3012408.3012416 • Manuel Rigger, Matthias Grimmer, Christian Wimmer, Thomas Würthinger, and Hanspeter Mössenböck. 2016. Bringing low-level languages to the JVM: efficient execution of LLVM IR on Truffle. In Proceedings of the 8th International Workshop on Virtual Machines and Intermediate Languages (VMIL 2016). ACM, New York, NY, USA, 6-15. DOI: https://doi.org/10.1145/2998415.2998416 • Manuel Rigger, Roland Schatz, René Mayrhofer, Matthias Grimmer, and Hanspeter Mössenböck. 2018. Sulong, and Thanks for All the Bugs: Finding Errors in C Programs by Abstracting from the Native Execution Model. In Proceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '18). ACM, New York, NY, USA, 377-391. DOI: https://doi.org/10.1145/3173162.3173174 85