Cambridge'18

 Cambridge'18

Executing C, C++ and Fortran Efficiently on the Java Virtual Machine via LLVM IR

389c8e3d83119ec458c5c57e8d92da2a?s=128

Manuel Rigger

March 02, 2018
Tweet

Transcript

  1. 1.

    Executing C, C++ and Fortran Efficiently on the Java Virtual

    Machine via LLVM IR Manuel Rigger Johannes Kepler University Linz, Austria Computer Laboratory Programming Research Group Seminar, University of Cambridge, 2 March 2018
  2. 2.

    JVM C C++ Fortran ... Execute on What is Sulong?

    2 Execute low-level/unsafe languages on the Java Virtual Machine (JVM)
  3. 3.

    Why? • Unchecked accesses • Manual memory management • Undefined

    behavior • Many existing safer alternatives are based on “unsafe” compilers or binary code 3
  4. 4.

    Why? • Unchecked accesses • Manual memory management • Undefined

    behavior • Many existing safer alternatives are based on “unsafe” compilers or binary code 3 Buffer overflows are still a serious problem
  5. 5.

    Why? • Unchecked accesses • Manual memory management • Undefined

    behavior • Many existing safer alternatives are based on “unsafe” compilers or binary code 3 Use-after-free errors, double-free errors, …
  6. 6.

    Why? • Unchecked accesses • Manual memory management • Undefined

    behavior • Many existing safer alternatives are based on “unsafe” compilers or binary code 3 A sufficiently advanced compiler is indistinguishable from an adversary. – John Regehr (https://blog.regehr.org)
  7. 7.

    Why? • Unchecked accesses • Manual memory management • Undefined

    behavior • Many existing safer alternatives are based on “unsafe” compilers or binary code 3 LLVM’s ASan, Valgrind, SoftBound, …
  8. 11.

    Why the Java Virtual Machine? 4 Sandboxed execution Garbage collection

    Existing JIT compiler Safe implementation language
  9. 12.

    Why the Java Virtual Machine? 4 Sandboxed execution Garbage collection

    Existing JIT compiler Safe implementation language Part of the multi-lingual GraalVM
  10. 13.

    Sulong as Part of GraalVM 5 Substrate VM Java HotSpot

    VM JVM Compiler Interface (JVMCI) JEP 243 Graal Compiler Truffle Framework http://www.oracle.com/technetwork/oracle-labs/program-languages
  11. 14.

    6

  12. 16.

    7

  13. 18.

    Truffle and Graal Contributors 8 Oracle Danilo Ansaloni Stefan Anzinger

    Cosmin Basca Daniele Bonetta Matthias Brantner Petr Chalupa Jürgen Christ Laurent Daynès Gilles Duboscq Martin Entlicher Bastian Hossbach Christian Humer Mick Jordan Vojin Jovanovic Peter Kessler David Leopoldseder Kevin Menard Jakub Podlešák Aleksandar Prokopec Tom Rodriguez Oracle (continued) Roland Schatz Chris Seaton Doug Simon Štěpán Šindelář Zbyněk Šlajchrt Lukas Stadler Codrut Stancu Jan Štola Jaroslav Tulach Michael Van De Vanter Adam Welc Christian Wimmer Christian Wirth Paul Wögerer Mario Wolczko Andreas Wöß Thomas Würthinger JKU Linz Prof. Hanspeter Mössenböck Benoit Daloze Josef Eisl Thomas Feichtinger Matthias Grimmer Christian Häubl Josef Haider Christian Huber Stefan Marr Manuel Rigger Stefan Rumzucker Bernhard Urban Thomas Pointhuber Daniel Pekarek Jacob Kreindl Mario Kahlhofer University of Edinburgh Christophe Dubach Juan José Fumero Alfonso Ranjeet Singh Toomas Remmelg LaBRI Floréal Morandat University of California, Irvine Prof. Michael Franz Gulfem Savrun Yeniceri Wei Zhang Purdue University Prof. Jan Vitek Tomas Kalibera Petr Maj Lei Zhao T. U. Dortmund Prof. Peter Marwedel Helena Kotthaus Ingo Korb University of California, Davis Prof. Duncan Temple Lang Nicholas Ulle University of Lugano, Switzerland Prof. Walter Binder Sun Haiyang Yudi Zheng Oracle Interns Brian Belleville Miguel Garcia Shams Imam Alexey Karyakin Stephen Kell Andreas Kunft Volker Lanting Gero Leinemann Julian Lettner David Piorkowski Gregor Richards Robert Seilbeck Rifat Shariyar Oracle Alumni Erik Eckstein Michael Haupt Christos Kotselidis Hyunjin Lee David Leibs Chris Thalinger Till Westmann
  14. 19.

    Structure of the Talk Execution and compilation of LLVM IR

    (Sulong) Memory safety (Safe Sulong) and performance evaluation Introspection to increase the robustness of libraries Challenges of executing C on the Java Virtual Machine 9
  15. 21.

    LLVM IR Interpreter Truffle LLVM IR Clang C C++ GCC

    Fortran Other LLVM frontend ... JVM LLVM tools Graal compiler System Overview 11 Manuel Rigger, et al. Bringing low-level languages to the JVM: efficient execution of LLVM IR on Truffle. In Proceedings of VMIL 2016
  16. 22.

    LLVM IR Interpreter Truffle LLVM IR Clang C C++ GCC

    Fortran Other LLVM frontend ... JVM LLVM tools Graal compiler System Overview 11 Manuel Rigger, et al. Bringing low-level languages to the JVM: efficient execution of LLVM IR on Truffle. In Proceedings of VMIL 2016
  17. 23.

    LLVM IR Interpreter Truffle LLVM IR Clang C C++ GCC

    Fortran Other LLVM frontend ... JVM LLVM tools Graal compiler System Overview 11 Manuel Rigger, et al. Bringing low-level languages to the JVM: efficient execution of LLVM IR on Truffle. In Proceedings of VMIL 2016
  18. 24.

    LLVM IR Interpreter Truffle LLVM IR Clang C C++ GCC

    Fortran Other LLVM frontend ... JVM LLVM tools Graal compiler System Overview 11 Manuel Rigger, et al. Bringing low-level languages to the JVM: efficient execution of LLVM IR on Truffle. In Proceedings of VMIL 2016
  19. 25.

    LLVM IR Interpreter Truffle LLVM IR Clang C C++ GCC

    Fortran Other LLVM frontend ... JVM LLVM tools Graal compiler System Overview 11 Manuel Rigger, et al. Bringing low-level languages to the JVM: efficient execution of LLVM IR on Truffle. In Proceedings of VMIL 2016
  20. 26.

    Example Program 12 void processRequests () { int i =

    0; do { processPacket (); i ++; } while (i < 10000) ; } define void @processRequests () #0 { ; ( basic block 0) br label %1 ; <label >:1 ( basic block 1) %i .0 = phi i32 [ 0, %0 ], [ %2 , %1 ] call void @processPacket () %2 = add nsw i32 %i .0, 1 %3 = icmp slt i32 %2 , 10000 br i1 %3 , label %1 , label %4 ; <label >:4 ( basic block 2) ret void } LLVM IR Clang
  21. 27.

    LLVM IR Program Interpret the program Execute the compiled code

    Deoptimize Compile often executed function Create executable interpreter nodes Executing LLVM IR with Sulong 13
  22. 28.

    14 define void @processRequests () #0 { ; ( basic

    block 0) br label %1 ; <label >:1 ( basic block 1) %i .0 = phi i32 [ 0, %0 ], [ %2 , %1 ] call void @processPacket () %2 = add nsw i32 %i .0, 1 %3 = icmp slt i32 %2 , 10000 br i1 %3 , label %1 , label %4 ; <label >:4 ( basic block 2) ret void } LLVM IR write %2 add read %i.0 1 Executable Abstract Syntax Tree Implementation of Operations
  23. 29.

    15 write %2 add read %i.0 1 Abstract Syntax Tree

    class LLVMI32LiteralNode extends LLVMExpressionNode { final int literal; public LLVMI32LiteralNode(int literal) { this.literal = literal; } @Override public int executeI32(VirtualFrame frame) { return literal; } } Executable AST node Nodes return their result in an execute() method Implementation of Operations
  24. 30.

    16 Abstract Syntax Tree @NodeChildren({@NodeChild("leftNode"), @NodeChild("rightNode")}) class LLVMI32AddNode extends LLVMExpressionNode

    { @Specialization protected int executeI32(int left, int right) { return left + right; } } Executable AST node write %2 add read %i.0 1 A DSL allows a declarative style of specifying and executing nodes Implementation of Operations
  25. 31.

    17 Abstract Syntax Tree @NodeChild("valueNode") class LLVMWriteI32Node extends LLVMExpressionNode {

    final FrameSlot slot; public LLVMWriteI32Node(FrameSlot slot) { this.slot = slot; } @Specialization public void writeI32(VirtualFrame frame, int value) { frame.setInt(slot, value); } } Executable AST node write %2 add read %i.0 1 Local variables are represented by an array-like VirtualFrame object Implementation of Operations
  26. 32.

    Example Program 18 define void @processRequests () #0 { ;

    ( basic block 0) br label %1 ; <label >:1 ( basic block 1) %i .0 = phi i32 [ 0, %0 ], [ %2 , %1 ] call void @processPacket () %2 = add nsw i32 %i .0, 1 %3 = icmp slt i32 %2 , 10000 br i1 %3 , label %1 , label %4 ; <label >:4 ( basic block 2) ret void } LLVM IR Contains unstructured control flow
  27. 33.

    Block0 Block1 Block2 Basic Block Dispatch Node 1 2 -1

    1 Interpreter 19 int blockIndex = 0; while (blockIndex != -1) blockIndex = blocks[blockIndex ].execute (); Interpreter implementation An AST interpreter cannot represent goto statements
  28. 34.

    Block0 Block1 Block2 Basic Block Dispatch Node 1 2 -1

    1 Interpreter 20 define void @processRequests () #0 { ; ( basic block 0) br label %1 ; <label >:1 ( basic block 1) %i .0 = phi i32 [ 0, %0 ], [ %2 , %1 ] call void @processPacket () %2 = add nsw i32 %i .0, 1 %3 = icmp slt i32 %2 , 10000 br i1 %3 , label %1 , label %4 ; <label >:4 ( basic block 2) ret void } Program execution
  29. 35.

    Block0 Block1 Block2 Basic Block Dispatch Node 1 2 -1

    1 Interpreter 21 define void @processRequests () #0 { ; ( basic block 0) br label %1 ; <label >:1 ( basic block 1) %i .0 = phi i32 [ 0, %0 ], [ %2 , %1 ] call void @processPacket () %2 = add nsw i32 %i .0, 1 %3 = icmp slt i32 %2 , 10000 br i1 %3 , label %1 , label %4 ; <label >:4 ( basic block 2) ret void } Program execution
  30. 36.

    Block0 Block1 Block2 Basic Block Dispatch Node 1 2 -1

    1 Interpreter 22 define void @processRequests () #0 { ; ( basic block 0) br label %1 ; <label >:1 ( basic block 1) %i .0 = phi i32 [ 0, %0 ], [ %2 , %1 ] call void @processPacket () %2 = add nsw i32 %i .0, 1 %3 = icmp slt i32 %2 , 10000 br i1 %3 , label %1 , label %4 ; <label >:4 ( basic block 2) ret void } Program execution
  31. 37.

    Executing LLVM IR with Sulong 23 LLVM IR Program Interpret

    the program Execute the compiled code Deoptimize Compile often executed function Create executable interpreter nodes
  32. 38.

    Partial evaluation • Assume that nodes are constant • Assumption

    allows inlining of the execute() methods 24
  33. 39.

    Block0 Block1 Block2 Basic Block Dispatch Node 1 2 -1

    1 Compiler 25 int blockIndex = 0; while (blockIndex != -1) blockIndex = blocks[blockIndex].execute(); Unrolling of the interpreter loop
  34. 40.

    Block0 Block1 Block2 Basic Block Dispatch Node 1 2 -1

    1 Compiler 26 int blockIndex = 0; while (blockIndex != -1) blockIndex = blocks[blockIndex].execute(); Unrolling of the interpreter loop
  35. 41.

    Block0 Block1 Block2 Basic Block Dispatch Node 1 2 -1

    1 Compiler 27 int blockIndex = 0; block0: blockIndex = blocks[0].execute(); while (blockIndex != -1) blockIndex = blocks[blockIndex].execute(); Unrolling of the interpreter 1st iteration
  36. 42.

    Block0 Block1 Block2 Basic Block Dispatch Node 1 2 -1

    1 Compiler 28 ; ( basic block 0) br label %1 int blockIndex = 0; block0: blockIndex = blocks[0].execute(); while (blockIndex != -1) blockIndex = blocks[blockIndex].execute(); Unrolling of the interpreter 1st iteration
  37. 43.

    Block0 Block1 Block2 Basic Block Dispatch Node 1 2 -1

    1 Compiler 29 ; ( basic block 0) br label %1 int blockIndex = 0; block0: blockIndex = 1 while (blockIndex != -1) blockIndex = blocks[blockIndex].execute(); Unrolling of the interpreter 1st iteration
  38. 44.

    Block0 Block1 Block2 Basic Block Dispatch Node 1 2 -1

    1 Compiler 30 int blockIndex = 0; block0: blockIndex = 1 while (blockIndex != -1) blockIndex = blocks[blockIndex].execute(); Unrolling of the interpreter 1st iteration
  39. 45.

    Block0 Block1 Block2 Basic Block Dispatch Node 1 2 -1

    1 Compiler 31 int blockIndex = 0; block0: blockIndex = 1 while (blockIndex != -1) blockIndex = blocks[blockIndex].execute(); Unrolling of the interpreter 1st iteration
  40. 46.

    Block0 Block1 Block2 Basic Block Dispatch Node 1 2 -1

    1 Compiler 32 int blockIndex = 0; block0: blockIndex = 1 block1: blockIndex = blocks[1].execute(); while (blockIndex != -1) blockIndex = blocks[blockIndex].execute(); Unrolling of the interpreter 2nd iteration
  41. 47.

    Block0 Block1 Block2 Basic Block Dispatch Node 1 2 -1

    1 Compiler 33 ; <label >:1 ( basic block 1) %i .0 = phi i32 [ 0, %0 ], [ %2 , %1 ] call void @processPacket () %2 = add nsw i32 %i .0, 1 %3 = icmp slt i32 %2 , 10000 br i1 %3 , label %1 , label %4 int blockIndex = 0; block0: blockIndex = 1 block1: blockIndex = blocks[1].execute(); while (blockIndex != -1) blockIndex = blocks[blockIndex].execute(); Unrolling of the interpreter 2nd iteration
  42. 48.

    Block0 Block1 Block2 Basic Block Dispatch Node 1 2 -1

    1 Compiler 34 ; <label >:1 ( basic block 1) %i .0 = phi i32 [ 0, %0 ], [ %2 , %1 ] call void @processPacket () %2 = add nsw i32 %i .0, 1 %3 = icmp slt i32 %2 , 10000 br i1 %3 , label %1 , label %4 int blockIndex = 0; block0: blockIndex = 1 block1: blockIndex = blocks[1].execute(); while (blockIndex != -1) blockIndex = blocks[blockIndex].execute(); Unrolling of the interpreter 2nd iteration
  43. 49.

    Block0 Block1 Block2 Basic Block Dispatch Node 1 2 -1

    1 Compiler 35 ; <label >:1 ( basic block 1) %i .0 = phi i32 [ 0, %0 ], [ %2 , %1 ] call void @processPacket () %2 = add nsw i32 %i .0, 1 %3 = icmp slt i32 %2 , 10000 br i1 %3 , label %1 , label %4 int blockIndex = 0; block0: blockIndex = 1 %i.0 = 0 block1: blockIndex = blocks[1].execute(); if blockIndex == 1: %i.0 = %2 while (blockIndex != -1) blockIndex = blocks[blockIndex].execute(); Unrolling of the interpreter 2nd iteration Nodes in predecessor blocks assign values used in phis
  44. 50.

    Block0 Block1 Block2 Basic Block Dispatch Node 1 2 -1

    1 Compiler 36 ; <label >:1 ( basic block 1) %i .0 = phi i32 [ 0, %0 ], [ %2 , %1 ] call void @processPacket () %2 = add nsw i32 %i .0, 1 %3 = icmp slt i32 %2 , 10000 br i1 %3 , label %1 , label %4 int blockIndex = 0; block0: blockIndex = 1 %i.0 = 0 block1: blockIndex = blocks[1].execute(); if blockIndex == 1: %i.0 = %2 while (blockIndex != -1) blockIndex = blocks[blockIndex].execute(); Unrolling of the interpreter 2nd iteration
  45. 51.

    Block0 Block1 Block2 Basic Block Dispatch Node 1 2 -1

    1 Compiler 37 Unrolling of the interpreter ; <label >:1 ( basic block 1) %i .0 = phi i32 [ 0, %0 ], [ %2 , %1 ] call void @processPacket () %2 = add nsw i32 %i .0, 1 %3 = icmp slt i32 %2 , 10000 br i1 %3 , label %1 , label %4 int blockIndex = 0; block0: blockIndex = 1 %i.0 = 0 block1: processPacket() %2 = %i.0 + 1 %3 = %2 < 10000 if %3: blockIndex = 1 %i.0 = %2 while (blockIndex != -1) blockIndex = blocks[blockIndex].execute(); else: // … Unrolling of the interpreter 2rd iteration
  46. 52.

    Block0 Block1 Block2 Basic Block Dispatch Node 1 2 -1

    1 Compiler 38 int blockIndex = 0; block0: blockIndex = 1 %i.0 = 0 block1: processPacket() %2 = %i.0 + 1 %3 = %2 < 10000 if %3: blockIndex = 1 %i.0 = %2 goto block1 else: // … Unrolling of the interpreter 3rd iteration
  47. 53.

    Block0 Block1 Block2 Basic Block Dispatch Node 1 2 -1

    1 Compiler 39 Unrolling of the interpreter int blockIndex = 0; block0: blockIndex = 1 %i.0 = 0 block1: while (true): processPacket() %2 = %i.0 + 1 %3 = %2 < 10000 if %3: blockIndex = 1 %i.0 = %2 continue; else: // … Unrolling of the interpreter 3rd iteration Merging already expanded paths makes the compilation work!
  48. 54.

    Block0 Block1 Block2 Basic Block Dispatch Node 1 2 -1

    1 Compiler 40 Unrolling of the interpreter int blockIndex = 0; block0: blockIndex = 1 %i.0 = 0 block1: while (true): processPacket() %2 = %i.0 + 1 %3 = %2 < 10000 if %3: blockIndex = 1 %i.0 = %2 continue; else: blockIndex = 2 while (blockIndex != -1) blockIndex = blocks[blockIndex].execute(); Unrolling of the interpreter 3rd iteration
  49. 55.

    Block0 Block1 Block2 Basic Block Dispatch Node 1 2 -1

    1 Compiler 41 Unrolling of the interpreter int blockIndex = 0; block0: blockIndex = 1 %i.0 = 0 block1: while (true): processPacket() %2 = %i.0 + 1 %3 = %2 < 10000 if %3: blockIndex = 1 %i.0 = %2 continue; else: blockIndex = 2 while (blockIndex != -1) blockIndex = blocks[blockIndex].execute(); Unrolling of the interpreter 3rd iteration
  50. 56.

    Block0 Block1 Block2 Basic Block Dispatch Node 1 2 -1

    1 Compiler 42 int blockIndex = 0; block0: blockIndex = 1 %i.0 = 0 block1: while (true): processPacket() %2 = %i.0 + 1 %3 = %2 < 10000 if %3: blockIndex = 1 %i.0 = %2 continue; else: blockIndex = 2 block2: blockIndex = blocks[2].execute(); while (blockIndex != -1) blockIndex = blocks[blockIndex].execute(); Unrolling of the interpreter 3rd iteration
  51. 57.

    Block0 Block1 Block2 Basic Block Dispatch Node 1 2 -1

    1 Compiler 43 ; <label >:4 ( basic block 2) ret void int blockIndex = 0; block0: blockIndex = 1 %i.0 = 0 block1: while (true): processPacket() %2 = %i.0 + 1 %3 = %2 < 10000 if %3: blockIndex = 1 %i.0 = %2 continue; else: blockIndex = 2 block2: blockIndex = blocks[2].execute(); while (blockIndex != -1) blockIndex = blocks[blockIndex].execute(); Unrolling of the interpreter 3rd iteration
  52. 58.

    Block0 Block1 Block2 Basic Block Dispatch Node 1 2 -1

    1 Compiler 44 int blockIndex = 0; block0: blockIndex = 1 %i.0 = 0 block1: while (true): processPacket() %2 = %i.0 + 1 %3 = %2 < 10000 if %3: blockIndex = 1 %i.0 = %2 continue; else: blockIndex = 2 block2: blockIndex = -1 while (blockIndex != -1) blockIndex = blocks[blockIndex].execute(); Unrolling of the interpreter 3rd iteration
  53. 59.

    Compiler 45 int blockIndex = 0; block0: blockIndex = 1

    %i.0 = 0 block1: while (true): processPacket() %2 = %i.0 + 1 %3 = %2 < 10000 if %3: blockIndex = 1 %i.0 = %2 continue; else: blockIndex = 2 block2: blockIndex = -1 while (blockIndex != -1) blockIndex = blocks[blockIndex].execute(); Unrolling of the interpreter 3rd iteration Block0 Block1 Block2 Basic Block Dispatch Node 1 2 -1 1
  54. 60.

    Compiler 46 int blockIndex = 0; block0: blockIndex = 1

    %i.0 = 0 block1: while (true): processPacket() %2 = %i.0 + 1 %3 = %2 < 10000 if %3: blockIndex = 1 %i.0 = %2 continue; else: blockIndex = 2 block2: blockIndex = -1 return Unrolling of the interpreter 3rd iteration Block0 Block1 Block2 Basic Block Dispatch Node 1 2 -1 1
  55. 61.

    Block0 Block1 Block2 Basic Block Dispatch Node 1 2 -1

    1 Compiler 47 int blockIndex = 0; block0: blockIndex = 1 %i.0 = 0 block1: while (true): processPacket() %2 = %i.0 + 1 %3 = %2 < 10000 if %3: blockIndex = 1 %i.0 = %2 continue; else: blockIndex = 2 block2: blockIndex = -1 return Unrolling of the interpreter 3rd iteration Graal further optimizes the partially evaluated interpreter
  56. 62.

    LLVM IR Program Interpret the program Execute the compiled code

    Deoptimize Compile often executed function Create executable interpreter nodes Executing LLVM IR with Sulong 48
  57. 63.

    Deoptimization • Truffle nodes can implement speculative assumptions • A

    failed assumption requires discarding the machine code and continuing execution in the interpreter 49
  58. 64.

    Node Rewriting in Truffle 50 U U U U U

    I I I G G I I I G G Node Specialization for Profiling Feedback AST Interpreter Specialized Nodes AST Interpreter Uninitialized Nodes Compilation using Partial Evaluation Compiled Code Node Transitions S U I D G Uninitialized Integer Generic Double String
  59. 65.

    Node Rewriting in Truffle 51 I I I G G

    I I I G G Transfer back to AST Interpreter D I D G G D I D G G Node Specialization to Update Profiling Feedback Recompilation using Partial Evaluation
  60. 66.

    Speculative Optimization: Value Profiling 52 public class LLVMI32LoadNode extends LLVMExpressionNode

    { final int expectedValue; // observed value @Specialization protected int doI32(Address addr) { int val = memory.getI32(addr); if (val == expectedValue) { return expectedValue; } else { CompilerDirectives.transferToInterpreter(); replace(new LLVMI32LoadGenericNode()); return val; } } } The compiler can assume that the loaded value is constant
  61. 67.

    Polymorphic Inline Caches for Indirect Calls 53 int inc(int val)

    { return val + 1; } int dec(int val) { return val - 1; } int square(int val) { return val * val; } int (*func)(int); // ... result = func(4); uninit call inc
  62. 68.

    int inc(int val) { return val + 1; } int

    dec(int val) { return val - 1; } int square(int val) { return val * val; } int (*func)(int); // ... result = func(4); Polymorphic Inline Caches for Indirect Calls 54 call inc uninit call Enables inlining of indirect calls inc dec
  63. 69.

    int inc(int val) { return val + 1; } int

    dec(int val) { return val - 1; } int square(int val) { return val * val; } int (*func)(int); // ... result = func(4); call inc call dec uninit call Polymorphic Inline Caches for Indirect Calls 55 inc dec square
  64. 70.

    Polymorphic Inline Caches for Indirect Calls 56 indirect call int

    inc(int val) { return val + 1; } int dec(int val) { return val - 1; } int square(int val) { return val * val; } int (*func)(int); // ... result = func(4); inc dec square Can be used to optimize virtual calls in C++
  65. 72.

    Handling of Allocations in the User Program 58 int *arr

    = malloc(4 * sizeof(int)) Native Sulong: unmanaged allocations (sun.misc.Unsafe) https://github.com/graalvm/sulong Safe Sulong: managed allocations unsafe.allocateMemory(16); Address offset=0 data I32Array contents {0, 0, 0} Rigger, et al. Sulong, and Thanks For All the Bugs: Finding Errors in C Programs by Abstracting from the Native Execution Model In Proceedings of ASPLOS 2018
  66. 73.

    Handling of Allocations in the User Program 58 int *arr

    = malloc(4 * sizeof(int)) Native Sulong: unmanaged allocations (sun.misc.Unsafe) https://github.com/graalvm/sulong Safe Sulong: managed allocations unsafe.allocateMemory(16); Address offset=0 data I32Array contents {0, 0, 0} Rigger, et al. Sulong, and Thanks For All the Bugs: Finding Errors in C Programs by Abstracting from the Native Execution Model In Proceedings of ASPLOS 2018
  67. 74.

    Handling of Allocations in the User Program 58 int *arr

    = malloc(4 * sizeof(int)) Native Sulong: unmanaged allocations (sun.misc.Unsafe) https://github.com/graalvm/sulong Safe Sulong: managed allocations unsafe.allocateMemory(16); Address offset=0 data I32Array contents {0, 0, 0} Rigger, et al. Sulong, and Thanks For All the Bugs: Finding Errors in C Programs by Abstracting from the Native Execution Model In Proceedings of ASPLOS 2018
  68. 75.

    Allocations in the User Program Unmanaged allocations + Interoperability with

    native libraries + Fallback for programs that make assumptions about the memory layout - No safety guarantees Managed Allocations + Sandboxed execution - Native interoperability 59
  69. 76.

    Type Hierarchy for Managed Objects 60 Automatic bounds, types, and

    null pointer checks! ManagedObject ManagedAddress pointee: ManagedObject pointerOffset: int I32Array values: int[] Function functionIndex: int I32 value: int Struct values: Dictionary
  70. 77.

    Prevent Out-Of-Bounds Accesses contents[20 / 4]  ArrayIndexOutOfBoundsException 61 int

    *arr = malloc(3 * sizeof(int)) arr[5] = … ManagedAddress offset=20 data I32Array contents {1, 2, 3}
  71. 79.

    Safe Semantics • We assign semantics to otherwise undefined behavior

     Java semantics • Invalid memory accesses are not optimized away 63 Rigger, et al. Lenient Execution of C on a Java Virtual Machine: or: How I Learned to Stop Worrying and Run the Code. In Proceedings of ManLang 2017 int a = 1, b = INT_MAX; int val = a + b; printf("%d\n", val); UB
  72. 80.

    Found Errors • 68 errors in small open-source projects •

    Some of these are not found by LLVM’s AddressSanitizer and Valgrind 64 int main(int argc, char** argv) { printf("%d %s\n", argc, argv[5]); } Out-of-bounds accesses to argv
  73. 81.

    Performance During Warmup (higher is better) 65 0 10 20

    30 40 50 60 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 Iterations per second Second Meteor benchmark Asan (Clang O0) Safe Sulong Valgrind We are working on On-stack Replacement to reduce the warmup time
  74. 84.

    Introspection Functions 68 size_left size_right sizeof(int) * 10 int *arr

    = malloc(sizeof (int) * 10) ; int *ptr = &(arr[4]); printf ("%ld\n", size_left(ptr)); // prints 16 printf ("%ld\n", size_right(ptr)); // prints 24 We also expose other meta data such as object types Rigger, et al. Introspection for C and its Applications to Library Robustness. In Programming 2018
  75. 85.

    Usage of Introspection • Improve availability of the system •

    Fix incomplete APIs • Improve bug-finding capabilities 69
  76. 86.

    Improve availability of the system 70 size_t strlen(const char *str)

    { size_t len = 0; while (size_right(str) > 0 && *str != '\0') { len++; str++; } return len; } Make libc robust against missing NUL terminators
  77. 87.

    Improve availability of the system • Case study on real-world

    bugs (Dnsmasq, Libxml2, GraphicsMagick) • Insight: most applications stay fully functional when the buffer overflow is mitigated • Drawback: Sulong still aborts execution for missing introspection checks. 71
  78. 88.

    Fix incomplete APIs 72 Make gets() robust against input that

    would overflow the buffer char* gets(char *str) { int size = size_right(str); return gets_s(str, size == -1 ? 0 : size); }
  79. 89.

    Improve bug-finding capabilities 73 Find "lurking" bugs char* gets_s(char *str,

    rsize_t n) { if (size_right(str) < n) { abort(); } else { // original code } }
  80. 90.

    Introspection is applicable for many other bug-finding tools • We

    also implemented it in • GCC’s Intel MPX based bounds checks instrumentation • LLVM’s Asan • SoftBound 74 ssize_t _size_right(void* p){ ssize_t upper_bounds = (ssize_t)__builtin___bnd_get_ptr_ubound(p); size_t size = (size_t) (upper_bounds + 1) - (size_t) p; return (ssize_t) size; }
  81. 92.

    C Projects Consist of More Than C Code 76 public

    abstract static class LLVMAMD64RdtscReadNode extends LLVMExpressionNode { public long executeRdtsc() { return System.nanoTime(); } } asm("rdtsc":"=a"(tickl),"=d"(tickh));
  82. 93.

    C Projects Consist of More Than C Code 77 Instructions

    In % of projects rdtsc 27.4% cpuid 25.4% mov 24.9% <compiler barrier> 21.8% lock xchg 14.2% … … We determined the usage of inline assembly to prioritize the implementation in Sulong Rigger, et al. An Analysis of x86-64 Inline Assembly in C Programs. In VEE 2018
  83. 94.

    C Projects Consist of More Than C Code 78 public

    abstract static class CountLeadingZeroesI64Node extends LLVMExpressionNode { public long executeRdtsc(long val) { return Long.numberOfLeadingZeroes(val); } } __builtin_clz(num);
  84. 95.

    GCC builtins 79 We are currently investigating the usage of

    GCC builtins Builtins In % of projects __builtin_expect 48.2% __builtin_clz 29.3% __builtin_bswap32 26.2% __builtin_constant_p 23.3% __builtin_alloca 20.3% … …
  85. 96.

    Native Interoperability • Native Sulong: object is a native allocation

    • Safe Sulong: object is a Java object 81 process(object) program.c lib.so
  86. 97.

    Native Intoperability Hybrid Sulong version • Native interoperability where needed

    • Memory safety where possible x86 Truffle Interpreter 82
  87. 98.

    Running a complete libc 83 public class LLVMAMD64SyscallGetcwdNode { @Specialization

    protected long doOp(LLVMAddress buf, long size) { String cwd = LLVMPath.getcwd(); if (cwd.length() >= size) { return -LLVMAMD64Error.ERANGE; } else { LLVMString.strcpy(buf, cwd); return cwd.length() + 1; } } } Emulate the Linux syscall API
  88. 100.

    • Sulong executes LLVM IR on the JVM • Speculative

    optimizations • Native Sulong: allocates unmanaged memory • Safe Sulong: allocates Java objects to provide memory safety • Introspection exposes metadata to library writers • Sulong partially supports inline assembly and compiler builtins LLVM IR Interpreter Truffle LLVM IR Clang C C++ GCC Fortran Other LLVM frontend ... JVM LLVM tools Graal compiler