Slide 1

Slide 1 text

Executing C, C++ and Fortran Efficiently on the Java Virtual Machine via LLVM IR Manuel Rigger Johannes Kepler University Linz, Austria Computer Laboratory Programming Research Group Seminar, University of Cambridge, 2 March 2018

Slide 2

Slide 2 text

JVM C C++ Fortran ... Execute on What is Sulong? 2 Execute low-level/unsafe languages on the Java Virtual Machine (JVM)

Slide 3

Slide 3 text

Why? • Unchecked accesses • Manual memory management • Undefined behavior • Many existing safer alternatives are based on “unsafe” compilers or binary code 3

Slide 4

Slide 4 text

Why? • Unchecked accesses • Manual memory management • Undefined behavior • Many existing safer alternatives are based on “unsafe” compilers or binary code 3 Buffer overflows are still a serious problem

Slide 5

Slide 5 text

Why? • Unchecked accesses • Manual memory management • Undefined behavior • Many existing safer alternatives are based on “unsafe” compilers or binary code 3 Use-after-free errors, double-free errors, …

Slide 6

Slide 6 text

Why? • Unchecked accesses • Manual memory management • Undefined behavior • Many existing safer alternatives are based on “unsafe” compilers or binary code 3 A sufficiently advanced compiler is indistinguishable from an adversary. – John Regehr (https://blog.regehr.org)

Slide 7

Slide 7 text

Why? • Unchecked accesses • Manual memory management • Undefined behavior • Many existing safer alternatives are based on “unsafe” compilers or binary code 3 LLVM’s ASan, Valgrind, SoftBound, …

Slide 8

Slide 8 text

Why the Java Virtual Machine? 4 Sandboxed execution

Slide 9

Slide 9 text

Why the Java Virtual Machine? 4 Sandboxed execution Garbage collection

Slide 10

Slide 10 text

Why the Java Virtual Machine? 4 Sandboxed execution Garbage collection Existing JIT compiler

Slide 11

Slide 11 text

Why the Java Virtual Machine? 4 Sandboxed execution Garbage collection Existing JIT compiler Safe implementation language

Slide 12

Slide 12 text

Why the Java Virtual Machine? 4 Sandboxed execution Garbage collection Existing JIT compiler Safe implementation language Part of the multi-lingual GraalVM

Slide 13

Slide 13 text

Sulong as Part of GraalVM 5 Substrate VM Java HotSpot VM JVM Compiler Interface (JVMCI) JEP 243 Graal Compiler Truffle Framework http://www.oracle.com/technetwork/oracle-labs/program-languages

Slide 14

Slide 14 text

6

Slide 15

Slide 15 text

6 The call stack contains both Ruby and C function stack frames

Slide 16

Slide 16 text

7

Slide 17

Slide 17 text

7 Truffle interop allows calling functions of other languages and access their data

Slide 18

Slide 18 text

Truffle and Graal Contributors 8 Oracle Danilo Ansaloni Stefan Anzinger Cosmin Basca Daniele Bonetta Matthias Brantner Petr Chalupa Jürgen Christ Laurent Daynès Gilles Duboscq Martin Entlicher Bastian Hossbach Christian Humer Mick Jordan Vojin Jovanovic Peter Kessler David Leopoldseder Kevin Menard Jakub Podlešák Aleksandar Prokopec Tom Rodriguez Oracle (continued) Roland Schatz Chris Seaton Doug Simon Štěpán Šindelář Zbyněk Šlajchrt Lukas Stadler Codrut Stancu Jan Štola Jaroslav Tulach Michael Van De Vanter Adam Welc Christian Wimmer Christian Wirth Paul Wögerer Mario Wolczko Andreas Wöß Thomas Würthinger JKU Linz Prof. Hanspeter Mössenböck Benoit Daloze Josef Eisl Thomas Feichtinger Matthias Grimmer Christian Häubl Josef Haider Christian Huber Stefan Marr Manuel Rigger Stefan Rumzucker Bernhard Urban Thomas Pointhuber Daniel Pekarek Jacob Kreindl Mario Kahlhofer University of Edinburgh Christophe Dubach Juan José Fumero Alfonso Ranjeet Singh Toomas Remmelg LaBRI Floréal Morandat University of California, Irvine Prof. Michael Franz Gulfem Savrun Yeniceri Wei Zhang Purdue University Prof. Jan Vitek Tomas Kalibera Petr Maj Lei Zhao T. U. Dortmund Prof. Peter Marwedel Helena Kotthaus Ingo Korb University of California, Davis Prof. Duncan Temple Lang Nicholas Ulle University of Lugano, Switzerland Prof. Walter Binder Sun Haiyang Yudi Zheng Oracle Interns Brian Belleville Miguel Garcia Shams Imam Alexey Karyakin Stephen Kell Andreas Kunft Volker Lanting Gero Leinemann Julian Lettner David Piorkowski Gregor Richards Robert Seilbeck Rifat Shariyar Oracle Alumni Erik Eckstein Michael Haupt Christos Kotselidis Hyunjin Lee David Leibs Chris Thalinger Till Westmann

Slide 19

Slide 19 text

Structure of the Talk Execution and compilation of LLVM IR (Sulong) Memory safety (Safe Sulong) and performance evaluation Introspection to increase the robustness of libraries Challenges of executing C on the Java Virtual Machine 9

Slide 20

Slide 20 text

Execution and compilation of LLVM IR 10

Slide 21

Slide 21 text

LLVM IR Interpreter Truffle LLVM IR Clang C C++ GCC Fortran Other LLVM frontend ... JVM LLVM tools Graal compiler System Overview 11 Manuel Rigger, et al. Bringing low-level languages to the JVM: efficient execution of LLVM IR on Truffle. In Proceedings of VMIL 2016

Slide 22

Slide 22 text

LLVM IR Interpreter Truffle LLVM IR Clang C C++ GCC Fortran Other LLVM frontend ... JVM LLVM tools Graal compiler System Overview 11 Manuel Rigger, et al. Bringing low-level languages to the JVM: efficient execution of LLVM IR on Truffle. In Proceedings of VMIL 2016

Slide 23

Slide 23 text

LLVM IR Interpreter Truffle LLVM IR Clang C C++ GCC Fortran Other LLVM frontend ... JVM LLVM tools Graal compiler System Overview 11 Manuel Rigger, et al. Bringing low-level languages to the JVM: efficient execution of LLVM IR on Truffle. In Proceedings of VMIL 2016

Slide 24

Slide 24 text

LLVM IR Interpreter Truffle LLVM IR Clang C C++ GCC Fortran Other LLVM frontend ... JVM LLVM tools Graal compiler System Overview 11 Manuel Rigger, et al. Bringing low-level languages to the JVM: efficient execution of LLVM IR on Truffle. In Proceedings of VMIL 2016

Slide 25

Slide 25 text

LLVM IR Interpreter Truffle LLVM IR Clang C C++ GCC Fortran Other LLVM frontend ... JVM LLVM tools Graal compiler System Overview 11 Manuel Rigger, et al. Bringing low-level languages to the JVM: efficient execution of LLVM IR on Truffle. In Proceedings of VMIL 2016

Slide 26

Slide 26 text

Example Program 12 void processRequests () { int i = 0; do { processPacket (); i ++; } while (i < 10000) ; } define void @processRequests () #0 { ; ( basic block 0) br label %1 ; :1 ( basic block 1) %i .0 = phi i32 [ 0, %0 ], [ %2 , %1 ] call void @processPacket () %2 = add nsw i32 %i .0, 1 %3 = icmp slt i32 %2 , 10000 br i1 %3 , label %1 , label %4 ; :4 ( basic block 2) ret void } LLVM IR Clang

Slide 27

Slide 27 text

LLVM IR Program Interpret the program Execute the compiled code Deoptimize Compile often executed function Create executable interpreter nodes Executing LLVM IR with Sulong 13

Slide 28

Slide 28 text

14 define void @processRequests () #0 { ; ( basic block 0) br label %1 ; :1 ( basic block 1) %i .0 = phi i32 [ 0, %0 ], [ %2 , %1 ] call void @processPacket () %2 = add nsw i32 %i .0, 1 %3 = icmp slt i32 %2 , 10000 br i1 %3 , label %1 , label %4 ; :4 ( basic block 2) ret void } LLVM IR write %2 add read %i.0 1 Executable Abstract Syntax Tree Implementation of Operations

Slide 29

Slide 29 text

15 write %2 add read %i.0 1 Abstract Syntax Tree class LLVMI32LiteralNode extends LLVMExpressionNode { final int literal; public LLVMI32LiteralNode(int literal) { this.literal = literal; } @Override public int executeI32(VirtualFrame frame) { return literal; } } Executable AST node Nodes return their result in an execute() method Implementation of Operations

Slide 30

Slide 30 text

16 Abstract Syntax Tree @NodeChildren({@NodeChild("leftNode"), @NodeChild("rightNode")}) class LLVMI32AddNode extends LLVMExpressionNode { @Specialization protected int executeI32(int left, int right) { return left + right; } } Executable AST node write %2 add read %i.0 1 A DSL allows a declarative style of specifying and executing nodes Implementation of Operations

Slide 31

Slide 31 text

17 Abstract Syntax Tree @NodeChild("valueNode") class LLVMWriteI32Node extends LLVMExpressionNode { final FrameSlot slot; public LLVMWriteI32Node(FrameSlot slot) { this.slot = slot; } @Specialization public void writeI32(VirtualFrame frame, int value) { frame.setInt(slot, value); } } Executable AST node write %2 add read %i.0 1 Local variables are represented by an array-like VirtualFrame object Implementation of Operations

Slide 32

Slide 32 text

Example Program 18 define void @processRequests () #0 { ; ( basic block 0) br label %1 ; :1 ( basic block 1) %i .0 = phi i32 [ 0, %0 ], [ %2 , %1 ] call void @processPacket () %2 = add nsw i32 %i .0, 1 %3 = icmp slt i32 %2 , 10000 br i1 %3 , label %1 , label %4 ; :4 ( basic block 2) ret void } LLVM IR Contains unstructured control flow

Slide 33

Slide 33 text

Block0 Block1 Block2 Basic Block Dispatch Node 1 2 -1 1 Interpreter 19 int blockIndex = 0; while (blockIndex != -1) blockIndex = blocks[blockIndex ].execute (); Interpreter implementation An AST interpreter cannot represent goto statements

Slide 34

Slide 34 text

Block0 Block1 Block2 Basic Block Dispatch Node 1 2 -1 1 Interpreter 20 define void @processRequests () #0 { ; ( basic block 0) br label %1 ; :1 ( basic block 1) %i .0 = phi i32 [ 0, %0 ], [ %2 , %1 ] call void @processPacket () %2 = add nsw i32 %i .0, 1 %3 = icmp slt i32 %2 , 10000 br i1 %3 , label %1 , label %4 ; :4 ( basic block 2) ret void } Program execution

Slide 35

Slide 35 text

Block0 Block1 Block2 Basic Block Dispatch Node 1 2 -1 1 Interpreter 21 define void @processRequests () #0 { ; ( basic block 0) br label %1 ; :1 ( basic block 1) %i .0 = phi i32 [ 0, %0 ], [ %2 , %1 ] call void @processPacket () %2 = add nsw i32 %i .0, 1 %3 = icmp slt i32 %2 , 10000 br i1 %3 , label %1 , label %4 ; :4 ( basic block 2) ret void } Program execution

Slide 36

Slide 36 text

Block0 Block1 Block2 Basic Block Dispatch Node 1 2 -1 1 Interpreter 22 define void @processRequests () #0 { ; ( basic block 0) br label %1 ; :1 ( basic block 1) %i .0 = phi i32 [ 0, %0 ], [ %2 , %1 ] call void @processPacket () %2 = add nsw i32 %i .0, 1 %3 = icmp slt i32 %2 , 10000 br i1 %3 , label %1 , label %4 ; :4 ( basic block 2) ret void } Program execution

Slide 37

Slide 37 text

Executing LLVM IR with Sulong 23 LLVM IR Program Interpret the program Execute the compiled code Deoptimize Compile often executed function Create executable interpreter nodes

Slide 38

Slide 38 text

Partial evaluation • Assume that nodes are constant • Assumption allows inlining of the execute() methods 24

Slide 39

Slide 39 text

Block0 Block1 Block2 Basic Block Dispatch Node 1 2 -1 1 Compiler 25 int blockIndex = 0; while (blockIndex != -1) blockIndex = blocks[blockIndex].execute(); Unrolling of the interpreter loop

Slide 40

Slide 40 text

Block0 Block1 Block2 Basic Block Dispatch Node 1 2 -1 1 Compiler 26 int blockIndex = 0; while (blockIndex != -1) blockIndex = blocks[blockIndex].execute(); Unrolling of the interpreter loop

Slide 41

Slide 41 text

Block0 Block1 Block2 Basic Block Dispatch Node 1 2 -1 1 Compiler 27 int blockIndex = 0; block0: blockIndex = blocks[0].execute(); while (blockIndex != -1) blockIndex = blocks[blockIndex].execute(); Unrolling of the interpreter 1st iteration

Slide 42

Slide 42 text

Block0 Block1 Block2 Basic Block Dispatch Node 1 2 -1 1 Compiler 28 ; ( basic block 0) br label %1 int blockIndex = 0; block0: blockIndex = blocks[0].execute(); while (blockIndex != -1) blockIndex = blocks[blockIndex].execute(); Unrolling of the interpreter 1st iteration

Slide 43

Slide 43 text

Block0 Block1 Block2 Basic Block Dispatch Node 1 2 -1 1 Compiler 29 ; ( basic block 0) br label %1 int blockIndex = 0; block0: blockIndex = 1 while (blockIndex != -1) blockIndex = blocks[blockIndex].execute(); Unrolling of the interpreter 1st iteration

Slide 44

Slide 44 text

Block0 Block1 Block2 Basic Block Dispatch Node 1 2 -1 1 Compiler 30 int blockIndex = 0; block0: blockIndex = 1 while (blockIndex != -1) blockIndex = blocks[blockIndex].execute(); Unrolling of the interpreter 1st iteration

Slide 45

Slide 45 text

Block0 Block1 Block2 Basic Block Dispatch Node 1 2 -1 1 Compiler 31 int blockIndex = 0; block0: blockIndex = 1 while (blockIndex != -1) blockIndex = blocks[blockIndex].execute(); Unrolling of the interpreter 1st iteration

Slide 46

Slide 46 text

Block0 Block1 Block2 Basic Block Dispatch Node 1 2 -1 1 Compiler 32 int blockIndex = 0; block0: blockIndex = 1 block1: blockIndex = blocks[1].execute(); while (blockIndex != -1) blockIndex = blocks[blockIndex].execute(); Unrolling of the interpreter 2nd iteration

Slide 47

Slide 47 text

Block0 Block1 Block2 Basic Block Dispatch Node 1 2 -1 1 Compiler 33 ; :1 ( basic block 1) %i .0 = phi i32 [ 0, %0 ], [ %2 , %1 ] call void @processPacket () %2 = add nsw i32 %i .0, 1 %3 = icmp slt i32 %2 , 10000 br i1 %3 , label %1 , label %4 int blockIndex = 0; block0: blockIndex = 1 block1: blockIndex = blocks[1].execute(); while (blockIndex != -1) blockIndex = blocks[blockIndex].execute(); Unrolling of the interpreter 2nd iteration

Slide 48

Slide 48 text

Block0 Block1 Block2 Basic Block Dispatch Node 1 2 -1 1 Compiler 34 ; :1 ( basic block 1) %i .0 = phi i32 [ 0, %0 ], [ %2 , %1 ] call void @processPacket () %2 = add nsw i32 %i .0, 1 %3 = icmp slt i32 %2 , 10000 br i1 %3 , label %1 , label %4 int blockIndex = 0; block0: blockIndex = 1 block1: blockIndex = blocks[1].execute(); while (blockIndex != -1) blockIndex = blocks[blockIndex].execute(); Unrolling of the interpreter 2nd iteration

Slide 49

Slide 49 text

Block0 Block1 Block2 Basic Block Dispatch Node 1 2 -1 1 Compiler 35 ; :1 ( basic block 1) %i .0 = phi i32 [ 0, %0 ], [ %2 , %1 ] call void @processPacket () %2 = add nsw i32 %i .0, 1 %3 = icmp slt i32 %2 , 10000 br i1 %3 , label %1 , label %4 int blockIndex = 0; block0: blockIndex = 1 %i.0 = 0 block1: blockIndex = blocks[1].execute(); if blockIndex == 1: %i.0 = %2 while (blockIndex != -1) blockIndex = blocks[blockIndex].execute(); Unrolling of the interpreter 2nd iteration Nodes in predecessor blocks assign values used in phis

Slide 50

Slide 50 text

Block0 Block1 Block2 Basic Block Dispatch Node 1 2 -1 1 Compiler 36 ; :1 ( basic block 1) %i .0 = phi i32 [ 0, %0 ], [ %2 , %1 ] call void @processPacket () %2 = add nsw i32 %i .0, 1 %3 = icmp slt i32 %2 , 10000 br i1 %3 , label %1 , label %4 int blockIndex = 0; block0: blockIndex = 1 %i.0 = 0 block1: blockIndex = blocks[1].execute(); if blockIndex == 1: %i.0 = %2 while (blockIndex != -1) blockIndex = blocks[blockIndex].execute(); Unrolling of the interpreter 2nd iteration

Slide 51

Slide 51 text

Block0 Block1 Block2 Basic Block Dispatch Node 1 2 -1 1 Compiler 37 Unrolling of the interpreter ; :1 ( basic block 1) %i .0 = phi i32 [ 0, %0 ], [ %2 , %1 ] call void @processPacket () %2 = add nsw i32 %i .0, 1 %3 = icmp slt i32 %2 , 10000 br i1 %3 , label %1 , label %4 int blockIndex = 0; block0: blockIndex = 1 %i.0 = 0 block1: processPacket() %2 = %i.0 + 1 %3 = %2 < 10000 if %3: blockIndex = 1 %i.0 = %2 while (blockIndex != -1) blockIndex = blocks[blockIndex].execute(); else: // … Unrolling of the interpreter 2rd iteration

Slide 52

Slide 52 text

Block0 Block1 Block2 Basic Block Dispatch Node 1 2 -1 1 Compiler 38 int blockIndex = 0; block0: blockIndex = 1 %i.0 = 0 block1: processPacket() %2 = %i.0 + 1 %3 = %2 < 10000 if %3: blockIndex = 1 %i.0 = %2 goto block1 else: // … Unrolling of the interpreter 3rd iteration

Slide 53

Slide 53 text

Block0 Block1 Block2 Basic Block Dispatch Node 1 2 -1 1 Compiler 39 Unrolling of the interpreter int blockIndex = 0; block0: blockIndex = 1 %i.0 = 0 block1: while (true): processPacket() %2 = %i.0 + 1 %3 = %2 < 10000 if %3: blockIndex = 1 %i.0 = %2 continue; else: // … Unrolling of the interpreter 3rd iteration Merging already expanded paths makes the compilation work!

Slide 54

Slide 54 text

Block0 Block1 Block2 Basic Block Dispatch Node 1 2 -1 1 Compiler 40 Unrolling of the interpreter int blockIndex = 0; block0: blockIndex = 1 %i.0 = 0 block1: while (true): processPacket() %2 = %i.0 + 1 %3 = %2 < 10000 if %3: blockIndex = 1 %i.0 = %2 continue; else: blockIndex = 2 while (blockIndex != -1) blockIndex = blocks[blockIndex].execute(); Unrolling of the interpreter 3rd iteration

Slide 55

Slide 55 text

Block0 Block1 Block2 Basic Block Dispatch Node 1 2 -1 1 Compiler 41 Unrolling of the interpreter int blockIndex = 0; block0: blockIndex = 1 %i.0 = 0 block1: while (true): processPacket() %2 = %i.0 + 1 %3 = %2 < 10000 if %3: blockIndex = 1 %i.0 = %2 continue; else: blockIndex = 2 while (blockIndex != -1) blockIndex = blocks[blockIndex].execute(); Unrolling of the interpreter 3rd iteration

Slide 56

Slide 56 text

Block0 Block1 Block2 Basic Block Dispatch Node 1 2 -1 1 Compiler 42 int blockIndex = 0; block0: blockIndex = 1 %i.0 = 0 block1: while (true): processPacket() %2 = %i.0 + 1 %3 = %2 < 10000 if %3: blockIndex = 1 %i.0 = %2 continue; else: blockIndex = 2 block2: blockIndex = blocks[2].execute(); while (blockIndex != -1) blockIndex = blocks[blockIndex].execute(); Unrolling of the interpreter 3rd iteration

Slide 57

Slide 57 text

Block0 Block1 Block2 Basic Block Dispatch Node 1 2 -1 1 Compiler 43 ; :4 ( basic block 2) ret void int blockIndex = 0; block0: blockIndex = 1 %i.0 = 0 block1: while (true): processPacket() %2 = %i.0 + 1 %3 = %2 < 10000 if %3: blockIndex = 1 %i.0 = %2 continue; else: blockIndex = 2 block2: blockIndex = blocks[2].execute(); while (blockIndex != -1) blockIndex = blocks[blockIndex].execute(); Unrolling of the interpreter 3rd iteration

Slide 58

Slide 58 text

Block0 Block1 Block2 Basic Block Dispatch Node 1 2 -1 1 Compiler 44 int blockIndex = 0; block0: blockIndex = 1 %i.0 = 0 block1: while (true): processPacket() %2 = %i.0 + 1 %3 = %2 < 10000 if %3: blockIndex = 1 %i.0 = %2 continue; else: blockIndex = 2 block2: blockIndex = -1 while (blockIndex != -1) blockIndex = blocks[blockIndex].execute(); Unrolling of the interpreter 3rd iteration

Slide 59

Slide 59 text

Compiler 45 int blockIndex = 0; block0: blockIndex = 1 %i.0 = 0 block1: while (true): processPacket() %2 = %i.0 + 1 %3 = %2 < 10000 if %3: blockIndex = 1 %i.0 = %2 continue; else: blockIndex = 2 block2: blockIndex = -1 while (blockIndex != -1) blockIndex = blocks[blockIndex].execute(); Unrolling of the interpreter 3rd iteration Block0 Block1 Block2 Basic Block Dispatch Node 1 2 -1 1

Slide 60

Slide 60 text

Compiler 46 int blockIndex = 0; block0: blockIndex = 1 %i.0 = 0 block1: while (true): processPacket() %2 = %i.0 + 1 %3 = %2 < 10000 if %3: blockIndex = 1 %i.0 = %2 continue; else: blockIndex = 2 block2: blockIndex = -1 return Unrolling of the interpreter 3rd iteration Block0 Block1 Block2 Basic Block Dispatch Node 1 2 -1 1

Slide 61

Slide 61 text

Block0 Block1 Block2 Basic Block Dispatch Node 1 2 -1 1 Compiler 47 int blockIndex = 0; block0: blockIndex = 1 %i.0 = 0 block1: while (true): processPacket() %2 = %i.0 + 1 %3 = %2 < 10000 if %3: blockIndex = 1 %i.0 = %2 continue; else: blockIndex = 2 block2: blockIndex = -1 return Unrolling of the interpreter 3rd iteration Graal further optimizes the partially evaluated interpreter

Slide 62

Slide 62 text

LLVM IR Program Interpret the program Execute the compiled code Deoptimize Compile often executed function Create executable interpreter nodes Executing LLVM IR with Sulong 48

Slide 63

Slide 63 text

Deoptimization • Truffle nodes can implement speculative assumptions • A failed assumption requires discarding the machine code and continuing execution in the interpreter 49

Slide 64

Slide 64 text

Node Rewriting in Truffle 50 U U U U U I I I G G I I I G G Node Specialization for Profiling Feedback AST Interpreter Specialized Nodes AST Interpreter Uninitialized Nodes Compilation using Partial Evaluation Compiled Code Node Transitions S U I D G Uninitialized Integer Generic Double String

Slide 65

Slide 65 text

Node Rewriting in Truffle 51 I I I G G I I I G G Transfer back to AST Interpreter D I D G G D I D G G Node Specialization to Update Profiling Feedback Recompilation using Partial Evaluation

Slide 66

Slide 66 text

Speculative Optimization: Value Profiling 52 public class LLVMI32LoadNode extends LLVMExpressionNode { final int expectedValue; // observed value @Specialization protected int doI32(Address addr) { int val = memory.getI32(addr); if (val == expectedValue) { return expectedValue; } else { CompilerDirectives.transferToInterpreter(); replace(new LLVMI32LoadGenericNode()); return val; } } } The compiler can assume that the loaded value is constant

Slide 67

Slide 67 text

Polymorphic Inline Caches for Indirect Calls 53 int inc(int val) { return val + 1; } int dec(int val) { return val - 1; } int square(int val) { return val * val; } int (*func)(int); // ... result = func(4); uninit call inc

Slide 68

Slide 68 text

int inc(int val) { return val + 1; } int dec(int val) { return val - 1; } int square(int val) { return val * val; } int (*func)(int); // ... result = func(4); Polymorphic Inline Caches for Indirect Calls 54 call inc uninit call Enables inlining of indirect calls inc dec

Slide 69

Slide 69 text

int inc(int val) { return val + 1; } int dec(int val) { return val - 1; } int square(int val) { return val * val; } int (*func)(int); // ... result = func(4); call inc call dec uninit call Polymorphic Inline Caches for Indirect Calls 55 inc dec square

Slide 70

Slide 70 text

Polymorphic Inline Caches for Indirect Calls 56 indirect call int inc(int val) { return val + 1; } int dec(int val) { return val - 1; } int square(int val) { return val * val; } int (*func)(int); // ... result = func(4); inc dec square Can be used to optimize virtual calls in C++

Slide 71

Slide 71 text

Memory safety 57

Slide 72

Slide 72 text

Handling of Allocations in the User Program 58 int *arr = malloc(4 * sizeof(int)) Native Sulong: unmanaged allocations (sun.misc.Unsafe) https://github.com/graalvm/sulong Safe Sulong: managed allocations unsafe.allocateMemory(16); Address offset=0 data I32Array contents {0, 0, 0} Rigger, et al. Sulong, and Thanks For All the Bugs: Finding Errors in C Programs by Abstracting from the Native Execution Model In Proceedings of ASPLOS 2018

Slide 73

Slide 73 text

Handling of Allocations in the User Program 58 int *arr = malloc(4 * sizeof(int)) Native Sulong: unmanaged allocations (sun.misc.Unsafe) https://github.com/graalvm/sulong Safe Sulong: managed allocations unsafe.allocateMemory(16); Address offset=0 data I32Array contents {0, 0, 0} Rigger, et al. Sulong, and Thanks For All the Bugs: Finding Errors in C Programs by Abstracting from the Native Execution Model In Proceedings of ASPLOS 2018

Slide 74

Slide 74 text

Handling of Allocations in the User Program 58 int *arr = malloc(4 * sizeof(int)) Native Sulong: unmanaged allocations (sun.misc.Unsafe) https://github.com/graalvm/sulong Safe Sulong: managed allocations unsafe.allocateMemory(16); Address offset=0 data I32Array contents {0, 0, 0} Rigger, et al. Sulong, and Thanks For All the Bugs: Finding Errors in C Programs by Abstracting from the Native Execution Model In Proceedings of ASPLOS 2018

Slide 75

Slide 75 text

Allocations in the User Program Unmanaged allocations + Interoperability with native libraries + Fallback for programs that make assumptions about the memory layout - No safety guarantees Managed Allocations + Sandboxed execution - Native interoperability 59

Slide 76

Slide 76 text

Type Hierarchy for Managed Objects 60 Automatic bounds, types, and null pointer checks! ManagedObject ManagedAddress pointee: ManagedObject pointerOffset: int I32Array values: int[] Function functionIndex: int I32 value: int Struct values: Dictionary

Slide 77

Slide 77 text

Prevent Out-Of-Bounds Accesses contents[20 / 4]  ArrayIndexOutOfBoundsException 61 int *arr = malloc(3 * sizeof(int)) arr[5] = … ManagedAddress offset=20 data I32Array contents {1, 2, 3}

Slide 78

Slide 78 text

Prevent Use-After-Free Errors contents[0] NullPointerException 62 free(arr); arr[0] = … ManagedAddress offset=20 data I32Array contents=null

Slide 79

Slide 79 text

Safe Semantics • We assign semantics to otherwise undefined behavior  Java semantics • Invalid memory accesses are not optimized away 63 Rigger, et al. Lenient Execution of C on a Java Virtual Machine: or: How I Learned to Stop Worrying and Run the Code. In Proceedings of ManLang 2017 int a = 1, b = INT_MAX; int val = a + b; printf("%d\n", val); UB

Slide 80

Slide 80 text

Found Errors • 68 errors in small open-source projects • Some of these are not found by LLVM’s AddressSanitizer and Valgrind 64 int main(int argc, char** argv) { printf("%d %s\n", argc, argv[5]); } Out-of-bounds accesses to argv

Slide 81

Slide 81 text

Performance During Warmup (higher is better) 65 0 10 20 30 40 50 60 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 Iterations per second Second Meteor benchmark Asan (Clang O0) Safe Sulong Valgrind We are working on On-stack Replacement to reduce the warmup time

Slide 82

Slide 82 text

Evaluation: Peak Performance (lower is better) 66

Slide 83

Slide 83 text

Introspection to increase the robustness of libraries 67

Slide 84

Slide 84 text

Introspection Functions 68 size_left size_right sizeof(int) * 10 int *arr = malloc(sizeof (int) * 10) ; int *ptr = &(arr[4]); printf ("%ld\n", size_left(ptr)); // prints 16 printf ("%ld\n", size_right(ptr)); // prints 24 We also expose other meta data such as object types Rigger, et al. Introspection for C and its Applications to Library Robustness. In Programming 2018

Slide 85

Slide 85 text

Usage of Introspection • Improve availability of the system • Fix incomplete APIs • Improve bug-finding capabilities 69

Slide 86

Slide 86 text

Improve availability of the system 70 size_t strlen(const char *str) { size_t len = 0; while (size_right(str) > 0 && *str != '\0') { len++; str++; } return len; } Make libc robust against missing NUL terminators

Slide 87

Slide 87 text

Improve availability of the system • Case study on real-world bugs (Dnsmasq, Libxml2, GraphicsMagick) • Insight: most applications stay fully functional when the buffer overflow is mitigated • Drawback: Sulong still aborts execution for missing introspection checks. 71

Slide 88

Slide 88 text

Fix incomplete APIs 72 Make gets() robust against input that would overflow the buffer char* gets(char *str) { int size = size_right(str); return gets_s(str, size == -1 ? 0 : size); }

Slide 89

Slide 89 text

Improve bug-finding capabilities 73 Find "lurking" bugs char* gets_s(char *str, rsize_t n) { if (size_right(str) < n) { abort(); } else { // original code } }

Slide 90

Slide 90 text

Introspection is applicable for many other bug-finding tools • We also implemented it in • GCC’s Intel MPX based bounds checks instrumentation • LLVM’s Asan • SoftBound 74 ssize_t _size_right(void* p){ ssize_t upper_bounds = (ssize_t)__builtin___bnd_get_ptr_ubound(p); size_t size = (size_t) (upper_bounds + 1) - (size_t) p; return (ssize_t) size; }

Slide 91

Slide 91 text

Challenges of Executing C on a JVM 75

Slide 92

Slide 92 text

C Projects Consist of More Than C Code 76 public abstract static class LLVMAMD64RdtscReadNode extends LLVMExpressionNode { public long executeRdtsc() { return System.nanoTime(); } } asm("rdtsc":"=a"(tickl),"=d"(tickh));

Slide 93

Slide 93 text

C Projects Consist of More Than C Code 77 Instructions In % of projects rdtsc 27.4% cpuid 25.4% mov 24.9% 21.8% lock xchg 14.2% … … We determined the usage of inline assembly to prioritize the implementation in Sulong Rigger, et al. An Analysis of x86-64 Inline Assembly in C Programs. In VEE 2018

Slide 94

Slide 94 text

C Projects Consist of More Than C Code 78 public abstract static class CountLeadingZeroesI64Node extends LLVMExpressionNode { public long executeRdtsc(long val) { return Long.numberOfLeadingZeroes(val); } } __builtin_clz(num);

Slide 95

Slide 95 text

GCC builtins 79 We are currently investigating the usage of GCC builtins Builtins In % of projects __builtin_expect 48.2% __builtin_clz 29.3% __builtin_bswap32 26.2% __builtin_constant_p 23.3% __builtin_alloca 20.3% … …

Slide 96

Slide 96 text

Native Interoperability • Native Sulong: object is a native allocation • Safe Sulong: object is a Java object 81 process(object) program.c lib.so

Slide 97

Slide 97 text

Native Intoperability Hybrid Sulong version • Native interoperability where needed • Memory safety where possible x86 Truffle Interpreter 82

Slide 98

Slide 98 text

Running a complete libc 83 public class LLVMAMD64SyscallGetcwdNode { @Specialization protected long doOp(LLVMAddress buf, long size) { String cwd = LLVMPath.getcwd(); if (cwd.length() >= size) { return -LLVMAMD64Error.ERANGE; } else { LLVMString.strcpy(buf, cwd); return cwd.length() + 1; } } } Emulate the Linux syscall API

Slide 99

Slide 99 text

Thanks for listening! 84 https://github.com/graalvm/sulong/ @RiggerManuel

Slide 100

Slide 100 text

• Sulong executes LLVM IR on the JVM • Speculative optimizations • Native Sulong: allocates unmanaged memory • Safe Sulong: allocates Java objects to provide memory safety • Introspection exposes metadata to library writers • Sulong partially supports inline assembly and compiler builtins LLVM IR Interpreter Truffle LLVM IR Clang C C++ GCC Fortran Other LLVM frontend ... JVM LLVM tools Graal compiler