Machine via LLVM IR Manuel Rigger Johannes Kepler University Linz, Austria Computer Laboratory Programming Research Group Seminar, University of Cambridge, 2 March 2018
behavior • Many existing safer alternatives are based on “unsafe” compilers or binary code 3 A sufficiently advanced compiler is indistinguishable from an adversary. – John Regehr (https://blog.regehr.org)
Cosmin Basca Daniele Bonetta Matthias Brantner Petr Chalupa Jürgen Christ Laurent Daynès Gilles Duboscq Martin Entlicher Bastian Hossbach Christian Humer Mick Jordan Vojin Jovanovic Peter Kessler David Leopoldseder Kevin Menard Jakub Podlešák Aleksandar Prokopec Tom Rodriguez Oracle (continued) Roland Schatz Chris Seaton Doug Simon Štěpán Šindelář Zbyněk Šlajchrt Lukas Stadler Codrut Stancu Jan Štola Jaroslav Tulach Michael Van De Vanter Adam Welc Christian Wimmer Christian Wirth Paul Wögerer Mario Wolczko Andreas Wöß Thomas Würthinger JKU Linz Prof. Hanspeter Mössenböck Benoit Daloze Josef Eisl Thomas Feichtinger Matthias Grimmer Christian Häubl Josef Haider Christian Huber Stefan Marr Manuel Rigger Stefan Rumzucker Bernhard Urban Thomas Pointhuber Daniel Pekarek Jacob Kreindl Mario Kahlhofer University of Edinburgh Christophe Dubach Juan José Fumero Alfonso Ranjeet Singh Toomas Remmelg LaBRI Floréal Morandat University of California, Irvine Prof. Michael Franz Gulfem Savrun Yeniceri Wei Zhang Purdue University Prof. Jan Vitek Tomas Kalibera Petr Maj Lei Zhao T. U. Dortmund Prof. Peter Marwedel Helena Kotthaus Ingo Korb University of California, Davis Prof. Duncan Temple Lang Nicholas Ulle University of Lugano, Switzerland Prof. Walter Binder Sun Haiyang Yudi Zheng Oracle Interns Brian Belleville Miguel Garcia Shams Imam Alexey Karyakin Stephen Kell Andreas Kunft Volker Lanting Gero Leinemann Julian Lettner David Piorkowski Gregor Richards Robert Seilbeck Rifat Shariyar Oracle Alumni Erik Eckstein Michael Haupt Christos Kotselidis Hyunjin Lee David Leibs Chris Thalinger Till Westmann
(Sulong) Memory safety (Safe Sulong) and performance evaluation Introspection to increase the robustness of libraries Challenges of executing C on the Java Virtual Machine 9
Fortran Other LLVM frontend ... JVM LLVM tools Graal compiler System Overview 11 Manuel Rigger, et al. Bringing low-level languages to the JVM: efficient execution of LLVM IR on Truffle. In Proceedings of VMIL 2016
Fortran Other LLVM frontend ... JVM LLVM tools Graal compiler System Overview 11 Manuel Rigger, et al. Bringing low-level languages to the JVM: efficient execution of LLVM IR on Truffle. In Proceedings of VMIL 2016
Fortran Other LLVM frontend ... JVM LLVM tools Graal compiler System Overview 11 Manuel Rigger, et al. Bringing low-level languages to the JVM: efficient execution of LLVM IR on Truffle. In Proceedings of VMIL 2016
Fortran Other LLVM frontend ... JVM LLVM tools Graal compiler System Overview 11 Manuel Rigger, et al. Bringing low-level languages to the JVM: efficient execution of LLVM IR on Truffle. In Proceedings of VMIL 2016
Fortran Other LLVM frontend ... JVM LLVM tools Graal compiler System Overview 11 Manuel Rigger, et al. Bringing low-level languages to the JVM: efficient execution of LLVM IR on Truffle. In Proceedings of VMIL 2016
class LLVMI32LiteralNode extends LLVMExpressionNode { final int literal; public LLVMI32LiteralNode(int literal) { this.literal = literal; } @Override public int executeI32(VirtualFrame frame) { return literal; } } Executable AST node Nodes return their result in an execute() method Implementation of Operations
{ @Specialization protected int executeI32(int left, int right) { return left + right; } } Executable AST node write %2 add read %i.0 1 A DSL allows a declarative style of specifying and executing nodes Implementation of Operations
final FrameSlot slot; public LLVMWriteI32Node(FrameSlot slot) { this.slot = slot; } @Specialization public void writeI32(VirtualFrame frame, int value) { frame.setInt(slot, value); } } Executable AST node write %2 add read %i.0 1 Local variables are represented by an array-like VirtualFrame object Implementation of Operations
I I I G G I I I G G Node Specialization for Profiling Feedback AST Interpreter Specialized Nodes AST Interpreter Uninitialized Nodes Compilation using Partial Evaluation Compiled Code Node Transitions S U I D G Uninitialized Integer Generic Double String
{ final int expectedValue; // observed value @Specialization protected int doI32(Address addr) { int val = memory.getI32(addr); if (val == expectedValue) { return expectedValue; } else { CompilerDirectives.transferToInterpreter(); replace(new LLVMI32LoadGenericNode()); return val; } } } The compiler can assume that the loaded value is constant
{ return val + 1; } int dec(int val) { return val - 1; } int square(int val) { return val * val; } int (*func)(int); // ... result = func(4); uninit call inc
dec(int val) { return val - 1; } int square(int val) { return val * val; } int (*func)(int); // ... result = func(4); call inc call dec uninit call Polymorphic Inline Caches for Indirect Calls 55 inc dec square
inc(int val) { return val + 1; } int dec(int val) { return val - 1; } int square(int val) { return val * val; } int (*func)(int); // ... result = func(4); inc dec square Can be used to optimize virtual calls in C++
= malloc(4 * sizeof(int)) Native Sulong: unmanaged allocations (sun.misc.Unsafe) https://github.com/graalvm/sulong Safe Sulong: managed allocations unsafe.allocateMemory(16); Address offset=0 data I32Array contents {0, 0, 0} Rigger, et al. Sulong, and Thanks For All the Bugs: Finding Errors in C Programs by Abstracting from the Native Execution Model In Proceedings of ASPLOS 2018
= malloc(4 * sizeof(int)) Native Sulong: unmanaged allocations (sun.misc.Unsafe) https://github.com/graalvm/sulong Safe Sulong: managed allocations unsafe.allocateMemory(16); Address offset=0 data I32Array contents {0, 0, 0} Rigger, et al. Sulong, and Thanks For All the Bugs: Finding Errors in C Programs by Abstracting from the Native Execution Model In Proceedings of ASPLOS 2018
= malloc(4 * sizeof(int)) Native Sulong: unmanaged allocations (sun.misc.Unsafe) https://github.com/graalvm/sulong Safe Sulong: managed allocations unsafe.allocateMemory(16); Address offset=0 data I32Array contents {0, 0, 0} Rigger, et al. Sulong, and Thanks For All the Bugs: Finding Errors in C Programs by Abstracting from the Native Execution Model In Proceedings of ASPLOS 2018
native libraries + Fallback for programs that make assumptions about the memory layout - No safety guarantees Managed Allocations + Sandboxed execution - Native interoperability 59
Java semantics • Invalid memory accesses are not optimized away 63 Rigger, et al. Lenient Execution of C on a Java Virtual Machine: or: How I Learned to Stop Worrying and Run the Code. In Proceedings of ManLang 2017 int a = 1, b = INT_MAX; int val = a + b; printf("%d\n", val); UB
Some of these are not found by LLVM’s AddressSanitizer and Valgrind 64 int main(int argc, char** argv) { printf("%d %s\n", argc, argv[5]); } Out-of-bounds accesses to argv
= malloc(sizeof (int) * 10) ; int *ptr = &(arr[4]); printf ("%ld\n", size_left(ptr)); // prints 16 printf ("%ld\n", size_right(ptr)); // prints 24 We also expose other meta data such as object types Rigger, et al. Introspection for C and its Applications to Library Robustness. In Programming 2018
bugs (Dnsmasq, Libxml2, GraphicsMagick) • Insight: most applications stay fully functional when the buffer overflow is mitigated • Drawback: Sulong still aborts execution for missing introspection checks. 71
In % of projects rdtsc 27.4% cpuid 25.4% mov 24.9% <compiler barrier> 21.8% lock xchg 14.2% … … We determined the usage of inline assembly to prioritize the implementation in Sulong Rigger, et al. An Analysis of x86-64 Inline Assembly in C Programs. In VEE 2018