$30 off During Our Annual Pro Sale. View Details »

Cambridge'18

 Cambridge'18

Executing C, C++ and Fortran Efficiently on the Java Virtual Machine via LLVM IR

Manuel Rigger

March 02, 2018
Tweet

More Decks by Manuel Rigger

Other Decks in Research

Transcript

  1. Executing C, C++ and Fortran Efficiently on the
    Java Virtual Machine via LLVM IR
    Manuel Rigger
    Johannes Kepler University Linz, Austria
    Computer Laboratory Programming Research Group Seminar,
    University of Cambridge, 2 March 2018

    View Slide

  2. JVM
    C C++ Fortran ...
    Execute on
    What is Sulong?
    2
    Execute low-level/unsafe languages on
    the Java Virtual Machine (JVM)

    View Slide

  3. Why?
    • Unchecked accesses
    • Manual memory management
    • Undefined behavior
    • Many existing safer alternatives
    are based on “unsafe”
    compilers or binary code
    3

    View Slide

  4. Why?
    • Unchecked accesses
    • Manual memory management
    • Undefined behavior
    • Many existing safer alternatives
    are based on “unsafe”
    compilers or binary code
    3
    Buffer overflows are still a serious
    problem

    View Slide

  5. Why?
    • Unchecked accesses
    • Manual memory management
    • Undefined behavior
    • Many existing safer alternatives
    are based on “unsafe”
    compilers or binary code
    3
    Use-after-free errors, double-free
    errors, …

    View Slide

  6. Why?
    • Unchecked accesses
    • Manual memory management
    • Undefined behavior
    • Many existing safer alternatives
    are based on “unsafe”
    compilers or binary code
    3
    A sufficiently advanced compiler is indistinguishable
    from an adversary. – John Regehr
    (https://blog.regehr.org)

    View Slide

  7. Why?
    • Unchecked accesses
    • Manual memory management
    • Undefined behavior
    • Many existing safer alternatives
    are based on “unsafe”
    compilers or binary code
    3
    LLVM’s ASan, Valgrind, SoftBound, …

    View Slide

  8. Why the Java Virtual Machine?
    4
    Sandboxed execution

    View Slide

  9. Why the Java Virtual Machine?
    4
    Sandboxed execution
    Garbage collection

    View Slide

  10. Why the Java Virtual Machine?
    4
    Sandboxed execution
    Garbage collection
    Existing JIT compiler

    View Slide

  11. Why the Java Virtual Machine?
    4
    Sandboxed execution
    Garbage collection
    Existing JIT compiler
    Safe implementation language

    View Slide

  12. Why the Java Virtual Machine?
    4
    Sandboxed execution
    Garbage collection
    Existing JIT compiler
    Safe implementation language
    Part of the multi-lingual GraalVM

    View Slide

  13. Sulong as Part of GraalVM
    5
    Substrate VM
    Java HotSpot VM
    JVM Compiler Interface (JVMCI) JEP 243
    Graal Compiler
    Truffle Framework
    http://www.oracle.com/technetwork/oracle-labs/program-languages

    View Slide

  14. 6

    View Slide

  15. 6
    The call stack contains
    both Ruby and C function
    stack frames

    View Slide

  16. 7

    View Slide

  17. 7
    Truffle interop allows calling
    functions of other languages
    and access their data

    View Slide

  18. Truffle and Graal Contributors
    8
    Oracle
    Danilo Ansaloni
    Stefan Anzinger
    Cosmin Basca
    Daniele Bonetta
    Matthias Brantner
    Petr Chalupa
    Jürgen Christ
    Laurent Daynès
    Gilles Duboscq
    Martin Entlicher
    Bastian Hossbach
    Christian Humer
    Mick Jordan
    Vojin Jovanovic
    Peter Kessler
    David Leopoldseder
    Kevin Menard
    Jakub Podlešák
    Aleksandar Prokopec
    Tom Rodriguez
    Oracle (continued)
    Roland Schatz
    Chris Seaton
    Doug Simon
    Štěpán Šindelář
    Zbyněk Šlajchrt
    Lukas Stadler
    Codrut Stancu
    Jan Štola
    Jaroslav Tulach
    Michael Van De Vanter
    Adam Welc
    Christian Wimmer
    Christian Wirth
    Paul Wögerer
    Mario Wolczko
    Andreas Wöß
    Thomas Würthinger
    JKU Linz
    Prof. Hanspeter Mössenböck
    Benoit Daloze
    Josef Eisl
    Thomas Feichtinger
    Matthias Grimmer
    Christian Häubl
    Josef Haider
    Christian Huber
    Stefan Marr
    Manuel Rigger
    Stefan Rumzucker
    Bernhard Urban
    Thomas Pointhuber
    Daniel Pekarek
    Jacob Kreindl
    Mario Kahlhofer
    University of Edinburgh
    Christophe Dubach
    Juan José Fumero Alfonso
    Ranjeet Singh
    Toomas Remmelg
    LaBRI
    Floréal Morandat
    University of California, Irvine
    Prof. Michael Franz
    Gulfem Savrun Yeniceri
    Wei Zhang
    Purdue University
    Prof. Jan Vitek
    Tomas Kalibera
    Petr Maj Lei Zhao
    T. U. Dortmund
    Prof. Peter Marwedel
    Helena Kotthaus
    Ingo Korb
    University of California, Davis
    Prof. Duncan Temple Lang
    Nicholas Ulle
    University of Lugano, Switzerland
    Prof. Walter Binder
    Sun Haiyang
    Yudi Zheng
    Oracle Interns
    Brian Belleville
    Miguel Garcia
    Shams Imam
    Alexey Karyakin
    Stephen Kell
    Andreas Kunft
    Volker Lanting
    Gero Leinemann
    Julian Lettner
    David Piorkowski
    Gregor Richards
    Robert Seilbeck
    Rifat Shariyar
    Oracle Alumni
    Erik Eckstein
    Michael Haupt
    Christos Kotselidis
    Hyunjin Lee
    David Leibs
    Chris Thalinger
    Till Westmann

    View Slide

  19. Structure of the Talk
    Execution and compilation of LLVM IR (Sulong)
    Memory safety (Safe Sulong) and performance evaluation
    Introspection to increase the robustness of libraries
    Challenges of executing C on the Java Virtual Machine
    9

    View Slide

  20. Execution and compilation of LLVM IR
    10

    View Slide

  21. LLVM IR Interpreter
    Truffle
    LLVM IR
    Clang
    C C++
    GCC
    Fortran
    Other
    LLVM
    frontend
    ...
    JVM
    LLVM tools
    Graal compiler
    System Overview
    11
    Manuel Rigger, et al. Bringing low-level languages to the
    JVM: efficient execution of LLVM IR on Truffle.
    In Proceedings of VMIL 2016

    View Slide

  22. LLVM IR Interpreter
    Truffle
    LLVM IR
    Clang
    C C++
    GCC
    Fortran
    Other
    LLVM
    frontend
    ...
    JVM
    LLVM tools
    Graal compiler
    System Overview
    11
    Manuel Rigger, et al. Bringing low-level languages to the
    JVM: efficient execution of LLVM IR on Truffle.
    In Proceedings of VMIL 2016

    View Slide

  23. LLVM IR Interpreter
    Truffle
    LLVM IR
    Clang
    C C++
    GCC
    Fortran
    Other
    LLVM
    frontend
    ...
    JVM
    LLVM tools
    Graal compiler
    System Overview
    11
    Manuel Rigger, et al. Bringing low-level languages to the
    JVM: efficient execution of LLVM IR on Truffle.
    In Proceedings of VMIL 2016

    View Slide

  24. LLVM IR Interpreter
    Truffle
    LLVM IR
    Clang
    C C++
    GCC
    Fortran
    Other
    LLVM
    frontend
    ...
    JVM
    LLVM tools
    Graal compiler
    System Overview
    11
    Manuel Rigger, et al. Bringing low-level languages to the
    JVM: efficient execution of LLVM IR on Truffle.
    In Proceedings of VMIL 2016

    View Slide

  25. LLVM IR Interpreter
    Truffle
    LLVM IR
    Clang
    C C++
    GCC
    Fortran
    Other
    LLVM
    frontend
    ...
    JVM
    LLVM tools
    Graal compiler
    System Overview
    11
    Manuel Rigger, et al. Bringing low-level languages to the
    JVM: efficient execution of LLVM IR on Truffle.
    In Proceedings of VMIL 2016

    View Slide

  26. Example Program
    12
    void processRequests () {
    int i = 0;
    do {
    processPacket ();
    i ++;
    } while (i < 10000) ;
    }
    define void @processRequests () #0 {
    ; ( basic block 0)
    br label %1
    ; :1 ( basic block 1)
    %i .0 = phi i32 [ 0, %0 ], [ %2 , %1 ]
    call void @processPacket ()
    %2 = add nsw i32 %i .0, 1
    %3 = icmp slt i32 %2 , 10000
    br i1 %3 , label %1 , label %4
    ; :4 ( basic block 2)
    ret void
    }
    LLVM IR
    Clang

    View Slide

  27. LLVM IR
    Program
    Interpret
    the program
    Execute the
    compiled code
    Deoptimize
    Compile often
    executed function
    Create executable
    interpreter nodes
    Executing LLVM IR with Sulong
    13

    View Slide

  28. 14
    define void @processRequests () #0 {
    ; ( basic block 0)
    br label %1
    ; :1 ( basic block 1)
    %i .0 = phi i32 [ 0, %0 ], [ %2 , %1 ]
    call void @processPacket ()
    %2 = add nsw i32 %i .0, 1
    %3 = icmp slt i32 %2 , 10000
    br i1 %3 , label %1 , label %4
    ; :4 ( basic block 2)
    ret void
    }
    LLVM IR
    write
    %2
    add
    read
    %i.0
    1
    Executable Abstract Syntax Tree
    Implementation of Operations

    View Slide

  29. 15
    write
    %2
    add
    read
    %i.0
    1
    Abstract Syntax Tree
    class LLVMI32LiteralNode extends LLVMExpressionNode {
    final int literal;
    public LLVMI32LiteralNode(int literal) {
    this.literal = literal;
    }
    @Override
    public int executeI32(VirtualFrame frame) {
    return literal;
    }
    }
    Executable AST node
    Nodes return their result
    in an execute() method
    Implementation of Operations

    View Slide

  30. 16
    Abstract Syntax Tree
    @NodeChildren({@NodeChild("leftNode"),
    @NodeChild("rightNode")})
    class LLVMI32AddNode extends LLVMExpressionNode {
    @Specialization
    protected int executeI32(int left, int right) {
    return left + right;
    }
    }
    Executable AST node
    write
    %2
    add
    read
    %i.0
    1
    A DSL allows a declarative style of
    specifying and executing nodes
    Implementation of Operations

    View Slide

  31. 17
    Abstract Syntax Tree
    @NodeChild("valueNode")
    class LLVMWriteI32Node extends LLVMExpressionNode {
    final FrameSlot slot;
    public LLVMWriteI32Node(FrameSlot slot) {
    this.slot = slot;
    }
    @Specialization
    public void writeI32(VirtualFrame frame, int value) {
    frame.setInt(slot, value);
    }
    }
    Executable AST node
    write
    %2
    add
    read
    %i.0
    1
    Local variables are represented by
    an array-like VirtualFrame object
    Implementation of Operations

    View Slide

  32. Example Program
    18
    define void @processRequests () #0 {
    ; ( basic block 0)
    br label %1
    ; :1 ( basic block 1)
    %i .0 = phi i32 [ 0, %0 ], [ %2 , %1 ]
    call void @processPacket ()
    %2 = add nsw i32 %i .0, 1
    %3 = icmp slt i32 %2 , 10000
    br i1 %3 , label %1 , label %4
    ; :4 ( basic block 2)
    ret void
    }
    LLVM IR
    Contains unstructured control flow

    View Slide

  33. Block0
    Block1
    Block2
    Basic Block Dispatch Node
    1 2 -1
    1
    Interpreter
    19
    int blockIndex = 0;
    while (blockIndex != -1)
    blockIndex = blocks[blockIndex ].execute ();
    Interpreter implementation
    An AST interpreter cannot
    represent goto statements

    View Slide

  34. Block0
    Block1
    Block2
    Basic Block Dispatch Node
    1 2 -1
    1
    Interpreter
    20
    define void @processRequests () #0 {
    ; ( basic block 0)
    br label %1
    ; :1 ( basic block 1)
    %i .0 = phi i32 [ 0, %0 ], [ %2 , %1 ]
    call void @processPacket ()
    %2 = add nsw i32 %i .0, 1
    %3 = icmp slt i32 %2 , 10000
    br i1 %3 , label %1 , label %4
    ; :4 ( basic block 2)
    ret void
    }
    Program execution

    View Slide

  35. Block0
    Block1
    Block2
    Basic Block Dispatch Node
    1 2 -1
    1
    Interpreter
    21
    define void @processRequests () #0 {
    ; ( basic block 0)
    br label %1
    ; :1 ( basic block 1)
    %i .0 = phi i32 [ 0, %0 ], [ %2 , %1 ]
    call void @processPacket ()
    %2 = add nsw i32 %i .0, 1
    %3 = icmp slt i32 %2 , 10000
    br i1 %3 , label %1 , label %4
    ; :4 ( basic block 2)
    ret void
    }
    Program execution

    View Slide

  36. Block0
    Block1
    Block2
    Basic Block Dispatch Node
    1 2 -1
    1
    Interpreter
    22
    define void @processRequests () #0 {
    ; ( basic block 0)
    br label %1
    ; :1 ( basic block 1)
    %i .0 = phi i32 [ 0, %0 ], [ %2 , %1 ]
    call void @processPacket ()
    %2 = add nsw i32 %i .0, 1
    %3 = icmp slt i32 %2 , 10000
    br i1 %3 , label %1 , label %4
    ; :4 ( basic block 2)
    ret void
    }
    Program execution

    View Slide

  37. Executing LLVM IR with Sulong
    23
    LLVM IR
    Program
    Interpret
    the program
    Execute the
    compiled code
    Deoptimize
    Compile often
    executed function
    Create executable
    interpreter nodes

    View Slide

  38. Partial evaluation
    • Assume that nodes are constant
    • Assumption allows inlining of the execute() methods
    24

    View Slide

  39. Block0
    Block1
    Block2
    Basic Block Dispatch Node
    1 2 -1
    1
    Compiler
    25
    int blockIndex = 0;
    while (blockIndex != -1)
    blockIndex = blocks[blockIndex].execute();
    Unrolling of the interpreter loop

    View Slide

  40. Block0
    Block1
    Block2
    Basic Block Dispatch Node
    1 2 -1
    1
    Compiler
    26
    int blockIndex = 0;
    while (blockIndex != -1)
    blockIndex = blocks[blockIndex].execute();
    Unrolling of the interpreter loop

    View Slide

  41. Block0
    Block1
    Block2
    Basic Block Dispatch Node
    1 2 -1
    1
    Compiler
    27
    int blockIndex = 0;
    block0:
    blockIndex = blocks[0].execute();
    while (blockIndex != -1)
    blockIndex = blocks[blockIndex].execute();
    Unrolling of the interpreter 1st iteration

    View Slide

  42. Block0
    Block1
    Block2
    Basic Block Dispatch Node
    1 2 -1
    1
    Compiler
    28
    ; ( basic block 0)
    br label %1
    int blockIndex = 0;
    block0:
    blockIndex = blocks[0].execute();
    while (blockIndex != -1)
    blockIndex = blocks[blockIndex].execute();
    Unrolling of the interpreter 1st iteration

    View Slide

  43. Block0
    Block1
    Block2
    Basic Block Dispatch Node
    1 2 -1
    1
    Compiler
    29
    ; ( basic block 0)
    br label %1
    int blockIndex = 0;
    block0:
    blockIndex = 1
    while (blockIndex != -1)
    blockIndex = blocks[blockIndex].execute();
    Unrolling of the interpreter 1st iteration

    View Slide

  44. Block0
    Block1
    Block2
    Basic Block Dispatch Node
    1 2 -1
    1
    Compiler
    30
    int blockIndex = 0;
    block0:
    blockIndex = 1
    while (blockIndex != -1)
    blockIndex = blocks[blockIndex].execute();
    Unrolling of the interpreter 1st iteration

    View Slide

  45. Block0
    Block1
    Block2
    Basic Block Dispatch Node
    1 2 -1
    1
    Compiler
    31
    int blockIndex = 0;
    block0:
    blockIndex = 1
    while (blockIndex != -1)
    blockIndex = blocks[blockIndex].execute();
    Unrolling of the interpreter 1st iteration

    View Slide

  46. Block0
    Block1
    Block2
    Basic Block Dispatch Node
    1 2 -1
    1
    Compiler
    32
    int blockIndex = 0;
    block0:
    blockIndex = 1
    block1:
    blockIndex = blocks[1].execute();
    while (blockIndex != -1)
    blockIndex = blocks[blockIndex].execute();
    Unrolling of the interpreter 2nd iteration

    View Slide

  47. Block0
    Block1
    Block2
    Basic Block Dispatch Node
    1 2 -1
    1
    Compiler
    33
    ; :1 ( basic block 1)
    %i .0 = phi i32 [ 0, %0 ], [ %2 , %1 ]
    call void @processPacket ()
    %2 = add nsw i32 %i .0, 1
    %3 = icmp slt i32 %2 , 10000
    br i1 %3 , label %1 , label %4
    int blockIndex = 0;
    block0:
    blockIndex = 1
    block1:
    blockIndex = blocks[1].execute();
    while (blockIndex != -1)
    blockIndex = blocks[blockIndex].execute();
    Unrolling of the interpreter 2nd iteration

    View Slide

  48. Block0
    Block1
    Block2
    Basic Block Dispatch Node
    1 2 -1
    1
    Compiler
    34
    ; :1 ( basic block 1)
    %i .0 = phi i32 [ 0, %0 ], [ %2 , %1 ]
    call void @processPacket ()
    %2 = add nsw i32 %i .0, 1
    %3 = icmp slt i32 %2 , 10000
    br i1 %3 , label %1 , label %4
    int blockIndex = 0;
    block0:
    blockIndex = 1
    block1:
    blockIndex = blocks[1].execute();
    while (blockIndex != -1)
    blockIndex = blocks[blockIndex].execute();
    Unrolling of the interpreter 2nd iteration

    View Slide

  49. Block0
    Block1
    Block2
    Basic Block Dispatch Node
    1 2 -1
    1
    Compiler
    35
    ; :1 ( basic block 1)
    %i .0 = phi i32 [ 0, %0 ], [ %2 , %1 ]
    call void @processPacket ()
    %2 = add nsw i32 %i .0, 1
    %3 = icmp slt i32 %2 , 10000
    br i1 %3 , label %1 , label %4
    int blockIndex = 0;
    block0:
    blockIndex = 1
    %i.0 = 0
    block1:
    blockIndex = blocks[1].execute();
    if blockIndex == 1:
    %i.0 = %2
    while (blockIndex != -1)
    blockIndex = blocks[blockIndex].execute();
    Unrolling of the interpreter 2nd iteration
    Nodes in predecessor blocks
    assign values used in phis

    View Slide

  50. Block0
    Block1
    Block2
    Basic Block Dispatch Node
    1 2 -1
    1
    Compiler
    36
    ; :1 ( basic block 1)
    %i .0 = phi i32 [ 0, %0 ], [ %2 , %1 ]
    call void @processPacket ()
    %2 = add nsw i32 %i .0, 1
    %3 = icmp slt i32 %2 , 10000
    br i1 %3 , label %1 , label %4
    int blockIndex = 0;
    block0:
    blockIndex = 1
    %i.0 = 0
    block1:
    blockIndex = blocks[1].execute();
    if blockIndex == 1:
    %i.0 = %2
    while (blockIndex != -1)
    blockIndex = blocks[blockIndex].execute();
    Unrolling of the interpreter 2nd iteration

    View Slide

  51. Block0
    Block1
    Block2
    Basic Block Dispatch Node
    1 2 -1
    1
    Compiler
    37
    Unrolling of the interpreter
    ; :1 ( basic block 1)
    %i .0 = phi i32 [ 0, %0 ], [ %2 , %1 ]
    call void @processPacket ()
    %2 = add nsw i32 %i .0, 1
    %3 = icmp slt i32 %2 , 10000
    br i1 %3 , label %1 , label %4
    int blockIndex = 0;
    block0:
    blockIndex = 1
    %i.0 = 0
    block1:
    processPacket()
    %2 = %i.0 + 1
    %3 = %2 < 10000
    if %3:
    blockIndex = 1
    %i.0 = %2
    while (blockIndex != -1)
    blockIndex = blocks[blockIndex].execute();
    else:
    // …
    Unrolling of the interpreter 2rd iteration

    View Slide

  52. Block0
    Block1
    Block2
    Basic Block Dispatch Node
    1 2 -1
    1
    Compiler
    38
    int blockIndex = 0;
    block0:
    blockIndex = 1
    %i.0 = 0
    block1:
    processPacket()
    %2 = %i.0 + 1
    %3 = %2 < 10000
    if %3:
    blockIndex = 1
    %i.0 = %2
    goto block1
    else:
    // …
    Unrolling of the interpreter 3rd iteration

    View Slide

  53. Block0
    Block1
    Block2
    Basic Block Dispatch Node
    1 2 -1
    1
    Compiler
    39
    Unrolling of the interpreter
    int blockIndex = 0;
    block0:
    blockIndex = 1
    %i.0 = 0
    block1:
    while (true):
    processPacket()
    %2 = %i.0 + 1
    %3 = %2 < 10000
    if %3:
    blockIndex = 1
    %i.0 = %2
    continue;
    else:
    // …
    Unrolling of the interpreter 3rd iteration
    Merging already expanded paths
    makes the compilation work!

    View Slide

  54. Block0
    Block1
    Block2
    Basic Block Dispatch Node
    1 2 -1
    1
    Compiler
    40
    Unrolling of the interpreter
    int blockIndex = 0;
    block0:
    blockIndex = 1
    %i.0 = 0
    block1:
    while (true):
    processPacket()
    %2 = %i.0 + 1
    %3 = %2 < 10000
    if %3:
    blockIndex = 1
    %i.0 = %2
    continue;
    else:
    blockIndex = 2
    while (blockIndex != -1)
    blockIndex = blocks[blockIndex].execute();
    Unrolling of the interpreter 3rd iteration

    View Slide

  55. Block0
    Block1
    Block2
    Basic Block Dispatch Node
    1 2 -1
    1
    Compiler
    41
    Unrolling of the interpreter
    int blockIndex = 0;
    block0:
    blockIndex = 1
    %i.0 = 0
    block1:
    while (true):
    processPacket()
    %2 = %i.0 + 1
    %3 = %2 < 10000
    if %3:
    blockIndex = 1
    %i.0 = %2
    continue;
    else:
    blockIndex = 2
    while (blockIndex != -1)
    blockIndex = blocks[blockIndex].execute();
    Unrolling of the interpreter 3rd iteration

    View Slide

  56. Block0
    Block1
    Block2
    Basic Block Dispatch Node
    1 2 -1
    1
    Compiler
    42
    int blockIndex = 0;
    block0:
    blockIndex = 1
    %i.0 = 0
    block1:
    while (true):
    processPacket()
    %2 = %i.0 + 1
    %3 = %2 < 10000
    if %3:
    blockIndex = 1
    %i.0 = %2
    continue;
    else:
    blockIndex = 2
    block2:
    blockIndex = blocks[2].execute();
    while (blockIndex != -1)
    blockIndex = blocks[blockIndex].execute();
    Unrolling of the interpreter 3rd iteration

    View Slide

  57. Block0
    Block1
    Block2
    Basic Block Dispatch Node
    1 2 -1
    1
    Compiler
    43
    ; :4 ( basic block 2)
    ret void
    int blockIndex = 0;
    block0:
    blockIndex = 1
    %i.0 = 0
    block1:
    while (true):
    processPacket()
    %2 = %i.0 + 1
    %3 = %2 < 10000
    if %3:
    blockIndex = 1
    %i.0 = %2
    continue;
    else:
    blockIndex = 2
    block2:
    blockIndex = blocks[2].execute();
    while (blockIndex != -1)
    blockIndex = blocks[blockIndex].execute();
    Unrolling of the interpreter 3rd iteration

    View Slide

  58. Block0
    Block1
    Block2
    Basic Block Dispatch Node
    1 2 -1
    1
    Compiler
    44
    int blockIndex = 0;
    block0:
    blockIndex = 1
    %i.0 = 0
    block1:
    while (true):
    processPacket()
    %2 = %i.0 + 1
    %3 = %2 < 10000
    if %3:
    blockIndex = 1
    %i.0 = %2
    continue;
    else:
    blockIndex = 2
    block2:
    blockIndex = -1
    while (blockIndex != -1)
    blockIndex = blocks[blockIndex].execute();
    Unrolling of the interpreter 3rd iteration

    View Slide

  59. Compiler
    45
    int blockIndex = 0;
    block0:
    blockIndex = 1
    %i.0 = 0
    block1:
    while (true):
    processPacket()
    %2 = %i.0 + 1
    %3 = %2 < 10000
    if %3:
    blockIndex = 1
    %i.0 = %2
    continue;
    else:
    blockIndex = 2
    block2:
    blockIndex = -1
    while (blockIndex != -1)
    blockIndex = blocks[blockIndex].execute();
    Unrolling of the interpreter 3rd iteration
    Block0
    Block1
    Block2
    Basic Block Dispatch Node
    1 2 -1
    1

    View Slide

  60. Compiler
    46
    int blockIndex = 0;
    block0:
    blockIndex = 1
    %i.0 = 0
    block1:
    while (true):
    processPacket()
    %2 = %i.0 + 1
    %3 = %2 < 10000
    if %3:
    blockIndex = 1
    %i.0 = %2
    continue;
    else:
    blockIndex = 2
    block2:
    blockIndex = -1
    return
    Unrolling of the interpreter 3rd iteration
    Block0
    Block1
    Block2
    Basic Block Dispatch Node
    1 2 -1
    1

    View Slide

  61. Block0
    Block1
    Block2
    Basic Block Dispatch Node
    1 2 -1
    1
    Compiler
    47
    int blockIndex = 0;
    block0:
    blockIndex = 1
    %i.0 = 0
    block1:
    while (true):
    processPacket()
    %2 = %i.0 + 1
    %3 = %2 < 10000
    if %3:
    blockIndex = 1
    %i.0 = %2
    continue;
    else:
    blockIndex = 2
    block2:
    blockIndex = -1
    return
    Unrolling of the interpreter 3rd iteration
    Graal further optimizes the
    partially evaluated interpreter

    View Slide

  62. LLVM IR
    Program
    Interpret
    the program
    Execute the
    compiled code
    Deoptimize
    Compile often
    executed function
    Create executable
    interpreter nodes
    Executing LLVM IR with Sulong
    48

    View Slide

  63. Deoptimization
    • Truffle nodes can implement speculative assumptions
    • A failed assumption requires discarding the machine code and
    continuing execution in the interpreter
    49

    View Slide

  64. Node Rewriting in Truffle
    50
    U
    U U
    U
    U I
    I I
    G
    G I
    I I
    G
    G
    Node Specialization
    for Profiling Feedback
    AST Interpreter
    Specialized Nodes
    AST Interpreter
    Uninitialized Nodes
    Compilation using
    Partial Evaluation
    Compiled Code
    Node Transitions
    S
    U
    I
    D
    G
    Uninitialized Integer
    Generic
    Double
    String

    View Slide

  65. Node Rewriting in Truffle
    51
    I
    I I
    G
    G I
    I I
    G
    G
    Transfer back
    to AST Interpreter
    D
    I D
    G
    G D
    I D
    G
    G
    Node Specialization to
    Update Profiling Feedback
    Recompilation using
    Partial Evaluation

    View Slide

  66. Speculative Optimization: Value Profiling
    52
    public class LLVMI32LoadNode extends LLVMExpressionNode {
    final int expectedValue; // observed value
    @Specialization
    protected int doI32(Address addr) {
    int val = memory.getI32(addr);
    if (val == expectedValue) {
    return expectedValue;
    } else {
    CompilerDirectives.transferToInterpreter();
    replace(new LLVMI32LoadGenericNode());
    return val;
    }
    }
    }
    The compiler can assume that
    the loaded value is constant

    View Slide

  67. Polymorphic Inline Caches for Indirect Calls
    53
    int inc(int val) { return val + 1; }
    int dec(int val) { return val - 1; }
    int square(int val) { return val * val; }
    int (*func)(int);
    // ...
    result = func(4);
    uninit
    call
    inc

    View Slide

  68. int inc(int val) { return val + 1; }
    int dec(int val) { return val - 1; }
    int square(int val) { return val * val; }
    int (*func)(int);
    // ...
    result = func(4);
    Polymorphic Inline Caches for Indirect Calls
    54
    call
    inc
    uninit
    call
    Enables inlining of
    indirect calls
    inc
    dec

    View Slide

  69. int inc(int val) { return val + 1; }
    int dec(int val) { return val - 1; }
    int square(int val) { return val * val; }
    int (*func)(int);
    // ...
    result = func(4);
    call
    inc
    call
    dec
    uninit
    call
    Polymorphic Inline Caches for Indirect Calls
    55
    inc
    dec
    square

    View Slide

  70. Polymorphic Inline Caches for Indirect Calls
    56
    indirect
    call
    int inc(int val) { return val + 1; }
    int dec(int val) { return val - 1; }
    int square(int val) { return val * val; }
    int (*func)(int);
    // ...
    result = func(4);
    inc
    dec
    square
    Can be used to
    optimize virtual calls
    in C++

    View Slide

  71. Memory safety
    57

    View Slide

  72. Handling of Allocations in the User Program
    58
    int *arr = malloc(4 * sizeof(int))
    Native Sulong: unmanaged
    allocations (sun.misc.Unsafe)
    https://github.com/graalvm/sulong
    Safe Sulong: managed allocations
    unsafe.allocateMemory(16); Address
    offset=0
    data
    I32Array
    contents {0, 0, 0}
    Rigger, et al. Sulong, and Thanks For All the Bugs: Finding Errors in C
    Programs by Abstracting from the Native Execution Model
    In Proceedings of ASPLOS 2018

    View Slide

  73. Handling of Allocations in the User Program
    58
    int *arr = malloc(4 * sizeof(int))
    Native Sulong: unmanaged
    allocations (sun.misc.Unsafe)
    https://github.com/graalvm/sulong
    Safe Sulong: managed allocations
    unsafe.allocateMemory(16); Address
    offset=0
    data
    I32Array
    contents {0, 0, 0}
    Rigger, et al. Sulong, and Thanks For All the Bugs: Finding Errors in C
    Programs by Abstracting from the Native Execution Model
    In Proceedings of ASPLOS 2018

    View Slide

  74. Handling of Allocations in the User Program
    58
    int *arr = malloc(4 * sizeof(int))
    Native Sulong: unmanaged
    allocations (sun.misc.Unsafe)
    https://github.com/graalvm/sulong
    Safe Sulong: managed allocations
    unsafe.allocateMemory(16); Address
    offset=0
    data
    I32Array
    contents {0, 0, 0}
    Rigger, et al. Sulong, and Thanks For All the Bugs: Finding Errors in C
    Programs by Abstracting from the Native Execution Model
    In Proceedings of ASPLOS 2018

    View Slide

  75. Allocations in the User Program
    Unmanaged allocations
    + Interoperability with native
    libraries
    + Fallback for programs that make
    assumptions about the memory
    layout
    - No safety guarantees
    Managed Allocations
    + Sandboxed execution
    - Native interoperability
    59

    View Slide

  76. Type Hierarchy for Managed Objects
    60
    Automatic bounds, types, and null pointer checks!
    ManagedObject
    ManagedAddress
    pointee: ManagedObject
    pointerOffset: int
    I32Array
    values: int[]
    Function
    functionIndex: int
    I32
    value: int
    Struct
    values: Dictionary

    View Slide

  77. Prevent Out-Of-Bounds Accesses
    contents[20 / 4]  ArrayIndexOutOfBoundsException
    61
    int *arr = malloc(3 * sizeof(int))
    arr[5] = …
    ManagedAddress
    offset=20
    data
    I32Array
    contents {1, 2, 3}

    View Slide

  78. Prevent Use-After-Free Errors
    contents[0] NullPointerException
    62
    free(arr);
    arr[0] = …
    ManagedAddress
    offset=20
    data
    I32Array
    contents=null

    View Slide

  79. Safe Semantics
    • We assign semantics to otherwise undefined behavior  Java
    semantics
    • Invalid memory accesses are not optimized away
    63
    Rigger, et al. Lenient Execution of C on a Java Virtual Machine: or:
    How I Learned to Stop Worrying and Run the Code. In Proceedings of
    ManLang 2017
    int a = 1, b = INT_MAX;
    int val = a + b;
    printf("%d\n", val);
    UB

    View Slide

  80. Found Errors
    • 68 errors in small open-source projects
    • Some of these are not found by LLVM’s AddressSanitizer and Valgrind
    64
    int main(int argc, char** argv) {
    printf("%d %s\n", argc, argv[5]);
    }
    Out-of-bounds accesses to argv

    View Slide

  81. Performance During Warmup
    (higher is better)
    65
    0
    10
    20
    30
    40
    50
    60
    1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37
    Iterations per second
    Second
    Meteor benchmark
    Asan (Clang O0) Safe Sulong Valgrind
    We are working on On-stack Replacement
    to reduce the warmup time

    View Slide

  82. Evaluation: Peak Performance
    (lower is better)
    66

    View Slide

  83. Introspection to increase the
    robustness of libraries
    67

    View Slide

  84. Introspection Functions
    68
    size_left size_right
    sizeof(int) * 10
    int *arr = malloc(sizeof (int) * 10) ;
    int *ptr = &(arr[4]);
    printf ("%ld\n", size_left(ptr)); // prints 16
    printf ("%ld\n", size_right(ptr)); // prints 24
    We also expose other meta
    data such as object types
    Rigger, et al. Introspection for C and its Applications to Library
    Robustness. In Programming 2018

    View Slide

  85. Usage of Introspection
    • Improve availability of the system
    • Fix incomplete APIs
    • Improve bug-finding capabilities
    69

    View Slide

  86. Improve availability of the system
    70
    size_t strlen(const char *str) {
    size_t len = 0;
    while (size_right(str) > 0 && *str != '\0') {
    len++;
    str++;
    }
    return len;
    }
    Make libc robust against
    missing NUL terminators

    View Slide

  87. Improve availability of the system
    • Case study on real-world bugs (Dnsmasq, Libxml2, GraphicsMagick)
    • Insight: most applications stay fully functional when the buffer
    overflow is mitigated
    • Drawback: Sulong still aborts execution for missing introspection
    checks.
    71

    View Slide

  88. Fix incomplete APIs
    72
    Make gets() robust
    against input that would
    overflow the buffer
    char* gets(char *str) {
    int size = size_right(str);
    return gets_s(str, size == -1 ? 0 : size);
    }

    View Slide

  89. Improve bug-finding capabilities
    73
    Find "lurking" bugs
    char* gets_s(char *str, rsize_t n) {
    if (size_right(str) < n) {
    abort();
    } else {
    // original code
    }
    }

    View Slide

  90. Introspection is applicable for many other
    bug-finding tools
    • We also implemented it in
    • GCC’s Intel MPX based bounds checks instrumentation
    • LLVM’s Asan
    • SoftBound
    74
    ssize_t _size_right(void* p){
    ssize_t upper_bounds = (ssize_t)__builtin___bnd_get_ptr_ubound(p);
    size_t size = (size_t) (upper_bounds + 1) - (size_t) p;
    return (ssize_t) size;
    }

    View Slide

  91. Challenges of Executing C on a JVM
    75

    View Slide

  92. C Projects Consist of More Than C Code
    76
    public abstract static class LLVMAMD64RdtscReadNode
    extends LLVMExpressionNode {
    public long executeRdtsc() {
    return System.nanoTime();
    }
    }
    asm("rdtsc":"=a"(tickl),"=d"(tickh));

    View Slide

  93. C Projects Consist of More Than C Code
    77
    Instructions In % of
    projects
    rdtsc 27.4%
    cpuid 25.4%
    mov 24.9%
    21.8%
    lock xchg 14.2%
    … …
    We determined the usage of
    inline assembly to prioritize the
    implementation in Sulong
    Rigger, et al. An Analysis of x86-64 Inline Assembly in C
    Programs. In VEE 2018

    View Slide

  94. C Projects Consist of More Than C Code
    78
    public abstract static class CountLeadingZeroesI64Node
    extends LLVMExpressionNode {
    public long executeRdtsc(long val) {
    return Long.numberOfLeadingZeroes(val);
    }
    }
    __builtin_clz(num);

    View Slide

  95. GCC builtins
    79
    We are currently investigating
    the usage of GCC builtins
    Builtins In % of
    projects
    __builtin_expect 48.2%
    __builtin_clz 29.3%
    __builtin_bswap32 26.2%
    __builtin_constant_p 23.3%
    __builtin_alloca 20.3%
    … …

    View Slide

  96. Native Interoperability
    • Native Sulong: object is a native
    allocation
    • Safe Sulong: object is a Java
    object
    81
    process(object)
    program.c lib.so

    View Slide

  97. Native Intoperability
    Hybrid Sulong version
    • Native interoperability where
    needed
    • Memory safety where possible
    x86 Truffle Interpreter
    82

    View Slide

  98. Running a complete libc
    83
    public class LLVMAMD64SyscallGetcwdNode {
    @Specialization
    protected long doOp(LLVMAddress buf, long size) {
    String cwd = LLVMPath.getcwd();
    if (cwd.length() >= size) {
    return -LLVMAMD64Error.ERANGE;
    } else {
    LLVMString.strcpy(buf, cwd);
    return cwd.length() + 1;
    }
    }
    }
    Emulate the Linux syscall API

    View Slide

  99. Thanks for listening!
    84
    https://github.com/graalvm/sulong/
    @RiggerManuel

    View Slide

  100. • Sulong executes LLVM IR on the JVM
    • Speculative optimizations
    • Native Sulong: allocates unmanaged
    memory
    • Safe Sulong: allocates Java objects to
    provide memory safety
    • Introspection exposes metadata to
    library writers
    • Sulong partially supports inline assembly
    and compiler builtins
    LLVM IR Interpreter
    Truffle
    LLVM IR
    Clang
    C C++
    GCC
    Fortran
    Other
    LLVM
    frontend
    ...
    JVM
    LLVM tools
    Graal compiler

    View Slide