Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Safe and Efficient Execution of LLVM-based Languages on the Java Virtual Machine

Safe and Efficient Execution of LLVM-based Languages on the Java Virtual Machine

Talk held at the Swiss LLVM Compiler and Code Generation Social. The talk was recorded and is available at https://www.youtube.com/watch?v=SMth9PN2sF4.

Manuel Rigger

March 14, 2019
Tweet

More Decks by Manuel Rigger

Other Decks in Research

Transcript

  1. Safe and Efficient Execution
    of LLVM-based Languages on
    the Java Virtual Machine
    Swiss LLVM Compiler and Code Generation Social
    14. March 2019
    Manuel Rigger
    Advanced Software Technologies Lab (Zhendong Su)
    @RiggerManuel

    View Slide

  2. Bachelor and Master Thesis Topics
    2
    https://people.inf.ethz.ch/suz/mstopics.html

    View Slide

  3. Unsafe Languages are Popular
    Rank Programming
    Language
    1 Java
    2 C
    3 C++
    3
    (TIOBE Index for November 2018)

    View Slide

  4. Unsafe Languages are Popular
    Rank Programming
    Language
    1 Java
    2 C
    3 C++
    3
    C and C++ are considered unsafe
    (TIOBE Index for November 2018)

    View Slide

  5. C/C++ is Widespread
    4
    Important software is written
    in unsafe languages

    View Slide

  6. C/C++ is Responsible for Dangerous Vulnerabilities
    5
    Heartbleed
    Cloudbleed

    View Slide

  7. C/C++ is Responsible for Dangerous Vulnerabilities
    5
    Heartbleed
    Cloudbleed
    Caused by buffer overflows, the
    most dangerous vulnerability in
    unsafe languages

    View Slide

  8. What Makes a Language Unsafe?
    6
    Undefined
    Behavior (UB)
    “behavior, upon use of a nonportable
    or erroneous program construct or of
    erroneous data, for which this
    International Standard imposes no
    requirements “
    (C99 standard)

    View Slide

  9. Examples for Undefined Behavior
    Buffer overflow
    Use-after-free
    error
    Integer overflow
    7

    View Slide

  10. Buffer Overflows: Leaking Sensitive Data
    8
    long *arr = malloc(3 * sizeof(long));
    arr: secret

    View Slide

  11. Buffer Overflows: Leaking Sensitive Data
    9
    long *arr = malloc(3 * sizeof(long));
    long dest[4];
    memcpy(dest, arr, sizeof(dest));
    arr:
    dest:
    secret

    View Slide

  12. Buffer Overflows: Leaking Sensitive Data
    9
    long *arr = malloc(3 * sizeof(long));
    long dest[4];
    memcpy(dest, arr, sizeof(dest));
    arr:
    dest:
    secret
    UB

    View Slide

  13. Buffer Overflows: Leaking Sensitive Data
    10
    long *arr = malloc(3 * sizeof(long));
    long dest[4];
    memcpy(dest, arr, sizeof(dest));
    arr:
    dest:
    secret
    secret

    View Slide

  14. Buffer Overflows: Leaking Sensitive Data
    10
    long *arr = malloc(3 * sizeof(long));
    long dest[4];
    memcpy(dest, arr, sizeof(dest));
    arr:
    dest:
    Heartbleed and Cloudbleed
    were such vulnerabilities
    secret
    secret

    View Slide

  15. Buffer Overflows: Changing Control Flow
    11
    long *arr = malloc(3 * sizeof(long));
    arrbefore
    : &func

    View Slide

  16. Buffer Overflows: Changing Control Flow
    12
    long *arr = malloc(3 * sizeof(long));
    arr[4] = 0xfe…;
    arrbefore
    :
    arrafter
    :
    &func
    0xfe...

    View Slide

  17. Buffer Overflows: Changing Control Flow
    12
    long *arr = malloc(3 * sizeof(long));
    arr[4] = 0xfe…;
    arrbefore
    :
    arrafter
    :
    UB
    &func
    0xfe...

    View Slide

  18. Buffer Overflows: Changing Control Flow
    12
    long *arr = malloc(3 * sizeof(long));
    arr[4] = 0xfe…;
    arrbefore
    :
    Allows attackers to change
    the program‘s control flow
    arrafter
    :
    UB
    &func
    0xfe...

    View Slide

  19. Use-after-free Error
    13
    long *arr = malloc(3 * sizeof(long));
    free(arr);
    arr[0] = …;
    UB

    View Slide

  20. Use-after-free Error
    13
    long *arr = malloc(3 * sizeof(long));
    free(arr);
    arr[0] = …;
    UB
    Can overwrite another object if the
    memory was reallocated

    View Slide

  21. Integer Overflow
    14
    int a = 1, b = INT_MAX;
    int val = a + b;
    UB

    View Slide

  22. Integer Overflow
    14
    int a = 1, b = INT_MAX;
    int val = a + b;
    Can result in inconsistent/surprising
    behavior if UB is “optimized away“
    UB

    View Slide

  23. Integer Overflow
    15
    void pause() {
    int a = 0;
    // run until overflow
    while (a < a + 1) {
    a++;
    }
    }

    View Slide

  24. Integer Overflow
    15
    void pause() {
    int a = 0;
    // run until overflow
    while (a < a + 1) {
    a++;
    }
    }
    What’s the compilation output of Clang/GCC?
    1. The function works as expected by the
    programmer
    2. The function body is optimized away
    3. The function results in an endless loop
    4. It depends on the optimization level

    View Slide

  25. Integer Overflow
    16
    void pause() {
    int a = 0;
    // run until overflow
    while (a < a + 1) {
    a++;
    }
    }

    View Slide

  26. Integer Overflow
    16
    void pause() {
    int a = 0;
    // run until overflow
    while (a < a + 1) {
    a++;
    }
    }
    mov dword ptr [rsp - 4], 0
    jmp loop_header
    loop_body:
    add dword ptr [rsp - 4], 1
    loop_header:
    mov eax, dword ptr [rsp - 4]
    mov ecx, dword ptr [rsp - 4]
    add ecx, 1
    cmp eax, ecx
    jl loop_body ret
    -O0

    View Slide

  27. Integer Overflow
    16
    void pause() {
    int a = 0;
    // run until overflow
    while (a < a + 1) {
    a++;
    }
    }
    loop:
    jmp loop
    mov dword ptr [rsp - 4], 0
    jmp loop_header
    loop_body:
    add dword ptr [rsp - 4], 1
    loop_header:
    mov eax, dword ptr [rsp - 4]
    mov ecx, dword ptr [rsp - 4]
    add ecx, 1
    cmp eax, ecx
    jl loop_body ret
    -O3
    -O0

    View Slide

  28. Ticking Timebombs
    17

    View Slide

  29. Ticking Timebombs
    17
    A future compiler compiler optimization
    might exploit additional UB

    View Slide

  30. Ticking Timebombs
    17
    https://blog.regehr.org/
    A future compiler compiler optimization
    might exploit additional UB

    View Slide

  31. Goal of my PhD
    18
    Tackle UB by
    safely and efficiently executing
    unsafe languages on the JVM

    View Slide

  32. Goal of my PhD
    19
    Tackle UB by
    safely and efficiently executing
    unsafe languages on the JVM

    View Slide

  33. Goal of my PhD
    19
    Tackle UB by
    safely and efficiently executing
    unsafe languages on the JVM
    Well-defined semantics even for errors
    and corner cases

    View Slide

  34. Idea
    20

    View Slide

  35. 21
    Lenient C
    Safe Sulong and its Bug-finding Mode
    Introspection

    View Slide

  36. 22
    Lenient C
    Safe Sulong and its
    Bug-finding Mode
    Automatic approaches
    Introspection

    View Slide

  37. 23
    Lenient C
    Safe Sulong and its
    Bug-finding Mode
    Introspection
    Terminate the program Continue execution

    View Slide

  38. 24
    Lenient C
    Safe Sulong and its
    Bug-finding Mode
    Introspection Manual approach

    View Slide

  39. 25
    Safe Sulong and its Bug-finding Mode

    View Slide

  40. Existing Approaches
    26
    Instrumentation-
    based bug-finding
    tools
    Symbolic
    execution
    Safe
    languages
    Hardware
    security
    Static
    analysis
    Attacker
    mitigation

    View Slide

  41. Existing Approaches
    27
    Instrumentation-
    based bug-finding
    tools
    Symbolic
    execution
    Safe
    languages
    Hardware
    security
    Static
    analysis
    Attacker
    mitigation

    View Slide

  42. Existing Approaches
    27
    Instrumentation-
    based bug-finding
    tools
    Symbolic
    execution
    Safe
    languages
    Hardware
    security
    Static
    analysis
    Attacker
    mitigation
    • LLVM’s AddressSanitizer
    (Serebryany et al. 2012)
    • Memcheck (Nethercote et al. 2007)
    • SoftBound+CETS
    (Nagarakatte et al. 2009, 2010)
    • Dr. Memory (Bruening et al. 2011)

    View Slide

  43. State of the Art: Instrumentation-based Tools
    28
    a.out
    Clang/GCC
    C
    ./a.out
    Hello world!

    View Slide

  44. State of the Art: Instrumentation-based Tools
    Compile-time instrumentation
    • AddressSanitizer
    • SoftBound+CETS
    28
    a.out
    Clang/GCC
    C
    ./a.out
    Hello world!

    View Slide

  45. State of the Art: Instrumentation-based Tools
    Compile-time instrumentation
    • AddressSanitizer
    • SoftBound+CETS
    28
    a.out
    Clang/GCC
    C
    ./a.out
    Hello world!
    Run-time instrumentation
    • Memcheck
    • Dr. Memory

    View Slide

  46. Conundrum: Finding Bugs vs. Performance
    29
    a.out
    Clang/GCC
    C
    ./a.out
    Hello world!

    View Slide

  47. Conundrum: Finding Bugs vs. Performance
    29
    a.out
    Clang/GCC
    C
    ./a.out
    Hello world!
    Static compilers: optimize code based
    on Undefined Behavior
    Bug-finding tools: find bugs assuming
    that violations are visible side effects
    (Wang et al. 2012, D'Silva 2015)

    View Slide

  48. Conundrum: Finding Bugs vs. Performance
    30
    To find all bugs, developers need to
    disable compiler optimizations

    View Slide

  49. Lack of Abstraction
    31
    a.out
    Clang/GCC
    C
    ./a.out
    Hello world!
    Checks omitted/forgotten result in
    overlooked bugs

    View Slide

  50. Map Data Structures and Operations to Java
    32
    long *arr = malloc(3 * sizeof(long));
    arr[4] = …

    View Slide

  51. Map Data Structures and Operations to Java
    32
    long *arr = malloc(3 * sizeof(long));
    arr[4] = …
    Map to Java Code

    View Slide

  52. Map Data Structures and Operations to Java
    32
    long[] arr = new long[3];
    arr[4] = …
    long *arr = malloc(3 * sizeof(long));
    arr[4] = …
    Map to Java Code

    View Slide

  53. Map Data Structures and Operations to Java
    32
    long[] arr = new long[3];
    arr[4] = …
    long *arr = malloc(3 * sizeof(long));
    arr[4] = …
    Map to Java Code
    The semantics of an out-of-
    bounds access are well specified

    View Slide

  54. Map Data Structures and Operations to Java
    32
    long[] arr = new long[3];
    arr[4] = …
    long *arr = malloc(3 * sizeof(long));
    arr[4] = …
    Map to Java Code
    ArrayIndexOutOfBoundsException
    The semantics of an out-of-
    bounds access are well specified

    View Slide

  55. Map Data Structures and Operations to Java
    32
    long[] arr = new long[3];
    arr[4] = …
    long *arr = malloc(3 * sizeof(long));
    arr[4] = …
    Map to Java Code
    ArrayIndexOutOfBoundsException
    The semantics of an out-of-
    bounds access are well specified
    Automatic bounds checks that
    cannot be optimized away

    View Slide

  56. 33
    (Rigger et al. 2018)

    View Slide

  57. 34
    速 龙

    View Slide

  58. 34
    速 龙
    Fast/rapid

    View Slide

  59. 34
    速 龙
    Dragon
    Fast/rapid

    View Slide

  60. 34
    速 龙
    Velocisaurus
    Dragon
    Fast/rapid

    View Slide

  61. Execution of LLVM IR
    35
    Safe Execution
    Platform
    LLVM IR
    Clang
    C C++
    GCC
    Fortran
    Other
    LLVM
    frontend
    ...
    [Languages other than C?]

    View Slide

  62. Execution of LLVM IR
    35
    Safe Execution
    Platform
    LLVM IR
    Clang
    C C++
    GCC
    Fortran
    Other
    LLVM
    frontend
    ...
    (Lattner et al. 2004)
    [Languages other than C?]

    View Slide

  63. Execution of LLVM IR
    35
    Safe Execution
    Platform
    LLVM IR
    Clang
    C C++
    GCC
    Fortran
    Other
    LLVM
    frontend
    ...
    (Lattner et al. 2004)
    We disable compiler
    optimizations of the front ends
    [Languages other than C?]

    View Slide

  64. Execution of LLVM IR
    35
    Safe Execution
    Platform
    LLVM IR
    Clang
    C C++
    GCC
    Fortran
    Other
    LLVM
    frontend
    ...
    (Lattner et al. 2004)
    We disable compiler
    optimizations of the front ends
    [Languages other than C?]

    View Slide

  65. Execution of LLVM IR
    35
    Safe Execution
    Platform
    LLVM IR
    Clang
    C C++
    GCC
    Fortran
    Other
    LLVM
    frontend
    ...
    (Lattner et al. 2004)
    Targeting LLVM IR allows executing
    multiple unsafe languages
    [Languages other than C?]

    View Slide

  66. Execution of LLVM IR
    35
    Safe Execution
    Platform
    LLVM IR
    Clang
    C C++
    GCC
    Fortran
    Other
    LLVM
    frontend
    ...
    (Lattner et al. 2004)
    Targeting LLVM IR allows executing
    multiple unsafe languages
    [Languages other than C?]

    View Slide

  67. Execution of LLVM IR
    36
    LLVM IR Interpreter
    Truffle
    LLVM IR
    Graal
    JVM
    [How does the compilation work?]
    [Array bounds check elimination]
    [Optimizations Overview]
    [Completenesss vs. Soundness]
    [Languages other than C?]

    View Slide

  68. Execution of LLVM IR
    36
    LLVM IR Interpreter
    Truffle
    LLVM IR
    Graal
    JVM
    [How does the compilation work?]
    [Array bounds check elimination]
    [Optimizations Overview]
    [Completenesss vs. Soundness]
    [Languages other than C?]

    View Slide

  69. Execution of LLVM IR
    36
    LLVM IR Interpreter
    Truffle
    LLVM IR
    Graal
    JVM
    (Würthinger et al. 2012, 2017)
    [How does the compilation work?]
    [Array bounds check elimination]
    [Optimizations Overview]
    [Completenesss vs. Soundness]
    [Languages other than C?]

    View Slide

  70. Execution of LLVM IR
    36
    LLVM IR Interpreter
    Truffle
    LLVM IR
    Graal
    JVM
    (Würthinger et al. 2012, 2017)
    Using Truffle and Graal, we can
    minimize the instrumentation
    overhead
    [How does the compilation work?]
    [Array bounds check elimination]
    [Optimizations Overview]
    [Completenesss vs. Soundness]
    [Languages other than C?]

    View Slide

  71. Execution of LLVM IR
    36
    LLVM IR Interpreter
    Truffle
    LLVM IR
    Graal
    JVM
    (Würthinger et al. 2012, 2017)
    [How does the compilation work?]
    [Array bounds check elimination]
    [Optimizations Overview]
    [Completenesss vs. Soundness]
    [Languages other than C?]

    View Slide

  72. Execution of LLVM IR
    36
    LLVM IR Interpreter
    Truffle
    LLVM IR
    Graal
    JVM
    (Würthinger et al. 2012, 2017)
    [How does the compilation work?]
    [Array bounds check elimination]
    [Optimizations Overview]
    Safe Sulong can rely on the
    underlying JVM
    • Automatic checks
    • Safe optimizations
    • Abstraction from the
    underlying machine and OS
    [Completenesss vs. Soundness]
    [Languages other than C?]

    View Slide

  73. {0, 0, 0}
    Address
    offset = 0
    data I64Array
    contents
    Prevent Out-Of-Bounds Accesses
    37
    long *arr = malloc(3 * sizeof(long));
    [How do we know the type?]
    [What other errors can Safe Sulong detect?]
    [Pointer to an integer?]
    [Array bounds check elimination]
    [Strict-aliasing rule]

    View Slide

  74. Prevent Out-Of-Bounds Accesses
    38
    long *arr = malloc(3 * sizeof(long));
    arr[4] = …
    {0, 0, 0}
    Address
    offset = 4
    data I64Array
    contents
    [What other errors can Safe Sulong detect?]
    [Pointer to an integer?]
    [Array bounds check elimination]
    [Strict-aliasing rule]

    View Slide

  75. Prevent Out-Of-Bounds Accesses
    contents[4] → ArrayIndexOutOfBoundsException
    38
    long *arr = malloc(3 * sizeof(long));
    arr[4] = …
    {0, 0, 0}
    Address
    offset = 4
    data I64Array
    contents
    [What other errors can Safe Sulong detect?]
    [Pointer to an integer?]
    [Array bounds check elimination]
    [Strict-aliasing rule]

    View Slide

  76. Prevent Use-after-Free Errors
    39
    long *arr = malloc(3 * sizeof(long));
    free(arr);
    {0, 0, 0}
    Address
    offset = 0
    data I64Array
    contents
    [What other errors can Safe Sulong detect?]
    [Pointer to an integer?]
    [Strict-aliasing rule]

    View Slide

  77. Prevent Use-after-Free Errors
    40
    long *arr = malloc(3 * sizeof(long));
    free(arr);
    Address
    offset = 0
    data I64Array
    contents=null
    [What other errors can Safe Sulong detect?]
    [Pointer to an integer?]
    [Strict-aliasing rule]

    View Slide

  78. Prevent Use-after-Free Errors
    41
    long *arr = malloc(3 * sizeof(long));
    free(arr);
    arr[0] = …
    Address
    offset = 0
    data I64Array
    contents=null
    [What other errors can Safe Sulong detect?]
    [Pointer to an integer?]
    [Strict-aliasing rule]

    View Slide

  79. Prevent Use-after-Free Errors
    contents[0] → NullPointerException
    42
    long *arr = malloc(3 * sizeof(long));
    free(arr);
    arr[0] = …
    Address
    offset = 0
    data I64Array
    contents=null
    [What other errors can Safe Sulong detect?]
    [Pointer to an integer?]
    [Strict-aliasing rule]

    View Slide

  80. Prevent Integer Overflows
    43
    int a = 1, b = INT_MAX;
    int val = a + b;
    Math.addExact(a, b);
    [What other errors can Safe Sulong detect?]
    [Pointer to an integer?]

    View Slide

  81. Prevent Integer Overflows
    43
    int a = 1, b = INT_MAX;
    int val = a + b;
    Math.addExact(a, b);
    ArithmeticException
    [What other errors can Safe Sulong detect?]
    [Pointer to an integer?]

    View Slide

  82. Safe Optimizations
    44
    ArrayIndexOutOfBoundsException
    NullPointerException
    ArithmeticException
    Exceptions are visible side effects
    and cannot be optimized away

    View Slide

  83. Example Program
    45
    void processRequests () {
    int i = 0;
    do {
    processPacket ();
    i ++;
    } while (i < 10000) ;
    }

    View Slide

  84. Example Program
    45
    void processRequests () {
    int i = 0;
    do {
    processPacket ();
    i ++;
    } while (i < 10000) ;
    }
    define void @processRequests () #0 {
    ; ( basic block 0)
    br label %1
    ; :1 ( basic block 1)
    %i .0 = phi i32 [ 0, %0 ], [ %2 , %1 ]
    call void @processPacket ()
    %2 = add nsw i32 %i .0, 1
    %3 = icmp slt i32 %2 , 10000
    br i1 %3 , label %1 , label %4
    ; :4 ( basic block 2)
    ret void
    }
    LLVM IR
    Clang

    View Slide

  85. 46
    define void @processRequests () #0 {
    ; ( basic block 0)
    br label %1
    ; :1 ( basic block 1)
    %i .0 = phi i32 [ 0, %0 ], [ %2 , %1 ]
    call void @processPacket ()
    %2 = add nsw i32 %i .0, 1
    %3 = icmp slt i32 %2 , 10000
    br i1 %3 , label %1 , label %4
    ; :4 ( basic block 2)
    ret void
    }
    LLVM IR
    Implementation of Operations

    View Slide

  86. 46
    define void @processRequests () #0 {
    ; ( basic block 0)
    br label %1
    ; :1 ( basic block 1)
    %i .0 = phi i32 [ 0, %0 ], [ %2 , %1 ]
    call void @processPacket ()
    %2 = add nsw i32 %i .0, 1
    %3 = icmp slt i32 %2 , 10000
    br i1 %3 , label %1 , label %4
    ; :4 ( basic block 2)
    ret void
    }
    LLVM IR
    write
    %2
    add
    read
    %i.0
    1
    Executable Abstract Syntax Tree
    Implementation of Operations

    View Slide

  87. 47
    write
    %2
    add
    read
    %i.0
    1
    Abstract Syntax Tree
    Implementation of Operations

    View Slide

  88. 47
    write
    %2
    add
    read
    %i.0
    1
    Abstract Syntax Tree
    class LLVMI32LiteralNode extends LLVMExpressionNode {
    final int literal;
    public LLVMI32LiteralNode(int literal) {
    this.literal = literal;
    }
    @Override
    public int executeI32(VirtualFrame frame) {
    return literal;
    }
    }
    Executable AST node
    Implementation of Operations

    View Slide

  89. 47
    write
    %2
    add
    read
    %i.0
    1
    Abstract Syntax Tree
    class LLVMI32LiteralNode extends LLVMExpressionNode {
    final int literal;
    public LLVMI32LiteralNode(int literal) {
    this.literal = literal;
    }
    @Override
    public int executeI32(VirtualFrame frame) {
    return literal;
    }
    }
    Executable AST node
    Implementation of Operations

    View Slide

  90. 47
    write
    %2
    add
    read
    %i.0
    1
    Abstract Syntax Tree
    class LLVMI32LiteralNode extends LLVMExpressionNode {
    final int literal;
    public LLVMI32LiteralNode(int literal) {
    this.literal = literal;
    }
    @Override
    public int executeI32(VirtualFrame frame) {
    return literal;
    }
    }
    Executable AST node
    Nodes return their result
    in an execute() method
    Implementation of Operations
    (Würthinger et al. 2012)

    View Slide

  91. 48
    Abstract Syntax Tree
    write
    %2
    add
    read
    %i.0
    1
    Implementation of Operations

    View Slide

  92. 48
    Abstract Syntax Tree
    @NodeChildren({@NodeChild("leftNode"),
    @NodeChild("rightNode")})
    class LLVMI32AddNode extends LLVMExpressionNode {
    @Specialization
    protected int executeI32(int left, int right) {
    return left + right;
    }
    }
    Executable AST node
    write
    %2
    add
    read
    %i.0
    1
    Implementation of Operations

    View Slide

  93. 48
    Abstract Syntax Tree
    @NodeChildren({@NodeChild("leftNode"),
    @NodeChild("rightNode")})
    class LLVMI32AddNode extends LLVMExpressionNode {
    @Specialization
    protected int executeI32(int left, int right) {
    return left + right;
    }
    }
    Executable AST node
    write
    %2
    add
    read
    %i.0
    1
    Implementation of Operations

    View Slide

  94. 48
    Abstract Syntax Tree
    @NodeChildren({@NodeChild("leftNode"),
    @NodeChild("rightNode")})
    class LLVMI32AddNode extends LLVMExpressionNode {
    @Specialization
    protected int executeI32(int left, int right) {
    return left + right;
    }
    }
    Executable AST node
    write
    %2
    add
    read
    %i.0
    1
    A DSL allows a declarative style of
    specifying and executing nodes
    Implementation of Operations
    (Humer et al. 2015)

    View Slide

  95. 49
    Abstract Syntax Tree
    write
    %2
    add
    read
    %i.0
    1
    Implementation of Operations

    View Slide

  96. 49
    Abstract Syntax Tree
    @NodeChild("valueNode")
    class LLVMWriteI32Node extends LLVMExpressionNode {
    final FrameSlot slot;
    public LLVMWriteI32Node(FrameSlot slot) {
    this.slot = slot;
    }
    @Specialization
    public void writeI32(VirtualFrame frame, int value) {
    frame.setInt(slot, value);
    }
    }
    Executable AST node
    write
    %2
    add
    read
    %i.0
    1
    Implementation of Operations

    View Slide

  97. 49
    Abstract Syntax Tree
    @NodeChild("valueNode")
    class LLVMWriteI32Node extends LLVMExpressionNode {
    final FrameSlot slot;
    public LLVMWriteI32Node(FrameSlot slot) {
    this.slot = slot;
    }
    @Specialization
    public void writeI32(VirtualFrame frame, int value) {
    frame.setInt(slot, value);
    }
    }
    Executable AST node
    write
    %2
    add
    read
    %i.0
    1
    Implementation of Operations

    View Slide

  98. 49
    Abstract Syntax Tree
    @NodeChild("valueNode")
    class LLVMWriteI32Node extends LLVMExpressionNode {
    final FrameSlot slot;
    public LLVMWriteI32Node(FrameSlot slot) {
    this.slot = slot;
    }
    @Specialization
    public void writeI32(VirtualFrame frame, int value) {
    frame.setInt(slot, value);
    }
    }
    Executable AST node
    write
    %2
    add
    read
    %i.0
    1
    Local variables are represented by
    an array-like VirtualFrame object
    Implementation of Operations

    View Slide

  99. 50
    define void @processRequests () #0 {
    ; ( basic block 0)
    br label %1
    ; :1 ( basic block 1)
    %i = phi i32 [ 0, %0 ], [ %2 , %1 ]
    call void @processPacket ()
    %2 = add nsw i32 %i, 1
    %3 = icmp slt i32 %2 , 10000
    br i1 %3 , label %1 , label %4
    ; :4 ( basic block 2)
    ret void
    }
    LLVM IR
    Implementation of Basic Blocks

    View Slide

  100. 50
    define void @processRequests () #0 {
    ; ( basic block 0)
    br label %1
    ; :1 ( basic block 1)
    %i = phi i32 [ 0, %0 ], [ %2 , %1 ]
    call void @processPacket ()
    %2 = add nsw i32 %i, 1
    %3 = icmp slt i32 %2 , 10000
    br i1 %3 , label %1 , label %4
    ; :4 ( basic block 2)
    ret void
    }
    LLVM IR Executable Abstract Syntax Tree
    Implementation of Basic Blocks
    Block1

    View Slide

  101. Example Program
    51
    define void @processRequests () #0 {
    ; ( basic block 0)
    br label %1
    ; :1 ( basic block 1)
    %i .0 = phi i32 [ 0, %0 ], [ %2 , %1 ]
    call void @processPacket ()
    %2 = add nsw i32 %i .0, 1
    %3 = icmp slt i32 %2 , 10000
    br i1 %3 , label %1 , label %4
    ; :4 ( basic block 2)
    ret void
    }
    LLVM IR

    View Slide

  102. Example Program
    51
    define void @processRequests () #0 {
    ; ( basic block 0)
    br label %1
    ; :1 ( basic block 1)
    %i .0 = phi i32 [ 0, %0 ], [ %2 , %1 ]
    call void @processPacket ()
    %2 = add nsw i32 %i .0, 1
    %3 = icmp slt i32 %2 , 10000
    br i1 %3 , label %1 , label %4
    ; :4 ( basic block 2)
    ret void
    }
    LLVM IR
    An AST interpreter cannot
    represent goto statements

    View Slide

  103. Block0
    Block1
    Block2
    Basic Block Dispatch Node
    1 2 -1
    1
    Interpreter
    52
    int blockIndex = 0;
    while (blockIndex != -1)
    blockIndex = blocks[blockIndex].execute ();
    Interpreter implementation

    View Slide

  104. Block0
    Block1
    Block2
    Basic Block Dispatch Node
    1 2 -1
    1
    Interpreter
    53
    define void @processRequests () #0 {
    ; ( basic block 0)
    br label %1
    ; :1 ( basic block 1)
    %i .0 = phi i32 [ 0, %0 ], [ %2 , %1 ]
    call void @processPacket ()
    %2 = add nsw i32 %i .0, 1
    %3 = icmp slt i32 %2 , 10000
    br i1 %3 , label %1 , label %4
    ; :4 ( basic block 2)
    ret void
    }
    Program execution

    View Slide

  105. Block0
    Block1
    Block2
    Basic Block Dispatch Node
    1 2 -1
    1
    Interpreter
    54
    define void @processRequests () #0 {
    ; ( basic block 0)
    br label %1
    ; :1 ( basic block 1)
    %i .0 = phi i32 [ 0, %0 ], [ %2 , %1 ]
    call void @processPacket ()
    %2 = add nsw i32 %i .0, 1
    %3 = icmp slt i32 %2 , 10000
    br i1 %3 , label %1 , label %4
    ; :4 ( basic block 2)
    ret void
    }
    Program execution

    View Slide

  106. Block0
    Block1
    Block2
    Basic Block Dispatch Node
    1 2 -1
    1
    Interpreter
    55
    define void @processRequests () #0 {
    ; ( basic block 0)
    br label %1
    ; :1 ( basic block 1)
    %i .0 = phi i32 [ 0, %0 ], [ %2 , %1 ]
    call void @processPacket ()
    %2 = add nsw i32 %i .0, 1
    %3 = icmp slt i32 %2 , 10000
    br i1 %3 , label %1 , label %4
    ; :4 ( basic block 2)
    ret void
    }
    Program execution
    (Rigger et al. 2016 VMIL)

    View Slide

  107. Block0
    Block1
    Block2
    Basic Block Dispatch Node
    1 2 -1
    1
    Compiler
    56
    int blockIndex = 0;
    block0:
    blockIndex = 1
    %i.0 = 0
    block1:
    while (true):
    processPacket()
    %2 = %i.0 + 1
    %3 = %2 < 10000
    if %3:
    blockIndex = 1
    %i.0 = %2
    continue;
    else:
    blockIndex = 2
    block2:
    blockIndex = -1
    return
    Partially evaluated interpreter (pseudo code)
    Graal

    View Slide

  108. Block0
    Block1
    Block2
    Basic Block Dispatch Node
    1 2 -1
    1
    Compiler
    56
    int blockIndex = 0;
    block0:
    blockIndex = 1
    %i.0 = 0
    block1:
    while (true):
    processPacket()
    %2 = %i.0 + 1
    %3 = %2 < 10000
    if %3:
    blockIndex = 1
    %i.0 = %2
    continue;
    else:
    blockIndex = 2
    block2:
    blockIndex = -1
    return
    Partially evaluated interpreter (pseudo code)
    Graal further optimizes the
    partially evaluated interpreter
    Graal

    View Slide

  109. Evaluation Hypotheses
    • Effectiveness: Safe Sulong detects bugs that are overlooked by other
    tools
    • Performance: Safe Sulong’s performance overhead is “reasonable”
    57

    View Slide

  110. Effectiveness: Errors in GitHub Projects
    58
    http://ssw.jku.at/General/Staff/ManuelRigger/ASPLOS18-SafeSulong-Bugs.csv
    68 errors in (small) open-source projects

    View Slide

  111. Effectiveness: Errors in GitHub Projects
    • Valgrind detected half of the errors
    • 8 errors not found by LLVM’s AddressSanitizer (and Valgrind)
    • Compiler optimizations (ASan –O3) prevented the detection of 4
    additional bugs
    59
    [What are the other errors?]
    [Completenesss vs. Soundness]
    [Comparison tools]

    View Slide

  112. Effectiveness: Errors in GitHub Projects
    60
    int main(int argc, char** argv) {
    printf("%d %s\n", argc, argv[5]);
    }
    Out-of-bounds accesses to argv
    are not instrumented by ASan
    [What are the other errors?]
    [Comparison tools]

    View Slide

  113. Effectiveness: Errors in GitHub Projects
    61
    https://github.com/google/sanitizers/issues/762

    View Slide

  114. Effectiveness: Errors in GitHub Projects
    • 8 errors not found by LLVM’s AddressSanitizer and Valgrind
    62
    int main(int argc, char** argv) {
    printf("%d %s\n", argc, argv[5]);
    }
    In Safe Sulong instrumentation
    cannot be omitted by design
    [What are the other errors?]
    [Completenesss vs. Soundness]
    [Comparison tools]

    View Slide

  115. Peak Performance
    63
    lower is better

    View Slide

  116. Peak Performance
    63
    lower is better
    Safe Sulong‘s performance is mostly between
    Clang –O0 and Clang –O3, and mostly faster
    than ASan –O0

    View Slide

  117. Warmup Performance
    64
    0
    10
    20
    30
    40
    50
    60
    1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37
    Iterations per second
    Second
    Meteor benchmark
    ASan (Clang O0) Safe Sulong Valgrind

    View Slide

  118. Symbolic
    execution
    Hardware
    security
    Static
    analysis
    Attacker
    mitigation
    Existing Approaches
    65
    Instrumentation-
    based bug-finding
    tools
    Safe
    languages
    Safe Sulong improves upon aspects
    of existing bug-finding tools
    • Safe optimizations
    • Abstraction from the native
    execution model

    View Slide

  119. Symbolic
    execution
    Hardware
    security
    Static
    analysis
    Attacker
    mitigation
    Existing Approaches
    66
    Instrumentation-
    based bug-finding
    tools
    Safe
    languages
    Safe Sulong leverages a safe
    implementation language for
    its bug-finding capabilities

    View Slide

  120. Limitations/Selected Threats to Validity
    • Lack of support for binary libraries
    • Generalizability of the benchmark results
    • Relied on a custom libc for evaluation
    • Lacks common low-level features
    67

    View Slide

  121. 68
    Lenient C

    View Slide

  122. Defined Behavior in C
    69
    C11
    Implementing the semantics
    described in the standard is (often)
    relatively straightforward
    int arr[3];
    int result = &arr[0] < &arr[2];

    View Slide

  123. Relational Comparison of Pointers
    70
    a: Address
    offset
    pointee
    b: Address
    offset
    pointee
    <
    integer_rep(a) < integer_rep(b)

    View Slide

  124. Integer Representation: Safe Sulong
    71
    integer_rep(a) = a.offset
    int arr[3];
    int result = &arr[0] < &arr[2];
    Can anyone see where our
    implementation could break programs?
    {0, 0, 0}
    Address
    offset = 2
    data I64Array
    contents

    View Slide

  125. 72
    Response % of Respondants
    Yes 33%
    Yes, but it shouldn’t 12%
    No, but there might well be 29%
    No, that would be crazy 16%
    Don’t know 8%
    [Do you know code that uses] relational comparison (with <, >, <=, or >=)
    of two pointers to separately allocated objects
    (of compatible object types)?
    (Memarian et al. 2016)
    Code Relies on Undefined Behavior

    View Slide

  126. Problem
    73
    Programmers often rely on
    Undefined Behavior being defined
    C11

    View Slide

  127. Problem
    73
    Programmers often rely on
    Undefined Behavior being defined
    C11

    View Slide

  128. Idea
    74
    Goal: Continue execution in the
    presence of UB and make common
    otherwise undefined patterns work

    View Slide

  129. Lenient C
    Programs
    written in
    C
    Valid
    Lenient C
    Programs
    Valid C
    Programs
    75

    View Slide

  130. {0, 0, 0}
    Address
    offset = 2
    data I64Array
    contents
    Integer Representation: Lenient C
    76
    integer_rep(a) =
    (long) System.identityHashCode(a.pointee) << 32 | offset;

    View Slide

  131. {0, 0, 0}
    Address
    offset = 2
    data I64Array
    contents
    Integer Representation: Lenient C
    76
    Breaks antisymmetry as different objects
    might have the same hash code 
    integer_rep(a) =
    (long) System.identityHashCode(a.pointee) << 32 | offset;

    View Slide

  132. {0, 0, 0}
    Address
    address
    data I64Array
    contents
    offset = 2
    address
    Integer Representation: Lenient C
    77
    integer_rep(a) = a.pointee.address

    View Slide

  133. {0, 0, 0}
    Address
    address
    data I64Array
    contents
    offset = 2
    address
    Integer Representation: Lenient C
    77
    integer_rep(a) = a.pointee.address
    Need to assign
    distinct addresses ☺

    View Slide

  134. Address
    offset = 0
    data I64Array
    contents {0, 0, 0}
    Mitigate Use-after-Free Errors
    78
    long *arr = malloc(3 * sizeof(long));
    free(arr);

    View Slide

  135. Address
    offset = 0
    data I64Array
    contents {0, 0, 0}
    Mitigate Use-after-Free Errors
    79
    long *arr = malloc(3 * sizeof(long));
    free(arr);
    arr[0] = …

    View Slide

  136. Address
    offset = 0
    data I64Array
    contents {0, 0, 0}
    Mitigate Use-after-Free Errors
    contents[0] = …
    79
    long *arr = malloc(3 * sizeof(long));
    free(arr);
    arr[0] = …

    View Slide

  137. Address
    offset = 0
    data I64Array
    contents {0, 0, 0}
    Mitigate Use-after-Free Errors
    contents[0] = …
    79
    long *arr = malloc(3 * sizeof(long));
    free(arr);
    arr[0] = …
    The GC will collect the object
    when it is no longer referenced

    View Slide

  138. Mitigate Integer Overflows
    80
    int a = 1, b = INT_MAX;
    int val = a + b;
    a + b

    View Slide

  139. Mitigate Integer Overflows
    80
    int a = 1, b = INT_MAX;
    int val = a + b;
    a + b
    INT_MIN

    View Slide

  140. Existing Approaches
    81
    Instrumentation-
    based bug-finding
    tools
    Symbolic
    execution
    Hardware
    security
    Static
    analysis
    Safe
    languages
    Attacker
    mitigation
    Lenient C assigns semantics
    to otherwise undefined
    behavior (cf. Friendly C)

    View Slide

  141. Existing Approaches
    82
    Instrumentation-
    based bug-finding
    tools
    Symbolic
    execution
    Hardware
    security
    Static
    analysis
    Safe
    languages
    Attacker
    mitigation
    Increases robustness of
    programs without
    terminating execution

    View Slide

  142. 83
    Introspection

    View Slide

  143. Idea
    84
    int *arr = malloc(sizeof (int) * 10);

    arr[4] = … ;

    View Slide

  144. Idea
    84
    int *arr = malloc(sizeof (int) * 10);

    arr[4] = … ;

    View Slide

  145. Idea
    84
    Records metadata
    int *arr = malloc(sizeof (int) * 10);

    arr[4] = … ;
    arr.size = 40

    View Slide

  146. Idea
    84
    Records metadata
    int *arr = malloc(sizeof (int) * 10);

    arr[4] = … ;
    arr.size = 40
    Checks accesses

    View Slide

  147. Idea
    84
    Records metadata
    int *arr = malloc(sizeof (int) * 10);

    arr[4] = … ;
    arr.size = 40
    Checks accesses
    int size = size_right(str);

    View Slide

  148. Idea
    84
    Records metadata
    int *arr = malloc(sizeof (int) * 10);

    arr[4] = … ;
    arr.size = 40
    Checks accesses
    Query meta data
    From the tool
    int size = size_right(str);

    View Slide

  149. Introspection Functions
    85
    int *arr = malloc(sizeof (int) * 10) ;
    int *ptr = &(arr[4]);
    printf ("%ld\n", size_right(ptr)); // prints 24
    _size_right()
    sizeof(int) * 10

    View Slide

  150. Introspection Functions
    85
    int *arr = malloc(sizeof (int) * 10) ;
    int *ptr = &(arr[4]);
    printf ("%ld\n", size_right(ptr)); // prints 24
    _size_right()
    sizeof(int) * 10
    We also designed
    introspection functions for
    other meta data

    View Slide

  151. Example: strlen()
    86
    size_t strlen(const char *str) {
    size_t len = 0;
    while (*str != '\0') {
    len++;
    str++;
    }
    return len;
    }

    View Slide

  152. Example: strlen()
    86
    size_t strlen(const char *str) {
    size_t len = 0;
    while (*str != '\0') {
    len++;
    str++;
    }
    return len;
    }
    P r o g r a m m i n g \0
    ... ...

    View Slide

  153. Example: strlen()
    86
    size_t strlen(const char *str) {
    size_t len = 0;
    while (*str != '\0') {
    len++;
    str++;
    }
    return len;
    }
    P r o g r a m m i n g \0
    ... ...

    View Slide

  154. Example: strlen()
    86
    size_t strlen(const char *str) {
    size_t len = 0;
    while (*str != '\0') {
    len++;
    str++;
    }
    return len;
    }
    11
    P r o g r a m m i n g \0
    ... ...

    View Slide

  155. Example: strlen()
    87
    size_t strlen(const char *str) {
    size_t len = 0;
    while (*str != '\0') {
    len++;
    str++;
    }
    return len;
    }
    P r o g r a m m i n g
    ... ...

    View Slide

  156. Example: strlen()
    87
    size_t strlen(const char *str) {
    size_t len = 0;
    while (*str != '\0') {
    len++;
    str++;
    }
    return len;
    }
    P r o g r a m m i n g
    ... ...

    View Slide

  157. Example: strlen()
    87
    size_t strlen(const char *str) {
    size_t len = 0;
    while (*str != '\0') {
    len++;
    str++;
    }
    return len;
    }
    P r o g r a m m i n g
    ... ...
    ==16497==ERROR: AddressSanitizer: stack-buffer-
    overflow on address 0x7ffc59c0ef63
    READ of size 1 at 0x7ffc59c0ef63 thread T0
    #0 0x4e7442 in strlen /home/manuel/test.c:10:12
    #1 0x4e7392 in main /home/manuel/test.c:5:5

    View Slide

  158. Mitigate Errors
    88
    What about systems with high-
    availability requirements?

    View Slide

  159. Idea
    89
    Goal: Allow programmers to manually
    implement a failure-oblivious computation logic

    View Slide

  160. size_t strlen(const char *str) {
    size_t len = 0;
    while ( size_right(str) > 0 && *str != '\0') {
    len++;
    str++;
    }
    return len;
    }
    Example: strlen()
    90
    P r o g r a m m i n g
    ... ...

    View Slide

  161. size_t strlen(const char *str) {
    size_t len = 0;
    while ( size_right(str) > 0 && *str != '\0') {
    len++;
    str++;
    }
    return len;
    }
    Example: strlen()
    90
    P r o g r a m m i n g
    ... ...

    View Slide

  162. size_t strlen(const char *str) {
    size_t len = 0;
    while ( size_right(str) > 0 && *str != '\0') {
    len++;
    str++;
    }
    return len;
    }
    Example: strlen()
    90
    11
    P r o g r a m m i n g
    ... ...

    View Slide

  163. size_t strlen(const char *str) {
    size_t len = 0;
    while ( size_right(str) > 0 && *str != '\0') {
    len++;
    str++;
    }
    return len;
    }
    Example: strlen()
    90
    11
    P r o g r a m m i n g
    ... ...
    We enhanced a libc to deal
    with unterminated strings

    View Slide

  164. Implementation in Tools
    LLVM’s
    AddressSanitizer
    SoftBound
    Intel MPX’s
    based bounds
    instrumentation
    Safe Sulong
    91

    View Slide

  165. Evaluation: Effectiveness
    92
    Dnsmasq
    CVE-2017-14493
    CVE-2017-14496
    CVE-2017-9047
    Libxml2
    CVE-2017-16352
    LightFTP
    CVE-2017-1000218

    View Slide

  166. Evaluation: Effectiveness
    92
    Dnsmasq
    CVE-2017-14493
    CVE-2017-14496
    CVE-2017-9047
    Libxml2
    CVE-2017-16352
    LightFTP
    CVE-2017-1000218
    Execution # CVEs
    Could continue 4
    Terminated 1

    View Slide

  167. CVE-2017-9047 (Libxml2)
    93
    if (content->name != NULL)
    strcat(buf, (char *) content->name);

    View Slide

  168. CVE-2017-9047 (Libxml2)
    93
    if (content->name != NULL)
    strcat(buf, (char *) content->name);
    The parser printed a
    truncated error message,
    similar to the fixed version

    View Slide

  169. Hardware
    security
    Existing Approaches
    94
    Instrumentation-
    based bug-finding
    tools
    Symbolic
    execution
    Safe
    languages
    Static
    analysis
    Attacker
    mitigation

    View Slide

  170. Hardware
    security
    Existing Approaches
    94
    Instrumentation-
    based bug-finding
    tools
    Symbolic
    execution
    Safe
    languages
    Static
    analysis
    Extension of Failure-oblivious
    Computing (Rinard et al. 2004)
    Attacker
    mitigation

    View Slide

  171. 95
    GraalVM

    View Slide

  172. GraalVM
    96

    View Slide

  173. Sulong as Part of GraalVM
    97
    Java Virtual Machine
    Graal Compiler
    Truffle Framework
    https://www.graalvm.org/
    TruffleRuby Graal.js Graal.python FastR
    (Würthinger et al. 2016)

    View Slide

  174. Sulong as Part of GraalVM
    97
    Java Virtual Machine
    Graal Compiler
    Truffle Framework
    https://www.graalvm.org/
    TruffleRuby Graal.js Graal.python FastR
    Optimization Boundary
    (Würthinger et al. 2016)

    View Slide

  175. Sulong as Part of GraalVM
    98
    Java Virtual Machine
    Graal Compiler
    Truffle Framework
    https://www.graalvm.org/
    TruffleRuby Graal.js Graal.python FastR
    Optimization Boundary
    Java Native Interface
    (Würthinger et al. 2016)

    View Slide

  176. Sulong as Part of GraalVM
    99
    Java Virtual Machine
    Graal Compiler
    Truffle Framework
    https://www.graalvm.org/
    TruffleRuby Graal.js Graal.python FastR
    Optimization Boundary
    LLVM IR Interpreter
    LLVM IR
    Clang Flang
    (Würthinger et al. 2016)

    View Slide

  177. Sulong and GraalVM
    100

    View Slide

  178. Sulong and GraalVM
    100

    View Slide

  179. Sulong Key Collaborators
    101
    Jacob
    Kreindl
    Raphael
    Mosaner
    Roland
    Schatz
    Josef
    Eisl
    Christian
    Häubl
    Matthias
    Grimmer
    Thomas
    Pointhuber
    Daniel
    Pekarek
    Chris
    Seaton
    Lukas
    Stadler
    Florian
    Angerer
    David
    Gnedt
    https://github.com/graalvm/sulong/graphs/contributors
    Swapnil
    Gaikwad

    View Slide

  180. Sulong Key Collaborators
    102
    Jacob
    Kreindl
    Raphael
    Mosaner
    Roland
    Schatz
    Josef
    Eisl
    Christian
    Häubl
    Matthias
    Grimmer
    Thomas
    Pointhuber
    Daniel
    Pekarek
    Chris
    Seaton
    Lukas
    Stadler
    Florian
    Angerer
    David
    Gnedt
    Swapnil
    Gaikwad
    EuroLLVM 2019 Talk
    LLVM IR in GraalVM: Multi-Level, Polyglot Debugging with Sulong
    https://github.com/graalvm/sulong/graphs/contributors

    View Slide

  181. Sulong Key Collaborators
    103
    Jacob
    Kreindl
    Raphael
    Mosaner
    Roland
    Schatz
    Josef
    Eisl
    Christian
    Häubl
    Matthias
    Grimmer
    Thomas
    Pointhuber
    Daniel
    Pekarek
    Chris
    Seaton
    Lukas
    Stadler
    Florian
    Angerer
    David
    Gnedt
    Swapnil
    Gaikwad
    EuroLLVM 2019 Talk
    Sulong: An experience report of using
    the "other end" of LLVM in GraalVM.
    https://github.com/graalvm/sulong/graphs/contributors

    View Slide

  182. Summary
    104
    UB is problematic
    Existing approaches can “optimize” UB “away”
    Execute C/C++ on the JVM!
    Automatic checks detect UB
    But: Programs often invoke UB
    Metadata for manual checks
    GraalVM

    View Slide