$30 off During Our Annual Pro Sale. View Details »

ASPLOS'18: Sulong, and Thanks For All the Bugs: Finding Errors in C Programs by Abstracting from the Native Execution Model

ASPLOS'18: Sulong, and Thanks For All the Bugs: Finding Errors in C Programs by Abstracting from the Native Execution Model

Talk at the 23rd ACM International Conference on Architectural Support for Programming Languages and Operating Systems (https://www.asplos2018.org/)

Manuel Rigger

March 27, 2018
Tweet

More Decks by Manuel Rigger

Other Decks in Research

Transcript

  1. Sulong, and Thanks For All the Bugs
    Finding Errors in C Programs by Abstracting from the Native
    Execution Model
    Manuel Rigger1, Roland Schatz2, René Mayrhofer1,
    Matthias Grimmer2, Hanspeter Mössenböck1
    1 Johannes Kepler University Linz, Austria
    2 Oracle Labs, Austria
    ASPLOS ’18, March 27, 2018, Williamsburg, VA, USA

    View Slide

  2. Claim: Unsafe Languages can be Executed
    Safely and Efficiently on the Java Virtual
    Machine
    2

    View Slide

  3. Claim: Unsafe Languages can be Executed
    Safely and Efficiently on the Java Virtual
    Machine
    3

    View Slide

  4. What are Unsafe Languages?
    4
    Unsafe
    Languages
    Do not specify an
    operation for all inputs
    e.g., C
    Safe
    Languages
    Strictly define all
    operations
    e.g., Java
    (Felleisen et al. 1999)

    View Slide

  5. Buffer Overflows
    5
    int *arr = malloc(3 * sizeof(int));
    arr[5] = …

    View Slide

  6. Buffer Overflows
    5
    int *arr = malloc(3 * sizeof(int));
    arr[5] = …
    C
    Undefined
    Behavior

    View Slide

  7. Buffer Overflows
    5
    int *arr = malloc(3 * sizeof(int));
    arr[5] = …
    Java
    C
    Undefined
    Behavior
    ArrayIndexOutOfBoundsException
    int[] arr = new int[3];
    arr[5] = …

    View Slide

  8. Use-after-free Errors
    6
    free(arr);
    arr[0] = …

    View Slide

  9. Use-after-free Errors
    6
    free(arr);
    arr[0] = …
    C
    Undefined
    Behavior

    View Slide

  10. Use-after-free Errors
    6
    free(arr);
    arr[0] = …
    C
    Undefined
    Behavior
    NullPointerException
    Java
    arr = null;
    arr[0] =

    View Slide

  11. Idea
    7
    Map the semantics of a C Program to Java
    to automatically detect memory safety errors

    View Slide

  12. State of the Art
    8
    a.out
    Clang/GCC
    C
    ./a.out
    Hello world!

    View Slide

  13. State of the Art
    Compile-time instrumentation
    • AddressSanitizer (ASan)
    (Serebryany et al. 2012)
    • SoftBound+CETS
    (Nagarakatte et al. 2009, 2010)
    8
    a.out
    Clang/GCC
    C
    ./a.out
    Hello world!

    View Slide

  14. State of the Art
    Compile-time instrumentation
    • AddressSanitizer (ASan)
    (Serebryany et al. 2012)
    • SoftBound+CETS
    (Nagarakatte et al. 2009, 2010)
    8
    a.out
    Clang/GCC
    C
    ./a.out
    Hello world!
    Run-time instrumentation
    • Valgrind
    (Nethercote et al. 2007)
    • Dr. Memory
    (Bruening et al. 2011)

    View Slide

  15. State of the Art
    9
    a.out
    Clang/GCC
    C
    ./a.out
    Hello world!
    Such tools were very helpful in finding bugs in
    widely used code

    View Slide

  16. Can we do better?
    10
    a.out
    Clang/GCC
    C
    ./a.out
    Hello world!
    Static compilers: optimize code based
    on Undefined Behavior
    Bug-finding tools: find bugs assuming
    that violations are visible side effects
    (Wang et al. 2012, D'Silva 2015)

    View Slide

  17. Can we do better?
    10
    a.out
    Clang/GCC
    C
    ./a.out
    Hello world!
    Static compilers: optimize code based
    on Undefined Behavior
    Bug-finding tools: find bugs assuming
    that violations are visible side effects
    (Wang et al. 2012, D'Silva 2015)

    View Slide

  18. Can we do better?
    10
    a.out
    Clang/GCC
    C
    ./a.out
    Hello world!
    Static compilers: optimize code based
    on Undefined Behavior
    Bug-finding tools: find bugs assuming
    that violations are visible side effects
    struct sock *sk = tun->sk;
    if (!tun)
    return POLLERR;
    (Wang et al. 2012, D'Silva 2015)

    View Slide

  19. Can we do better?
    10
    a.out
    Clang/GCC
    C
    ./a.out
    Hello world!
    Static compilers: optimize code based
    on Undefined Behavior
    Bug-finding tools: find bugs assuming
    that violations are visible side effects
    struct sock *sk = tun->sk;
    if (!tun)
    return POLLERR;
    (Wang et al. 2012, D'Silva 2015)

    View Slide

  20. Can we do better?
    11
    a.out
    Clang/GCC
    C
    ./a.out
    Hello world!
    Current approaches do not abstract from the underlying
    machine/native execution model

    View Slide

  21. Can we do better?
    11
    a.out
    Clang/GCC
    C
    ./a.out
    Hello world!
    Current approaches do not abstract from the underlying
    machine/native execution model
    Manually adding instrumentation is error-prone

    View Slide

  22. Claim: Unsafe Languages can be Executed
    Safely and Efficiently on the Java Virtual
    Machine
    12

    View Slide

  23. System Overview
    13
    LLVM IR Interpreter
    LLVM IR
    Clang
    program.c libc.c
    Truffle
    Graal
    JVM
    -O0

    View Slide

  24. System Overview
    13
    LLVM IR Interpreter
    LLVM IR
    Clang
    program.c libc.c
    Truffle
    Graal
    JVM
    -O0

    View Slide

  25. System Overview
    13
    LLVM IR Interpreter
    LLVM IR
    Clang
    program.c libc.c
    Truffle
    Graal
    JVM
    We are currently using a custom libc
    implementation (without system calls)
    -O0

    View Slide

  26. System Overview
    13
    LLVM IR Interpreter
    LLVM IR
    Clang
    program.c libc.c
    Truffle
    Graal
    JVM
    (Lattner 2004)
    -O0

    View Slide

  27. System Overview
    13
    LLVM IR Interpreter
    LLVM IR
    Clang
    program.c libc.c
    Truffle
    Graal
    JVM
    (Lattner 2004)
    Executing LLVM IR allows us to also
    execute other unsafe languages
    -O0

    View Slide

  28. System Overview
    13
    LLVM IR Interpreter
    LLVM IR
    Clang
    program.c libc.c
    Truffle
    Graal
    JVM
    -O0

    View Slide

  29. System Overview
    13
    LLVM IR Interpreter
    LLVM IR
    Clang
    program.c libc.c
    Truffle
    Graal
    JVM
    -O0

    View Slide

  30. System Overview
    13
    LLVM IR Interpreter
    LLVM IR
    Clang
    program.c libc.c
    Truffle
    Graal
    JVM
    -O0
    Unaware of the underlying machine,
    execution model, and ABI

    View Slide

  31. System Overview
    13
    LLVM IR Interpreter
    LLVM IR
    Clang
    program.c libc.c
    Truffle
    Graal
    JVM
    (Würthinger et al. 2013)
    -O0

    View Slide

  32. System Overview
    13
    LLVM IR Interpreter
    LLVM IR
    Clang
    program.c libc.c
    Truffle
    Graal
    JVM
    (Würthinger et al. 2013)
    -O0
    Truffle and Graal allow Safe Sulong to
    reach “native speeds”

    View Slide

  33. System Overview
    13
    LLVM IR Interpreter
    LLVM IR
    Clang
    program.c libc.c
    Truffle
    Graal
    JVM
    -O0

    View Slide

  34. System Overview
    13
    LLVM IR Interpreter
    LLVM IR
    Clang
    program.c libc.c
    Truffle
    Graal
    JVM
    -O0
    All checks are automatically performed
    by the underlying JVM

    View Slide

  35. Prevent Out-Of-Bounds Accesses
    14
    int *arr = malloc(3 * sizeof(int))
    arr[5] = …
    ManagedAddress
    offset=5
    data
    I32Array
    contents {0, 0, 0}

    View Slide

  36. Prevent Out-Of-Bounds Accesses
    contents[5]  ArrayIndexOutOfBoundsException
    14
    int *arr = malloc(3 * sizeof(int))
    arr[5] = …
    ManagedAddress
    offset=5
    data
    I32Array
    contents {0, 0, 0}

    View Slide

  37. ManagedAddress
    offset=0
    data
    I32Array
    contents=null
    Prevent Use-After-Free Errors
    15
    free(arr);
    arr[0] = …

    View Slide

  38. ManagedAddress
    offset=0
    data
    I32Array
    contents=null
    Prevent Use-After-Free Errors
    15
    free(arr);
    arr[0] = …

    View Slide

  39. ManagedAddress
    offset=0
    data
    I32Array
    contents=null
    Prevent Use-After-Free Errors
    contents[0] NullPointerException
    15
    free(arr);
    arr[0] = …

    View Slide

  40. ManagedAddress
    offset=0
    data
    I32Array
    contents=null
    Prevent Use-After-Free Errors
    contents[0] NullPointerException
    15
    free(arr);
    arr[0] = …
    Safe Sulong can detect other categories of errors
    (e.g., double-free errors)

    View Slide

  41. Evaluation
    • Found 68 errors in small open-source projects
    • Safe Sulong found 8 errors that were both not found by ASan and
    Valgrind
    • Compiler optimizations (ASan –O3) prevented the detection of 4
    additional bugs
    • Valgrind detected half of the errors
    16

    View Slide

  42. Evaluation: Example ASan
    17
    int main(int argc, char** argv) {
    printf("%d %s\n", argc, argv[100]);
    }
    ASan does not instrument
    the main() arguments since
    they are allocated by libc
    https://github.com/google/sanitizers/issues/762

    View Slide

  43. Claim: Unsafe Languages can be Executed
    Safely and Efficiently on the Java Virtual
    Machine
    18

    View Slide

  44. Example Program
    19
    void processRequests () {
    int i = 0;
    do {
    processPacket ();
    i ++;
    } while (i < 10000) ;
    }
    C

    View Slide

  45. Example Program
    19
    void processRequests () {
    int i = 0;
    do {
    processPacket ();
    i ++;
    } while (i < 10000) ;
    }
    define void @processRequests () #0 {
    ; ( basic block 0)
    br label %1
    ; :1 ( basic block 1)
    %i = phi i32 [ 0, %0 ], [ %2 , %1 ]
    call void @processPacket ()
    %2 = add nsw i32 %i, 1
    %3 = icmp slt i32 %2 , 10000
    br i1 %3 , label %1 , label %4
    ; :4 ( basic block 2)
    ret void
    }
    LLVM IR
    Clang
    C

    View Slide

  46. 20
    define void @processRequests () #0 {
    ; ( basic block 0)
    br label %1
    ; :1 ( basic block 1)
    %i = phi i32 [ 0, %0 ], [ %2 , %1 ]
    call void @processPacket ()
    %2 = add nsw i32 %i, 1
    %3 = icmp slt i32 %2 , 10000
    br i1 %3 , label %1 , label %4
    ; :4 ( basic block 2)
    ret void
    }
    LLVM IR
    Implementation of Operations

    View Slide

  47. 20
    define void @processRequests () #0 {
    ; ( basic block 0)
    br label %1
    ; :1 ( basic block 1)
    %i = phi i32 [ 0, %0 ], [ %2 , %1 ]
    call void @processPacket ()
    %2 = add nsw i32 %i, 1
    %3 = icmp slt i32 %2 , 10000
    br i1 %3 , label %1 , label %4
    ; :4 ( basic block 2)
    ret void
    }
    LLVM IR Executable Abstract Syntax Tree
    Implementation of Operations
    write
    %2
    add
    read
    %i
    1

    View Slide

  48. 21
    define void @processRequests () #0 {
    ; ( basic block 0)
    br label %1
    ; :1 ( basic block 1)
    %i = phi i32 [ 0, %0 ], [ %2 , %1 ]
    call void @processPacket ()
    %2 = add nsw i32 %i, 1
    %3 = icmp slt i32 %2 , 10000
    br i1 %3 , label %1 , label %4
    ; :4 ( basic block 2)
    ret void
    }
    LLVM IR
    Implementation of Basic Blocks

    View Slide

  49. 21
    define void @processRequests () #0 {
    ; ( basic block 0)
    br label %1
    ; :1 ( basic block 1)
    %i = phi i32 [ 0, %0 ], [ %2 , %1 ]
    call void @processPacket ()
    %2 = add nsw i32 %i, 1
    %3 = icmp slt i32 %2 , 10000
    br i1 %3 , label %1 , label %4
    ; :4 ( basic block 2)
    ret void
    }
    LLVM IR Executable Abstract Syntax Tree
    Implementation of Basic Blocks
    Block1

    View Slide

  50. Implementation of Control Flow Support
    22
    define void @processRequests () #0 {
    ; ( basic block 0)
    br label %1
    ; :1 ( basic block 1)
    %i = phi i32 [ 0, %0 ], [ %2 , %1 ]
    call void @processPacket ()
    %2 = add nsw i32 %i, 1
    %3 = icmp slt i32 %2 , 10000
    br i1 %3 , label %1 , label %4
    ; :4 ( basic block 2)
    ret void
    }
    LLVM IR

    View Slide

  51. Implementation of Control Flow Support
    22
    define void @processRequests () #0 {
    ; ( basic block 0)
    br label %1
    ; :1 ( basic block 1)
    %i = phi i32 [ 0, %0 ], [ %2 , %1 ]
    call void @processPacket ()
    %2 = add nsw i32 %i, 1
    %3 = icmp slt i32 %2 , 10000
    br i1 %3 , label %1 , label %4
    ; :4 ( basic block 2)
    ret void
    }
    LLVM IR
    Block0
    Block1
    Block2
    Basic Block Dispatch Node
    1 2 -1
    1

    View Slide

  52. Compilation
    • For frequently executed functions
    • Partial evaluation: inline execute
    methods of the graph (recursively)
    • Further optimize the graph
    23
    Block0
    Block1
    Block2
    Basic Block Dispatch Node
    1 2 -1
    1

    View Slide

  53. Block0
    Block1
    Block2
    Basic Block Dispatch Node
    1 2 -1
    1
    Compiler
    24
    int blockIndex = 0;
    block0:
    blockIndex = 1
    %i.0 = 0
    block1:
    while (true):
    processPacket()
    %2 = %i.0 + 1
    %3 = %2 < 10000
    if %3:
    blockIndex = 1
    %i.0 = %2
    continue;
    else:
    blockIndex = 2
    block2:
    blockIndex = -1
    return
    Unrolling of the interpreter loop

    View Slide

  54. Block0
    Block1
    Block2
    Basic Block Dispatch Node
    1 2 -1
    1
    Compiler
    24
    int blockIndex = 0;
    block0:
    blockIndex = 1
    %i.0 = 0
    block1:
    while (true):
    processPacket()
    %2 = %i.0 + 1
    %3 = %2 < 10000
    if %3:
    blockIndex = 1
    %i.0 = %2
    continue;
    else:
    blockIndex = 2
    block2:
    blockIndex = -1
    return
    Unrolling of the interpreter loop
    Graal further optimizes the
    partially evaluated interpreter

    View Slide

  55. Safe Semantics
    • Safe by design: errors result in exceptions
    • Invalid memory accesses are not optimized away
    25

    View Slide

  56. Evaluation: Peak Performance
    26
    lower is better

    View Slide

  57. Evaluation: Peak Performance
    27
    lower is better

    View Slide

  58. Evaluation: Peak Performance
    27
    Small benchmarks since Safe Sulong failed
    executing SPEC  preliminary results
    lower is better

    View Slide

  59. Evaluation: Peak Performance
    28
    lower is better

    View Slide

  60. Evaluation: Peak Performance
    28
    Baseline is Clang –O0, Safe Sulong
    is faster in all but one case
    lower is better

    View Slide

  61. Evaluation: Peak Performance
    29
    lower is better

    View Slide

  62. Evaluation: Peak Performance
    29
    Safe Sulong is close to
    Clang –O3 in some cases
    lower is better

    View Slide

  63. Evaluation: Peak Performance
    30
    lower is better

    View Slide

  64. Evaluation: Peak Performance
    30
    Safe Sulong –O0 is mostly faster than ASan –O0
    lower is better

    View Slide

  65. Future Work and Summary
    31

    View Slide

  66. 32
    Executing libc and Other System Libraries

    View Slide

  67. 32
    asm("rdtsc":"=a"(tickl),"=d"(tickh));
    Inline assembly
    Executing libc and Other System Libraries
    (Rigger et al. 2018)

    View Slide

  68. 32
    if (__builtin_expect(x, 0))
    foo ();
    asm("rdtsc":"=a"(tickl),"=d"(tickh));
    Inline assembly
    Executing libc and Other System Libraries
    Compiler builtins
    (Rigger et al. 2018)

    View Slide

  69. 32
    if (__builtin_expect(x, 0))
    foo ();
    asm("rdtsc":"=a"(tickl),"=d"(tickh));
    Inline assembly
    Executing libc and Other System Libraries
    Compiler builtins
    getcwd(buf, size);
    System calls
    (Rigger et al. 2018)

    View Slide

  70. 32
    if (__builtin_expect(x, 0))
    foo ();
    asm("rdtsc":"=a"(tickl),"=d"(tickh));
    Inline assembly
    Executing libc and Other System Libraries
    Compiler builtins
    getcwd(buf, size);
    System calls
    Implementing them will allow Safe Sulong to
    execute existing libcs (and SPEC)
    (Rigger et al. 2018)

    View Slide

  71. Summary
    33
    @RiggerManuel
    Approaches are based on “unsafe” compilers Safe Sulong automatically detects errors
    It reaches good peak performance We are still working on completeness

    View Slide

  72. Bibliography
    • Matthias Felleisen and Shriram Krishnamurthi. 1999. Safety in Programming Languages. Technical Report. Rice University.
    • Konstantin Serebryany, Derek Bruening, Alexander Potapenko, and Dmitry Vyukov. 2012. AddressSanitizer: a fast address sanity checker.
    In Proceedings of the 2012 USENIX conference on Annual Technical Conference (USENIX ATC'12). USENIX Association, Berkeley, CA, USA, 28-28.
    • Santosh Nagarakatte, Jianzhou Zhao, Milo M.K. Martin, and Steve Zdancewic. 2009. SoftBound: highly compatible and complete spatial memory safety
    for c. In Proceedings of the 30th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI '09). ACM, New York, NY,
    USA, 245-258.
    • Santosh Nagarakatte, Jianzhou Zhao, Milo M.K. Martin, and Steve Zdancewic. 2010. CETS: compiler enforced temporal safety for C. In Proceedings of
    the 2010 international symposium on Memory management (ISMM '10). ACM, New York, NY, USA, 31-40.
    • Nicholas Nethercote and Julian Seward. 2007. Valgrind: a framework for heavyweight dynamic binary instrumentation. In Proceedings of the 28th
    ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI '07).
    • Derek Bruening and Qin Zhao. 2011. Practical memory checking with Dr. Memory. In Proceedings of the 9th Annual IEEE/ACM International
    Symposium on Code Generation and Optimization (CGO '11). IEEE Computer Society, Washington, DC, USA, 213-223.
    • Xi Wang, Haogang Chen, Alvin Cheung, Zhihao Jia, Nickolai Zeldovich, and M. Frans Kaashoek. 2012. Undefined behavior: what happened to my code?.
    In Proceedings of the Asia-Pacific Workshop on Systems (APSYS '12). ACM, New York, NY, USA, Article 9, 7 pages.
    • Vijay D'Silva, Mathias Payer, and Dawn Song. 2015. The Correctness-Security Gap in Compiler Optimization. In Proceedings of the 2015 IEEE Security
    and Privacy Workshops (SPW '15). IEEE Computer Society, Washington, DC, USA, 73-87.
    • Thomas Würthinger, Christian Wimmer, Andreas Wöß, Lukas Stadler, Gilles Duboscq, Christian Humer, Gregor Richards, Doug Simon, and Mario
    Wolczko. 2013. One VM to rule them all. In Proceedings of the 2013 ACM international symposium on New ideas, new paradigms, and reflections on
    programming & software (Onward! 2013). ACM, New York, NY, USA, 187-204.
    • Chris Lattner and Vikram Adve. 2004. LLVM: A Compilation Framework for Lifelong Program Analysis & Transformation. In Proceedings of the
    international symposium on Code generation and optimization: feedback-directed and runtime optimization(CGO '04). IEEE Computer Society,
    Washington, DC, USA.
    • Manuel Rigger and Stefan Marr and Stephen Kell and David Leopoldseder and Hanspeter Mössenböck, (2018) An Analysis of x86-64 Inline Assembly in
    C Programs. In: 14th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments, 25 March 2018, Williamsburg, VA, USA.
    34

    View Slide