Pro Yearly is on sale from $80 to $50! »

Sulong: Executing Low-level Languages on Truffle

Sulong: Executing Low-level Languages on Truffle

Invited talk held at the ICW 2019 Interconnecting Code Workshop co-located with 2019

389c8e3d83119ec458c5c57e8d92da2a?s=128

Manuel Rigger

April 01, 2019
Tweet

Transcript

  1. Sulong: Executing Low-level Languages on Truffle Manuel Rigger Advanced Software

    Technologies Lab (Zhendong Su) ETH Zurich 1. April 2019 Interconnecting Code Workshop @ <Programming> 2019 @RiggerManuel
  2. PhD Topic 2 Safe and Efficient Execution of Unsafe Languages

    on the Java Virtual Machine
  3. How is this Relevant for ICW? 3 An improved version

    of Sulong is used within GraalVM as a native function interface
  4. 4 GraalVM, its language interoperability mechanism, and Sulong’s role

  5. 4 GraalVM, its language interoperability mechanism, and Sulong’s role I

    have not been working on language interoperability myself.
  6. Unsafe languages 5 Heartbleed Cloudbleed

  7. Unsafe languages 5 Heartbleed Cloudbleed Graalbleed

  8. 6 GraalVM, its language interoperability mechanism, and Sulong’s role Safe

    Sulong and how it safely executes LLVM- based ´languages
  9. Sulong Interacts also with Other Code 7 Compiler builtins System

    calls External Libraries Low-level libc/POSIX functions Linkage features Compiler extensions Inline assembly
  10. 8 The importance of inline assembly and compiler builtins GraalVM,

    its language interoperability mechanism, and Sulong’s role Safe Sulong and how it safely executes LLVM- based languages
  11. 9 GraalVM, its language interoperability mechanism, and Sulong’s role

  12. GraalVM 10 https://www.graalvm.org/

  13. GraalVM 11 (Würthinger et al. 2016) GraalVM supports the execution

    of various languages TruffleRuby Graal.js Graal.python FastR
  14. GraalVM 12 (Würthinger et al. 2016) TruffleRuby Graal.js Graal.python FastR

    Truffle Truffle is an language- implementation framework • Written in Java • Optimization primitives • Debugging and profiling • Language interoperability!
  15. GraalVM 13 (Würthinger et al. 2016) TruffleRuby Graal.js Graal.python FastR

    Graal Truffle Graal is the compiler used by Truffle
  16. GraalVM 14 (Würthinger et al. 2016) TruffleRuby Graal.js Graal.python FastR

    Graal Truffle Can execute on the JVM, be compiled to a standalone executable, … JVM
  17. GraalVM 15 (Würthinger et al. 2016) TruffleRuby Graal.js Graal.python FastR

    The languages are implemented as Abstract Syntax Tree (AST) interpreters
  18. AST Interpreters 16 = a b 3 + a =

    b + 3 Parse input program
  19. AST Interpreters 17 Set up input 2 a b =

    a b 3 +
  20. AST Interpreters 18 Execute = a b 3 + 2

    a b
  21. AST Interpreters 18 Execute = a b 3 + 2

    a b 3 2
  22. AST Interpreters 18 Execute = a b 3 + 2

    a b 3 2 5 a
  23. AST Interpreters 18 Execute = a b 3 + 5

    2 a b 3 2 5 a
  24. AST Interpreters Optimization 19 = a b 3 + Truffle

    AST Interpreters specialize for their input 5 2 a b Variable Integer
  25. AST Interpreters Optimization 19 = a b 3 + Truffle

    AST Interpreters specialize for their input 5 2 a b if (input is as expected) { execute specialized operation } else { rewrite node } Variable Integer
  26. AST Interpreters Optimization 20 = a b 3 + Partial

    Evaluation = a + b 3 Variable Integer
  27. AST Interpreters Optimization 21 Compilation = a + b 3

    if (b is an Integer) { a = b + 3 } else { deoptimize and rewrite node } Variable Integer
  28. AST Interpreters Optimization 22 = a + b 3 5

    “icw” a b = a b 3 + Variable Integer Deoptimize
  29. AST Interpreters Optimization 23 = a b 3 + =

    a b 3 + Respecialize Variable Integer Generic
  30. GraalVM 24 (Grimmer et al. 2015) TruffleRuby Graal.js Graal.python FastR

  31. GraalVM 24 (Grimmer et al. 2015) TruffleRuby Graal.js Graal.python FastR

    Language interoperability support for individual language pairs would not scale
  32. GraalVM 25 TruffleRuby Graal.js Graal.python FastR (Grimmer et al. 2015)

  33. GraalVM 25 TruffleRuby Graal.js Graal.python FastR Idea: Implement a language-

    independent mechanism based on messages (Grimmer et al. 2015)
  34. Message-Based Foreign Access 26 a = b + 3 =

    a READ b 3 + 2 b
  35. Message-Based Foreign Access 26 a = b + 3 =

    a READ b 3 + 2 b Foreign objects can be accessed by sending a message to the foreign language implementation
  36. Message-Based Foreign Accesses 27 = a READ B 3 +

    2 Execute = a 3 + b
  37. Message-Based Foreign Accesses 27 = a READ B 3 +

    2 Execute = a 3 + Subsequent reads do not need to send a message b
  38. Sulong as Part of GraalVM 28 Java Virtual Machine Graal

    Compiler Truffle Framework https://www.graalvm.org/ TruffleRuby Graal.js Graal.python FastR Native Extension
  39. Sulong as Part of GraalVM 29 Java Virtual Machine Graal

    Compiler Truffle Framework https://www.graalvm.org/ TruffleRuby Graal.js Graal.python FastR Java Native Interface
  40. Sulong as Part of GraalVM 30 Java Virtual Machine Graal

    Compiler Truffle Framework https://www.graalvm.org/ TruffleRuby Graal.js Graal.python FastR Optimization Boundary Java Native Interface
  41. Sulong as Part of GraalVM 31 Java Virtual Machine Graal

    Compiler Truffle Framework TruffleRuby Graal.js Graal.python FastR LLVM IR Interpreter LLVM IR Clang Flang Optimization Boundary
  42. How to Deal with C Code Accessing VM Internals? 32

    Native Extension VM Native Extension API
  43. How to Deal with C Code Accessing VM Internals? 32

    Native Extension VM Native Extension API Native extension APIs allow to access VM internals
  44. Example: Ruby C Extension 33 # Ruby Code: array.rb s

    = CArray.new puts s.arraySum([1,2,3]) // The C extension: array.c #include “ruby.h” VALUE c_arraySum(VALUE self, VALUE array) { int sum = 0; for (int i = 0; i < RARRAY_LEN(array); i++) { sum += FIX2INT(rb_ary_entry(array, i)); } return INT2FIX(sum); } Slide modified from Matthias Grimmer, with permission
  45. Example: Ruby C Extension 34 // The C extension: array.c

    #include “ruby.h” VALUE c_arraySum(VALUE self, VALUE array) { int sum = 0; for (int i = 0; i < RARRAY_LEN(array); i++) { sum += FIX2INT(rb_ary_entry(array, i)); } return INT2FIX(sum); } // ruby.h typedef VALUE void*; typedef ID void *; VALUE rb_ary_entry(VALUE ary, long idx); Slide modified from Matthias Grimmer, with permission Programmers write their native extensions using the API provided by MRI
  46. Example: Ruby C Extension 35 // The C extension: array.c

    #include “ruby.h” VALUE c_arraySum(VALUE self, VALUE array) { int sum = 0; for (int i = 0; i < RARRAY_LEN(array); i++) { sum += FIX2INT(rb_ary_entry(array, i)); } return INT2FIX(sum); } Slide modified from Matthias Grimmer, with permission // ruby.c #include “ruby.h” #include “truffle.h” VALUE rb_ary_entry(VALUE ary, long idx) { return truffle_read_idx(ary, (int) idx); } int FIX2INT(VALUE value) { return truffle_invoke_i(RUBY_CEXT, “rb_fix2int”, value); } truffle_read_idx and truffle_invoke_i are Sulong intrinsics that send messages
  47. Example: Ruby C Extension 36 // The C extension: array.c

    #include “ruby.h” VALUE c_arraySum(VALUE self, VALUE array) { int sum = 0; for (int i = 0; i < RARRAY_LEN(array); i++) { sum += FIX2INT(rb_ary_entry(array, i)); } return INT2FIX(sum); } Slide modified from Matthias Grimmer, with permission // ruby.c #include “ruby.h” #include “truffle.h” VALUE rb_ary_entry(VALUE ary, long idx) { return truffle_read_idx(ary, (int) idx); } int FIX2INT(VALUE value) { return truffle_invoke_i(RUBY_CEXT, “rb_fix2int”, value); }
  48. Example: Ruby C Extension 36 // The C extension: array.c

    #include “ruby.h” VALUE c_arraySum(VALUE self, VALUE array) { int sum = 0; for (int i = 0; i < RARRAY_LEN(array); i++) { sum += FIX2INT(rb_ary_entry(array, i)); } return INT2FIX(sum); } Slide modified from Matthias Grimmer, with permission // ruby.c #include “ruby.h” #include “truffle.h” VALUE rb_ary_entry(VALUE ary, long idx) { return truffle_read_idx(ary, (int) idx); } int FIX2INT(VALUE value) { return truffle_invoke_i(RUBY_CEXT, “rb_fix2int”, value); } # ruby.rb def rb_fix2int(value) if value.nil? raise TypeError else int = value.to_int raise RangeError if int >= 2**32 int end end
  49. Performance 37 11 32 0 5 10 15 20 25

    30 35 Peak performance relative to MRI running pure Ruby MRI with C Extensions GraalVM with C Extensions Slide modified from Matthias Grimmer, with permission
  50. Performance 37 11 32 0 5 10 15 20 25

    30 35 Peak performance relative to MRI running pure Ruby MRI with C Extensions GraalVM with C Extensions Slide modified from Matthias Grimmer, with permission Truffle can inline the function call from Ruby to C!
  51. 38 Safe Sulong and how it safely executes LLVM-based Languages

  52. Problem: C/C++ are unsafe languages 39 Undefined Behavior (UB) “behavior,

    upon use of a nonportable or erroneous program construct or of erroneous data, for which this International Standard imposes no requirements “ (C99 standard)
  53. Examples for Undefined Behavior Buffer overflow Use-after-free error Integer overflow

    40
  54. Buffer Overflows: Leaking Sensitive Data 41 long *arr = malloc(3

    * sizeof(long)); arr: secret
  55. Buffer Overflows: Leaking Sensitive Data 42 long *arr = malloc(3

    * sizeof(long)); long dest[4]; memcpy(dest, arr, sizeof(dest)); arr: dest: secret
  56. Buffer Overflows: Leaking Sensitive Data 42 long *arr = malloc(3

    * sizeof(long)); long dest[4]; memcpy(dest, arr, sizeof(dest)); arr: dest: secret UB
  57. Buffer Overflows: Leaking Sensitive Data 43 long *arr = malloc(3

    * sizeof(long)); long dest[4]; memcpy(dest, arr, sizeof(dest)); arr: dest: secret secret
  58. Buffer Overflows: Leaking Sensitive Data 43 long *arr = malloc(3

    * sizeof(long)); long dest[4]; memcpy(dest, arr, sizeof(dest)); arr: dest: secret secret Heartbleed and Cloudbleed were such vulnerabilities
  59. Buffer Overflows: Leaking Sensitive Data 43 long *arr = malloc(3

    * sizeof(long)); long dest[4]; memcpy(dest, arr, sizeof(dest)); arr: dest: secret secret Heartbleed and Cloudbleed were such vulnerabilities Writes can allow attackers to change a program’s control flow
  60. Use-after-free Error 44 long *arr = malloc(3 * sizeof(long)); free(arr);

    arr[0] = …; UB
  61. Use-after-free Error 44 long *arr = malloc(3 * sizeof(long)); free(arr);

    arr[0] = …; UB Another object can be overwritten if the memory has been reallocated
  62. Integer Overflow 45 int a = 1, b = INT_MAX;

    int val = a + b; UB
  63. Integer Overflow 45 int a = 1, b = INT_MAX;

    int val = a + b; UB Can result in inconsistent or surprising behavior if UB is “optimized away”
  64. Integer Overflow 46 void pause() { int a = 0;

    // run until overflow while (a < a + 1) { a++; } }
  65. Integer Overflow 46 void pause() { int a = 0;

    // run until overflow while (a < a + 1) { a++; } } What’s the compilation output of Clang/GCC? 1. The function works as expected by the programmer 2. The function body is optimized away 3. The function results in an endless loop 4. It depends on the optimization level
  66. Integer Overflow 47 void pause() { int a = 0;

    // run until overflow while (a < a + 1) { a++; } }
  67. Integer Overflow 47 void pause() { int a = 0;

    // run until overflow while (a < a + 1) { a++; } } mov dword ptr [rsp - 4], 0 jmp loop_header loop_body: add dword ptr [rsp - 4], 1 loop_header: mov eax, dword ptr [rsp - 4] mov ecx, dword ptr [rsp - 4] add ecx, 1 cmp eax, ecx jl loop_body ret -O0
  68. Integer Overflow 47 void pause() { int a = 0;

    // run until overflow while (a < a + 1) { a++; } } loop: jmp loop mov dword ptr [rsp - 4], 0 jmp loop_header loop_body: add dword ptr [rsp - 4], 1 loop_header: mov eax, dword ptr [rsp - 4] mov ecx, dword ptr [rsp - 4] add ecx, 1 cmp eax, ecx jl loop_body ret -O3 -O0
  69. Goal of my PhD 48 Tackle UB by safely and

    efficiently executing unsafe languages on the JVM
  70. Goal of my PhD 49 Tackle UB by safely and

    efficiently executing unsafe languages on the JVM
  71. Goal of my PhD 49 Tackle UB by safely and

    efficiently executing unsafe languages on the JVM Well-defined semantics even for errors and corner cases
  72. 50 Existing Approaches Instrumentation- based bug-finding tools Symbolic execution Safe

    languages Hardware security Static analysis Attacker mitigation
  73. 51 Existing Approaches Instrumentation- based bug-finding tools Symbolic execution Safe

    languages Hardware security Static analysis Attacker mitigation
  74. State of the Art: Instrumentation-based Tools 52 a.out Clang/GCC C

    ./a.out Hello world!
  75. State of the Art: Instrumentation-based Tools Compile-time instrumentation • AddressSanitizer

    • SoftBound+CETS 52 a.out Clang/GCC C ./a.out Hello world!
  76. State of the Art: Instrumentation-based Tools Compile-time instrumentation • AddressSanitizer

    • SoftBound+CETS 52 a.out Clang/GCC C ./a.out Hello world! Run-time instrumentation • Memcheck • Dr. Memory
  77. Conundrum: Finding Bugs vs. Performance 53 a.out Clang/GCC C ./a.out

    Hello world!
  78. Conundrum: Finding Bugs vs. Performance 53 a.out Clang/GCC C ./a.out

    Hello world! Static compilers: optimize code based on Undefined Behavior Bug-finding tools: find bugs assuming that violations are visible side effects (Wang et al. 2012, D'Silva 2015)
  79. Conundrum: Finding Bugs vs. Performance 54 To find all bugs,

    developers need to disable compiler optimizations
  80. Map Data Structures and Operations to Java 55 long *arr

    = malloc(3 * sizeof(long)); arr[4] = …
  81. Map Data Structures and Operations to Java 55 long *arr

    = malloc(3 * sizeof(long)); arr[4] = … Map to Java Code
  82. Map Data Structures and Operations to Java 55 long[] arr

    = new long[3]; arr[4] = … long *arr = malloc(3 * sizeof(long)); arr[4] = … Map to Java Code
  83. Map Data Structures and Operations to Java 55 long[] arr

    = new long[3]; arr[4] = … long *arr = malloc(3 * sizeof(long)); arr[4] = … Map to Java Code The semantics of an out-of- bounds access are well specified
  84. Map Data Structures and Operations to Java 55 long[] arr

    = new long[3]; arr[4] = … long *arr = malloc(3 * sizeof(long)); arr[4] = … Map to Java Code ArrayIndexOutOfBoundsException The semantics of an out-of- bounds access are well specified
  85. Map Data Structures and Operations to Java 55 long[] arr

    = new long[3]; arr[4] = … long *arr = malloc(3 * sizeof(long)); arr[4] = … Map to Java Code ArrayIndexOutOfBoundsException The semantics of an out-of- bounds access are well specified The JVM’s compiler optimizes the program, but without optimizing Undefined Behavior away
  86. Sulong 56 Sulong is a Truffle-based LLVM IR Interpreter LLVM

    IR Interpreter LLVM IR Clang program.c libc.c Truffle Graal JVM
  87. Sulong 56 Sulong is a Truffle-based LLVM IR Interpreter LLVM

    IR Interpreter LLVM IR Clang program.c libc.c Truffle Graal JVM We need to disable Clang’s optimizations
  88. {0, 0, 0} Address offset = 0 data I64Array contents

    Prevent Out-Of-Bounds Accesses 57 long *arr = malloc(3 * sizeof(long)); [How do we know the type?] [Pointer to an integer?] [Array bounds check elimination] [Strict-aliasing rule]
  89. Prevent Out-Of-Bounds Accesses 58 long *arr = malloc(3 * sizeof(long));

    arr[4] = … {0, 0, 0} Address offset = 4 data I64Array contents [Pointer to an integer?] [Array bounds check elimination] [Strict-aliasing rule]
  90. Prevent Out-Of-Bounds Accesses contents[4] → ArrayIndexOutOfBoundsException 58 long *arr =

    malloc(3 * sizeof(long)); arr[4] = … {0, 0, 0} Address offset = 4 data I64Array contents [Pointer to an integer?] [Array bounds check elimination] [Strict-aliasing rule]
  91. Prevent Use-after-Free Errors 59 long *arr = malloc(3 * sizeof(long));

    free(arr); {0, 0, 0} Address offset = 0 data I64Array contents [Pointer to an integer?] [Strict-aliasing rule]
  92. Prevent Use-after-Free Errors 60 long *arr = malloc(3 * sizeof(long));

    free(arr); Address offset = 0 data I64Array contents=null [Pointer to an integer?] [Strict-aliasing rule]
  93. Prevent Use-after-Free Errors 61 long *arr = malloc(3 * sizeof(long));

    free(arr); arr[0] = … Address offset = 0 data I64Array contents=null [Pointer to an integer?] [Strict-aliasing rule]
  94. Prevent Use-after-Free Errors contents[0] → NullPointerException 62 long *arr =

    malloc(3 * sizeof(long)); free(arr); arr[0] = … Address offset = 0 data I64Array contents=null [Pointer to an integer?] [Strict-aliasing rule]
  95. Prevent Integer Overflows 63 int a = 1, b =

    INT_MAX; int val = a + b; Math.addExact(a, b); [Pointer to an integer?]
  96. Prevent Integer Overflows 63 int a = 1, b =

    INT_MAX; int val = a + b; Math.addExact(a, b); ArithmeticException [Pointer to an integer?]
  97. Safe Optimizations 64 ArrayIndexOutOfBoundsException NullPointerException ArithmeticException Exceptions are visible side

    effects and cannot be optimized away
  98. Evaluation Hypotheses • Effectiveness: Safe Sulong detects bugs that are

    overlooked by other tools • Performance: Safe Sulong’s performance overhead is “reasonable” 65
  99. Effectiveness: Errors in GitHub Projects 66 http://ssw.jku.at/General/Staff/ManuelRigger/ASPLOS18-SafeSulong-Bugs.csv

  100. Effectiveness: Errors in GitHub Projects 66 http://ssw.jku.at/General/Staff/ManuelRigger/ASPLOS18-SafeSulong-Bugs.csv 68 errors in

    (small) open-source projects
  101. Effectiveness: Errors in GitHub Projects • Valgrind detected half of

    the errors • 8 errors not found by LLVM’s AddressSanitizer (and Valgrind) • Compiler optimizations (ASan –O3) prevented the detection of 4 additional bugs 67 [Comparison tools]
  102. Effectiveness: Errors in GitHub Projects 68 int main(int argc, char**

    argv) { printf("%d %s\n", argc, argv[5]); } [Comparison tools] Out-of-bounds accesses to argv are not instrumented by ASan
  103. Effectiveness: Errors in GitHub Projects 69 https://github.com/google/sanitizers/issues/762

  104. Effectiveness: Errors in GitHub Projects • 8 errors not found

    by LLVM’s AddressSanitizer and Valgrind 70 int main(int argc, char** argv) { printf("%d %s\n", argc, argv[5]); } [Comparison tools] In Safe Sulong instrumentation cannot be omitted by design
  105. Peak Performance 71 lower is better

  106. Peak Performance 71 lower is better Safe Sulong‘s performance is

    mostly between Clang –O0 and Clang –O3, and mostly faster than ASan –O0
  107. Sulong as Part of GraalVM 72 Java Virtual Machine Graal

    Compiler Truffle Framework TruffleRuby Graal.js Graal.python FastR LLVM IR Interpreter LLVM IR Clang Flang Optimization Boundary
  108. Sulong as Part of GraalVM 72 Java Virtual Machine Graal

    Compiler Truffle Framework TruffleRuby Graal.js Graal.python FastR LLVM IR Interpreter LLVM IR Clang Flang Optimization Boundary Managed Sulong, derived from Safe Sulong, is available in GraalVM
  109. Sulong Key Collaborators 73 Jacob Kreindl Raphael Mosaner Roland Schatz

    Josef Eisl Christian Häubl Matthias Grimmer Thomas Pointhuber Daniel Pekarek Chris Seaton Lukas Stadler Florian Angerer David Gnedt https://github.com/graalvm/sulong/graphs/contributors Swapnil Gaikwad
  110. 74 The importance of inline assembly and compiler builtins

  111. C/C++ Fortran

  112. What about inline assembly? 76

  113. What about GCC builtins? 77

  114. What about linkage features? 78

  115. Inline Assembly Compiler builtins System calls External Libraries Low-level libc/POSIX

    functions Linkage features C/C++ Fortran Compiler extensions Non-standard-compliant code
  116. Inline Assembly Compiler builtins System calls External Libraries Low-level libc/POSIX

    functions Linkage features C/C++ Fortran Compiler extensions Non-standard-compliant code
  117. Collaborators 81 Stefan Marr Stephen Kell David Leopoldseder Hanspeter Mössenböck

    Bram Adams
  118. 82 if (__builtin_expect(x, 0)) foo (); asm("rdtsc":"=a"(tickl),"=d"(tickh)); Inline Assembly C

    Projects Consist of More Than C Code Compiler builtins [Inline assembly details] [Inline Assembly and GCC Builtins in Sulong]
  119. 83 if (__builtin_expect(x, 0)) foo (); asm("rdtsc":"=a"(tickl),"=d"(tickh)); Inline Assembly C

    Projects Consist of More Than C Code Compiler builtins [Inline assembly details] [Inline Assembly and GCC Builtins in Sulong] ~1,000 instructions for a single complex ISA like x86-64
  120. if (__builtin_expect(x, 0)) foo (); asm("rdtsc":"=a"(tickl),"=d"(tickh)); Inline Assembly C Projects

    Consist of More Than C Code Compiler builtins [Inline assembly details] [Inline Assembly and GCC Builtins in Sulong] Over 1,000 GCC builtins 84
  121. C Projects Consist of More Than C Code 85 How

    frequently are these used? How are they used? What is the implementation effort to cover most programs? How well do comparable tools support them?
  122. C Projects Consist of More Than C Code 85 How

    frequently are these used? How are they used? What is the implementation effort to cover most programs? How well do comparable tools support them? Informed decision to decide whether and do what extent to implement them in Sulong!
  123. Mining of C GitHub Projects 86 GCC Builtins Inline Assembly

    # studied projects ~5,000 ~1,300 Considered projects All C projects C Client Applications Identification grep <builtin name> grep asm
  124. Mining of C GitHub Projects 86 GCC Builtins Inline Assembly

    # studied projects ~5,000 ~1,300 Considered projects All C projects C Client Applications Identification grep <builtin name> grep asm Different setups, so the comparison should be taken with a grain of salt
  125. How widespread are GCC builtins and inline assembly fragments? 87

  126. In How Many Projects are They Used? 28% 37% 0

    10 20 30 40 % of projects Popular projects with inline assembly (Popular) projects with GCC builtins Both GCC builtins and inline assembly are frequently used by projects 88
  127. How Often are They Used Within a Project? 50k 6k

    0 10 20 30 40 50 Density (occurrence per KLOC) Popular projects with inline assembly (Popular) projects with GCC builtins They are infrequently used within a project 89
  128. How are inline assembly and GCC builtins used? 90

  129. Inline Assembly 91 Inline assembly fragments can contain an arbitrary

    number of instructions; how many do they typically contain?
  130. Inline Assembly 91 Inline assembly fragments can contain an arbitrary

    number of instructions; how many do they typically contain? uint64 sqlite3Hwtime(void){ unsigned long val; __asm__ ("rdtsc" : "=A" (val)); return val; }
  131. Inline Assembly 91 Inline assembly fragments can contain an arbitrary

    number of instructions; how many do they typically contain? uint64 sqlite3Hwtime(void){ unsigned long val; __asm__ ("rdtsc" : "=A" (val)); return val; } __asm__ __volatile__ ( " leaq %0, %%rax\n" " movq %%rbp, 8(%%rax)\n" /* save regs rbp and rsp " movq %%rsp, (%%rax)\n" " movq %%rax, %%rsp\n" /* make rsp point to &ar " movq 16(%%rsp), %%rsi\n" /* rsi = in */ " movq 32(%%rsp), %%rdi\n" /* rdi = out */ " movq 24(%%rsp), %%r9\n" /* r9 = last */ " movq 48(%%rsp), %%r10\n" /* r10 = end */ " movq 64(%%rsp), %%rbp\n" /* rbp = lcode */ " movq 72(%%rsp), %%r11\n" /* r11 = dcode */ " movq 80(%%rsp), %%rdx\n" /* rdx = hold */ " movl 88(%%rsp), %%ebx\n" /* ebx = bits */ " movl 100(%%rsp), %%r12d\n" /* r12d = lmask */ " movl 104(%%rsp), %%r13d\n" /* r13d = dmask */ /* r14d = len */ /* r15d = dist */ " cld\n" " cmpq %%rdi, %%r10\n" " je .L_one_time\n" /* if only one decode le " cmpq %%rsi, %%r9\n" " je .L_one_time\n" " jmp .L_do_loop\n" ".L_one_time:\n" " movq %%r12, %%r8\n" /* r8 = lmask */ " cmpb $32, %%bl\n" " ja .L_get_length_code_one_time\n" " lodsl\n" /* eax = *(uint *)in++ * " movb %%bl, %%cl\n" /* cl = bits, needs it f " addb $32, %%bl\n" /* bits += 32 */ " shlq %%cl, %%rax\n" " orq %%rax, %%rdx\n" /* hold |= *((uint *)in) " jmp .L_get_length_code_one_time\n"
  132. How are Inline Assembly Fragments Used? 92 0 10 20

    30 40 50 60 70 80 90 100 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 Cumulative percentage Number of unique fragments per project 36% A number of projects only uses a single inline assembly fragments
  133. How are Inline Assembly Fragments Used? 93 0 10 20

    30 40 50 60 70 80 90 100 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 Cumulative percentage Number of unique fragments per project 99% Almost all projects use less than 25 inline assembly fragments
  134. How are Inline Assembly Fragments Used? 0 10 20 30

    40 50 60 70 80 90 100 1 2 3 4 5 6 7 8 9 10 11 12 Cumulative percentage Number of instructions per unique fragment 94 100% 438 … We also found fragments with several hundred instructions
  135. How are Inline Assembly Fragments Used? 95 Inline assembly fragments

    typically consist of a low number of instructions.
  136. How are GCC Builtins Used? 96 if (__builtin_expect(x, 0)) foo

    (); Architecture-independent builtin c = __builtin_ia32_paddb(a, b); Architecture-specific builtin Architecture-specific builtins are similar to inline assembly. Are they used?
  137. How are GCC Builtins Used? 97 38% 36% 8% 0

    500 1000 1500 2000 Number of projects Used builtins Machine-independent Machine-specific Mainly machine-independent GCC builtins are used.
  138. Machine-specific vs. Machine-independent Builtins 98 17 3 4 A project

    that uses machine-specific builtins uses them in a larger number.
  139. How well do tools support them and how much effort

    needs to be invested to support them? 99
  140. Tool Support for Inline Assembly 100 c2go transpile test.c panic:

    unknown node type: 'GCCAsmStmt 0x3a991f8 <line:5:3, col:38>'goroutine 1 [running]:github_com_elliotchance_c2go_ast.Parse go/src/github.com/elliotchance/c2go/ast/ast.go:211main.convertLinesToNodes go/src/github.com/elliotchance/c2go/main.go:81main.Start go/src/github.com/elliotchance/c2go/main.go:219main.runCommand go/src/github.com/elliotchance/c2go/main.go:350main.main go/src/github.com/elliotchance/c2go/main.go:277goroutine 6 [finalizer wait]: Splint 3.1.2 --- 03 May 2009 test.c: (in function rdtsc) test.c:5:3: Unrecognized identifier: asm Identifier used in code has not been declared. (Use –unrecog to inhibit warning) test.c:5:15: Parse Error. (For help on parse errors, see splint -help parseerrors.) *** Cannot continue.
  141. Tool Support 101 Test suite for the most commonly-used 100

    builtins
  142. Bugs in CompCert 102 https://github.com/AbsInt/CompCert/issues/243 [Details bug]

  143. 103 Tool support is lacking behind

  144. How much effort is needed to implement GCC Builtins? 104

    [Details]
  145. How much effort is needed to implement GCC Builtins? 104

    32 builtins to support half of projects [Details]
  146. How much effort is needed to implement GCC Builtins? 104

    1600 builtins to support 99% of projects 32 builtins to support half of projects [Details]
  147. How much effort is needed to implement GCC Builtins? 104

    1600 builtins to support 99% of projects 32 builtins to support half of projects [Details] Machine-independent builtins are the “low-hanging fruits”
  148. Are they a legacy feature that has survived until today?

    105
  149. GCC Builtin Usage Over Time 106 [Details] We analyzed the

    commit history of the GCC builtin projects
  150. GCC Builtin Usage Over Time Trend Projects Increasing 38% Stagnant

    26% Decreasing 14% Inconclusive 22% 107 64% of projects have been mainly adding builtins
  151. Research Opportunities • Other elements, such as compiler pragmas and

    function attributes are not widely understood • Testing the correct usage of inline assembly and GCC builtins • Support in formal models and static analysis tools • Automatic approaches? 108
  152. Inline Assembly Compiler builtins System calls External Libraries Low-level libc/POSIX

    functions Linkage features C/C++ Fortran Compiler extensions Non-standard-compliant code
  153. 110 Addressing the last 20% of the problem took 80%

    of the time
  154. Pareto Principle 111 80% of the effects come from 20%

    of the causes
  155. Pareto Principle 112 It is useful to consider the “seemingly”

    less- important 20% of a problem • Avoids oversimplifications • Helps designing holistic solutions • Leads to new research questions
  156. Discussion: What About Other Overlooked Problems? 113 In which 20%

    of important use cases do current language interoperability approaches fail? Which 20% of important use cases cannot be expressed with <name of DSL> and how does it affect users? Which 20% of an approach for connecting heterogeneous code provides bad usability and how can we improve on it?
  157. Summary 114