Slide 1

Slide 1 text

Sulong: Executing Low-level Languages on Truffle Manuel Rigger Advanced Software Technologies Lab (Zhendong Su) ETH Zurich 1. April 2019 Interconnecting Code Workshop @ 2019 @RiggerManuel

Slide 2

Slide 2 text

PhD Topic 2 Safe and Efficient Execution of Unsafe Languages on the Java Virtual Machine

Slide 3

Slide 3 text

How is this Relevant for ICW? 3 An improved version of Sulong is used within GraalVM as a native function interface

Slide 4

Slide 4 text

4 GraalVM, its language interoperability mechanism, and Sulong’s role

Slide 5

Slide 5 text

4 GraalVM, its language interoperability mechanism, and Sulong’s role I have not been working on language interoperability myself.

Slide 6

Slide 6 text

Unsafe languages 5 Heartbleed Cloudbleed

Slide 7

Slide 7 text

Unsafe languages 5 Heartbleed Cloudbleed Graalbleed

Slide 8

Slide 8 text

6 GraalVM, its language interoperability mechanism, and Sulong’s role Safe Sulong and how it safely executes LLVM- based ´languages

Slide 9

Slide 9 text

Sulong Interacts also with Other Code 7 Compiler builtins System calls External Libraries Low-level libc/POSIX functions Linkage features Compiler extensions Inline assembly

Slide 10

Slide 10 text

8 The importance of inline assembly and compiler builtins GraalVM, its language interoperability mechanism, and Sulong’s role Safe Sulong and how it safely executes LLVM- based languages

Slide 11

Slide 11 text

9 GraalVM, its language interoperability mechanism, and Sulong’s role

Slide 12

Slide 12 text

GraalVM 10 https://www.graalvm.org/

Slide 13

Slide 13 text

GraalVM 11 (Würthinger et al. 2016) GraalVM supports the execution of various languages TruffleRuby Graal.js Graal.python FastR

Slide 14

Slide 14 text

GraalVM 12 (Würthinger et al. 2016) TruffleRuby Graal.js Graal.python FastR Truffle Truffle is an language- implementation framework • Written in Java • Optimization primitives • Debugging and profiling • Language interoperability!

Slide 15

Slide 15 text

GraalVM 13 (Würthinger et al. 2016) TruffleRuby Graal.js Graal.python FastR Graal Truffle Graal is the compiler used by Truffle

Slide 16

Slide 16 text

GraalVM 14 (Würthinger et al. 2016) TruffleRuby Graal.js Graal.python FastR Graal Truffle Can execute on the JVM, be compiled to a standalone executable, … JVM

Slide 17

Slide 17 text

GraalVM 15 (Würthinger et al. 2016) TruffleRuby Graal.js Graal.python FastR The languages are implemented as Abstract Syntax Tree (AST) interpreters

Slide 18

Slide 18 text

AST Interpreters 16 = a b 3 + a = b + 3 Parse input program

Slide 19

Slide 19 text

AST Interpreters 17 Set up input 2 a b = a b 3 +

Slide 20

Slide 20 text

AST Interpreters 18 Execute = a b 3 + 2 a b

Slide 21

Slide 21 text

AST Interpreters 18 Execute = a b 3 + 2 a b 3 2

Slide 22

Slide 22 text

AST Interpreters 18 Execute = a b 3 + 2 a b 3 2 5 a

Slide 23

Slide 23 text

AST Interpreters 18 Execute = a b 3 + 5 2 a b 3 2 5 a

Slide 24

Slide 24 text

AST Interpreters Optimization 19 = a b 3 + Truffle AST Interpreters specialize for their input 5 2 a b Variable Integer

Slide 25

Slide 25 text

AST Interpreters Optimization 19 = a b 3 + Truffle AST Interpreters specialize for their input 5 2 a b if (input is as expected) { execute specialized operation } else { rewrite node } Variable Integer

Slide 26

Slide 26 text

AST Interpreters Optimization 20 = a b 3 + Partial Evaluation = a + b 3 Variable Integer

Slide 27

Slide 27 text

AST Interpreters Optimization 21 Compilation = a + b 3 if (b is an Integer) { a = b + 3 } else { deoptimize and rewrite node } Variable Integer

Slide 28

Slide 28 text

AST Interpreters Optimization 22 = a + b 3 5 “icw” a b = a b 3 + Variable Integer Deoptimize

Slide 29

Slide 29 text

AST Interpreters Optimization 23 = a b 3 + = a b 3 + Respecialize Variable Integer Generic

Slide 30

Slide 30 text

GraalVM 24 (Grimmer et al. 2015) TruffleRuby Graal.js Graal.python FastR

Slide 31

Slide 31 text

GraalVM 24 (Grimmer et al. 2015) TruffleRuby Graal.js Graal.python FastR Language interoperability support for individual language pairs would not scale

Slide 32

Slide 32 text

GraalVM 25 TruffleRuby Graal.js Graal.python FastR (Grimmer et al. 2015)

Slide 33

Slide 33 text

GraalVM 25 TruffleRuby Graal.js Graal.python FastR Idea: Implement a language- independent mechanism based on messages (Grimmer et al. 2015)

Slide 34

Slide 34 text

Message-Based Foreign Access 26 a = b + 3 = a READ b 3 + 2 b

Slide 35

Slide 35 text

Message-Based Foreign Access 26 a = b + 3 = a READ b 3 + 2 b Foreign objects can be accessed by sending a message to the foreign language implementation

Slide 36

Slide 36 text

Message-Based Foreign Accesses 27 = a READ B 3 + 2 Execute = a 3 + b

Slide 37

Slide 37 text

Message-Based Foreign Accesses 27 = a READ B 3 + 2 Execute = a 3 + Subsequent reads do not need to send a message b

Slide 38

Slide 38 text

Sulong as Part of GraalVM 28 Java Virtual Machine Graal Compiler Truffle Framework https://www.graalvm.org/ TruffleRuby Graal.js Graal.python FastR Native Extension

Slide 39

Slide 39 text

Sulong as Part of GraalVM 29 Java Virtual Machine Graal Compiler Truffle Framework https://www.graalvm.org/ TruffleRuby Graal.js Graal.python FastR Java Native Interface

Slide 40

Slide 40 text

Sulong as Part of GraalVM 30 Java Virtual Machine Graal Compiler Truffle Framework https://www.graalvm.org/ TruffleRuby Graal.js Graal.python FastR Optimization Boundary Java Native Interface

Slide 41

Slide 41 text

Sulong as Part of GraalVM 31 Java Virtual Machine Graal Compiler Truffle Framework TruffleRuby Graal.js Graal.python FastR LLVM IR Interpreter LLVM IR Clang Flang Optimization Boundary

Slide 42

Slide 42 text

How to Deal with C Code Accessing VM Internals? 32 Native Extension VM Native Extension API

Slide 43

Slide 43 text

How to Deal with C Code Accessing VM Internals? 32 Native Extension VM Native Extension API Native extension APIs allow to access VM internals

Slide 44

Slide 44 text

Example: Ruby C Extension 33 # Ruby Code: array.rb s = CArray.new puts s.arraySum([1,2,3]) // The C extension: array.c #include “ruby.h” VALUE c_arraySum(VALUE self, VALUE array) { int sum = 0; for (int i = 0; i < RARRAY_LEN(array); i++) { sum += FIX2INT(rb_ary_entry(array, i)); } return INT2FIX(sum); } Slide modified from Matthias Grimmer, with permission

Slide 45

Slide 45 text

Example: Ruby C Extension 34 // The C extension: array.c #include “ruby.h” VALUE c_arraySum(VALUE self, VALUE array) { int sum = 0; for (int i = 0; i < RARRAY_LEN(array); i++) { sum += FIX2INT(rb_ary_entry(array, i)); } return INT2FIX(sum); } // ruby.h typedef VALUE void*; typedef ID void *; VALUE rb_ary_entry(VALUE ary, long idx); Slide modified from Matthias Grimmer, with permission Programmers write their native extensions using the API provided by MRI

Slide 46

Slide 46 text

Example: Ruby C Extension 35 // The C extension: array.c #include “ruby.h” VALUE c_arraySum(VALUE self, VALUE array) { int sum = 0; for (int i = 0; i < RARRAY_LEN(array); i++) { sum += FIX2INT(rb_ary_entry(array, i)); } return INT2FIX(sum); } Slide modified from Matthias Grimmer, with permission // ruby.c #include “ruby.h” #include “truffle.h” VALUE rb_ary_entry(VALUE ary, long idx) { return truffle_read_idx(ary, (int) idx); } int FIX2INT(VALUE value) { return truffle_invoke_i(RUBY_CEXT, “rb_fix2int”, value); } truffle_read_idx and truffle_invoke_i are Sulong intrinsics that send messages

Slide 47

Slide 47 text

Example: Ruby C Extension 36 // The C extension: array.c #include “ruby.h” VALUE c_arraySum(VALUE self, VALUE array) { int sum = 0; for (int i = 0; i < RARRAY_LEN(array); i++) { sum += FIX2INT(rb_ary_entry(array, i)); } return INT2FIX(sum); } Slide modified from Matthias Grimmer, with permission // ruby.c #include “ruby.h” #include “truffle.h” VALUE rb_ary_entry(VALUE ary, long idx) { return truffle_read_idx(ary, (int) idx); } int FIX2INT(VALUE value) { return truffle_invoke_i(RUBY_CEXT, “rb_fix2int”, value); }

Slide 48

Slide 48 text

Example: Ruby C Extension 36 // The C extension: array.c #include “ruby.h” VALUE c_arraySum(VALUE self, VALUE array) { int sum = 0; for (int i = 0; i < RARRAY_LEN(array); i++) { sum += FIX2INT(rb_ary_entry(array, i)); } return INT2FIX(sum); } Slide modified from Matthias Grimmer, with permission // ruby.c #include “ruby.h” #include “truffle.h” VALUE rb_ary_entry(VALUE ary, long idx) { return truffle_read_idx(ary, (int) idx); } int FIX2INT(VALUE value) { return truffle_invoke_i(RUBY_CEXT, “rb_fix2int”, value); } # ruby.rb def rb_fix2int(value) if value.nil? raise TypeError else int = value.to_int raise RangeError if int >= 2**32 int end end

Slide 49

Slide 49 text

Performance 37 11 32 0 5 10 15 20 25 30 35 Peak performance relative to MRI running pure Ruby MRI with C Extensions GraalVM with C Extensions Slide modified from Matthias Grimmer, with permission

Slide 50

Slide 50 text

Performance 37 11 32 0 5 10 15 20 25 30 35 Peak performance relative to MRI running pure Ruby MRI with C Extensions GraalVM with C Extensions Slide modified from Matthias Grimmer, with permission Truffle can inline the function call from Ruby to C!

Slide 51

Slide 51 text

38 Safe Sulong and how it safely executes LLVM-based Languages

Slide 52

Slide 52 text

Problem: C/C++ are unsafe languages 39 Undefined Behavior (UB) “behavior, upon use of a nonportable or erroneous program construct or of erroneous data, for which this International Standard imposes no requirements “ (C99 standard)

Slide 53

Slide 53 text

Examples for Undefined Behavior Buffer overflow Use-after-free error Integer overflow 40

Slide 54

Slide 54 text

Buffer Overflows: Leaking Sensitive Data 41 long *arr = malloc(3 * sizeof(long)); arr: secret

Slide 55

Slide 55 text

Buffer Overflows: Leaking Sensitive Data 42 long *arr = malloc(3 * sizeof(long)); long dest[4]; memcpy(dest, arr, sizeof(dest)); arr: dest: secret

Slide 56

Slide 56 text

Buffer Overflows: Leaking Sensitive Data 42 long *arr = malloc(3 * sizeof(long)); long dest[4]; memcpy(dest, arr, sizeof(dest)); arr: dest: secret UB

Slide 57

Slide 57 text

Buffer Overflows: Leaking Sensitive Data 43 long *arr = malloc(3 * sizeof(long)); long dest[4]; memcpy(dest, arr, sizeof(dest)); arr: dest: secret secret

Slide 58

Slide 58 text

Buffer Overflows: Leaking Sensitive Data 43 long *arr = malloc(3 * sizeof(long)); long dest[4]; memcpy(dest, arr, sizeof(dest)); arr: dest: secret secret Heartbleed and Cloudbleed were such vulnerabilities

Slide 59

Slide 59 text

Buffer Overflows: Leaking Sensitive Data 43 long *arr = malloc(3 * sizeof(long)); long dest[4]; memcpy(dest, arr, sizeof(dest)); arr: dest: secret secret Heartbleed and Cloudbleed were such vulnerabilities Writes can allow attackers to change a program’s control flow

Slide 60

Slide 60 text

Use-after-free Error 44 long *arr = malloc(3 * sizeof(long)); free(arr); arr[0] = …; UB

Slide 61

Slide 61 text

Use-after-free Error 44 long *arr = malloc(3 * sizeof(long)); free(arr); arr[0] = …; UB Another object can be overwritten if the memory has been reallocated

Slide 62

Slide 62 text

Integer Overflow 45 int a = 1, b = INT_MAX; int val = a + b; UB

Slide 63

Slide 63 text

Integer Overflow 45 int a = 1, b = INT_MAX; int val = a + b; UB Can result in inconsistent or surprising behavior if UB is “optimized away”

Slide 64

Slide 64 text

Integer Overflow 46 void pause() { int a = 0; // run until overflow while (a < a + 1) { a++; } }

Slide 65

Slide 65 text

Integer Overflow 46 void pause() { int a = 0; // run until overflow while (a < a + 1) { a++; } } What’s the compilation output of Clang/GCC? 1. The function works as expected by the programmer 2. The function body is optimized away 3. The function results in an endless loop 4. It depends on the optimization level

Slide 66

Slide 66 text

Integer Overflow 47 void pause() { int a = 0; // run until overflow while (a < a + 1) { a++; } }

Slide 67

Slide 67 text

Integer Overflow 47 void pause() { int a = 0; // run until overflow while (a < a + 1) { a++; } } mov dword ptr [rsp - 4], 0 jmp loop_header loop_body: add dword ptr [rsp - 4], 1 loop_header: mov eax, dword ptr [rsp - 4] mov ecx, dword ptr [rsp - 4] add ecx, 1 cmp eax, ecx jl loop_body ret -O0

Slide 68

Slide 68 text

Integer Overflow 47 void pause() { int a = 0; // run until overflow while (a < a + 1) { a++; } } loop: jmp loop mov dword ptr [rsp - 4], 0 jmp loop_header loop_body: add dword ptr [rsp - 4], 1 loop_header: mov eax, dword ptr [rsp - 4] mov ecx, dword ptr [rsp - 4] add ecx, 1 cmp eax, ecx jl loop_body ret -O3 -O0

Slide 69

Slide 69 text

Goal of my PhD 48 Tackle UB by safely and efficiently executing unsafe languages on the JVM

Slide 70

Slide 70 text

Goal of my PhD 49 Tackle UB by safely and efficiently executing unsafe languages on the JVM

Slide 71

Slide 71 text

Goal of my PhD 49 Tackle UB by safely and efficiently executing unsafe languages on the JVM Well-defined semantics even for errors and corner cases

Slide 72

Slide 72 text

50 Existing Approaches Instrumentation- based bug-finding tools Symbolic execution Safe languages Hardware security Static analysis Attacker mitigation

Slide 73

Slide 73 text

51 Existing Approaches Instrumentation- based bug-finding tools Symbolic execution Safe languages Hardware security Static analysis Attacker mitigation

Slide 74

Slide 74 text

State of the Art: Instrumentation-based Tools 52 a.out Clang/GCC C ./a.out Hello world!

Slide 75

Slide 75 text

State of the Art: Instrumentation-based Tools Compile-time instrumentation • AddressSanitizer • SoftBound+CETS 52 a.out Clang/GCC C ./a.out Hello world!

Slide 76

Slide 76 text

State of the Art: Instrumentation-based Tools Compile-time instrumentation • AddressSanitizer • SoftBound+CETS 52 a.out Clang/GCC C ./a.out Hello world! Run-time instrumentation • Memcheck • Dr. Memory

Slide 77

Slide 77 text

Conundrum: Finding Bugs vs. Performance 53 a.out Clang/GCC C ./a.out Hello world!

Slide 78

Slide 78 text

Conundrum: Finding Bugs vs. Performance 53 a.out Clang/GCC C ./a.out Hello world! Static compilers: optimize code based on Undefined Behavior Bug-finding tools: find bugs assuming that violations are visible side effects (Wang et al. 2012, D'Silva 2015)

Slide 79

Slide 79 text

Conundrum: Finding Bugs vs. Performance 54 To find all bugs, developers need to disable compiler optimizations

Slide 80

Slide 80 text

Map Data Structures and Operations to Java 55 long *arr = malloc(3 * sizeof(long)); arr[4] = …

Slide 81

Slide 81 text

Map Data Structures and Operations to Java 55 long *arr = malloc(3 * sizeof(long)); arr[4] = … Map to Java Code

Slide 82

Slide 82 text

Map Data Structures and Operations to Java 55 long[] arr = new long[3]; arr[4] = … long *arr = malloc(3 * sizeof(long)); arr[4] = … Map to Java Code

Slide 83

Slide 83 text

Map Data Structures and Operations to Java 55 long[] arr = new long[3]; arr[4] = … long *arr = malloc(3 * sizeof(long)); arr[4] = … Map to Java Code The semantics of an out-of- bounds access are well specified

Slide 84

Slide 84 text

Map Data Structures and Operations to Java 55 long[] arr = new long[3]; arr[4] = … long *arr = malloc(3 * sizeof(long)); arr[4] = … Map to Java Code ArrayIndexOutOfBoundsException The semantics of an out-of- bounds access are well specified

Slide 85

Slide 85 text

Map Data Structures and Operations to Java 55 long[] arr = new long[3]; arr[4] = … long *arr = malloc(3 * sizeof(long)); arr[4] = … Map to Java Code ArrayIndexOutOfBoundsException The semantics of an out-of- bounds access are well specified The JVM’s compiler optimizes the program, but without optimizing Undefined Behavior away

Slide 86

Slide 86 text

Sulong 56 Sulong is a Truffle-based LLVM IR Interpreter LLVM IR Interpreter LLVM IR Clang program.c libc.c Truffle Graal JVM

Slide 87

Slide 87 text

Sulong 56 Sulong is a Truffle-based LLVM IR Interpreter LLVM IR Interpreter LLVM IR Clang program.c libc.c Truffle Graal JVM We need to disable Clang’s optimizations

Slide 88

Slide 88 text

{0, 0, 0} Address offset = 0 data I64Array contents Prevent Out-Of-Bounds Accesses 57 long *arr = malloc(3 * sizeof(long)); [How do we know the type?] [Pointer to an integer?] [Array bounds check elimination] [Strict-aliasing rule]

Slide 89

Slide 89 text

Prevent Out-Of-Bounds Accesses 58 long *arr = malloc(3 * sizeof(long)); arr[4] = … {0, 0, 0} Address offset = 4 data I64Array contents [Pointer to an integer?] [Array bounds check elimination] [Strict-aliasing rule]

Slide 90

Slide 90 text

Prevent Out-Of-Bounds Accesses contents[4] → ArrayIndexOutOfBoundsException 58 long *arr = malloc(3 * sizeof(long)); arr[4] = … {0, 0, 0} Address offset = 4 data I64Array contents [Pointer to an integer?] [Array bounds check elimination] [Strict-aliasing rule]

Slide 91

Slide 91 text

Prevent Use-after-Free Errors 59 long *arr = malloc(3 * sizeof(long)); free(arr); {0, 0, 0} Address offset = 0 data I64Array contents [Pointer to an integer?] [Strict-aliasing rule]

Slide 92

Slide 92 text

Prevent Use-after-Free Errors 60 long *arr = malloc(3 * sizeof(long)); free(arr); Address offset = 0 data I64Array contents=null [Pointer to an integer?] [Strict-aliasing rule]

Slide 93

Slide 93 text

Prevent Use-after-Free Errors 61 long *arr = malloc(3 * sizeof(long)); free(arr); arr[0] = … Address offset = 0 data I64Array contents=null [Pointer to an integer?] [Strict-aliasing rule]

Slide 94

Slide 94 text

Prevent Use-after-Free Errors contents[0] → NullPointerException 62 long *arr = malloc(3 * sizeof(long)); free(arr); arr[0] = … Address offset = 0 data I64Array contents=null [Pointer to an integer?] [Strict-aliasing rule]

Slide 95

Slide 95 text

Prevent Integer Overflows 63 int a = 1, b = INT_MAX; int val = a + b; Math.addExact(a, b); [Pointer to an integer?]

Slide 96

Slide 96 text

Prevent Integer Overflows 63 int a = 1, b = INT_MAX; int val = a + b; Math.addExact(a, b); ArithmeticException [Pointer to an integer?]

Slide 97

Slide 97 text

Safe Optimizations 64 ArrayIndexOutOfBoundsException NullPointerException ArithmeticException Exceptions are visible side effects and cannot be optimized away

Slide 98

Slide 98 text

Evaluation Hypotheses • Effectiveness: Safe Sulong detects bugs that are overlooked by other tools • Performance: Safe Sulong’s performance overhead is “reasonable” 65

Slide 99

Slide 99 text

Effectiveness: Errors in GitHub Projects 66 http://ssw.jku.at/General/Staff/ManuelRigger/ASPLOS18-SafeSulong-Bugs.csv

Slide 100

Slide 100 text

Effectiveness: Errors in GitHub Projects 66 http://ssw.jku.at/General/Staff/ManuelRigger/ASPLOS18-SafeSulong-Bugs.csv 68 errors in (small) open-source projects

Slide 101

Slide 101 text

Effectiveness: Errors in GitHub Projects • Valgrind detected half of the errors • 8 errors not found by LLVM’s AddressSanitizer (and Valgrind) • Compiler optimizations (ASan –O3) prevented the detection of 4 additional bugs 67 [Comparison tools]

Slide 102

Slide 102 text

Effectiveness: Errors in GitHub Projects 68 int main(int argc, char** argv) { printf("%d %s\n", argc, argv[5]); } [Comparison tools] Out-of-bounds accesses to argv are not instrumented by ASan

Slide 103

Slide 103 text

Effectiveness: Errors in GitHub Projects 69 https://github.com/google/sanitizers/issues/762

Slide 104

Slide 104 text

Effectiveness: Errors in GitHub Projects • 8 errors not found by LLVM’s AddressSanitizer and Valgrind 70 int main(int argc, char** argv) { printf("%d %s\n", argc, argv[5]); } [Comparison tools] In Safe Sulong instrumentation cannot be omitted by design

Slide 105

Slide 105 text

Peak Performance 71 lower is better

Slide 106

Slide 106 text

Peak Performance 71 lower is better Safe Sulong‘s performance is mostly between Clang –O0 and Clang –O3, and mostly faster than ASan –O0

Slide 107

Slide 107 text

Sulong as Part of GraalVM 72 Java Virtual Machine Graal Compiler Truffle Framework TruffleRuby Graal.js Graal.python FastR LLVM IR Interpreter LLVM IR Clang Flang Optimization Boundary

Slide 108

Slide 108 text

Sulong as Part of GraalVM 72 Java Virtual Machine Graal Compiler Truffle Framework TruffleRuby Graal.js Graal.python FastR LLVM IR Interpreter LLVM IR Clang Flang Optimization Boundary Managed Sulong, derived from Safe Sulong, is available in GraalVM

Slide 109

Slide 109 text

Sulong Key Collaborators 73 Jacob Kreindl Raphael Mosaner Roland Schatz Josef Eisl Christian Häubl Matthias Grimmer Thomas Pointhuber Daniel Pekarek Chris Seaton Lukas Stadler Florian Angerer David Gnedt https://github.com/graalvm/sulong/graphs/contributors Swapnil Gaikwad

Slide 110

Slide 110 text

74 The importance of inline assembly and compiler builtins

Slide 111

Slide 111 text

C/C++ Fortran

Slide 112

Slide 112 text

What about inline assembly? 76

Slide 113

Slide 113 text

What about GCC builtins? 77

Slide 114

Slide 114 text

What about linkage features? 78

Slide 115

Slide 115 text

Inline Assembly Compiler builtins System calls External Libraries Low-level libc/POSIX functions Linkage features C/C++ Fortran Compiler extensions Non-standard-compliant code

Slide 116

Slide 116 text

Inline Assembly Compiler builtins System calls External Libraries Low-level libc/POSIX functions Linkage features C/C++ Fortran Compiler extensions Non-standard-compliant code

Slide 117

Slide 117 text

Collaborators 81 Stefan Marr Stephen Kell David Leopoldseder Hanspeter Mössenböck Bram Adams

Slide 118

Slide 118 text

82 if (__builtin_expect(x, 0)) foo (); asm("rdtsc":"=a"(tickl),"=d"(tickh)); Inline Assembly C Projects Consist of More Than C Code Compiler builtins [Inline assembly details] [Inline Assembly and GCC Builtins in Sulong]

Slide 119

Slide 119 text

83 if (__builtin_expect(x, 0)) foo (); asm("rdtsc":"=a"(tickl),"=d"(tickh)); Inline Assembly C Projects Consist of More Than C Code Compiler builtins [Inline assembly details] [Inline Assembly and GCC Builtins in Sulong] ~1,000 instructions for a single complex ISA like x86-64

Slide 120

Slide 120 text

if (__builtin_expect(x, 0)) foo (); asm("rdtsc":"=a"(tickl),"=d"(tickh)); Inline Assembly C Projects Consist of More Than C Code Compiler builtins [Inline assembly details] [Inline Assembly and GCC Builtins in Sulong] Over 1,000 GCC builtins 84

Slide 121

Slide 121 text

C Projects Consist of More Than C Code 85 How frequently are these used? How are they used? What is the implementation effort to cover most programs? How well do comparable tools support them?

Slide 122

Slide 122 text

C Projects Consist of More Than C Code 85 How frequently are these used? How are they used? What is the implementation effort to cover most programs? How well do comparable tools support them? Informed decision to decide whether and do what extent to implement them in Sulong!

Slide 123

Slide 123 text

Mining of C GitHub Projects 86 GCC Builtins Inline Assembly # studied projects ~5,000 ~1,300 Considered projects All C projects C Client Applications Identification grep grep asm

Slide 124

Slide 124 text

Mining of C GitHub Projects 86 GCC Builtins Inline Assembly # studied projects ~5,000 ~1,300 Considered projects All C projects C Client Applications Identification grep grep asm Different setups, so the comparison should be taken with a grain of salt

Slide 125

Slide 125 text

How widespread are GCC builtins and inline assembly fragments? 87

Slide 126

Slide 126 text

In How Many Projects are They Used? 28% 37% 0 10 20 30 40 % of projects Popular projects with inline assembly (Popular) projects with GCC builtins Both GCC builtins and inline assembly are frequently used by projects 88

Slide 127

Slide 127 text

How Often are They Used Within a Project? 50k 6k 0 10 20 30 40 50 Density (occurrence per KLOC) Popular projects with inline assembly (Popular) projects with GCC builtins They are infrequently used within a project 89

Slide 128

Slide 128 text

How are inline assembly and GCC builtins used? 90

Slide 129

Slide 129 text

Inline Assembly 91 Inline assembly fragments can contain an arbitrary number of instructions; how many do they typically contain?

Slide 130

Slide 130 text

Inline Assembly 91 Inline assembly fragments can contain an arbitrary number of instructions; how many do they typically contain? uint64 sqlite3Hwtime(void){ unsigned long val; __asm__ ("rdtsc" : "=A" (val)); return val; }

Slide 131

Slide 131 text

Inline Assembly 91 Inline assembly fragments can contain an arbitrary number of instructions; how many do they typically contain? uint64 sqlite3Hwtime(void){ unsigned long val; __asm__ ("rdtsc" : "=A" (val)); return val; } __asm__ __volatile__ ( " leaq %0, %%rax\n" " movq %%rbp, 8(%%rax)\n" /* save regs rbp and rsp " movq %%rsp, (%%rax)\n" " movq %%rax, %%rsp\n" /* make rsp point to &ar " movq 16(%%rsp), %%rsi\n" /* rsi = in */ " movq 32(%%rsp), %%rdi\n" /* rdi = out */ " movq 24(%%rsp), %%r9\n" /* r9 = last */ " movq 48(%%rsp), %%r10\n" /* r10 = end */ " movq 64(%%rsp), %%rbp\n" /* rbp = lcode */ " movq 72(%%rsp), %%r11\n" /* r11 = dcode */ " movq 80(%%rsp), %%rdx\n" /* rdx = hold */ " movl 88(%%rsp), %%ebx\n" /* ebx = bits */ " movl 100(%%rsp), %%r12d\n" /* r12d = lmask */ " movl 104(%%rsp), %%r13d\n" /* r13d = dmask */ /* r14d = len */ /* r15d = dist */ " cld\n" " cmpq %%rdi, %%r10\n" " je .L_one_time\n" /* if only one decode le " cmpq %%rsi, %%r9\n" " je .L_one_time\n" " jmp .L_do_loop\n" ".L_one_time:\n" " movq %%r12, %%r8\n" /* r8 = lmask */ " cmpb $32, %%bl\n" " ja .L_get_length_code_one_time\n" " lodsl\n" /* eax = *(uint *)in++ * " movb %%bl, %%cl\n" /* cl = bits, needs it f " addb $32, %%bl\n" /* bits += 32 */ " shlq %%cl, %%rax\n" " orq %%rax, %%rdx\n" /* hold |= *((uint *)in) " jmp .L_get_length_code_one_time\n"

Slide 132

Slide 132 text

How are Inline Assembly Fragments Used? 92 0 10 20 30 40 50 60 70 80 90 100 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 Cumulative percentage Number of unique fragments per project 36% A number of projects only uses a single inline assembly fragments

Slide 133

Slide 133 text

How are Inline Assembly Fragments Used? 93 0 10 20 30 40 50 60 70 80 90 100 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 Cumulative percentage Number of unique fragments per project 99% Almost all projects use less than 25 inline assembly fragments

Slide 134

Slide 134 text

How are Inline Assembly Fragments Used? 0 10 20 30 40 50 60 70 80 90 100 1 2 3 4 5 6 7 8 9 10 11 12 Cumulative percentage Number of instructions per unique fragment 94 100% 438 … We also found fragments with several hundred instructions

Slide 135

Slide 135 text

How are Inline Assembly Fragments Used? 95 Inline assembly fragments typically consist of a low number of instructions.

Slide 136

Slide 136 text

How are GCC Builtins Used? 96 if (__builtin_expect(x, 0)) foo (); Architecture-independent builtin c = __builtin_ia32_paddb(a, b); Architecture-specific builtin Architecture-specific builtins are similar to inline assembly. Are they used?

Slide 137

Slide 137 text

How are GCC Builtins Used? 97 38% 36% 8% 0 500 1000 1500 2000 Number of projects Used builtins Machine-independent Machine-specific Mainly machine-independent GCC builtins are used.

Slide 138

Slide 138 text

Machine-specific vs. Machine-independent Builtins 98 17 3 4 A project that uses machine-specific builtins uses them in a larger number.

Slide 139

Slide 139 text

How well do tools support them and how much effort needs to be invested to support them? 99

Slide 140

Slide 140 text

Tool Support for Inline Assembly 100 c2go transpile test.c panic: unknown node type: 'GCCAsmStmt 0x3a991f8 'goroutine 1 [running]:github_com_elliotchance_c2go_ast.Parse go/src/github.com/elliotchance/c2go/ast/ast.go:211main.convertLinesToNodes go/src/github.com/elliotchance/c2go/main.go:81main.Start go/src/github.com/elliotchance/c2go/main.go:219main.runCommand go/src/github.com/elliotchance/c2go/main.go:350main.main go/src/github.com/elliotchance/c2go/main.go:277goroutine 6 [finalizer wait]: Splint 3.1.2 --- 03 May 2009 test.c: (in function rdtsc) test.c:5:3: Unrecognized identifier: asm Identifier used in code has not been declared. (Use –unrecog to inhibit warning) test.c:5:15: Parse Error. (For help on parse errors, see splint -help parseerrors.) *** Cannot continue.

Slide 141

Slide 141 text

Tool Support 101 Test suite for the most commonly-used 100 builtins

Slide 142

Slide 142 text

Bugs in CompCert 102 https://github.com/AbsInt/CompCert/issues/243 [Details bug]

Slide 143

Slide 143 text

103 Tool support is lacking behind

Slide 144

Slide 144 text

How much effort is needed to implement GCC Builtins? 104 [Details]

Slide 145

Slide 145 text

How much effort is needed to implement GCC Builtins? 104 32 builtins to support half of projects [Details]

Slide 146

Slide 146 text

How much effort is needed to implement GCC Builtins? 104 1600 builtins to support 99% of projects 32 builtins to support half of projects [Details]

Slide 147

Slide 147 text

How much effort is needed to implement GCC Builtins? 104 1600 builtins to support 99% of projects 32 builtins to support half of projects [Details] Machine-independent builtins are the “low-hanging fruits”

Slide 148

Slide 148 text

Are they a legacy feature that has survived until today? 105

Slide 149

Slide 149 text

GCC Builtin Usage Over Time 106 [Details] We analyzed the commit history of the GCC builtin projects

Slide 150

Slide 150 text

GCC Builtin Usage Over Time Trend Projects Increasing 38% Stagnant 26% Decreasing 14% Inconclusive 22% 107 64% of projects have been mainly adding builtins

Slide 151

Slide 151 text

Research Opportunities • Other elements, such as compiler pragmas and function attributes are not widely understood • Testing the correct usage of inline assembly and GCC builtins • Support in formal models and static analysis tools • Automatic approaches? 108

Slide 152

Slide 152 text

Inline Assembly Compiler builtins System calls External Libraries Low-level libc/POSIX functions Linkage features C/C++ Fortran Compiler extensions Non-standard-compliant code

Slide 153

Slide 153 text

110 Addressing the last 20% of the problem took 80% of the time

Slide 154

Slide 154 text

Pareto Principle 111 80% of the effects come from 20% of the causes

Slide 155

Slide 155 text

Pareto Principle 112 It is useful to consider the “seemingly” less- important 20% of a problem • Avoids oversimplifications • Helps designing holistic solutions • Leads to new research questions

Slide 156

Slide 156 text

Discussion: What About Other Overlooked Problems? 113 In which 20% of important use cases do current language interoperability approaches fail? Which 20% of important use cases cannot be expressed with and how does it affect users? Which 20% of an approach for connecting heterogeneous code provides bad usability and how can we improve on it?

Slide 157

Slide 157 text

Summary 114