Slide 1

Slide 1 text

No content

Slide 2

Slide 2 text

Understanding GCC Builtins to Develop Better Tools Manuel Rigger Stefan Marr Bram Adams Hanspeter Mössenböck @RiggerManuel 28. August 2019 Session Empirical Studies I @ ESEC/FSE 2019

Slide 3

Slide 3 text

Motivating Example Task: Print the number of leading zero bits in a 64-bit integer (implementing a C program) 3

Slide 4

Slide 4 text

Motivating Example 7 (decimal) Task: Print the number of leading zero bits in a 64-bit integer (implementing a C program) 3

Slide 5

Slide 5 text

Motivating Example 7 (decimal) 000…000111 (binary) Task: Print the number of leading zero bits in a 64-bit integer (implementing a C program) 3

Slide 6

Slide 6 text

Motivating Example 7 (decimal) 000…000111 (binary) 61 leading zero bits Task: Print the number of leading zero bits in a 64-bit integer (implementing a C program) 3

Slide 7

Slide 7 text

Approach 1: Implement in Plain C int count_leading_zeroes(unsigned long x) { int count = 0; unsigned long cur = x; while (cur > 0) { cur = cur >> 1; count++; } return 64 - count; } int main() { unsigned long value = 0b111; int bits = count_leading_zeroes(value); printf("%d", bits); } 4

Slide 8

Slide 8 text

Approach 2: Reuse Existing Functionality 5

Slide 9

Slide 9 text

Libc lacks such functionality Approach 2: Reuse Existing Functionality 5

Slide 10

Slide 10 text

int main() { unsigned long value = 0b111; int bits = __builtin_clzl(value); printf("%d", bits); } Approach 2: Reuse Existing Functionality 6

Slide 11

Slide 11 text

int main() { unsigned long value = 0b111; int bits = __builtin_clzl(value); printf("%d", bits); } Approach 2: Reuse Existing Functionality GCC builtins are provided directly by the (GCC) compiler 6

Slide 12

Slide 12 text

int main() { unsigned long value = 0b111; int bits = __builtin_clzl(value); printf("%d", bits); } Approach 2: Reuse Existing Functionality bsr rsi, rsi xor rsi, 63 Compiles to 7

Slide 13

Slide 13 text

int main() { unsigned long value = 0b111; int bits = __builtin_clzl(value); printf("%d", bits); } Approach 2: Reuse Existing Functionality bsr rsi, rsi xor rsi, 63 Compiles to GCC builtins typically result in efficient machine code 7

Slide 14

Slide 14 text

Tools While GCC builtins provide convenience to users, adding support in other tools is challenging 8

Slide 15

Slide 15 text

Why is Implementing GCC Builtins Challenging? Finding: Over 12,000 GCC builtins exist!

Slide 16

Slide 16 text

int main() { unsigned long value = 0b111; int bits = __builtin_clzl(value); printf("%d", bits); } Why Should Tool Developers Care? Finding: builtins are used by 37% of projects that we analyzed 10

Slide 17

Slide 17 text

Many Tools Exist that Support Developing C Programs Clang (LLVM) GCC ICC Mature compilers 11

Slide 18

Slide 18 text

Many Tools Exist that Support Developing C Programs Clang (LLVM) GCC ICC Preliminary experimentation suggested that these compilers support common GCC builtins 12

Slide 19

Slide 19 text

Many Tools Exist that Support Developing C Programs Clang (LLVM) GCC ICC CompCert Tiny C Compiler Special-purpose compilers 13

Slide 20

Slide 20 text

Many Tools Exist that Support Developing C Programs Clang (LLVM) GCC ICC CompCert Tiny C Compiler KCC Frama-C CIL Static analyzers 14

Slide 21

Slide 21 text

Many Tools Exist that Support Developing C Programs Clang (LLVM) GCC ICC CompCert Tiny C Compiler KCC Frama-C CIL KLEE Various other tools DragonEgg Sulong 15

Slide 22

Slide 22 text

Goal of our work Investigate the usage of builtins to inform tool developers about adding support for builtins in their tools 16

Slide 23

Slide 23 text

Goal of our work Investigate the usage of builtins to inform tool developers about adding support for builtins in their tools 16

Slide 24

Slide 24 text

Broader Implications of our Study First study on compiler builtins 17 • Compiler builtins are an important feature, but not widely understood

Slide 25

Slide 25 text

18 • Compiler builtins are an important feature, but not widely understood Emerging languages like Rust also provide builtins Broader Implications of our Study

Slide 26

Slide 26 text

Broader Implications of our Study • Compiler builtins are an important feature, but not widely understood • Language design Demonstrates what features programming languages like C lack 19

Slide 27

Slide 27 text

Broader Implications of our Study • Compiler builtins are an important feature, but not widely understood • Language design • Developer feedback Informs developers on how builtin usage affects how well their code can be analyzed 20

Slide 28

Slide 28 text

Broader Implications of our Study • Compiler builtins are an important feature, but not widely understood • Language design • Developer feedback • Implementation and maintenance of compilers Informs compilers developers about which builtins are often used 21

Slide 29

Slide 29 text

1. How frequently are builtins used? Research Questions 22

Slide 30

Slide 30 text

1. How frequently are builtins used? 2. How well do tools that process C code support builtins? Research Questions 22

Slide 31

Slide 31 text

1. How frequently are builtins used? 2. How well do tools that process C code support builtins? 3. How many builtins must be implemented to support most projects? Research Questions 22

Slide 32

Slide 32 text

1. How frequently are builtins used? 2. How well do tools that process C code support builtins? 3. How many builtins must be implemented to support most projects? 4. (How does builtin usage vary over a project’s lifetime?) Research Questions 22

Slide 33

Slide 33 text

1. How frequently are builtins used? 2. How well do tools that process C code support builtins? 3. How many builtins must be implemented to support most projects? 4. (How does builtin usage vary over a project’s lifetime?) 5. (For what purposes are builtins used?) Research Questions 22

Slide 34

Slide 34 text

Obtain C Projects 5,000 projects from Methodology 23 >= 80 GitHub stars

Slide 35

Slide 35 text

Obtain C Projects Filter C Projects Methodology > 100 LOC? 23 ~4,900 projects

Slide 36

Slide 36 text

Obtain C Projects Filter C Projects Extracting Builtin Uses grep __builtin_clzl SQLite3 Database Methodology 23

Slide 37

Slide 37 text

Obtain C Projects Filter C Projects Extracting Builtin Uses Filter Builtin Name Records // __builtin_clzl /* __builtin_clzl */ Methodology 23

Slide 38

Slide 38 text

Obtain C Projects Filter C Projects Extracting Builtin Uses Filter Builtin Name Records Analyze the results Methodology 23

Slide 39

Slide 39 text

Obtain C Projects Filter C Projects Extracting Builtin Uses Filter Builtin Name Records Analyze the results Methodology All steps are replicable, see https://github.com/jku-ssw/gcc-builtin-study 23

Slide 40

Slide 40 text

Obtain C Projects Filter C Projects Extracting Builtin Uses Filter Builtin Name Records Analyze the results Methodology All steps are replicable, see https://github.com/jku-ssw/gcc-builtin-study 23 Repeating the study on newly- added builtins requires little effort

Slide 41

Slide 41 text

How Frequently Are Builtins Used? 24

Slide 42

Slide 42 text

Overview 37% 0 500 1000 1500 2000 Number of projects Used builtins Machine-independent GCC builtins are used by many projects 25

Slide 43

Slide 43 text

Overview 37% 0 500 1000 1500 2000 Number of projects Used builtins Machine-independent GCC builtins are used by many projects 25 ~3,000 different builtins were used

Slide 44

Slide 44 text

Overview 37% 0 500 1000 1500 2000 Number of projects Used builtins Machine-independent GCC builtins are used by many projects 25 ~3,000 different builtins were used Builtins are used infrequently within a project: 1 builtin every ~6K LOC

Slide 45

Slide 45 text

Overview 37% 36% 0 500 1000 1500 2000 Number of projects Used builtins Machine-independent Machine-specific Many projects rely on architecture- independent builtins 26

Slide 46

Slide 46 text

Architecture-independent Builtins __builtin_clzl() Compiles to Definition: Architecture-independent builtins are typically supported on all common architectures 27

Slide 47

Slide 47 text

Architecture-independent Builtins 28

Slide 48

Slide 48 text

Architecture-independent Builtins Mainly builtins from the “other” and “sync” categories are frequently used 28

Slide 49

Slide 49 text

Architecture-independent Builtins Builtin Category Projects __builtin_expect other (compiler interaction) 890 / 48.3% __builtin_clz other (bitwise operation) 536 / 29.1% __builtin_bswap32 other (bitwise operation) 483 / 26.2% __builtin_constant_p other (compiler interaction) 430 / 23.3% __builtin_alloca other (stack allocation) 373 / 20.2% __sync_synchronize sync 356 / 19.3% __builtin_bswap64 other (bitwise operation) 347 / 18.8% __sync_fetch_and_add sync 332 / 18.0% __builtin_ctz other (bitwise operation) 324 / 17.6% __builtin_bswap16 other (bitwise operation) 304 / 16.5% 29

Slide 50

Slide 50 text

Architecture-independent Builtins Builtin Category Projects __builtin_expect other (compiler interaction) 890 / 48.3% __builtin_clz other (bitwise operation) 536 / 29.1% __builtin_bswap32 other (bitwise operation) 483 / 26.2% __builtin_constant_p other (compiler interaction) 430 / 23.3% __builtin_alloca other (stack allocation) 373 / 20.2% __sync_synchronize sync 356 / 19.3% __builtin_bswap64 other (bitwise operation) 347 / 18.8% __sync_fetch_and_add sync 332 / 18.0% __builtin_ctz other (bitwise operation) 324 / 17.6% __builtin_bswap16 other (bitwise operation) 304 / 16.5% if (__builtin_expect(x, 0)) foo(); 29

Slide 51

Slide 51 text

Architecture-independent Builtins Builtin Category Projects __builtin_expect other (compiler interaction) 890 / 48.3% __builtin_clz other (bitwise operation) 536 / 29.1% __builtin_bswap32 other (bitwise operation) 483 / 26.2% __builtin_constant_p other (compiler interaction) 430 / 23.3% __builtin_alloca other (stack allocation) 373 / 20.2% __sync_synchronize sync 356 / 19.3% __builtin_bswap64 other (bitwise operation) 347 / 18.8% __sync_fetch_and_add sync 332 / 18.0% __builtin_ctz other (bitwise operation) 324 / 17.6% __builtin_bswap16 other (bitwise operation) 304 / 16.5% 29

Slide 52

Slide 52 text

Architecture-independent Builtins Builtin Category Projects __builtin_expect other (compiler interaction) 890 / 48.3% __builtin_clz other (bitwise operation) 536 / 29.1% __builtin_bswap32 other (bitwise operation) 483 / 26.2% __builtin_constant_p other (compiler interaction) 430 / 23.3% __builtin_alloca other (stack allocation) 373 / 20.2% __sync_synchronize sync 356 / 19.3% __builtin_bswap64 other (bitwise operation) 347 / 18.8% __sync_fetch_and_add sync 332 / 18.0% __builtin_ctz other (bitwise operation) 324 / 17.6% __builtin_bswap16 other (bitwise operation) 304 / 16.5% 29

Slide 53

Slide 53 text

Architecture-specific Builtins 37% 36% 8% 0 500 1000 1500 2000 Number of projects Used builtins Machine-independent Machine-specific Architecture-specific builtins are less frequently used 30

Slide 54

Slide 54 text

Architecture-specific Builtins 37% 36% 8% 0 500 1000 1500 2000 Number of projects Used builtins Machine-independent Machine-specific A project can rely on both architecture-specific and architecture-independent builtins 30

Slide 55

Slide 55 text

Architecture-specific Builtins vec_perm() Compiles to Definition: Architecture-specific builtins are typically supported only on a specific architecture 31

Slide 56

Slide 56 text

Architecture-specific Builtins PowerPC and ARM Builtins used most frequently 32

Slide 57

Slide 57 text

Unique Uses Per Project that Uses Builtins 17 3 4 A project that uses machine-specific builtins uses them in a larger number 33

Slide 58

Slide 58 text

How Well are They Supported by Tools? 34

Slide 59

Slide 59 text

Test Suite for 100 most-frequently used builtins #include int main() { volatile unsigned long value = -1; assert(__builtin_clzl(value) == 0); value = (unsigned long)-1 >> 3; assert(__builtin_clzl(value) == 3); value = (long)((unsigned long)-1 >> 5) - 4; assert(__builtin_clzl(value) == 5); return 0; } 35 Goal was to test common and corner case

Slide 60

Slide 60 text

Test Suite for 100 most-frequently used builtins #include int main() { volatile unsigned long value = -1; assert(__builtin_clzl(value) == 0); value = (unsigned long)-1 >> 3; assert(__builtin_clzl(value) == 3); value = (long)((unsigned long)-1 >> 5) - 4; assert(__builtin_clzl(value) == 5); return 0; } Compile Analyze Parse Warning? Error? 35 Goal was to test common and corner case

Slide 61

Slide 61 text

Tool Support 36

Slide 62

Slide 62 text

Tool Support 36

Slide 63

Slide 63 text

Tool Support Mature compilers support GCC builtins well 36

Slide 64

Slide 64 text

Tool Support 36

Slide 65

Slide 65 text

Tool Support 36 Tools that concretely or abstractly execute code support builtins also well

Slide 66

Slide 66 text

Tool Support 36

Slide 67

Slide 67 text

Tool Support 36 Static analysis tools and special- purpose compilers have limited/incorrect support

Slide 68

Slide 68 text

Example: Bugs in CompCert https://github.com/AbsInt/CompCert/issues/243 __builtin_clz for long and long long incorrectly assumed a 32-bit integer #include int main() { volatile unsigned long value = -1; assert(__builtin_clzl(value) == 0); value = (unsigned long)-1 >> 3; assert(__builtin_clzl(value) == 3); value = (long)((unsigned long)-1 >> 5) - 4; assert(__builtin_clzl(value) == 5); return 0; } a.out: test.c:7: int main(): Assertion `__builtin_clzl(value) == 3' failed. Aborted 37

Slide 69

Slide 69 text

Feedback 38

Slide 70

Slide 70 text

Feedback 38

Slide 71

Slide 71 text

Feedback 38

Slide 72

Slide 72 text

Feedback 38

Slide 73

Slide 73 text

Feedback 38

Slide 74

Slide 74 text

Feedback 38

Slide 75

Slide 75 text

How Many Builtins Must be Implemented to Support Most Projects? 39

Slide 76

Slide 76 text

Greedy Implementation Strategy 40

Slide 77

Slide 77 text

Greedy Implementation Strategy 40 Greedily maximizes the number of supported projects

Slide 78

Slide 78 text

Greedy Implementation Strategy ~30 builtins to support half of projects 40

Slide 79

Slide 79 text

Greedy Implementation Strategy # Builtin Supported Projects … … … 24 __sync_fetch_and_sub 43.16% 25 __sync_lock_release 44.47% 26 __builtin_clzl 45.45% 27 __builtin_choose_expr 46.23% 28 __builtin_frame_address 47.14% 29 __builtin_clzll 47.85% 30 __builtin_ctzll 50.79% … … … 41

Slide 80

Slide 80 text

Greedy Implementation Strategy # Builtin Supported Projects … … … 24 __sync_fetch_and_sub 43.16% 25 __sync_lock_release 44.47% 26 __builtin_clzl 45.45% 27 __builtin_choose_expr 46.23% 28 __builtin_frame_address 47.14% 29 __builtin_clzll 47.85% 30 __builtin_ctzll 50.79% … … … Machine-independent builtins are the “low-hanging fruits” to implement 41

Slide 81

Slide 81 text

Greedy Implementation Strategy 600 builtins to support 90% of projects 42

Slide 82

Slide 82 text

Greedy Implementation Strategy # Builtin Supported Projects … … … 609 vreinterpret_u32_f32 89.37% 610 vld1_dup_s32 89.59% 611 vst1_s32 89.65% 612 vshrn_n_u32 89.65% 613 vshrq_n_u64 89.65% … … … 43

Slide 83

Slide 83 text

Greedy Implementation Strategy # Builtin Supported Projects … … … 609 vreinterpret_u32_f32 89.37% 610 vld1_dup_s32 89.59% 611 vst1_s32 89.65% 612 vshrn_n_u32 89.65% 613 vshrq_n_u64 89.65% … … … Machine-specific builtins represent the “long tail of the distribution” 43

Slide 84

Slide 84 text

Summary and Discussion GCC builtins are a challenge for tool developers 37% of projects use GCC builtins (mostly machine-independent ones) Many tools lack support for GCC builtins Exponential number of builtins to support a specific number of projects @RiggerManuel https://github.com/jku-ssw/gcc-builtin-study

Slide 85

Slide 85 text

Summary and Discussion GCC builtins are a challenge for tool developers 37% of projects use GCC builtins (mostly machine-independent ones) Many tools lack support for GCC builtins Exponential number of builtins to support a specific number of projects @RiggerManuel https://github.com/jku-ssw/gcc-builtin-study

Slide 86

Slide 86 text

Summary and Discussion GCC builtins are a challenge for tool developers 37% of projects use GCC builtins (mostly machine-independent ones) Many tools lack support for GCC builtins Exponential number of builtins to support a specific number of projects @RiggerManuel https://github.com/jku-ssw/gcc-builtin-study

Slide 87

Slide 87 text

Summary and Discussion GCC builtins are a challenge for tool developers 37% of projects use GCC builtins (mostly machine-independent ones) Many tools lack support for GCC builtins Exponential number of builtins to support a specific number of projects @RiggerManuel https://github.com/jku-ssw/gcc-builtin-study

Slide 88

Slide 88 text

Summary and Discussion GCC builtins are a challenge for tool developers 37% of projects use GCC builtins (mostly machine-independent ones) Many tools lack support for GCC builtins Exponential number of builtins to support a specific number of projects @RiggerManuel https://github.com/jku-ssw/gcc-builtin-study

Slide 89

Slide 89 text

Summary and Discussion GCC builtins are a challenge for tool developers 37% of projects use GCC builtins (mostly machine-independent ones) Many tools lack support for GCC builtins Exponential number of builtins to support a specific number of projects @RiggerManuel https://github.com/jku-ssw/gcc-builtin-study

Slide 90

Slide 90 text

No content