Understanding GCC Builtins to Develop Better Tools

Understanding GCC Builtins to Develop Better Tools

Talk at ESEC/FSE 2019

389c8e3d83119ec458c5c57e8d92da2a?s=128

Manuel Rigger

August 28, 2019
Tweet

Transcript

  1. 1.
  2. 2.

    Understanding GCC Builtins to Develop Better Tools Manuel Rigger Stefan

    Marr Bram Adams Hanspeter Mössenböck @RiggerManuel 28. August 2019 Session Empirical Studies I @ ESEC/FSE 2019
  3. 3.

    Motivating Example Task: Print the number of leading zero bits

    in a 64-bit integer (implementing a C program) 3
  4. 4.

    Motivating Example 7 (decimal) Task: Print the number of leading

    zero bits in a 64-bit integer (implementing a C program) 3
  5. 5.

    Motivating Example 7 (decimal) 000…000111 (binary) Task: Print the number

    of leading zero bits in a 64-bit integer (implementing a C program) 3
  6. 6.

    Motivating Example 7 (decimal) 000…000111 (binary) 61 leading zero bits

    Task: Print the number of leading zero bits in a 64-bit integer (implementing a C program) 3
  7. 7.

    Approach 1: Implement in Plain C int count_leading_zeroes(unsigned long x)

    { int count = 0; unsigned long cur = x; while (cur > 0) { cur = cur >> 1; count++; } return 64 - count; } int main() { unsigned long value = 0b111; int bits = count_leading_zeroes(value); printf("%d", bits); } 4
  8. 10.

    int main() { unsigned long value = 0b111; int bits

    = __builtin_clzl(value); printf("%d", bits); } Approach 2: Reuse Existing Functionality 6
  9. 11.

    int main() { unsigned long value = 0b111; int bits

    = __builtin_clzl(value); printf("%d", bits); } Approach 2: Reuse Existing Functionality GCC builtins are provided directly by the (GCC) compiler 6
  10. 12.

    int main() { unsigned long value = 0b111; int bits

    = __builtin_clzl(value); printf("%d", bits); } Approach 2: Reuse Existing Functionality bsr rsi, rsi xor rsi, 63 Compiles to 7
  11. 13.

    int main() { unsigned long value = 0b111; int bits

    = __builtin_clzl(value); printf("%d", bits); } Approach 2: Reuse Existing Functionality bsr rsi, rsi xor rsi, 63 Compiles to GCC builtins typically result in efficient machine code 7
  12. 16.

    int main() { unsigned long value = 0b111; int bits

    = __builtin_clzl(value); printf("%d", bits); } Why Should Tool Developers Care? Finding: builtins are used by 37% of projects that we analyzed 10
  13. 18.

    Many Tools Exist that Support Developing C Programs Clang (LLVM)

    GCC ICC Preliminary experimentation suggested that these compilers support common GCC builtins 12
  14. 19.

    Many Tools Exist that Support Developing C Programs Clang (LLVM)

    GCC ICC CompCert Tiny C Compiler Special-purpose compilers 13
  15. 20.

    Many Tools Exist that Support Developing C Programs Clang (LLVM)

    GCC ICC CompCert Tiny C Compiler KCC Frama-C CIL Static analyzers 14
  16. 21.

    Many Tools Exist that Support Developing C Programs Clang (LLVM)

    GCC ICC CompCert Tiny C Compiler KCC Frama-C CIL KLEE Various other tools DragonEgg Sulong 15
  17. 22.

    Goal of our work Investigate the usage of builtins to

    inform tool developers about adding support for builtins in their tools 16
  18. 23.

    Goal of our work Investigate the usage of builtins to

    inform tool developers about adding support for builtins in their tools 16
  19. 24.

    Broader Implications of our Study First study on compiler builtins

    17 • Compiler builtins are an important feature, but not widely understood
  20. 25.

    18 • Compiler builtins are an important feature, but not

    widely understood Emerging languages like Rust also provide builtins Broader Implications of our Study
  21. 26.

    Broader Implications of our Study • Compiler builtins are an

    important feature, but not widely understood • Language design Demonstrates what features programming languages like C lack 19
  22. 27.

    Broader Implications of our Study • Compiler builtins are an

    important feature, but not widely understood • Language design • Developer feedback Informs developers on how builtin usage affects how well their code can be analyzed 20
  23. 28.

    Broader Implications of our Study • Compiler builtins are an

    important feature, but not widely understood • Language design • Developer feedback • Implementation and maintenance of compilers Informs compilers developers about which builtins are often used 21
  24. 30.

    1. How frequently are builtins used? 2. How well do

    tools that process C code support builtins? Research Questions 22
  25. 31.

    1. How frequently are builtins used? 2. How well do

    tools that process C code support builtins? 3. How many builtins must be implemented to support most projects? Research Questions 22
  26. 32.

    1. How frequently are builtins used? 2. How well do

    tools that process C code support builtins? 3. How many builtins must be implemented to support most projects? 4. (How does builtin usage vary over a project’s lifetime?) Research Questions 22
  27. 33.

    1. How frequently are builtins used? 2. How well do

    tools that process C code support builtins? 3. How many builtins must be implemented to support most projects? 4. (How does builtin usage vary over a project’s lifetime?) 5. (For what purposes are builtins used?) Research Questions 22
  28. 36.

    Obtain C Projects Filter C Projects Extracting Builtin Uses grep

    __builtin_clzl SQLite3 Database Methodology 23
  29. 37.

    Obtain C Projects Filter C Projects Extracting Builtin Uses Filter

    Builtin Name Records // __builtin_clzl /* __builtin_clzl */ Methodology 23
  30. 38.

    Obtain C Projects Filter C Projects Extracting Builtin Uses Filter

    Builtin Name Records Analyze the results Methodology 23
  31. 39.

    Obtain C Projects Filter C Projects Extracting Builtin Uses Filter

    Builtin Name Records Analyze the results Methodology All steps are replicable, see https://github.com/jku-ssw/gcc-builtin-study 23
  32. 40.

    Obtain C Projects Filter C Projects Extracting Builtin Uses Filter

    Builtin Name Records Analyze the results Methodology All steps are replicable, see https://github.com/jku-ssw/gcc-builtin-study 23 Repeating the study on newly- added builtins requires little effort
  33. 42.

    Overview 37% 0 500 1000 1500 2000 Number of projects

    Used builtins Machine-independent GCC builtins are used by many projects 25
  34. 43.

    Overview 37% 0 500 1000 1500 2000 Number of projects

    Used builtins Machine-independent GCC builtins are used by many projects 25 ~3,000 different builtins were used
  35. 44.

    Overview 37% 0 500 1000 1500 2000 Number of projects

    Used builtins Machine-independent GCC builtins are used by many projects 25 ~3,000 different builtins were used Builtins are used infrequently within a project: 1 builtin every ~6K LOC
  36. 45.

    Overview 37% 36% 0 500 1000 1500 2000 Number of

    projects Used builtins Machine-independent Machine-specific Many projects rely on architecture- independent builtins 26
  37. 49.

    Architecture-independent Builtins Builtin Category Projects __builtin_expect other (compiler interaction) 890

    / 48.3% __builtin_clz other (bitwise operation) 536 / 29.1% __builtin_bswap32 other (bitwise operation) 483 / 26.2% __builtin_constant_p other (compiler interaction) 430 / 23.3% __builtin_alloca other (stack allocation) 373 / 20.2% __sync_synchronize sync 356 / 19.3% __builtin_bswap64 other (bitwise operation) 347 / 18.8% __sync_fetch_and_add sync 332 / 18.0% __builtin_ctz other (bitwise operation) 324 / 17.6% __builtin_bswap16 other (bitwise operation) 304 / 16.5% 29
  38. 50.

    Architecture-independent Builtins Builtin Category Projects __builtin_expect other (compiler interaction) 890

    / 48.3% __builtin_clz other (bitwise operation) 536 / 29.1% __builtin_bswap32 other (bitwise operation) 483 / 26.2% __builtin_constant_p other (compiler interaction) 430 / 23.3% __builtin_alloca other (stack allocation) 373 / 20.2% __sync_synchronize sync 356 / 19.3% __builtin_bswap64 other (bitwise operation) 347 / 18.8% __sync_fetch_and_add sync 332 / 18.0% __builtin_ctz other (bitwise operation) 324 / 17.6% __builtin_bswap16 other (bitwise operation) 304 / 16.5% if (__builtin_expect(x, 0)) foo(); 29
  39. 51.

    Architecture-independent Builtins Builtin Category Projects __builtin_expect other (compiler interaction) 890

    / 48.3% __builtin_clz other (bitwise operation) 536 / 29.1% __builtin_bswap32 other (bitwise operation) 483 / 26.2% __builtin_constant_p other (compiler interaction) 430 / 23.3% __builtin_alloca other (stack allocation) 373 / 20.2% __sync_synchronize sync 356 / 19.3% __builtin_bswap64 other (bitwise operation) 347 / 18.8% __sync_fetch_and_add sync 332 / 18.0% __builtin_ctz other (bitwise operation) 324 / 17.6% __builtin_bswap16 other (bitwise operation) 304 / 16.5% 29
  40. 52.

    Architecture-independent Builtins Builtin Category Projects __builtin_expect other (compiler interaction) 890

    / 48.3% __builtin_clz other (bitwise operation) 536 / 29.1% __builtin_bswap32 other (bitwise operation) 483 / 26.2% __builtin_constant_p other (compiler interaction) 430 / 23.3% __builtin_alloca other (stack allocation) 373 / 20.2% __sync_synchronize sync 356 / 19.3% __builtin_bswap64 other (bitwise operation) 347 / 18.8% __sync_fetch_and_add sync 332 / 18.0% __builtin_ctz other (bitwise operation) 324 / 17.6% __builtin_bswap16 other (bitwise operation) 304 / 16.5% 29
  41. 53.

    Architecture-specific Builtins 37% 36% 8% 0 500 1000 1500 2000

    Number of projects Used builtins Machine-independent Machine-specific Architecture-specific builtins are less frequently used 30
  42. 54.

    Architecture-specific Builtins 37% 36% 8% 0 500 1000 1500 2000

    Number of projects Used builtins Machine-independent Machine-specific A project can rely on both architecture-specific and architecture-independent builtins 30
  43. 57.

    Unique Uses Per Project that Uses Builtins 17 3 4

    A project that uses machine-specific builtins uses them in a larger number 33
  44. 59.

    Test Suite for 100 most-frequently used builtins #include <assert.h> int

    main() { volatile unsigned long value = -1; assert(__builtin_clzl(value) == 0); value = (unsigned long)-1 >> 3; assert(__builtin_clzl(value) == 3); value = (long)((unsigned long)-1 >> 5) - 4; assert(__builtin_clzl(value) == 5); return 0; } 35 Goal was to test common and corner case
  45. 60.

    Test Suite for 100 most-frequently used builtins #include <assert.h> int

    main() { volatile unsigned long value = -1; assert(__builtin_clzl(value) == 0); value = (unsigned long)-1 >> 3; assert(__builtin_clzl(value) == 3); value = (long)((unsigned long)-1 >> 5) - 4; assert(__builtin_clzl(value) == 5); return 0; } Compile Analyze Parse Warning? Error? 35 Goal was to test common and corner case
  46. 68.

    Example: Bugs in CompCert https://github.com/AbsInt/CompCert/issues/243 __builtin_clz for long and long

    long incorrectly assumed a 32-bit integer #include <assert.h> int main() { volatile unsigned long value = -1; assert(__builtin_clzl(value) == 0); value = (unsigned long)-1 >> 3; assert(__builtin_clzl(value) == 3); value = (long)((unsigned long)-1 >> 5) - 4; assert(__builtin_clzl(value) == 5); return 0; } a.out: test.c:7: int main(): Assertion `__builtin_clzl(value) == 3' failed. Aborted 37
  47. 79.

    Greedy Implementation Strategy # Builtin Supported Projects … … …

    24 __sync_fetch_and_sub 43.16% 25 __sync_lock_release 44.47% 26 __builtin_clzl 45.45% 27 __builtin_choose_expr 46.23% 28 __builtin_frame_address 47.14% 29 __builtin_clzll 47.85% 30 __builtin_ctzll 50.79% … … … 41
  48. 80.

    Greedy Implementation Strategy # Builtin Supported Projects … … …

    24 __sync_fetch_and_sub 43.16% 25 __sync_lock_release 44.47% 26 __builtin_clzl 45.45% 27 __builtin_choose_expr 46.23% 28 __builtin_frame_address 47.14% 29 __builtin_clzll 47.85% 30 __builtin_ctzll 50.79% … … … Machine-independent builtins are the “low-hanging fruits” to implement 41
  49. 82.

    Greedy Implementation Strategy # Builtin Supported Projects … … …

    609 vreinterpret_u32_f32 89.37% 610 vld1_dup_s32 89.59% 611 vst1_s32 89.65% 612 vshrn_n_u32 89.65% 613 vshrq_n_u64 89.65% … … … 43
  50. 83.

    Greedy Implementation Strategy # Builtin Supported Projects … … …

    609 vreinterpret_u32_f32 89.37% 610 vld1_dup_s32 89.59% 611 vst1_s32 89.65% 612 vshrn_n_u32 89.65% 613 vshrq_n_u64 89.65% … … … Machine-specific builtins represent the “long tail of the distribution” 43
  51. 84.

    Summary and Discussion GCC builtins are a challenge for tool

    developers 37% of projects use GCC builtins (mostly machine-independent ones) Many tools lack support for GCC builtins Exponential number of builtins to support a specific number of projects @RiggerManuel https://github.com/jku-ssw/gcc-builtin-study
  52. 85.

    Summary and Discussion GCC builtins are a challenge for tool

    developers 37% of projects use GCC builtins (mostly machine-independent ones) Many tools lack support for GCC builtins Exponential number of builtins to support a specific number of projects @RiggerManuel https://github.com/jku-ssw/gcc-builtin-study
  53. 86.

    Summary and Discussion GCC builtins are a challenge for tool

    developers 37% of projects use GCC builtins (mostly machine-independent ones) Many tools lack support for GCC builtins Exponential number of builtins to support a specific number of projects @RiggerManuel https://github.com/jku-ssw/gcc-builtin-study
  54. 87.

    Summary and Discussion GCC builtins are a challenge for tool

    developers 37% of projects use GCC builtins (mostly machine-independent ones) Many tools lack support for GCC builtins Exponential number of builtins to support a specific number of projects @RiggerManuel https://github.com/jku-ssw/gcc-builtin-study
  55. 88.

    Summary and Discussion GCC builtins are a challenge for tool

    developers 37% of projects use GCC builtins (mostly machine-independent ones) Many tools lack support for GCC builtins Exponential number of builtins to support a specific number of projects @RiggerManuel https://github.com/jku-ssw/gcc-builtin-study
  56. 89.

    Summary and Discussion GCC builtins are a challenge for tool

    developers 37% of projects use GCC builtins (mostly machine-independent ones) Many tools lack support for GCC builtins Exponential number of builtins to support a specific number of projects @RiggerManuel https://github.com/jku-ssw/gcc-builtin-study
  57. 90.