Understanding GCC Builtins to Develop Better Tools

Understanding GCC Builtins to Develop Better Tools

Talk at ESEC/FSE 2019

389c8e3d83119ec458c5c57e8d92da2a?s=128

Manuel Rigger

August 28, 2019
Tweet

Transcript

  1. None
  2. Understanding GCC Builtins to Develop Better Tools Manuel Rigger Stefan

    Marr Bram Adams Hanspeter Mössenböck @RiggerManuel 28. August 2019 Session Empirical Studies I @ ESEC/FSE 2019
  3. Motivating Example Task: Print the number of leading zero bits

    in a 64-bit integer (implementing a C program) 3
  4. Motivating Example 7 (decimal) Task: Print the number of leading

    zero bits in a 64-bit integer (implementing a C program) 3
  5. Motivating Example 7 (decimal) 000…000111 (binary) Task: Print the number

    of leading zero bits in a 64-bit integer (implementing a C program) 3
  6. Motivating Example 7 (decimal) 000…000111 (binary) 61 leading zero bits

    Task: Print the number of leading zero bits in a 64-bit integer (implementing a C program) 3
  7. Approach 1: Implement in Plain C int count_leading_zeroes(unsigned long x)

    { int count = 0; unsigned long cur = x; while (cur > 0) { cur = cur >> 1; count++; } return 64 - count; } int main() { unsigned long value = 0b111; int bits = count_leading_zeroes(value); printf("%d", bits); } 4
  8. Approach 2: Reuse Existing Functionality 5

  9. Libc lacks such functionality Approach 2: Reuse Existing Functionality 5

  10. int main() { unsigned long value = 0b111; int bits

    = __builtin_clzl(value); printf("%d", bits); } Approach 2: Reuse Existing Functionality 6
  11. int main() { unsigned long value = 0b111; int bits

    = __builtin_clzl(value); printf("%d", bits); } Approach 2: Reuse Existing Functionality GCC builtins are provided directly by the (GCC) compiler 6
  12. int main() { unsigned long value = 0b111; int bits

    = __builtin_clzl(value); printf("%d", bits); } Approach 2: Reuse Existing Functionality bsr rsi, rsi xor rsi, 63 Compiles to 7
  13. int main() { unsigned long value = 0b111; int bits

    = __builtin_clzl(value); printf("%d", bits); } Approach 2: Reuse Existing Functionality bsr rsi, rsi xor rsi, 63 Compiles to GCC builtins typically result in efficient machine code 7
  14. Tools While GCC builtins provide convenience to users, adding support

    in other tools is challenging 8
  15. Why is Implementing GCC Builtins Challenging? Finding: Over 12,000 GCC

    builtins exist!
  16. int main() { unsigned long value = 0b111; int bits

    = __builtin_clzl(value); printf("%d", bits); } Why Should Tool Developers Care? Finding: builtins are used by 37% of projects that we analyzed 10
  17. Many Tools Exist that Support Developing C Programs Clang (LLVM)

    GCC ICC Mature compilers 11
  18. Many Tools Exist that Support Developing C Programs Clang (LLVM)

    GCC ICC Preliminary experimentation suggested that these compilers support common GCC builtins 12
  19. Many Tools Exist that Support Developing C Programs Clang (LLVM)

    GCC ICC CompCert Tiny C Compiler Special-purpose compilers 13
  20. Many Tools Exist that Support Developing C Programs Clang (LLVM)

    GCC ICC CompCert Tiny C Compiler KCC Frama-C CIL Static analyzers 14
  21. Many Tools Exist that Support Developing C Programs Clang (LLVM)

    GCC ICC CompCert Tiny C Compiler KCC Frama-C CIL KLEE Various other tools DragonEgg Sulong 15
  22. Goal of our work Investigate the usage of builtins to

    inform tool developers about adding support for builtins in their tools 16
  23. Goal of our work Investigate the usage of builtins to

    inform tool developers about adding support for builtins in their tools 16
  24. Broader Implications of our Study First study on compiler builtins

    17 • Compiler builtins are an important feature, but not widely understood
  25. 18 • Compiler builtins are an important feature, but not

    widely understood Emerging languages like Rust also provide builtins Broader Implications of our Study
  26. Broader Implications of our Study • Compiler builtins are an

    important feature, but not widely understood • Language design Demonstrates what features programming languages like C lack 19
  27. Broader Implications of our Study • Compiler builtins are an

    important feature, but not widely understood • Language design • Developer feedback Informs developers on how builtin usage affects how well their code can be analyzed 20
  28. Broader Implications of our Study • Compiler builtins are an

    important feature, but not widely understood • Language design • Developer feedback • Implementation and maintenance of compilers Informs compilers developers about which builtins are often used 21
  29. 1. How frequently are builtins used? Research Questions 22

  30. 1. How frequently are builtins used? 2. How well do

    tools that process C code support builtins? Research Questions 22
  31. 1. How frequently are builtins used? 2. How well do

    tools that process C code support builtins? 3. How many builtins must be implemented to support most projects? Research Questions 22
  32. 1. How frequently are builtins used? 2. How well do

    tools that process C code support builtins? 3. How many builtins must be implemented to support most projects? 4. (How does builtin usage vary over a project’s lifetime?) Research Questions 22
  33. 1. How frequently are builtins used? 2. How well do

    tools that process C code support builtins? 3. How many builtins must be implemented to support most projects? 4. (How does builtin usage vary over a project’s lifetime?) 5. (For what purposes are builtins used?) Research Questions 22
  34. Obtain C Projects 5,000 projects from Methodology 23 >= 80

    GitHub stars
  35. Obtain C Projects Filter C Projects Methodology > 100 LOC?

    23 ~4,900 projects
  36. Obtain C Projects Filter C Projects Extracting Builtin Uses grep

    __builtin_clzl SQLite3 Database Methodology 23
  37. Obtain C Projects Filter C Projects Extracting Builtin Uses Filter

    Builtin Name Records // __builtin_clzl /* __builtin_clzl */ Methodology 23
  38. Obtain C Projects Filter C Projects Extracting Builtin Uses Filter

    Builtin Name Records Analyze the results Methodology 23
  39. Obtain C Projects Filter C Projects Extracting Builtin Uses Filter

    Builtin Name Records Analyze the results Methodology All steps are replicable, see https://github.com/jku-ssw/gcc-builtin-study 23
  40. Obtain C Projects Filter C Projects Extracting Builtin Uses Filter

    Builtin Name Records Analyze the results Methodology All steps are replicable, see https://github.com/jku-ssw/gcc-builtin-study 23 Repeating the study on newly- added builtins requires little effort
  41. How Frequently Are Builtins Used? 24

  42. Overview 37% 0 500 1000 1500 2000 Number of projects

    Used builtins Machine-independent GCC builtins are used by many projects 25
  43. Overview 37% 0 500 1000 1500 2000 Number of projects

    Used builtins Machine-independent GCC builtins are used by many projects 25 ~3,000 different builtins were used
  44. Overview 37% 0 500 1000 1500 2000 Number of projects

    Used builtins Machine-independent GCC builtins are used by many projects 25 ~3,000 different builtins were used Builtins are used infrequently within a project: 1 builtin every ~6K LOC
  45. Overview 37% 36% 0 500 1000 1500 2000 Number of

    projects Used builtins Machine-independent Machine-specific Many projects rely on architecture- independent builtins 26
  46. Architecture-independent Builtins __builtin_clzl() Compiles to Definition: Architecture-independent builtins are typically

    supported on all common architectures 27
  47. Architecture-independent Builtins 28

  48. Architecture-independent Builtins Mainly builtins from the “other” and “sync” categories

    are frequently used 28
  49. Architecture-independent Builtins Builtin Category Projects __builtin_expect other (compiler interaction) 890

    / 48.3% __builtin_clz other (bitwise operation) 536 / 29.1% __builtin_bswap32 other (bitwise operation) 483 / 26.2% __builtin_constant_p other (compiler interaction) 430 / 23.3% __builtin_alloca other (stack allocation) 373 / 20.2% __sync_synchronize sync 356 / 19.3% __builtin_bswap64 other (bitwise operation) 347 / 18.8% __sync_fetch_and_add sync 332 / 18.0% __builtin_ctz other (bitwise operation) 324 / 17.6% __builtin_bswap16 other (bitwise operation) 304 / 16.5% 29
  50. Architecture-independent Builtins Builtin Category Projects __builtin_expect other (compiler interaction) 890

    / 48.3% __builtin_clz other (bitwise operation) 536 / 29.1% __builtin_bswap32 other (bitwise operation) 483 / 26.2% __builtin_constant_p other (compiler interaction) 430 / 23.3% __builtin_alloca other (stack allocation) 373 / 20.2% __sync_synchronize sync 356 / 19.3% __builtin_bswap64 other (bitwise operation) 347 / 18.8% __sync_fetch_and_add sync 332 / 18.0% __builtin_ctz other (bitwise operation) 324 / 17.6% __builtin_bswap16 other (bitwise operation) 304 / 16.5% if (__builtin_expect(x, 0)) foo(); 29
  51. Architecture-independent Builtins Builtin Category Projects __builtin_expect other (compiler interaction) 890

    / 48.3% __builtin_clz other (bitwise operation) 536 / 29.1% __builtin_bswap32 other (bitwise operation) 483 / 26.2% __builtin_constant_p other (compiler interaction) 430 / 23.3% __builtin_alloca other (stack allocation) 373 / 20.2% __sync_synchronize sync 356 / 19.3% __builtin_bswap64 other (bitwise operation) 347 / 18.8% __sync_fetch_and_add sync 332 / 18.0% __builtin_ctz other (bitwise operation) 324 / 17.6% __builtin_bswap16 other (bitwise operation) 304 / 16.5% 29
  52. Architecture-independent Builtins Builtin Category Projects __builtin_expect other (compiler interaction) 890

    / 48.3% __builtin_clz other (bitwise operation) 536 / 29.1% __builtin_bswap32 other (bitwise operation) 483 / 26.2% __builtin_constant_p other (compiler interaction) 430 / 23.3% __builtin_alloca other (stack allocation) 373 / 20.2% __sync_synchronize sync 356 / 19.3% __builtin_bswap64 other (bitwise operation) 347 / 18.8% __sync_fetch_and_add sync 332 / 18.0% __builtin_ctz other (bitwise operation) 324 / 17.6% __builtin_bswap16 other (bitwise operation) 304 / 16.5% 29
  53. Architecture-specific Builtins 37% 36% 8% 0 500 1000 1500 2000

    Number of projects Used builtins Machine-independent Machine-specific Architecture-specific builtins are less frequently used 30
  54. Architecture-specific Builtins 37% 36% 8% 0 500 1000 1500 2000

    Number of projects Used builtins Machine-independent Machine-specific A project can rely on both architecture-specific and architecture-independent builtins 30
  55. Architecture-specific Builtins vec_perm() Compiles to Definition: Architecture-specific builtins are typically

    supported only on a specific architecture 31
  56. Architecture-specific Builtins PowerPC and ARM Builtins used most frequently 32

  57. Unique Uses Per Project that Uses Builtins 17 3 4

    A project that uses machine-specific builtins uses them in a larger number 33
  58. How Well are They Supported by Tools? 34

  59. Test Suite for 100 most-frequently used builtins #include <assert.h> int

    main() { volatile unsigned long value = -1; assert(__builtin_clzl(value) == 0); value = (unsigned long)-1 >> 3; assert(__builtin_clzl(value) == 3); value = (long)((unsigned long)-1 >> 5) - 4; assert(__builtin_clzl(value) == 5); return 0; } 35 Goal was to test common and corner case
  60. Test Suite for 100 most-frequently used builtins #include <assert.h> int

    main() { volatile unsigned long value = -1; assert(__builtin_clzl(value) == 0); value = (unsigned long)-1 >> 3; assert(__builtin_clzl(value) == 3); value = (long)((unsigned long)-1 >> 5) - 4; assert(__builtin_clzl(value) == 5); return 0; } Compile Analyze Parse Warning? Error? 35 Goal was to test common and corner case
  61. Tool Support 36

  62. Tool Support 36

  63. Tool Support Mature compilers support GCC builtins well 36

  64. Tool Support 36

  65. Tool Support 36 Tools that concretely or abstractly execute code

    support builtins also well
  66. Tool Support 36

  67. Tool Support 36 Static analysis tools and special- purpose compilers

    have limited/incorrect support
  68. Example: Bugs in CompCert https://github.com/AbsInt/CompCert/issues/243 __builtin_clz for long and long

    long incorrectly assumed a 32-bit integer #include <assert.h> int main() { volatile unsigned long value = -1; assert(__builtin_clzl(value) == 0); value = (unsigned long)-1 >> 3; assert(__builtin_clzl(value) == 3); value = (long)((unsigned long)-1 >> 5) - 4; assert(__builtin_clzl(value) == 5); return 0; } a.out: test.c:7: int main(): Assertion `__builtin_clzl(value) == 3' failed. Aborted 37
  69. Feedback 38

  70. Feedback 38

  71. Feedback 38

  72. Feedback 38

  73. Feedback 38

  74. Feedback 38

  75. How Many Builtins Must be Implemented to Support Most Projects?

    39
  76. Greedy Implementation Strategy 40

  77. Greedy Implementation Strategy 40 Greedily maximizes the number of supported

    projects
  78. Greedy Implementation Strategy ~30 builtins to support half of projects

    40
  79. Greedy Implementation Strategy # Builtin Supported Projects … … …

    24 __sync_fetch_and_sub 43.16% 25 __sync_lock_release 44.47% 26 __builtin_clzl 45.45% 27 __builtin_choose_expr 46.23% 28 __builtin_frame_address 47.14% 29 __builtin_clzll 47.85% 30 __builtin_ctzll 50.79% … … … 41
  80. Greedy Implementation Strategy # Builtin Supported Projects … … …

    24 __sync_fetch_and_sub 43.16% 25 __sync_lock_release 44.47% 26 __builtin_clzl 45.45% 27 __builtin_choose_expr 46.23% 28 __builtin_frame_address 47.14% 29 __builtin_clzll 47.85% 30 __builtin_ctzll 50.79% … … … Machine-independent builtins are the “low-hanging fruits” to implement 41
  81. Greedy Implementation Strategy 600 builtins to support 90% of projects

    42
  82. Greedy Implementation Strategy # Builtin Supported Projects … … …

    609 vreinterpret_u32_f32 89.37% 610 vld1_dup_s32 89.59% 611 vst1_s32 89.65% 612 vshrn_n_u32 89.65% 613 vshrq_n_u64 89.65% … … … 43
  83. Greedy Implementation Strategy # Builtin Supported Projects … … …

    609 vreinterpret_u32_f32 89.37% 610 vld1_dup_s32 89.59% 611 vst1_s32 89.65% 612 vshrn_n_u32 89.65% 613 vshrq_n_u64 89.65% … … … Machine-specific builtins represent the “long tail of the distribution” 43
  84. Summary and Discussion GCC builtins are a challenge for tool

    developers 37% of projects use GCC builtins (mostly machine-independent ones) Many tools lack support for GCC builtins Exponential number of builtins to support a specific number of projects @RiggerManuel https://github.com/jku-ssw/gcc-builtin-study
  85. Summary and Discussion GCC builtins are a challenge for tool

    developers 37% of projects use GCC builtins (mostly machine-independent ones) Many tools lack support for GCC builtins Exponential number of builtins to support a specific number of projects @RiggerManuel https://github.com/jku-ssw/gcc-builtin-study
  86. Summary and Discussion GCC builtins are a challenge for tool

    developers 37% of projects use GCC builtins (mostly machine-independent ones) Many tools lack support for GCC builtins Exponential number of builtins to support a specific number of projects @RiggerManuel https://github.com/jku-ssw/gcc-builtin-study
  87. Summary and Discussion GCC builtins are a challenge for tool

    developers 37% of projects use GCC builtins (mostly machine-independent ones) Many tools lack support for GCC builtins Exponential number of builtins to support a specific number of projects @RiggerManuel https://github.com/jku-ssw/gcc-builtin-study
  88. Summary and Discussion GCC builtins are a challenge for tool

    developers 37% of projects use GCC builtins (mostly machine-independent ones) Many tools lack support for GCC builtins Exponential number of builtins to support a specific number of projects @RiggerManuel https://github.com/jku-ssw/gcc-builtin-study
  89. Summary and Discussion GCC builtins are a challenge for tool

    developers 37% of projects use GCC builtins (mostly machine-independent ones) Many tools lack support for GCC builtins Exponential number of builtins to support a specific number of projects @RiggerManuel https://github.com/jku-ssw/gcc-builtin-study
  90. None