Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Understanding GCC Builtins to Develop Better Tools

Understanding GCC Builtins to Develop Better Tools

Talk at ESEC/FSE 2019

Manuel Rigger

August 28, 2019
Tweet

More Decks by Manuel Rigger

Other Decks in Research

Transcript

  1. View Slide

  2. Understanding GCC Builtins
    to Develop Better Tools
    Manuel Rigger Stefan Marr Bram Adams Hanspeter Mössenböck
    @RiggerManuel
    28. August 2019
    Session Empirical Studies I @ ESEC/FSE 2019

    View Slide

  3. Motivating Example
    Task: Print the number of leading zero bits in a 64-bit
    integer (implementing a C program)
    3

    View Slide

  4. Motivating Example
    7 (decimal)
    Task: Print the number of leading zero bits in a 64-bit
    integer (implementing a C program)
    3

    View Slide

  5. Motivating Example
    7 (decimal)
    000…000111 (binary)
    Task: Print the number of leading zero bits in a 64-bit
    integer (implementing a C program)
    3

    View Slide

  6. Motivating Example
    7 (decimal)
    000…000111 (binary)
    61 leading zero bits
    Task: Print the number of leading zero bits in a 64-bit
    integer (implementing a C program)
    3

    View Slide

  7. Approach 1: Implement in Plain C
    int count_leading_zeroes(unsigned long x) {
    int count = 0;
    unsigned long cur = x;
    while (cur > 0) {
    cur = cur >> 1;
    count++;
    }
    return 64 - count;
    }
    int main() {
    unsigned long value = 0b111;
    int bits = count_leading_zeroes(value);
    printf("%d", bits);
    }
    4

    View Slide

  8. Approach 2: Reuse Existing Functionality
    5

    View Slide

  9. Libc lacks such
    functionality
    Approach 2: Reuse Existing Functionality
    5

    View Slide

  10. int main() {
    unsigned long value = 0b111;
    int bits = __builtin_clzl(value);
    printf("%d", bits);
    }
    Approach 2: Reuse Existing Functionality
    6

    View Slide

  11. int main() {
    unsigned long value = 0b111;
    int bits = __builtin_clzl(value);
    printf("%d", bits);
    }
    Approach 2: Reuse Existing Functionality
    GCC builtins are provided
    directly by the (GCC) compiler
    6

    View Slide

  12. int main() {
    unsigned long value = 0b111;
    int bits = __builtin_clzl(value);
    printf("%d", bits);
    }
    Approach 2: Reuse Existing Functionality
    bsr rsi, rsi
    xor rsi, 63
    Compiles to
    7

    View Slide

  13. int main() {
    unsigned long value = 0b111;
    int bits = __builtin_clzl(value);
    printf("%d", bits);
    }
    Approach 2: Reuse Existing Functionality
    bsr rsi, rsi
    xor rsi, 63
    Compiles to
    GCC builtins typically result in
    efficient machine code
    7

    View Slide

  14. Tools
    While GCC builtins provide
    convenience to users, adding support
    in other tools is challenging
    8

    View Slide

  15. Why is Implementing GCC Builtins Challenging?
    Finding: Over 12,000 GCC
    builtins exist!

    View Slide

  16. int main() {
    unsigned long value = 0b111;
    int bits = __builtin_clzl(value);
    printf("%d", bits);
    }
    Why Should Tool Developers Care?
    Finding: builtins are used by 37%
    of projects that we analyzed
    10

    View Slide

  17. Many Tools Exist that Support Developing C Programs
    Clang (LLVM)
    GCC
    ICC
    Mature compilers
    11

    View Slide

  18. Many Tools Exist that Support Developing C Programs
    Clang (LLVM)
    GCC
    ICC
    Preliminary experimentation
    suggested that these
    compilers support common
    GCC builtins
    12

    View Slide

  19. Many Tools Exist that Support Developing C Programs
    Clang (LLVM)
    GCC
    ICC
    CompCert
    Tiny C
    Compiler
    Special-purpose compilers
    13

    View Slide

  20. Many Tools Exist that Support Developing C Programs
    Clang (LLVM)
    GCC
    ICC
    CompCert
    Tiny C
    Compiler
    KCC
    Frama-C
    CIL
    Static analyzers
    14

    View Slide

  21. Many Tools Exist that Support Developing C Programs
    Clang (LLVM)
    GCC
    ICC
    CompCert
    Tiny C
    Compiler
    KCC
    Frama-C
    CIL
    KLEE
    Various other tools
    DragonEgg
    Sulong
    15

    View Slide

  22. Goal of our work
    Investigate the usage of builtins to inform
    tool developers about adding support for
    builtins in their tools
    16

    View Slide

  23. Goal of our work
    Investigate the usage of builtins to inform
    tool developers about adding support for
    builtins in their tools
    16

    View Slide

  24. Broader Implications of our Study
    First study on compiler
    builtins
    17
    • Compiler builtins are an important feature, but not widely understood

    View Slide

  25. 18
    • Compiler builtins are an important feature, but not widely understood
    Emerging languages like
    Rust also provide builtins
    Broader Implications of our Study

    View Slide

  26. Broader Implications of our Study
    • Compiler builtins are an important feature, but not widely understood
    • Language design
    Demonstrates what features
    programming languages like C lack
    19

    View Slide

  27. Broader Implications of our Study
    • Compiler builtins are an important feature, but not widely understood
    • Language design
    • Developer feedback
    Informs developers on how
    builtin usage affects how well
    their code can be analyzed
    20

    View Slide

  28. Broader Implications of our Study
    • Compiler builtins are an important feature, but not widely understood
    • Language design
    • Developer feedback
    • Implementation and maintenance of compilers
    Informs compilers
    developers about which
    builtins are often used
    21

    View Slide

  29. 1. How frequently are builtins used?
    Research Questions
    22

    View Slide

  30. 1. How frequently are builtins used?
    2. How well do tools that process C code support builtins?
    Research Questions
    22

    View Slide

  31. 1. How frequently are builtins used?
    2. How well do tools that process C code support builtins?
    3. How many builtins must be implemented to support most
    projects?
    Research Questions
    22

    View Slide

  32. 1. How frequently are builtins used?
    2. How well do tools that process C code support builtins?
    3. How many builtins must be implemented to support most
    projects?
    4. (How does builtin usage vary over a project’s lifetime?)
    Research Questions
    22

    View Slide

  33. 1. How frequently are builtins used?
    2. How well do tools that process C code support builtins?
    3. How many builtins must be implemented to support most
    projects?
    4. (How does builtin usage vary over a project’s lifetime?)
    5. (For what purposes are builtins used?)
    Research Questions
    22

    View Slide

  34. Obtain C
    Projects
    5,000 projects from
    Methodology
    23
    >= 80 GitHub stars

    View Slide

  35. Obtain C
    Projects
    Filter C
    Projects
    Methodology
    > 100 LOC?
    23
    ~4,900 projects

    View Slide

  36. Obtain C
    Projects
    Filter C
    Projects
    Extracting
    Builtin Uses
    grep __builtin_clzl
    SQLite3 Database
    Methodology
    23

    View Slide

  37. Obtain C
    Projects
    Filter C
    Projects
    Extracting
    Builtin Uses
    Filter Builtin
    Name
    Records
    // __builtin_clzl
    /* __builtin_clzl */
    Methodology
    23

    View Slide

  38. Obtain C
    Projects
    Filter C
    Projects
    Extracting
    Builtin Uses
    Filter Builtin
    Name
    Records
    Analyze the
    results
    Methodology
    23

    View Slide

  39. Obtain C
    Projects
    Filter C
    Projects
    Extracting
    Builtin Uses
    Filter Builtin
    Name
    Records
    Analyze the
    results
    Methodology
    All steps are replicable, see
    https://github.com/jku-ssw/gcc-builtin-study
    23

    View Slide

  40. Obtain C
    Projects
    Filter C
    Projects
    Extracting
    Builtin Uses
    Filter Builtin
    Name
    Records
    Analyze the
    results
    Methodology
    All steps are replicable, see
    https://github.com/jku-ssw/gcc-builtin-study
    23
    Repeating the study on newly-
    added builtins requires little effort

    View Slide

  41. How Frequently Are Builtins
    Used?
    24

    View Slide

  42. Overview
    37%
    0
    500
    1000
    1500
    2000
    Number of projects
    Used builtins Machine-independent
    GCC builtins are used by
    many projects
    25

    View Slide

  43. Overview
    37%
    0
    500
    1000
    1500
    2000
    Number of projects
    Used builtins Machine-independent
    GCC builtins are used by
    many projects
    25
    ~3,000 different builtins
    were used

    View Slide

  44. Overview
    37%
    0
    500
    1000
    1500
    2000
    Number of projects
    Used builtins Machine-independent
    GCC builtins are used by
    many projects
    25
    ~3,000 different builtins
    were used
    Builtins are used infrequently
    within a project:
    1 builtin every ~6K LOC

    View Slide

  45. Overview
    37% 36%
    0
    500
    1000
    1500
    2000
    Number of projects
    Used builtins Machine-independent Machine-specific
    Many projects rely on architecture-
    independent builtins
    26

    View Slide

  46. Architecture-independent Builtins
    __builtin_clzl()
    Compiles to
    Definition: Architecture-independent
    builtins are typically supported on all
    common architectures
    27

    View Slide

  47. Architecture-independent Builtins
    28

    View Slide

  48. Architecture-independent Builtins
    Mainly builtins from the “other” and “sync”
    categories are frequently used
    28

    View Slide

  49. Architecture-independent Builtins
    Builtin Category Projects
    __builtin_expect other (compiler interaction) 890 / 48.3%
    __builtin_clz other (bitwise operation) 536 / 29.1%
    __builtin_bswap32 other (bitwise operation) 483 / 26.2%
    __builtin_constant_p other (compiler interaction) 430 / 23.3%
    __builtin_alloca other (stack allocation) 373 / 20.2%
    __sync_synchronize sync 356 / 19.3%
    __builtin_bswap64 other (bitwise operation) 347 / 18.8%
    __sync_fetch_and_add sync 332 / 18.0%
    __builtin_ctz other (bitwise operation) 324 / 17.6%
    __builtin_bswap16 other (bitwise operation) 304 / 16.5%
    29

    View Slide

  50. Architecture-independent Builtins
    Builtin Category Projects
    __builtin_expect other (compiler interaction) 890 / 48.3%
    __builtin_clz other (bitwise operation) 536 / 29.1%
    __builtin_bswap32 other (bitwise operation) 483 / 26.2%
    __builtin_constant_p other (compiler interaction) 430 / 23.3%
    __builtin_alloca other (stack allocation) 373 / 20.2%
    __sync_synchronize sync 356 / 19.3%
    __builtin_bswap64 other (bitwise operation) 347 / 18.8%
    __sync_fetch_and_add sync 332 / 18.0%
    __builtin_ctz other (bitwise operation) 324 / 17.6%
    __builtin_bswap16 other (bitwise operation) 304 / 16.5%
    if (__builtin_expect(x, 0))
    foo();
    29

    View Slide

  51. Architecture-independent Builtins
    Builtin Category Projects
    __builtin_expect other (compiler interaction) 890 / 48.3%
    __builtin_clz other (bitwise operation) 536 / 29.1%
    __builtin_bswap32 other (bitwise operation) 483 / 26.2%
    __builtin_constant_p other (compiler interaction) 430 / 23.3%
    __builtin_alloca other (stack allocation) 373 / 20.2%
    __sync_synchronize sync 356 / 19.3%
    __builtin_bswap64 other (bitwise operation) 347 / 18.8%
    __sync_fetch_and_add sync 332 / 18.0%
    __builtin_ctz other (bitwise operation) 324 / 17.6%
    __builtin_bswap16 other (bitwise operation) 304 / 16.5%
    29

    View Slide

  52. Architecture-independent Builtins
    Builtin Category Projects
    __builtin_expect other (compiler interaction) 890 / 48.3%
    __builtin_clz other (bitwise operation) 536 / 29.1%
    __builtin_bswap32 other (bitwise operation) 483 / 26.2%
    __builtin_constant_p other (compiler interaction) 430 / 23.3%
    __builtin_alloca other (stack allocation) 373 / 20.2%
    __sync_synchronize sync 356 / 19.3%
    __builtin_bswap64 other (bitwise operation) 347 / 18.8%
    __sync_fetch_and_add sync 332 / 18.0%
    __builtin_ctz other (bitwise operation) 324 / 17.6%
    __builtin_bswap16 other (bitwise operation) 304 / 16.5%
    29

    View Slide

  53. Architecture-specific Builtins
    37% 36%
    8%
    0
    500
    1000
    1500
    2000
    Number of projects
    Used builtins Machine-independent Machine-specific
    Architecture-specific builtins are
    less frequently used
    30

    View Slide

  54. Architecture-specific Builtins
    37% 36%
    8%
    0
    500
    1000
    1500
    2000
    Number of projects
    Used builtins Machine-independent Machine-specific
    A project can rely on both
    architecture-specific and
    architecture-independent builtins
    30

    View Slide

  55. Architecture-specific Builtins
    vec_perm()
    Compiles to
    Definition: Architecture-specific builtins are
    typically supported only on a specific
    architecture
    31

    View Slide

  56. Architecture-specific Builtins
    PowerPC and ARM Builtins
    used most frequently
    32

    View Slide

  57. Unique Uses Per Project that Uses Builtins
    17
    3
    4
    A project that uses machine-specific builtins
    uses them in a larger number
    33

    View Slide

  58. How Well are They Supported by
    Tools?
    34

    View Slide

  59. Test Suite for 100 most-frequently used builtins
    #include
    int main() {
    volatile unsigned long value = -1;
    assert(__builtin_clzl(value) == 0);
    value = (unsigned long)-1 >> 3;
    assert(__builtin_clzl(value) == 3);
    value = (long)((unsigned long)-1 >> 5) - 4;
    assert(__builtin_clzl(value) == 5);
    return 0;
    }
    35
    Goal was to test common and corner case

    View Slide

  60. Test Suite for 100 most-frequently used builtins
    #include
    int main() {
    volatile unsigned long value = -1;
    assert(__builtin_clzl(value) == 0);
    value = (unsigned long)-1 >> 3;
    assert(__builtin_clzl(value) == 3);
    value = (long)((unsigned long)-1 >> 5) - 4;
    assert(__builtin_clzl(value) == 5);
    return 0;
    }
    Compile
    Analyze
    Parse
    Warning? Error?
    35
    Goal was to test common and corner case

    View Slide

  61. Tool Support
    36

    View Slide

  62. Tool Support
    36

    View Slide

  63. Tool Support
    Mature compilers support GCC
    builtins well
    36

    View Slide

  64. Tool Support
    36

    View Slide

  65. Tool Support
    36
    Tools that concretely or
    abstractly execute code support
    builtins also well

    View Slide

  66. Tool Support
    36

    View Slide

  67. Tool Support
    36
    Static analysis tools and special-
    purpose compilers have
    limited/incorrect support

    View Slide

  68. Example: Bugs in CompCert
    https://github.com/AbsInt/CompCert/issues/243
    __builtin_clz for long and
    long long incorrectly
    assumed a 32-bit integer
    #include
    int main() {
    volatile unsigned long value = -1;
    assert(__builtin_clzl(value) == 0);
    value = (unsigned long)-1 >> 3;
    assert(__builtin_clzl(value) == 3);
    value = (long)((unsigned long)-1 >> 5) - 4;
    assert(__builtin_clzl(value) == 5);
    return 0;
    }
    a.out: test.c:7: int main(): Assertion
    `__builtin_clzl(value) == 3' failed.
    Aborted
    37

    View Slide

  69. Feedback
    38

    View Slide

  70. Feedback
    38

    View Slide

  71. Feedback
    38

    View Slide

  72. Feedback
    38

    View Slide

  73. Feedback
    38

    View Slide

  74. Feedback
    38

    View Slide

  75. How Many Builtins Must be
    Implemented to Support Most
    Projects?
    39

    View Slide

  76. Greedy Implementation Strategy
    40

    View Slide

  77. Greedy Implementation Strategy
    40
    Greedily maximizes the number
    of supported projects

    View Slide

  78. Greedy Implementation Strategy
    ~30 builtins to support
    half of projects
    40

    View Slide

  79. Greedy Implementation Strategy
    # Builtin Supported Projects
    … … …
    24 __sync_fetch_and_sub 43.16%
    25 __sync_lock_release 44.47%
    26 __builtin_clzl 45.45%
    27 __builtin_choose_expr 46.23%
    28 __builtin_frame_address 47.14%
    29 __builtin_clzll 47.85%
    30 __builtin_ctzll 50.79%
    … … …
    41

    View Slide

  80. Greedy Implementation Strategy
    # Builtin Supported Projects
    … … …
    24 __sync_fetch_and_sub 43.16%
    25 __sync_lock_release 44.47%
    26 __builtin_clzl 45.45%
    27 __builtin_choose_expr 46.23%
    28 __builtin_frame_address 47.14%
    29 __builtin_clzll 47.85%
    30 __builtin_ctzll 50.79%
    … … …
    Machine-independent builtins
    are the “low-hanging fruits” to
    implement
    41

    View Slide

  81. Greedy Implementation Strategy
    600 builtins to support 90% of projects
    42

    View Slide

  82. Greedy Implementation Strategy
    # Builtin Supported
    Projects
    … … …
    609 vreinterpret_u32_f32 89.37%
    610 vld1_dup_s32 89.59%
    611 vst1_s32 89.65%
    612 vshrn_n_u32 89.65%
    613 vshrq_n_u64 89.65%
    … … …
    43

    View Slide

  83. Greedy Implementation Strategy
    # Builtin Supported
    Projects
    … … …
    609 vreinterpret_u32_f32 89.37%
    610 vld1_dup_s32 89.59%
    611 vst1_s32 89.65%
    612 vshrn_n_u32 89.65%
    613 vshrq_n_u64 89.65%
    … … …
    Machine-specific builtins
    represent the “long tail of the
    distribution”
    43

    View Slide

  84. Summary and Discussion
    GCC builtins are a challenge
    for tool developers
    37% of projects use GCC builtins
    (mostly machine-independent ones)
    Many tools lack support for
    GCC builtins
    Exponential number of builtins to
    support a specific number of projects
    @RiggerManuel
    https://github.com/jku-ssw/gcc-builtin-study

    View Slide

  85. Summary and Discussion
    GCC builtins are a challenge
    for tool developers
    37% of projects use GCC builtins
    (mostly machine-independent ones)
    Many tools lack support for
    GCC builtins
    Exponential number of builtins to
    support a specific number of projects
    @RiggerManuel
    https://github.com/jku-ssw/gcc-builtin-study

    View Slide

  86. Summary and Discussion
    GCC builtins are a challenge
    for tool developers
    37% of projects use GCC builtins
    (mostly machine-independent ones)
    Many tools lack support for
    GCC builtins
    Exponential number of builtins to
    support a specific number of projects
    @RiggerManuel
    https://github.com/jku-ssw/gcc-builtin-study

    View Slide

  87. Summary and Discussion
    GCC builtins are a challenge
    for tool developers
    37% of projects use GCC builtins
    (mostly machine-independent ones)
    Many tools lack support for
    GCC builtins
    Exponential number of builtins to
    support a specific number of projects
    @RiggerManuel
    https://github.com/jku-ssw/gcc-builtin-study

    View Slide

  88. Summary and Discussion
    GCC builtins are a challenge
    for tool developers
    37% of projects use GCC builtins
    (mostly machine-independent ones)
    Many tools lack support for
    GCC builtins
    Exponential number of builtins to
    support a specific number of projects
    @RiggerManuel
    https://github.com/jku-ssw/gcc-builtin-study

    View Slide

  89. Summary and Discussion
    GCC builtins are a challenge
    for tool developers
    37% of projects use GCC builtins
    (mostly machine-independent ones)
    Many tools lack support for
    GCC builtins
    Exponential number of builtins to
    support a specific number of projects
    @RiggerManuel
    https://github.com/jku-ssw/gcc-builtin-study

    View Slide

  90. View Slide