VEE'18: An Analysis of x86-64 Inline Assembly in C Programs

VEE'18: An Analysis of x86-64 Inline Assembly in C Programs

Talk at the 14th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments (VEE’18) https://conf.researchr.org/home/vee-2018

389c8e3d83119ec458c5c57e8d92da2a?s=128

Manuel Rigger

March 25, 2018
Tweet

Transcript

  1. An Analysis of x86-64 Inline Assembly in C Programs Manuel

    Rigger1, Stefan Marr2, Stephen Kell3, David Leopoldseder1, Hanspeter Mössenböck1 VEE, 25 March 2018 1 Johannes Kepler University Linz, Austria 2 University of Kent, UK 3 University of Cambridge, UK Partly funded by
  2. 2 C projects consist of more than C code

  3. 2 asm("rdtsc":"=a"(tickl),"=d"(tickh)); Inline Assembly C projects consist of more than

    C code
  4. 2 asm("rdtsc":"=a"(tickl),"=d"(tickh)); #pragma GCC poison printf Inline Assembly C projects

    consist of more than C code Compiler pragmas
  5. 2 if (__builtin_expect(x, 0)) foo (); asm("rdtsc":"=a"(tickl),"=d"(tickh)); #pragma GCC poison

    printf Inline Assembly C projects consist of more than C code Compiler builtins Compiler pragmas
  6. 2 if (__builtin_expect(x, 0)) foo (); asm("rdtsc":"=a"(tickl),"=d"(tickh)); #pragma GCC poison

    printf Inline Assembly C projects consist of more than C code Compiler builtins Compiler pragmas #define getmax(a,b) ((a)>(b)?(a):(b)) Preprocessor macros
  7. 2 if (__builtin_expect(x, 0)) foo (); asm("rdtsc":"=a"(tickl),"=d"(tickh)); #pragma GCC poison

    printf Inline Assembly C projects consist of more than C code Compiler builtins Compiler pragmas #define getmax(a,b) ((a)>(b)?(a):(b)) Preprocessor macros void fatal() __attribute__ ((noreturn)); Attributes
  8. 2 asm("rdtsc":"=a"(tickl),"=d"(tickh)); Inline Assembly C projects consist of more than

    C code
  9. 3 asm("rdtsc":"=a"(tickl),"=d"(tickh)); Inline Assembly C projects consist of more than

    C code • Dependent on the compiler and machine • Usage not widely understood • What should tools do with it?
  10. An Analysis of x86-64 Inline Assembly in C Programs 4

  11. Inline Assembly in C Projects 5 uint64_t clock_cycles() { unsigned

    int tickl, tickh; asm("rdtsc":"=a"(tickl),"=d"(tickh)); return ((uint64_t)tickh << 32)|tickl; }
  12. Inline Assembly in C Projects 6 uint64_t clock_cycles() { unsigned

    int tickl, tickh; asm("rdtsc":"=a"(tickl),"=d"(tickh)); return ((uint64_t)tickh << 32)|tickl; }
  13. Inline Assembly in C Projects 7 uint64_t clock_cycles() { unsigned

    int tickl, tickh; asm("rdtsc":"=a"(tickl),"=d"(tickh)); return ((uint64_t)tickh << 32)|tickl; } Instructions
  14. Inline Assembly in C Projects 8 uint64_t clock_cycles() { unsigned

    int tickl, tickh; asm("rdtsc":"=a"(tickl),"=d"(tickh)); return ((uint64_t)tickh << 32)|tickl; } Output operands
  15. Inline Assembly in C Projects 9 uint64_t clock_cycles() { unsigned

    int tickl, tickh; asm("rdtsc":"=a"(tickl),"=d"(tickh)); return ((uint64_t)tickh << 32)|tickl; } Output operand constraints
  16. Inline Assembly in C Projects 10 uint64_t clock_cycles() { unsigned

    int tickl, tickh; asm("rdtsc":"=a"(tickl),"=d"(tickh)); return ((uint64_t)tickh << 32)|tickl; } Input operands, side effects ,…
  17. uint64_t clock_cycles() { unsigned int tickl, tickh; asm("rdtsc":"=a"(tickl),"=d"(tickh)); return ((uint64_t)tickh

    << 32)|tickl; } Inline Assembly in C Projects 11 clock_cycles(): rdtsc shl rdx, 32 mov eax, eax ret
  18. uint64_t clock_cycles() { unsigned int tickl, tickh; asm("rdtsc":"=a"(tickl),"=d"(tickh)); return ((uint64_t)tickh

    << 32)|tickl; } Inline Assembly in C Projects 11 What about C tools that could not use an assembler to defer the work? clock_cycles(): rdtsc shl rdx, 32 mov eax, eax ret
  19. Sulong 12

  20. Sulong 13 public abstract static class LLVMAMD64RdtscReadNode extends LLVMExpressionNode {

    public long executeRdtsc() { return System.nanoTime(); } }
  21. Sulong 13 public abstract static class LLVMAMD64RdtscReadNode extends LLVMExpressionNode {

    public long executeRdtsc() { return System.nanoTime(); } } Emulate the behavior of assembly
  22. Splint 14 Splint 3.1.2 --- 03 May 2009 test.c: (in

    function rdtsc) test.c:5:3: Unrecognized identifier: asm Identifier used in code has not been declared. (Use –unrecog to inhibit warning) test.c:5:15: Parse Error. (For help on parse errors, see splint -help parseerrors.) *** Cannot continue.
  23. Splint 14 Splint 3.1.2 --- 03 May 2009 test.c: (in

    function rdtsc) test.c:5:3: Unrecognized identifier: asm Identifier used in code has not been declared. (Use –unrecog to inhibit warning) test.c:5:15: Parse Error. (For help on parse errors, see splint -help parseerrors.) *** Cannot continue. Many analysis tools ignore inline assembly
  24. Splint 15 Many analysis tools ignore inline assembly uint64_t clock_cycles()

    { unsigned int tickl, tickh; asm("rdtsc":"=a"(tickl),"=d"(tickh)); return ((uint64_t)tickh << 32)|tickl; }
  25. Splint 15 Many analysis tools ignore inline assembly uint64_t clock_cycles()

    { unsigned int tickl, tickh; asm("rdtsc":"=a"(tickl),"=d"(tickh)); return ((uint64_t)tickh << 32)|tickl; } But could approximate it by analyzing side effects
  26. Goal: characterize inline assembly to support tool developers that want

    to support it 16
  27. Methodology • Repository mining approach • Analyzed 1264 GitHub C

    projects • Qualitative and quantitative analysis • Created a database of inline assembly 17
  28. Methodology • Repository mining approach • Analyzed 1264 GitHub C

    projects • Qualitative and quantitative analysis • Created a database of inline assembly 17 Available at https://github.com/jku-ssw/inline-assembly
  29. Methodology • Filtered non-application-level projects • Two selection strategies to

    obtain a diverse set • 327 popular projects • >850 GitHub stars • 937 keyword-search projects • E.g., bitcoin, web server, parser • Grep for “asm” and extraction of the fragments 18
  30. Research Questions 19

  31. Research Questions • RQ1: How frequent is inline assembly? 19

  32. Research Questions • RQ1: How frequent is inline assembly? •

    RQ2: How does the average inline assembly look like? 19
  33. Research Questions • RQ1: How frequent is inline assembly? •

    RQ2: How does the average inline assembly look like? • RQ3: In which domains is inline assembly used? 19
  34. Research Questions • RQ1: How frequent is inline assembly? •

    RQ2: How does the average inline assembly look like? • RQ3: In which domains is inline assembly used? • RQ4: What is inline assembly used for? 19
  35. Research Questions • RQ1: How frequent is inline assembly? •

    RQ2: How does the average inline assembly look like? • RQ3: In which domains is inline assembly used? • RQ4: What is inline assembly used for? • RQ5: Do projects use the same subset of instructions? 19
  36. Analysis 20 1026 fragments 607 fragments unique per project 197

    unique fragments
  37. Analysis 20 1026 fragments 607 fragments unique per project 197

    unique fragments We only considered instructions without size prefixes
  38. Analysis 21 197 projects with assembly 163 analyzed projects with

    assembly
  39. • Other projects: manual analysis was infeasible • Macro-metaprogramming and/or

    large number of inline assembly fragments • Several SIMD instruction set extensions Analysis 21 197 projects with assembly 163 analyzed projects with assembly
  40. RQ1: How frequent is inline assembly? 22

  41. RQ1: How frequent is it? 11% 28% 16% 0 5

    10 15 20 25 30 % of projects with inline assembly Keyword projects Popular projects All projects 23
  42. RQ1: How frequent is it? 13 69 0 10 20

    30 40 50 60 70 80 Average project size in KLOC Keyword projects Popular projects 24
  43. RQ1: How frequent is it? 31 50 40 0 10

    20 30 40 50 60 Average inline assembly density in KLOC Keyword projects Popular projects All projects 25
  44. RQ1: How frequent is it? 26 0 10 20 30

    40 50 60 70 80 90 100 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 Cumulative percentage Number of unique fragments per project
  45. RQ1: How frequent is it? 27 0 10 20 30

    40 50 60 70 80 90 100 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 Cumulative percentage Number of unique fragments per project A number of projects only use a single inline assembly fragment 36%
  46. RQ1: How frequent is it? 28 0 10 20 30

    40 50 60 70 80 90 100 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 Cumulative percentage Number of unique fragments per project Almost all projects use less than 25 inline assembly fragments 99%
  47. RQ2: How does the average inline assembly look like? 29

    • Number of instructions? • File duplication?
  48. RQ2: How does the average fragment look like? 0 10

    20 30 40 50 60 70 80 90 100 1 2 3 4 5 6 7 8 9 10 11 12 Cumulative percentage Number of instructions per unique fragment 30
  49. RQ2: How does the average fragment look like? 0 10

    20 30 40 50 60 70 80 90 100 1 2 3 4 5 6 7 8 9 10 11 12 Cumulative percentage Number of instructions per unique fragment 31 Fragments typically consist of a single instruction 64%
  50. RQ2: How does the average fragment look like? 0 10

    20 30 40 50 60 70 80 90 100 1 2 3 4 5 6 7 8 9 10 11 12 Cumulative percentage Number of instructions per unique fragment 32 Fragments rarely exceeded 12 instructions 90%
  51. RQ2: How does the average fragment look like? 0 10

    20 30 40 50 60 70 80 90 100 1 2 3 4 5 6 7 8 9 10 11 12 Cumulative percentage Number of instructions per unique fragment 33 We also found fragments with several hundred instructions 100% 438 …
  52. RQ2: How does the average fragment look like? 34 Duplicate

    file example # projects sqlite3.c 10 SDL_endian.h 4 inffas86.c 4 Inline assembly fragments are often included by importing third party code
  53. SDL_endian.h 35 Duplicate file example # projects sqlite3.c 10 SDL_endian.h

    4 inffas86.c 4 uint64 sqlite3Hwtime(void){ unsigned long val; __asm__ __volatile__ ("rdtsc" : "=A" (val)); return val; }
  54. sqlite3.c 36 Duplicate file example # projects sqlite3.c 10 SDL_endian.h

    4 inffas86.c 4 Uint16 SDL_Swap16(Uint16 x) { __asm__("xchgb %b0,%h0": "=Q"(x):"0"(x)); return x; }
  55. inffas86.c 37 __asm__ __volatile__ ( " leaq %0, %%rax\n" "

    movq %%rbp, 8(%%rax)\n" /* save regs rbp and rsp */ " movq %%rsp, (%%rax)\n" " movq %%rax, %%rsp\n" /* make rsp point to &ar */ " movq 16(%%rsp), %%rsi\n" /* rsi = in */ " movq 32(%%rsp), %%rdi\n" /* rdi = out */ " movq 24(%%rsp), %%r9\n" /* r9 = last */ " movq 48(%%rsp), %%r10\n" /* r10 = end */ " movq 64(%%rsp), %%rbp\n" /* rbp = lcode */ " movq 72(%%rsp), %%r11\n" /* r11 = dcode */ " movq 80(%%rsp), %%rdx\n" /* rdx = hold */ " movl 88(%%rsp), %%ebx\n" /* ebx = bits */ " movl 100(%%rsp), %%r12d\n" /* r12d = lmask */ " movl 104(%%rsp), %%r13d\n" /* r13d = dmask */ /* r14d = len */ /* r15d = dist */ " cld\n" " cmpq %%rdi, %%r10\n" " je .L_one_time\n" /* if only one decode left " cmpq %%rsi, %%r9\n" " je .L_one_time\n" " jmp .L_do_loop\n" ".L_one_time:\n" " movq %%r12, %%r8\n" /* r8 = lmask */ " cmpb $32, %%bl\n" " ja .L_get_length_code_one_time\n" " lodsl\n" /* eax = *(uint *)in++ */ " movb %%bl, %%cl\n" /* cl = bits, needs it for " addb $32, %%bl\n" /* bits += 32 */ " shlq %%cl, %%rax\n" " orq %%rax, %%rdx\n" /* hold |= *((uint *)in)++ " jmp .L_get_length_code_one_time\n" Duplicate file example # projects sqlite3.c 10 SDL_endian.h 4 inffas86.c 4
  56. RQ3: In which domains is inline assembly used? 38

  57. RQ3: In which domains is it used? 39 Domain #

    projects % projects Crypto 23 11.7% Networking 20 10.2% Media 17 8.6% Database 16 8.1% Language implementation 15 7.6% Misc 13 6.6% Concurrency 9 4.6% SSL 8 4.1% Text processing 8 4.1% Math library 7 3.6% Web server 7 3.6% The domains of inline assembly are diverse
  58. RQ4: What is inline assembly used for? 40

  59. RQ4: What is it used for? • Instruction order •

    Performance optimization • Functionality not available in C • Supporting instructions 41
  60. RQ4: What is it used for? 42 Instruction order Compiler

    barriers Atomics Memory barriers
  61. RQ4: What is it used for? 43 Performance optimization SIMD

    Bitscans Endianness conversion
  62. RQ4: What is it used for? 44 Functionality unavailable in

    C CPU feature detection Data prefetching Clock cycles
  63. RQ4: What is it used for? 45 Supporting instructions Moves

    Control flow Arithmetics
  64. RQ5: Do projects use the same subset of instructions? 46

  65. RQ5: Do projects use the same subset? 47 • How

    many projects can be supported by implementing 5% of x86-64’s ~1000 instructions? • At least 64% of projects (including the large-fragment ones) 0 10 20 30 40 50 60 70 80 90 2 4 13 22 28 31 32 36 46 47 49 50 % of supported projects Number of implemented instructions 77.9%
  66. RQ5: Do projects use the same subset? 48 Instructions In

    % of projects rdtsc 27.4% cpuid 25.4% mov 24.9% <compiler barrier> 21.8% lock xchg 14.2% … …
  67. Declarative Inline Assembly 49

  68. Specifying assembler names 50 AES_ECB_encrypt(...) asm("AES_ECB_encrypt");

  69. Linker warnings 51 asm(".section .gnu.warning.gets; .ascii \"Please do not use

    gets!\"; .text");
  70. Symbol versioning 52 asm(".symver memcpy,memcpy@GLIBC_2.2.5");

  71. Register variables 53 register unsigned char *pc asm("\%rsi");

  72. Non-mnemonic representations 54 __asm__ __volatile__(" rep; nop\n");

  73. Non-mnemonic representations 54 0xF3 0x90 pause __asm__ __volatile__(" rep; nop\n");

  74. Non-mnemonic representations 54 0xF3 0x90 pause Programmers sometimes have to

    work around old assemblers __asm__ __volatile__(" rep; nop\n");
  75. Non-mnemonic Representations 55 asm volatile(".byte 0x66; clflush %0":"+m"(addr));

  76. Non-mnemonic Representations 55 0x0FAE clflushopt asm volatile(".byte 0x66; clflush %0":"+m"(addr));

  77. Threats to Validity 56

  78. Selected Threats to Validity 57 Manual classification of x86-64 inline

    assembly  error prone, but double-checked #if defined(__GNUC__) && defined(__i386__) && \ !(__GNUC__ == 2 && __GNUC_MINOR__ == 95 ) Uint16 SDL_Swap16(Uint16 x) { __asm__("xchgb %b0,%h0": "=q"(x):"0"(x)); return x; } #elif defined(__GNUC__) && defined(__x86_64__) Uint16 SDL_Swap16(Uint16 x) { __asm__("xchgb %b0,%h0": "=Q"(x):"0"(x)); return x; }
  79. Selected Threats to Validity 58 #elif defined(__GNUC__) && (defined(__powerpc__) ||

    defined(__ppc__)) Uint16 SDL_Swap16(Uint16 x) { int result; __asm__("rlwimi %0,%2,8,16,23": "=&r"(result):"0"(x >> 8), "r"(x)); return (Uint16)result; } #elif defined(__GNUC__) && (defined(__M68000__) || defined(__M68020__)) && !defined(__mcoldfire__) Uint16 SDL_Swap16(Uint16 x) { __asm__("rorw #8,%0": "=d"(x): "0"(x):"cc"); return x; } Our results are not generalizable to other architectures
  80. Selected Threats to Validity 59 #else Uint16 SDL_Swap16(Uint16 x) {

    return SDL_static_cast(Uint16, ((x << 8) | (x >> 8))); } #endif Inline assembly often has C and/or GCC builtin fallbacks
  81. Future Work • Improved tool support • Tools that analyze

    the correctness of inline assembly • Compiler testing • Programming language improvements • Study other unstandardized non-C elements 60
  82. Ongoing work: GCC builtins 61

  83. GCC builtins: percentage of projects 62 28% 37% 0 10

    20 30 40 % of projects Popular projects with inline assembly (Popular) projects with GCC builtins GCC builtins are used in almost every second (popular) project
  84. GCC builtins: density (occurrence per KLOC) 63 50 6 0

    10 20 30 40 50 Density (occurrence per KLOC) Popular projects with inline assembly (Popular) projects with GCC builtins
  85. GCC builtins 64 Builtins In % of projects __builtin_expect 48.2%

    __builtin_clz 29.3% __builtin_bswap32 26.2% __builtin_constant_p 23.3% __builtin_alloca 20.3% … … Similar as for inline assembly, but also to interact with the compiler
  86. Summary 65 28% of popular C GitHub projects contain it

    Few fragments per project; typically a single instruction @RiggerManuel @smarr @stephenrkell @davleopo It is used in diverse domains There are four different usage categories Projects rely on a common subset