Upgrade to Pro — share decks privately, control downloads, hide ads and more …

VEE'18: An Analysis of x86-64 Inline Assembly i...

VEE'18: An Analysis of x86-64 Inline Assembly in C Programs

Talk at the 14th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments (VEE’18) https://conf.researchr.org/home/vee-2018

Manuel Rigger

March 25, 2018
Tweet

More Decks by Manuel Rigger

Other Decks in Research

Transcript

  1. An Analysis of x86-64 Inline Assembly in C Programs Manuel

    Rigger1, Stefan Marr2, Stephen Kell3, David Leopoldseder1, Hanspeter Mössenböck1 VEE, 25 March 2018 1 Johannes Kepler University Linz, Austria 2 University of Kent, UK 3 University of Cambridge, UK Partly funded by
  2. 2 if (__builtin_expect(x, 0)) foo (); asm("rdtsc":"=a"(tickl),"=d"(tickh)); #pragma GCC poison

    printf Inline Assembly C projects consist of more than C code Compiler builtins Compiler pragmas
  3. 2 if (__builtin_expect(x, 0)) foo (); asm("rdtsc":"=a"(tickl),"=d"(tickh)); #pragma GCC poison

    printf Inline Assembly C projects consist of more than C code Compiler builtins Compiler pragmas #define getmax(a,b) ((a)>(b)?(a):(b)) Preprocessor macros
  4. 2 if (__builtin_expect(x, 0)) foo (); asm("rdtsc":"=a"(tickl),"=d"(tickh)); #pragma GCC poison

    printf Inline Assembly C projects consist of more than C code Compiler builtins Compiler pragmas #define getmax(a,b) ((a)>(b)?(a):(b)) Preprocessor macros void fatal() __attribute__ ((noreturn)); Attributes
  5. 3 asm("rdtsc":"=a"(tickl),"=d"(tickh)); Inline Assembly C projects consist of more than

    C code • Dependent on the compiler and machine • Usage not widely understood • What should tools do with it?
  6. Inline Assembly in C Projects 5 uint64_t clock_cycles() { unsigned

    int tickl, tickh; asm("rdtsc":"=a"(tickl),"=d"(tickh)); return ((uint64_t)tickh << 32)|tickl; }
  7. Inline Assembly in C Projects 6 uint64_t clock_cycles() { unsigned

    int tickl, tickh; asm("rdtsc":"=a"(tickl),"=d"(tickh)); return ((uint64_t)tickh << 32)|tickl; }
  8. Inline Assembly in C Projects 7 uint64_t clock_cycles() { unsigned

    int tickl, tickh; asm("rdtsc":"=a"(tickl),"=d"(tickh)); return ((uint64_t)tickh << 32)|tickl; } Instructions
  9. Inline Assembly in C Projects 8 uint64_t clock_cycles() { unsigned

    int tickl, tickh; asm("rdtsc":"=a"(tickl),"=d"(tickh)); return ((uint64_t)tickh << 32)|tickl; } Output operands
  10. Inline Assembly in C Projects 9 uint64_t clock_cycles() { unsigned

    int tickl, tickh; asm("rdtsc":"=a"(tickl),"=d"(tickh)); return ((uint64_t)tickh << 32)|tickl; } Output operand constraints
  11. Inline Assembly in C Projects 10 uint64_t clock_cycles() { unsigned

    int tickl, tickh; asm("rdtsc":"=a"(tickl),"=d"(tickh)); return ((uint64_t)tickh << 32)|tickl; } Input operands, side effects ,…
  12. uint64_t clock_cycles() { unsigned int tickl, tickh; asm("rdtsc":"=a"(tickl),"=d"(tickh)); return ((uint64_t)tickh

    << 32)|tickl; } Inline Assembly in C Projects 11 clock_cycles(): rdtsc shl rdx, 32 mov eax, eax ret
  13. uint64_t clock_cycles() { unsigned int tickl, tickh; asm("rdtsc":"=a"(tickl),"=d"(tickh)); return ((uint64_t)tickh

    << 32)|tickl; } Inline Assembly in C Projects 11 What about C tools that could not use an assembler to defer the work? clock_cycles(): rdtsc shl rdx, 32 mov eax, eax ret
  14. Sulong 13 public abstract static class LLVMAMD64RdtscReadNode extends LLVMExpressionNode {

    public long executeRdtsc() { return System.nanoTime(); } }
  15. Sulong 13 public abstract static class LLVMAMD64RdtscReadNode extends LLVMExpressionNode {

    public long executeRdtsc() { return System.nanoTime(); } } Emulate the behavior of assembly
  16. Splint 14 Splint 3.1.2 --- 03 May 2009 test.c: (in

    function rdtsc) test.c:5:3: Unrecognized identifier: asm Identifier used in code has not been declared. (Use –unrecog to inhibit warning) test.c:5:15: Parse Error. (For help on parse errors, see splint -help parseerrors.) *** Cannot continue.
  17. Splint 14 Splint 3.1.2 --- 03 May 2009 test.c: (in

    function rdtsc) test.c:5:3: Unrecognized identifier: asm Identifier used in code has not been declared. (Use –unrecog to inhibit warning) test.c:5:15: Parse Error. (For help on parse errors, see splint -help parseerrors.) *** Cannot continue. Many analysis tools ignore inline assembly
  18. Splint 15 Many analysis tools ignore inline assembly uint64_t clock_cycles()

    { unsigned int tickl, tickh; asm("rdtsc":"=a"(tickl),"=d"(tickh)); return ((uint64_t)tickh << 32)|tickl; }
  19. Splint 15 Many analysis tools ignore inline assembly uint64_t clock_cycles()

    { unsigned int tickl, tickh; asm("rdtsc":"=a"(tickl),"=d"(tickh)); return ((uint64_t)tickh << 32)|tickl; } But could approximate it by analyzing side effects
  20. Methodology • Repository mining approach • Analyzed 1264 GitHub C

    projects • Qualitative and quantitative analysis • Created a database of inline assembly 17
  21. Methodology • Repository mining approach • Analyzed 1264 GitHub C

    projects • Qualitative and quantitative analysis • Created a database of inline assembly 17 Available at https://github.com/jku-ssw/inline-assembly
  22. Methodology • Filtered non-application-level projects • Two selection strategies to

    obtain a diverse set • 327 popular projects • >850 GitHub stars • 937 keyword-search projects • E.g., bitcoin, web server, parser • Grep for “asm” and extraction of the fragments 18
  23. Research Questions • RQ1: How frequent is inline assembly? •

    RQ2: How does the average inline assembly look like? 19
  24. Research Questions • RQ1: How frequent is inline assembly? •

    RQ2: How does the average inline assembly look like? • RQ3: In which domains is inline assembly used? 19
  25. Research Questions • RQ1: How frequent is inline assembly? •

    RQ2: How does the average inline assembly look like? • RQ3: In which domains is inline assembly used? • RQ4: What is inline assembly used for? 19
  26. Research Questions • RQ1: How frequent is inline assembly? •

    RQ2: How does the average inline assembly look like? • RQ3: In which domains is inline assembly used? • RQ4: What is inline assembly used for? • RQ5: Do projects use the same subset of instructions? 19
  27. Analysis 20 1026 fragments 607 fragments unique per project 197

    unique fragments We only considered instructions without size prefixes
  28. • Other projects: manual analysis was infeasible • Macro-metaprogramming and/or

    large number of inline assembly fragments • Several SIMD instruction set extensions Analysis 21 197 projects with assembly 163 analyzed projects with assembly
  29. RQ1: How frequent is it? 11% 28% 16% 0 5

    10 15 20 25 30 % of projects with inline assembly Keyword projects Popular projects All projects 23
  30. RQ1: How frequent is it? 13 69 0 10 20

    30 40 50 60 70 80 Average project size in KLOC Keyword projects Popular projects 24
  31. RQ1: How frequent is it? 31 50 40 0 10

    20 30 40 50 60 Average inline assembly density in KLOC Keyword projects Popular projects All projects 25
  32. RQ1: How frequent is it? 26 0 10 20 30

    40 50 60 70 80 90 100 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 Cumulative percentage Number of unique fragments per project
  33. RQ1: How frequent is it? 27 0 10 20 30

    40 50 60 70 80 90 100 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 Cumulative percentage Number of unique fragments per project A number of projects only use a single inline assembly fragment 36%
  34. RQ1: How frequent is it? 28 0 10 20 30

    40 50 60 70 80 90 100 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 Cumulative percentage Number of unique fragments per project Almost all projects use less than 25 inline assembly fragments 99%
  35. RQ2: How does the average inline assembly look like? 29

    • Number of instructions? • File duplication?
  36. RQ2: How does the average fragment look like? 0 10

    20 30 40 50 60 70 80 90 100 1 2 3 4 5 6 7 8 9 10 11 12 Cumulative percentage Number of instructions per unique fragment 30
  37. RQ2: How does the average fragment look like? 0 10

    20 30 40 50 60 70 80 90 100 1 2 3 4 5 6 7 8 9 10 11 12 Cumulative percentage Number of instructions per unique fragment 31 Fragments typically consist of a single instruction 64%
  38. RQ2: How does the average fragment look like? 0 10

    20 30 40 50 60 70 80 90 100 1 2 3 4 5 6 7 8 9 10 11 12 Cumulative percentage Number of instructions per unique fragment 32 Fragments rarely exceeded 12 instructions 90%
  39. RQ2: How does the average fragment look like? 0 10

    20 30 40 50 60 70 80 90 100 1 2 3 4 5 6 7 8 9 10 11 12 Cumulative percentage Number of instructions per unique fragment 33 We also found fragments with several hundred instructions 100% 438 …
  40. RQ2: How does the average fragment look like? 34 Duplicate

    file example # projects sqlite3.c 10 SDL_endian.h 4 inffas86.c 4 Inline assembly fragments are often included by importing third party code
  41. SDL_endian.h 35 Duplicate file example # projects sqlite3.c 10 SDL_endian.h

    4 inffas86.c 4 uint64 sqlite3Hwtime(void){ unsigned long val; __asm__ __volatile__ ("rdtsc" : "=A" (val)); return val; }
  42. sqlite3.c 36 Duplicate file example # projects sqlite3.c 10 SDL_endian.h

    4 inffas86.c 4 Uint16 SDL_Swap16(Uint16 x) { __asm__("xchgb %b0,%h0": "=Q"(x):"0"(x)); return x; }
  43. inffas86.c 37 __asm__ __volatile__ ( " leaq %0, %%rax\n" "

    movq %%rbp, 8(%%rax)\n" /* save regs rbp and rsp */ " movq %%rsp, (%%rax)\n" " movq %%rax, %%rsp\n" /* make rsp point to &ar */ " movq 16(%%rsp), %%rsi\n" /* rsi = in */ " movq 32(%%rsp), %%rdi\n" /* rdi = out */ " movq 24(%%rsp), %%r9\n" /* r9 = last */ " movq 48(%%rsp), %%r10\n" /* r10 = end */ " movq 64(%%rsp), %%rbp\n" /* rbp = lcode */ " movq 72(%%rsp), %%r11\n" /* r11 = dcode */ " movq 80(%%rsp), %%rdx\n" /* rdx = hold */ " movl 88(%%rsp), %%ebx\n" /* ebx = bits */ " movl 100(%%rsp), %%r12d\n" /* r12d = lmask */ " movl 104(%%rsp), %%r13d\n" /* r13d = dmask */ /* r14d = len */ /* r15d = dist */ " cld\n" " cmpq %%rdi, %%r10\n" " je .L_one_time\n" /* if only one decode left " cmpq %%rsi, %%r9\n" " je .L_one_time\n" " jmp .L_do_loop\n" ".L_one_time:\n" " movq %%r12, %%r8\n" /* r8 = lmask */ " cmpb $32, %%bl\n" " ja .L_get_length_code_one_time\n" " lodsl\n" /* eax = *(uint *)in++ */ " movb %%bl, %%cl\n" /* cl = bits, needs it for " addb $32, %%bl\n" /* bits += 32 */ " shlq %%cl, %%rax\n" " orq %%rax, %%rdx\n" /* hold |= *((uint *)in)++ " jmp .L_get_length_code_one_time\n" Duplicate file example # projects sqlite3.c 10 SDL_endian.h 4 inffas86.c 4
  44. RQ3: In which domains is it used? 39 Domain #

    projects % projects Crypto 23 11.7% Networking 20 10.2% Media 17 8.6% Database 16 8.1% Language implementation 15 7.6% Misc 13 6.6% Concurrency 9 4.6% SSL 8 4.1% Text processing 8 4.1% Math library 7 3.6% Web server 7 3.6% The domains of inline assembly are diverse
  45. RQ4: What is it used for? • Instruction order •

    Performance optimization • Functionality not available in C • Supporting instructions 41
  46. RQ4: What is it used for? 44 Functionality unavailable in

    C CPU feature detection Data prefetching Clock cycles
  47. RQ5: Do projects use the same subset? 47 • How

    many projects can be supported by implementing 5% of x86-64’s ~1000 instructions? • At least 64% of projects (including the large-fragment ones) 0 10 20 30 40 50 60 70 80 90 2 4 13 22 28 31 32 36 46 47 49 50 % of supported projects Number of implemented instructions 77.9%
  48. RQ5: Do projects use the same subset? 48 Instructions In

    % of projects rdtsc 27.4% cpuid 25.4% mov 24.9% <compiler barrier> 21.8% lock xchg 14.2% … …
  49. Non-mnemonic representations 54 0xF3 0x90 pause Programmers sometimes have to

    work around old assemblers __asm__ __volatile__(" rep; nop\n");
  50. Selected Threats to Validity 57 Manual classification of x86-64 inline

    assembly  error prone, but double-checked #if defined(__GNUC__) && defined(__i386__) && \ !(__GNUC__ == 2 && __GNUC_MINOR__ == 95 ) Uint16 SDL_Swap16(Uint16 x) { __asm__("xchgb %b0,%h0": "=q"(x):"0"(x)); return x; } #elif defined(__GNUC__) && defined(__x86_64__) Uint16 SDL_Swap16(Uint16 x) { __asm__("xchgb %b0,%h0": "=Q"(x):"0"(x)); return x; }
  51. Selected Threats to Validity 58 #elif defined(__GNUC__) && (defined(__powerpc__) ||

    defined(__ppc__)) Uint16 SDL_Swap16(Uint16 x) { int result; __asm__("rlwimi %0,%2,8,16,23": "=&r"(result):"0"(x >> 8), "r"(x)); return (Uint16)result; } #elif defined(__GNUC__) && (defined(__M68000__) || defined(__M68020__)) && !defined(__mcoldfire__) Uint16 SDL_Swap16(Uint16 x) { __asm__("rorw #8,%0": "=d"(x): "0"(x):"cc"); return x; } Our results are not generalizable to other architectures
  52. Selected Threats to Validity 59 #else Uint16 SDL_Swap16(Uint16 x) {

    return SDL_static_cast(Uint16, ((x << 8) | (x >> 8))); } #endif Inline assembly often has C and/or GCC builtin fallbacks
  53. Future Work • Improved tool support • Tools that analyze

    the correctness of inline assembly • Compiler testing • Programming language improvements • Study other unstandardized non-C elements 60
  54. GCC builtins: percentage of projects 62 28% 37% 0 10

    20 30 40 % of projects Popular projects with inline assembly (Popular) projects with GCC builtins GCC builtins are used in almost every second (popular) project
  55. GCC builtins: density (occurrence per KLOC) 63 50 6 0

    10 20 30 40 50 Density (occurrence per KLOC) Popular projects with inline assembly (Popular) projects with GCC builtins
  56. GCC builtins 64 Builtins In % of projects __builtin_expect 48.2%

    __builtin_clz 29.3% __builtin_bswap32 26.2% __builtin_constant_p 23.3% __builtin_alloca 20.3% … … Similar as for inline assembly, but also to interact with the compiler
  57. Summary 65 28% of popular C GitHub projects contain it

    Few fragments per project; typically a single instruction @RiggerManuel @smarr @stephenrkell @davleopo It is used in diverse domains There are four different usage categories Projects rely on a common subset