Slide 1

Slide 1 text

An Analysis of x86-64 Inline Assembly in C Programs Manuel Rigger1, Stefan Marr2, Stephen Kell3, David Leopoldseder1, Hanspeter Mössenböck1 VEE, 25 March 2018 1 Johannes Kepler University Linz, Austria 2 University of Kent, UK 3 University of Cambridge, UK Partly funded by

Slide 2

Slide 2 text

2 C projects consist of more than C code

Slide 3

Slide 3 text

2 asm("rdtsc":"=a"(tickl),"=d"(tickh)); Inline Assembly C projects consist of more than C code

Slide 4

Slide 4 text

2 asm("rdtsc":"=a"(tickl),"=d"(tickh)); #pragma GCC poison printf Inline Assembly C projects consist of more than C code Compiler pragmas

Slide 5

Slide 5 text

2 if (__builtin_expect(x, 0)) foo (); asm("rdtsc":"=a"(tickl),"=d"(tickh)); #pragma GCC poison printf Inline Assembly C projects consist of more than C code Compiler builtins Compiler pragmas

Slide 6

Slide 6 text

2 if (__builtin_expect(x, 0)) foo (); asm("rdtsc":"=a"(tickl),"=d"(tickh)); #pragma GCC poison printf Inline Assembly C projects consist of more than C code Compiler builtins Compiler pragmas #define getmax(a,b) ((a)>(b)?(a):(b)) Preprocessor macros

Slide 7

Slide 7 text

2 if (__builtin_expect(x, 0)) foo (); asm("rdtsc":"=a"(tickl),"=d"(tickh)); #pragma GCC poison printf Inline Assembly C projects consist of more than C code Compiler builtins Compiler pragmas #define getmax(a,b) ((a)>(b)?(a):(b)) Preprocessor macros void fatal() __attribute__ ((noreturn)); Attributes

Slide 8

Slide 8 text

2 asm("rdtsc":"=a"(tickl),"=d"(tickh)); Inline Assembly C projects consist of more than C code

Slide 9

Slide 9 text

3 asm("rdtsc":"=a"(tickl),"=d"(tickh)); Inline Assembly C projects consist of more than C code • Dependent on the compiler and machine • Usage not widely understood • What should tools do with it?

Slide 10

Slide 10 text

An Analysis of x86-64 Inline Assembly in C Programs 4

Slide 11

Slide 11 text

Inline Assembly in C Projects 5 uint64_t clock_cycles() { unsigned int tickl, tickh; asm("rdtsc":"=a"(tickl),"=d"(tickh)); return ((uint64_t)tickh << 32)|tickl; }

Slide 12

Slide 12 text

Inline Assembly in C Projects 6 uint64_t clock_cycles() { unsigned int tickl, tickh; asm("rdtsc":"=a"(tickl),"=d"(tickh)); return ((uint64_t)tickh << 32)|tickl; }

Slide 13

Slide 13 text

Inline Assembly in C Projects 7 uint64_t clock_cycles() { unsigned int tickl, tickh; asm("rdtsc":"=a"(tickl),"=d"(tickh)); return ((uint64_t)tickh << 32)|tickl; } Instructions

Slide 14

Slide 14 text

Inline Assembly in C Projects 8 uint64_t clock_cycles() { unsigned int tickl, tickh; asm("rdtsc":"=a"(tickl),"=d"(tickh)); return ((uint64_t)tickh << 32)|tickl; } Output operands

Slide 15

Slide 15 text

Inline Assembly in C Projects 9 uint64_t clock_cycles() { unsigned int tickl, tickh; asm("rdtsc":"=a"(tickl),"=d"(tickh)); return ((uint64_t)tickh << 32)|tickl; } Output operand constraints

Slide 16

Slide 16 text

Inline Assembly in C Projects 10 uint64_t clock_cycles() { unsigned int tickl, tickh; asm("rdtsc":"=a"(tickl),"=d"(tickh)); return ((uint64_t)tickh << 32)|tickl; } Input operands, side effects ,…

Slide 17

Slide 17 text

uint64_t clock_cycles() { unsigned int tickl, tickh; asm("rdtsc":"=a"(tickl),"=d"(tickh)); return ((uint64_t)tickh << 32)|tickl; } Inline Assembly in C Projects 11 clock_cycles(): rdtsc shl rdx, 32 mov eax, eax ret

Slide 18

Slide 18 text

uint64_t clock_cycles() { unsigned int tickl, tickh; asm("rdtsc":"=a"(tickl),"=d"(tickh)); return ((uint64_t)tickh << 32)|tickl; } Inline Assembly in C Projects 11 What about C tools that could not use an assembler to defer the work? clock_cycles(): rdtsc shl rdx, 32 mov eax, eax ret

Slide 19

Slide 19 text

Sulong 12

Slide 20

Slide 20 text

Sulong 13 public abstract static class LLVMAMD64RdtscReadNode extends LLVMExpressionNode { public long executeRdtsc() { return System.nanoTime(); } }

Slide 21

Slide 21 text

Sulong 13 public abstract static class LLVMAMD64RdtscReadNode extends LLVMExpressionNode { public long executeRdtsc() { return System.nanoTime(); } } Emulate the behavior of assembly

Slide 22

Slide 22 text

Splint 14 Splint 3.1.2 --- 03 May 2009 test.c: (in function rdtsc) test.c:5:3: Unrecognized identifier: asm Identifier used in code has not been declared. (Use –unrecog to inhibit warning) test.c:5:15: Parse Error. (For help on parse errors, see splint -help parseerrors.) *** Cannot continue.

Slide 23

Slide 23 text

Splint 14 Splint 3.1.2 --- 03 May 2009 test.c: (in function rdtsc) test.c:5:3: Unrecognized identifier: asm Identifier used in code has not been declared. (Use –unrecog to inhibit warning) test.c:5:15: Parse Error. (For help on parse errors, see splint -help parseerrors.) *** Cannot continue. Many analysis tools ignore inline assembly

Slide 24

Slide 24 text

Splint 15 Many analysis tools ignore inline assembly uint64_t clock_cycles() { unsigned int tickl, tickh; asm("rdtsc":"=a"(tickl),"=d"(tickh)); return ((uint64_t)tickh << 32)|tickl; }

Slide 25

Slide 25 text

Splint 15 Many analysis tools ignore inline assembly uint64_t clock_cycles() { unsigned int tickl, tickh; asm("rdtsc":"=a"(tickl),"=d"(tickh)); return ((uint64_t)tickh << 32)|tickl; } But could approximate it by analyzing side effects

Slide 26

Slide 26 text

Goal: characterize inline assembly to support tool developers that want to support it 16

Slide 27

Slide 27 text

Methodology • Repository mining approach • Analyzed 1264 GitHub C projects • Qualitative and quantitative analysis • Created a database of inline assembly 17

Slide 28

Slide 28 text

Methodology • Repository mining approach • Analyzed 1264 GitHub C projects • Qualitative and quantitative analysis • Created a database of inline assembly 17 Available at https://github.com/jku-ssw/inline-assembly

Slide 29

Slide 29 text

Methodology • Filtered non-application-level projects • Two selection strategies to obtain a diverse set • 327 popular projects • >850 GitHub stars • 937 keyword-search projects • E.g., bitcoin, web server, parser • Grep for “asm” and extraction of the fragments 18

Slide 30

Slide 30 text

Research Questions 19

Slide 31

Slide 31 text

Research Questions • RQ1: How frequent is inline assembly? 19

Slide 32

Slide 32 text

Research Questions • RQ1: How frequent is inline assembly? • RQ2: How does the average inline assembly look like? 19

Slide 33

Slide 33 text

Research Questions • RQ1: How frequent is inline assembly? • RQ2: How does the average inline assembly look like? • RQ3: In which domains is inline assembly used? 19

Slide 34

Slide 34 text

Research Questions • RQ1: How frequent is inline assembly? • RQ2: How does the average inline assembly look like? • RQ3: In which domains is inline assembly used? • RQ4: What is inline assembly used for? 19

Slide 35

Slide 35 text

Research Questions • RQ1: How frequent is inline assembly? • RQ2: How does the average inline assembly look like? • RQ3: In which domains is inline assembly used? • RQ4: What is inline assembly used for? • RQ5: Do projects use the same subset of instructions? 19

Slide 36

Slide 36 text

Analysis 20 1026 fragments 607 fragments unique per project 197 unique fragments

Slide 37

Slide 37 text

Analysis 20 1026 fragments 607 fragments unique per project 197 unique fragments We only considered instructions without size prefixes

Slide 38

Slide 38 text

Analysis 21 197 projects with assembly 163 analyzed projects with assembly

Slide 39

Slide 39 text

• Other projects: manual analysis was infeasible • Macro-metaprogramming and/or large number of inline assembly fragments • Several SIMD instruction set extensions Analysis 21 197 projects with assembly 163 analyzed projects with assembly

Slide 40

Slide 40 text

RQ1: How frequent is inline assembly? 22

Slide 41

Slide 41 text

RQ1: How frequent is it? 11% 28% 16% 0 5 10 15 20 25 30 % of projects with inline assembly Keyword projects Popular projects All projects 23

Slide 42

Slide 42 text

RQ1: How frequent is it? 13 69 0 10 20 30 40 50 60 70 80 Average project size in KLOC Keyword projects Popular projects 24

Slide 43

Slide 43 text

RQ1: How frequent is it? 31 50 40 0 10 20 30 40 50 60 Average inline assembly density in KLOC Keyword projects Popular projects All projects 25

Slide 44

Slide 44 text

RQ1: How frequent is it? 26 0 10 20 30 40 50 60 70 80 90 100 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 Cumulative percentage Number of unique fragments per project

Slide 45

Slide 45 text

RQ1: How frequent is it? 27 0 10 20 30 40 50 60 70 80 90 100 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 Cumulative percentage Number of unique fragments per project A number of projects only use a single inline assembly fragment 36%

Slide 46

Slide 46 text

RQ1: How frequent is it? 28 0 10 20 30 40 50 60 70 80 90 100 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 Cumulative percentage Number of unique fragments per project Almost all projects use less than 25 inline assembly fragments 99%

Slide 47

Slide 47 text

RQ2: How does the average inline assembly look like? 29 • Number of instructions? • File duplication?

Slide 48

Slide 48 text

RQ2: How does the average fragment look like? 0 10 20 30 40 50 60 70 80 90 100 1 2 3 4 5 6 7 8 9 10 11 12 Cumulative percentage Number of instructions per unique fragment 30

Slide 49

Slide 49 text

RQ2: How does the average fragment look like? 0 10 20 30 40 50 60 70 80 90 100 1 2 3 4 5 6 7 8 9 10 11 12 Cumulative percentage Number of instructions per unique fragment 31 Fragments typically consist of a single instruction 64%

Slide 50

Slide 50 text

RQ2: How does the average fragment look like? 0 10 20 30 40 50 60 70 80 90 100 1 2 3 4 5 6 7 8 9 10 11 12 Cumulative percentage Number of instructions per unique fragment 32 Fragments rarely exceeded 12 instructions 90%

Slide 51

Slide 51 text

RQ2: How does the average fragment look like? 0 10 20 30 40 50 60 70 80 90 100 1 2 3 4 5 6 7 8 9 10 11 12 Cumulative percentage Number of instructions per unique fragment 33 We also found fragments with several hundred instructions 100% 438 …

Slide 52

Slide 52 text

RQ2: How does the average fragment look like? 34 Duplicate file example # projects sqlite3.c 10 SDL_endian.h 4 inffas86.c 4 Inline assembly fragments are often included by importing third party code

Slide 53

Slide 53 text

SDL_endian.h 35 Duplicate file example # projects sqlite3.c 10 SDL_endian.h 4 inffas86.c 4 uint64 sqlite3Hwtime(void){ unsigned long val; __asm__ __volatile__ ("rdtsc" : "=A" (val)); return val; }

Slide 54

Slide 54 text

sqlite3.c 36 Duplicate file example # projects sqlite3.c 10 SDL_endian.h 4 inffas86.c 4 Uint16 SDL_Swap16(Uint16 x) { __asm__("xchgb %b0,%h0": "=Q"(x):"0"(x)); return x; }

Slide 55

Slide 55 text

inffas86.c 37 __asm__ __volatile__ ( " leaq %0, %%rax\n" " movq %%rbp, 8(%%rax)\n" /* save regs rbp and rsp */ " movq %%rsp, (%%rax)\n" " movq %%rax, %%rsp\n" /* make rsp point to &ar */ " movq 16(%%rsp), %%rsi\n" /* rsi = in */ " movq 32(%%rsp), %%rdi\n" /* rdi = out */ " movq 24(%%rsp), %%r9\n" /* r9 = last */ " movq 48(%%rsp), %%r10\n" /* r10 = end */ " movq 64(%%rsp), %%rbp\n" /* rbp = lcode */ " movq 72(%%rsp), %%r11\n" /* r11 = dcode */ " movq 80(%%rsp), %%rdx\n" /* rdx = hold */ " movl 88(%%rsp), %%ebx\n" /* ebx = bits */ " movl 100(%%rsp), %%r12d\n" /* r12d = lmask */ " movl 104(%%rsp), %%r13d\n" /* r13d = dmask */ /* r14d = len */ /* r15d = dist */ " cld\n" " cmpq %%rdi, %%r10\n" " je .L_one_time\n" /* if only one decode left " cmpq %%rsi, %%r9\n" " je .L_one_time\n" " jmp .L_do_loop\n" ".L_one_time:\n" " movq %%r12, %%r8\n" /* r8 = lmask */ " cmpb $32, %%bl\n" " ja .L_get_length_code_one_time\n" " lodsl\n" /* eax = *(uint *)in++ */ " movb %%bl, %%cl\n" /* cl = bits, needs it for " addb $32, %%bl\n" /* bits += 32 */ " shlq %%cl, %%rax\n" " orq %%rax, %%rdx\n" /* hold |= *((uint *)in)++ " jmp .L_get_length_code_one_time\n" Duplicate file example # projects sqlite3.c 10 SDL_endian.h 4 inffas86.c 4

Slide 56

Slide 56 text

RQ3: In which domains is inline assembly used? 38

Slide 57

Slide 57 text

RQ3: In which domains is it used? 39 Domain # projects % projects Crypto 23 11.7% Networking 20 10.2% Media 17 8.6% Database 16 8.1% Language implementation 15 7.6% Misc 13 6.6% Concurrency 9 4.6% SSL 8 4.1% Text processing 8 4.1% Math library 7 3.6% Web server 7 3.6% The domains of inline assembly are diverse

Slide 58

Slide 58 text

RQ4: What is inline assembly used for? 40

Slide 59

Slide 59 text

RQ4: What is it used for? • Instruction order • Performance optimization • Functionality not available in C • Supporting instructions 41

Slide 60

Slide 60 text

RQ4: What is it used for? 42 Instruction order Compiler barriers Atomics Memory barriers

Slide 61

Slide 61 text

RQ4: What is it used for? 43 Performance optimization SIMD Bitscans Endianness conversion

Slide 62

Slide 62 text

RQ4: What is it used for? 44 Functionality unavailable in C CPU feature detection Data prefetching Clock cycles

Slide 63

Slide 63 text

RQ4: What is it used for? 45 Supporting instructions Moves Control flow Arithmetics

Slide 64

Slide 64 text

RQ5: Do projects use the same subset of instructions? 46

Slide 65

Slide 65 text

RQ5: Do projects use the same subset? 47 • How many projects can be supported by implementing 5% of x86-64’s ~1000 instructions? • At least 64% of projects (including the large-fragment ones) 0 10 20 30 40 50 60 70 80 90 2 4 13 22 28 31 32 36 46 47 49 50 % of supported projects Number of implemented instructions 77.9%

Slide 66

Slide 66 text

RQ5: Do projects use the same subset? 48 Instructions In % of projects rdtsc 27.4% cpuid 25.4% mov 24.9% 21.8% lock xchg 14.2% … …

Slide 67

Slide 67 text

Declarative Inline Assembly 49

Slide 68

Slide 68 text

Specifying assembler names 50 AES_ECB_encrypt(...) asm("AES_ECB_encrypt");

Slide 69

Slide 69 text

Linker warnings 51 asm(".section .gnu.warning.gets; .ascii \"Please do not use gets!\"; .text");

Slide 70

Slide 70 text

Symbol versioning 52 asm(".symver memcpy,memcpy@GLIBC_2.2.5");

Slide 71

Slide 71 text

Register variables 53 register unsigned char *pc asm("\%rsi");

Slide 72

Slide 72 text

Non-mnemonic representations 54 __asm__ __volatile__(" rep; nop\n");

Slide 73

Slide 73 text

Non-mnemonic representations 54 0xF3 0x90 pause __asm__ __volatile__(" rep; nop\n");

Slide 74

Slide 74 text

Non-mnemonic representations 54 0xF3 0x90 pause Programmers sometimes have to work around old assemblers __asm__ __volatile__(" rep; nop\n");

Slide 75

Slide 75 text

Non-mnemonic Representations 55 asm volatile(".byte 0x66; clflush %0":"+m"(addr));

Slide 76

Slide 76 text

Non-mnemonic Representations 55 0x0FAE clflushopt asm volatile(".byte 0x66; clflush %0":"+m"(addr));

Slide 77

Slide 77 text

Threats to Validity 56

Slide 78

Slide 78 text

Selected Threats to Validity 57 Manual classification of x86-64 inline assembly  error prone, but double-checked #if defined(__GNUC__) && defined(__i386__) && \ !(__GNUC__ == 2 && __GNUC_MINOR__ == 95 ) Uint16 SDL_Swap16(Uint16 x) { __asm__("xchgb %b0,%h0": "=q"(x):"0"(x)); return x; } #elif defined(__GNUC__) && defined(__x86_64__) Uint16 SDL_Swap16(Uint16 x) { __asm__("xchgb %b0,%h0": "=Q"(x):"0"(x)); return x; }

Slide 79

Slide 79 text

Selected Threats to Validity 58 #elif defined(__GNUC__) && (defined(__powerpc__) || defined(__ppc__)) Uint16 SDL_Swap16(Uint16 x) { int result; __asm__("rlwimi %0,%2,8,16,23": "=&r"(result):"0"(x >> 8), "r"(x)); return (Uint16)result; } #elif defined(__GNUC__) && (defined(__M68000__) || defined(__M68020__)) && !defined(__mcoldfire__) Uint16 SDL_Swap16(Uint16 x) { __asm__("rorw #8,%0": "=d"(x): "0"(x):"cc"); return x; } Our results are not generalizable to other architectures

Slide 80

Slide 80 text

Selected Threats to Validity 59 #else Uint16 SDL_Swap16(Uint16 x) { return SDL_static_cast(Uint16, ((x << 8) | (x >> 8))); } #endif Inline assembly often has C and/or GCC builtin fallbacks

Slide 81

Slide 81 text

Future Work • Improved tool support • Tools that analyze the correctness of inline assembly • Compiler testing • Programming language improvements • Study other unstandardized non-C elements 60

Slide 82

Slide 82 text

Ongoing work: GCC builtins 61

Slide 83

Slide 83 text

GCC builtins: percentage of projects 62 28% 37% 0 10 20 30 40 % of projects Popular projects with inline assembly (Popular) projects with GCC builtins GCC builtins are used in almost every second (popular) project

Slide 84

Slide 84 text

GCC builtins: density (occurrence per KLOC) 63 50 6 0 10 20 30 40 50 Density (occurrence per KLOC) Popular projects with inline assembly (Popular) projects with GCC builtins

Slide 85

Slide 85 text

GCC builtins 64 Builtins In % of projects __builtin_expect 48.2% __builtin_clz 29.3% __builtin_bswap32 26.2% __builtin_constant_p 23.3% __builtin_alloca 20.3% … … Similar as for inline assembly, but also to interact with the compiler

Slide 86

Slide 86 text

Summary 65 28% of popular C GitHub projects contain it Few fragments per project; typically a single instruction @RiggerManuel @smarr @stephenrkell @davleopo It is used in diverse domains There are four different usage categories Projects rely on a common subset