Rigger1, Stefan Marr2, Stephen Kell3, David Leopoldseder1, Hanspeter Mössenböck1 VEE, 25 March 2018 1 Johannes Kepler University Linz, Austria 2 University of Kent, UK 3 University of Cambridge, UK Partly funded by
printf Inline Assembly C projects consist of more than C code Compiler builtins Compiler pragmas #define getmax(a,b) ((a)>(b)?(a):(b)) Preprocessor macros
printf Inline Assembly C projects consist of more than C code Compiler builtins Compiler pragmas #define getmax(a,b) ((a)>(b)?(a):(b)) Preprocessor macros void fatal() __attribute__ ((noreturn)); Attributes
<< 32)|tickl; } Inline Assembly in C Projects 11 What about C tools that could not use an assembler to defer the work? clock_cycles(): rdtsc shl rdx, 32 mov eax, eax ret
function rdtsc) test.c:5:3: Unrecognized identifier: asm Identifier used in code has not been declared. (Use –unrecog to inhibit warning) test.c:5:15: Parse Error. (For help on parse errors, see splint -help parseerrors.) *** Cannot continue.
function rdtsc) test.c:5:3: Unrecognized identifier: asm Identifier used in code has not been declared. (Use –unrecog to inhibit warning) test.c:5:15: Parse Error. (For help on parse errors, see splint -help parseerrors.) *** Cannot continue. Many analysis tools ignore inline assembly
{ unsigned int tickl, tickh; asm("rdtsc":"=a"(tickl),"=d"(tickh)); return ((uint64_t)tickh << 32)|tickl; } But could approximate it by analyzing side effects
obtain a diverse set • 327 popular projects • >850 GitHub stars • 937 keyword-search projects • E.g., bitcoin, web server, parser • Grep for “asm” and extraction of the fragments 18
RQ2: How does the average inline assembly look like? • RQ3: In which domains is inline assembly used? • RQ4: What is inline assembly used for? • RQ5: Do projects use the same subset of instructions? 19
large number of inline assembly fragments • Several SIMD instruction set extensions Analysis 21 197 projects with assembly 163 analyzed projects with assembly
40 50 60 70 80 90 100 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 Cumulative percentage Number of unique fragments per project A number of projects only use a single inline assembly fragment 36%
20 30 40 50 60 70 80 90 100 1 2 3 4 5 6 7 8 9 10 11 12 Cumulative percentage Number of instructions per unique fragment 31 Fragments typically consist of a single instruction 64%
20 30 40 50 60 70 80 90 100 1 2 3 4 5 6 7 8 9 10 11 12 Cumulative percentage Number of instructions per unique fragment 33 We also found fragments with several hundred instructions 100% 438 …
projects % projects Crypto 23 11.7% Networking 20 10.2% Media 17 8.6% Database 16 8.1% Language implementation 15 7.6% Misc 13 6.6% Concurrency 9 4.6% SSL 8 4.1% Text processing 8 4.1% Math library 7 3.6% Web server 7 3.6% The domains of inline assembly are diverse
many projects can be supported by implementing 5% of x86-64’s ~1000 instructions? • At least 64% of projects (including the large-fragment ones) 0 10 20 30 40 50 60 70 80 90 2 4 13 22 28 31 32 36 46 47 49 50 % of supported projects Number of implemented instructions 77.9%
20 30 40 % of projects Popular projects with inline assembly (Popular) projects with GCC builtins GCC builtins are used in almost every second (popular) project
__builtin_clz 29.3% __builtin_bswap32 26.2% __builtin_constant_p 23.3% __builtin_alloca 20.3% … … Similar as for inline assembly, but also to interact with the compiler
Few fragments per project; typically a single instruction @RiggerManuel @smarr @stephenrkell @davleopo It is used in diverse domains There are four different usage categories Projects rely on a common subset