Slide 1

Slide 1 text

An Analysis of Inline Assembly in C Projects Manuel Rigger, Stefan Marr, Hanspeter Mössenböck VMM, September 29, 2017 Johannes Kepler University Linz

Slide 2

Slide 2 text

Inline Assembly in C Projects 2 uint64_t clock_cycles() { unsigned int tickl, tickh; asm("rdtsc":"=a"(tickl),"=d"(tickh)); return ((uint64_t)tickh << 32)|tickl; }

Slide 3

Slide 3 text

Inline Assembly in C Projects 3 uint64_t clock_cycles() { unsigned int tickl, tickh; asm("rdtsc":"=a"(tickl),"=d"(tickh)); return ((uint64_t)tickh << 32)|tickl; } Instructions

Slide 4

Slide 4 text

Inline Assembly in C Projects 4 uint64_t clock_cycles() { unsigned int tickl, tickh; asm("rdtsc":"=a"(tickl),"=d"(tickh)); return ((uint64_t)tickh << 32)|tickl; } Output operands

Slide 5

Slide 5 text

Inline Assembly in C Projects 5 uint64_t clock_cycles() { unsigned int tickl, tickh; asm("rdtsc":"=a"(tickl),"=d"(tickh)); return ((uint64_t)tickh << 32)|tickl; } Output operand constraints

Slide 6

Slide 6 text

Inline Assembly in C Projects 6 uint64_t clock_cycles() { unsigned int tickl, tickh; asm("rdtsc":"=a"(tickl),"=d"(tickh)); return ((uint64_t)tickh << 32)|tickl; } Input operands, side effects ,…

Slide 7

Slide 7 text

uint64_t clock_cycles() { unsigned int tickl, tickh; asm("rdtsc":"=a"(tickl),"=d"(tickh)); return ((uint64_t)tickh << 32)|tickl; } Inline Assembly in C Projects 7 clock_cycles(): rdtsc shl rdx, 32 mov eax, eax or rax, rdx ret

Slide 8

Slide 8 text

uint64_t clock_cycles() { unsigned int tickl, tickh; asm("rdtsc":"=a"(tickl),"=d"(tickh)); return ((uint64_t)tickh << 32)|tickl; } Inline Assembly in C Projects 7 What about C tools that cannot use an assembler to defer the work? clock_cycles(): rdtsc shl rdx, 32 mov eax, eax or rax, rdx ret

Slide 9

Slide 9 text

Sulong 8

Slide 10

Slide 10 text

Sulong 8 Ad-hoc approach to adding unsupported instructions

Slide 11

Slide 11 text

Sulong 9 public abstract static class LLVMAMD64RdtscReadNode extends LLVMExpressionNode { public long executeRdtsc() { return System.currentTimeMillis(); } }

Slide 12

Slide 12 text

Sulong 9 public abstract static class LLVMAMD64RdtscReadNode extends LLVMExpressionNode { public long executeRdtsc() { return System.currentTimeMillis(); } } Emulate the behavior of assembly

Slide 13

Slide 13 text

Splint 10 Splint 3.1.2 --- 03 May 2009 test.c: (in function rdtsc) test.c:5:3: Unrecognized identifier: asm Identifier used in code has not been declared. (Use –unrecog to inhibit warning) test.c:5:15: Parse Error. (For help on parse errors, see splint -help parseerrors.) *** Cannot continue.

Slide 14

Slide 14 text

Splint 10 Splint 3.1.2 --- 03 May 2009 test.c: (in function rdtsc) test.c:5:3: Unrecognized identifier: asm Identifier used in code has not been declared. (Use –unrecog to inhibit warning) test.c:5:15: Parse Error. (For help on parse errors, see splint -help parseerrors.) *** Cannot continue. Many analysis tools ignore inline assembly

Slide 15

Slide 15 text

c2go 11 c2go transpile test.c panic: unknown node type: 'GCCAsmStmt 0x3a991f8 'goroutine 1 [running]:github_com_elliotchance_c2go_ast.Parse go/src/github.com/elliotchance/c2go/ast/ast.go:211main.convertLinesToNodes go/src/github.com/elliotchance/c2go/main.go:81main.Start go/src/github.com/elliotchance/c2go/main.go:219main.runCommand go/src/github.com/elliotchance/c2go/main.go:350main.main go/src/github.com/elliotchance/c2go/main.go:277goroutine 6 [finalizer wait]:

Slide 16

Slide 16 text

c2go 11 Many source-to-source translators ignore inline assembly c2go transpile test.c panic: unknown node type: 'GCCAsmStmt 0x3a991f8 'goroutine 1 [running]:github_com_elliotchance_c2go_ast.Parse go/src/github.com/elliotchance/c2go/ast/ast.go:211main.convertLinesToNodes go/src/github.com/elliotchance/c2go/main.go:81main.Start go/src/github.com/elliotchance/c2go/main.go:219main.runCommand go/src/github.com/elliotchance/c2go/main.go:350main.main go/src/github.com/elliotchance/c2go/main.go:277goroutine 6 [finalizer wait]:

Slide 17

Slide 17 text

Current assumptions 12 “Inline assembly is rare in most programs” (Johnson 2014) “[…] programmers could provide C implementations of inline assembly blocks” (Johnson 2014)

Slide 18

Slide 18 text

Current assumptions 13 “…” [Other papers]

Slide 19

Slide 19 text

Open questions • How frequent is inline assembly? • How “complex” is the usage of inline assembly? • Which domains use inline assembly? • How diverse is the usage of inline assembly? 14

Slide 20

Slide 20 text

Open questions • How frequent is inline assembly? • How “complex” is the usage of inline assembly? • Which domains use inline assembly? • How diverse is the usage of inline assembly? 14 Survey of inline assembly in C projects

Slide 21

Slide 21 text

Approach • grep-based analysis of Github projects (excluding OS-level projects) • Most popular* 327 C projects • 937 C projects by keywords • Quantitative analysis  database for AMD64 instructions • Qualitative analysis 15 *C projects >= 850 Github stars

Slide 22

Slide 22 text

How frequent is inline assembly? • ∼28% of the most popular C projects • ∼11% of the other C projects 16

Slide 23

Slide 23 text

How frequent is inline assembly? • ∼28% of the most popular C projects • ∼11% of the other C projects 16 Sulong and other tools that process C cannot ignore inline assembly!

Slide 24

Slide 24 text

How “complex” is its usage? 17 36 projects (18%) with inline assembly used complicated macro-metaprogramming  ignored

Slide 25

Slide 25 text

How “complex” is its usage? 18 The majority of inline assembly fragments contain one or two instructions

Slide 26

Slide 26 text

How “complex” is its usage? 18 The majority of inline assembly fragments contain one or two instructions 1 instruction

Slide 27

Slide 27 text

How “complex” is its usage? 18 The majority of inline assembly fragments contain one or two instructions 2 instructions

Slide 28

Slide 28 text

How “complex” is its usage? 19 The majority of projects with inline assembly contain only a few fragments

Slide 29

Slide 29 text

How “complex” is its usage? 19 The majority of projects with inline assembly contain only a few fragments 10 fragments

Slide 30

Slide 30 text

How “complex” is its usage? 20 rep nop

Slide 31

Slide 31 text

How “complex” is its usage? 20 rep nop 0xF3 0x90 pause

Slide 32

Slide 32 text

How “complex” is its usage? 20 rep nop 0xF3 0x90 pause Programmers sometimes have to work around old assemblers

Slide 33

Slide 33 text

How “complex” is its usage? 21 .byte 0x66; clflush %0

Slide 34

Slide 34 text

How “complex” is its usage? 21 .byte 0x66; clflush %0 0x0FAE clflushopt

Slide 35

Slide 35 text

Which domains use inline assembly? Domain # projects % projects Crypto 32 11.7% Networking 20 10.2% Media 17 8.6% Database 16 8.1% Language implementation 15 7.6% Misc 14 6.6% Concurrency 9 4.6% SSL 8 4.1% Text processing 8 4.1% Math library 7 3.6% Web server 7 3.6% 22 The domains of inline assembly are diverse

Slide 36

Slide 36 text

How diverse is the usage of inline assembly? • ∼190 unique inline assembly fragments • ∼170 unique instructions • Implement 46 instructions  support 76% of projects 23

Slide 37

Slide 37 text

How diverse is the usage of inline assembly? • ∼190 unique inline assembly fragments • ∼170 unique instructions • Implement 46 instructions  support 76% of projects 23 Only a small subset of the ∼1000 AMD64 instructions (Heule 2016) is needed

Slide 38

Slide 38 text

How diverse is the usage of inline assembly? Instructions Contained in % projects (with inline assembly) rdtsc 26.9% cpuid 24.9% mov 23.9% 21.3% lock xchg 14.2% … … 24

Slide 39

Slide 39 text

How diverse is the usage of inline assembly? • Instruction ordering and SMP programming 25 Compiler barriers Memory barriers Atomics …

Slide 40

Slide 40 text

How diverse is the usage of inline assembly? • Instruction ordering and SMP programming • Performance optimizations 26 SIMD Endianness conversions Bitscans …

Slide 41

Slide 41 text

How diverse is the usage of inline assembly? • Instruction ordering and SMP programming • Performance optimizations • Functionality unavailable in C 27 Elapsed clock cycles CPU features Data prefetching …

Slide 42

Slide 42 text

How diverse is the usage of inline assembly? • Instruction ordering and SMP programming • Performance optimizations • Functionality unavailable in C • “Supporting” instructions (moves etc.) 28

Slide 43

Slide 43 text

Selected Threat to Validity 29 #ifdef SOME_CONDITION // use inline assembly implementations #else // use other implementation #endif Instructions are not necessarily included in the binary

Slide 44

Slide 44 text

Conclusions • We cannot ignore inline assembly • Majority of projects use few and short inline assembly fragments • <200 different instructions • But: • Projects that provide different SIMD implementations • Non-mnemonic instructions 30

Slide 45

Slide 45 text

Thanks for listening! 31 https://github.com/graalvm/sulong/ @RiggerManuel

Slide 46

Slide 46 text

Bibliography • Stefan Heule, Eric Schkufza, Rahul Sharma, and Alex Aiken. 2016. Stratified synthesis: automatically learning the x86-64 instruction set. In Proceedings of the 37th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI '16). • Rob Johnson and David Wagner. 2004. Finding user/kernel pointer bugs with type inference. In Proceedings of the 13th conference on USENIX Security Symposium - Volume 13 (SSYM'04), Vol. 13. USENIX Association, Berkeley, CA, USA, 9-9. 32

Slide 47

Slide 47 text

Images • Page 13: Faras Saint Anne, Public Domain, https://en.wikipedia.org/wiki/File:Faras_Saint_Anne_(detail).jpg • Page 15: Cessna 172 Airplane: Nick Dean, Frze, Creative Commons Attribution-Share Alike 3.0 Unported license, https://commons.wikimedia.org/wiki/File:WRM_Airplane_-_Flugzeug_Cessna_172_m.jpg • Page 15: Under construction, Public Domain, https://commons.wikimedia.org/wiki/Category:Under_construction_icons#/media/File:UnderCon_icon_b lack.svg • Page 18: Stein der fünften Sonne, sog. Aztekenkalender: Anagoria, GNU Free Documentation License, https://commons.wikimedia.org/wiki/File:1479_Stein_der_f%C3%BCnften_Sonne,_sog._Aztekenkalend er,_Ollin_Tonatiuh_anagoria.JPG 33