How is the a.out produced? What does it contain? What is a .o file? What is a .so file? This presentation should give you a starting point to think about all this. With some digressions and live coding when relevant. Noufal Ibrahim Compilation: Blow by Blow March 11, 2017 3 / 58
How is the a.out produced? What does it contain? What is a .o file? What is a .so file? This presentation should give you a starting point to think about all this. With some digressions and live coding when relevant. First we’ll shoot through the process Then we’ll rewind and go through it step by step. Noufal Ibrahim Compilation: Blow by Blow March 11, 2017 3 / 58
How is the a.out produced? What does it contain? What is a .o file? What is a .so file? This presentation should give you a starting point to think about all this. With some digressions and live coding when relevant. First we’ll shoot through the process Then we’ll rewind and go through it step by step. A sneak peek - Try gcc -v program.c. Noufal Ibrahim Compilation: Blow by Blow March 11, 2017 3 / 58
#define textual macros #include include other files #warning, #error for user info #line to change the line number and filename __LINE__ and __FILE__ for line number and file. Noufal Ibrahim Compilation: Blow by Blow March 11, 2017 7 / 58
#define textual macros #include include other files #warning, #error for user info #line to change the line number and filename __LINE__ and __FILE__ for line number and file. Pragmas Compiler specific directives e.g. Use some kinds of instructions etc. Noufal Ibrahim Compilation: Blow by Blow March 11, 2017 7 / 58
l e " sample.c " . t e x t . g l o b l main .type main , @function main : pushq %rbp movq %rsp , %rbp movl $7 , %eax popq %rbp ret . s i z e main , . −main .ident "GCC: ( Debian 4 .9.2 −10) 4 . 9 . 2 " . s e c t i o n .note.GNU−stack , " " , @progbits Noufal Ibrahim Compilation: Blow by Blow March 11, 2017 9 / 58
the . statements? What is the whole %rbp %rsp trickery? A little note on g++ We will revisit this whole thing. Noufal Ibrahim Compilation: Blow by Blow March 11, 2017 10 / 58
libraries Core dumps Replaces a.out, COFF etc. Programs intended to execute directly on a processor. Noufal Ibrahim Compilation: Blow by Blow March 11, 2017 13 / 58
use it for. ELF header (fixed position) has a roadmap for sections Sections hold things like symbol table, instructions, data etc. Noufal Ibrahim Compilation: Blow by Blow March 11, 2017 14 / 58
1 . . . Section n Section header table "Section header table" tells you where the various sections are - name, size etc. Noufal Ibrahim Compilation: Blow by Blow March 11, 2017 15 / 58
4 bytes magic number 1 byte to indicate 32/64 bit 1 byte for endinanees 1 byte for ELF version 1 byte for OS ABI Rest at https://en.wikipedia.org/wiki/Executable_and_Linkable_Forma You can actually twiddle this 8th byte is the OS ABI. - 0x03 is Linux, 0x07 is AIX Noufal Ibrahim Compilation: Blow by Blow March 11, 2017 17 / 58
extra instructions. gcc -S sample1.c . f i l e " sample1.c " . t e x t . g l o b l main .type main , @function main : .LFB0 : . c f i _ s t a r t p r o c pushq %rbp . c f i _ d e f _ c f a _ o f f s e t 16 . c f i _ o f f s e t 6 , −16 movq %rsp , %rbp . c f i _ d e f _ c f a _ r e g i s t e r 6 movl $9 , %eax popq %rbp . c f i _ d e f _ c f a 7 , 8 ret .cfi_endproc .LFE0 : . s i z e main , . −main .ident "GCC: ( Debian 4 .9.2 −10) 4 . 9 . 2 " . s e c t i o n .note.GNU−stack , " " , @progbits Noufal Ibrahim Compilation: Blow by Blow March 11, 2017 25 / 58
a . are assembler directives. Tell the assembler how to assemble the file. CFI stands for "Call Frame Information". Tells the assembler to add extra information useful for unwinding the call stack. We can remove this using -fno-asynchronous-unwind-tables We’ll come back to this when we discuss linking Noufal Ibrahim Compilation: Blow by Blow March 11, 2017 26 / 58
. f i l e " sample1.c " . t e x t . g l o b l main .type main , @function main : pushq %rbp movq %rsp , %rbp movl $9 , %eax popq %rbp ret . s i z e main , . −main .ident "GCC: ( Debian 4 .9.2 −10) 4 . 9 . 2 " . s e c t i o n .note.GNU−stack , " " , @progbits Noufal Ibrahim Compilation: Blow by Blow March 11, 2017 27 / 58
a new file. .text assembles this into the "text segment" (executable code) .globl Makes the symbol visible to ld (the linker) - Not just internal. .type Marks the gives symbol as the given type (in this case a function) main: is the label from where the main function starts. The initial pushq and movq are to save the current stack base and create a new frame movl puts 9 into eax. stack restored and function returns. .size is used to define the size of the main symbol. .ident adds some comments on the file .section is used to add a new section. Here it marks the stack as executable and the @progbits includes this into the binary. Noufal Ibrahim Compilation: Blow by Blow March 11, 2017 28 / 58
main prints symbols in object files. value, type, name The T tells you that the symbol is in the text section. main is the name. Noufal Ibrahim Compilation: Blow by Blow March 11, 2017 29 / 58
i l e format elf64 −x86−64 architecture : i386 : x86 −64, flags 0x00000010 : HAS_SYMS s t a r t address 0x0000000000000000 Notice the HAS_SYMS. It’s not stripped. Also notice the start address (of main). If you built it using -g, you can ask for the debugging info. objdump -x will give you header information Noufal Ibrahim Compilation: Blow by Blow March 11, 2017 30 / 58
the file header -S gives you the sections in the file. Notice the symbol table. That’s where we get the stuff from nm We can strip it. After this, nm will not work. Noufal Ibrahim Compilation: Blow by Blow March 11, 2017 31 / 58
" sample2.c " . t e x t . g l o b l add .type add , @function add : pushq %rbp movq %rsp , %rbp movl %edi , −4(%rbp ) movl %esi , −8(%rbp ) movl −4(%rbp ) , %edx movl −8(%rbp ) , %eax addl %edx , %eax popq %rbp ret . s i z e add , . −add . g l o b l main .type main , @function main : pushq %rbp movq %rsp , %rbp movl $5 , %e s i movl $4 , %edi c a l l add popq %rbp ret . s i z e main , . −main .ident "GCC: ( Debian 4 .9.2 −10) 4 . 9 . 2 " . s e c t i o n .note.GNU−stack , " " , @progbits Noufal Ibrahim Compilation: Blow by Blow March 11, 2017 33 / 58
pushq %rbp movq %rsp , %rbp movl $5 , %e s i movl $4 , %edi c a l l add popq %rbp ret Loads the two arguments into esi and edi Then calls add The rest is just creating an activation record for main itself. Noufal Ibrahim Compilation: Blow by Blow March 11, 2017 34 / 58
the stack using movl Then they’re loaded into into edx and eax They’re added and the result stored in eax Stack restored. Since it’s in eax, returning from main will be fine. Noufal Ibrahim Compilation: Blow by Blow March 11, 2017 36 / 58
0000000000000000 T add 0000000000000014 T main The symbols change if you use g++. Notice that all the functions are T. Let’s change that. Noufal Ibrahim Compilation: Blow by Blow March 11, 2017 37 / 58
) { return printf ( " Hello , world\n" ) ; } nm hello.o 0000000000000000 T main U printf Notice the U with printf. This is an undefined symbol. It should get resolved when we do linking. No compilation errors but linking crash! We’ll touch this again when we discuss linking and shared libs Noufal Ibrahim Compilation: Blow by Blow March 11, 2017 38 / 58
cat hello1.c #include <stdio . h> int main ( ) { printf ( " Hello , world\n" ) ; return 0; } nm hello1.o 0000000000000000 T main U puts Notice how gcc replaces the printf with puts now. Who knew? Noufal Ibrahim Compilation: Blow by Blow March 11, 2017 39 / 58
run it. To do that, you need to link it Take multiple files and convert into one. Let’s link our file Noufal Ibrahim Compilation: Blow by Blow March 11, 2017 40 / 58
: cannot find entry symbol _ s t a r t ; defaulting to 00000000004000b0 zsh : segmentation fault . / sample Noufal Ibrahim Compilation: Blow by Blow March 11, 2017 41 / 58
with gcc sample.c -o sample-good Look at the file file sample sample : ELF 64− bit LSB executable , x86 −64, version 1 (SYSV) , s t a t i c a l l y linked , not stripped file sample-good sample−good : ELF 64− bit LSB executable , x86 −64, version 1 (SYSV) , dynamically linked , interpreter / lib64 /ld−linux −x86 −64. so .2 , . . . Noufal Ibrahim Compilation: Blow by Blow March 11, 2017 42 / 58
i o n .data . s e c t i o n . t e x t . g l o b l _ s t a r t _ s t a r t : mov $1 , %eax mov $9 , %ebx int $0x80 Noufal Ibrahim Compilation: Blow by Blow March 11, 2017 44 / 58
i o n .data . s e c t i o n . t e x t . g l o b l _ s t a r t _ s t a r t : mov $60 , %rax mov $0 , %rdi s y s c a l l Noufal Ibrahim Compilation: Blow by Blow March 11, 2017 45 / 58
.o file we’ve generated above has just a _start symbol When we link, it looks for a _start symbol. (Stripping this produces a warning ) Noufal Ibrahim Compilation: Blow by Blow March 11, 2017 47 / 58
.o file we’ve generated above has just a _start symbol When we link, it looks for a _start symbol. (Stripping this produces a warning ) That’s the piece of code that will run when you run the program. It simply sets a few registers and calls an interrupt to exit the program. Can’t get much smaller. Noufal Ibrahim Compilation: Blow by Blow March 11, 2017 47 / 58
we don’t have a _start Heck! I have main. I’ll just use that. ld -e main sample.o -o sample ; ./sample zsh: segmentation fault ./sample Noufal Ibrahim Compilation: Blow by Blow March 11, 2017 48 / 58
binary anymore. C has a runtime. crt*.o e.g. /usr/lib/x86_64-linux-gnu/crt1.o has our _start symbol Stuff happens here before the handover to main. Let’s link this too. Noufal Ibrahim Compilation: Blow by Blow March 11, 2017 49 / 58
binary anymore. C has a runtime. crt*.o e.g. /usr/lib/x86_64-linux-gnu/crt1.o has our _start symbol Stuff happens here before the handover to main. Let’s link this too. ld /usr/lib/x86_64-linux-gnu/crt1.o sample.o -o sample / usr / l i b /x86_64−linux −gnu/ crt1 . o : In function ‘ _start ’ : / build / glibc −qK83Be/ glibc −2.19/ csu / . . / sysdeps /x86_64/ s t a r t . S :115: undefined reference to ‘ __libc_csu_fini ’ / build / glibc −qK83Be/ glibc −2.19/ csu / . . / sysdeps /x86_64/ s t a r t . S :116: undefined reference to ‘ __libc_csu_init ’ / build / glibc −qK83Be/ glibc −2.19/ csu / . . / sysdeps /x86_64/ s t a r t . S :122: undefined reference to ‘ __libc_start_main ’ Noufal Ibrahim Compilation: Blow by Blow March 11, 2017 49 / 58
-o sample -lc / usr / l i b /x86_64−linux −gnu/ libc_nonshared . a ( elf − i n i t . oS ) : In function ‘ __libc_csu_init ’ : ( . text +0x2f ) : undefined reference to ‘ _init ’ Noufal Ibrahim Compilation: Blow by Blow March 11, 2017 50 / 58
linker invocation and try again ld /usr/lib/x86_64-linux-gnu/crt1.o /usr/lib/x86_64-linux-gnu/crti.o sample.o -o sample -lc Links but can’t run. file sample sample : ELF 64− bit LSB executable , x86 −64, version 1 (SYSV) , dynamically linked , interpreter / l i b /ld64 . so .1 . . . Noufal Ibrahim Compilation: Blow by Blow March 11, 2017 51 / 58
Although we’ve made it a dynamic executable ld -dynamic-linker /lib64/ld-linux-x86-64.so.2 /usr/lib/x86_64-linux-gnu/crt1.o /usr/lib/x86_64-linux-gnu/crti.o sample.o -o sample -lc Still segfaults Noufal Ibrahim Compilation: Blow by Blow March 11, 2017 52 / 58
not there yet. For that, we need a few more crt* libs This becomes doubly obvious when we use a printf. ld -dynamic-linker /lib64/ld-linux-x86-64.so.2 \ /usr/lib/x86_64-linux-gnu/crt1.o \ /usr/lib/x86_64-linux-gnu/crti.o \ /usr/lib/x86_64-linux-gnu/crtn.o \ -o sample sample.o -lc Noufal Ibrahim Compilation: Blow by Blow March 11, 2017 53 / 58