Slide 1

Slide 1 text

LLVM IR & OPTIMIZATION TECHNIQUES INTRODUCTION TO

Slide 2

Slide 2 text

WHAT IS LLVM LLVM AND ITS ARCHITECTURE • LLVM is a library that is used to construct, optimise and produce intermediate and/or binary machine code. • LLVM can be used as a compiler framework, where you provide the "front end" (parser and lexer) and the "back end" (code that converts LLVM's representation to actual machine code). • LLVM can also act as a JIT compiler - it has support for x86/ x86_64 and PPC/PPC64 assembly generation with fast code optimisations aimed for compilation speed.

Slide 3

Slide 3 text

LLVM’S ARCHITECTURE LLVM AND ITS ARCHITECTURE CLANG C/C++/OBJC FRONTEND LLVM-GCC FRONTEND GCC FRONTEND C Fortran Haskell LLVM OPTIMIZER LLVM X86 BACKEND LLVM POWERPC BACKEND LLVM ARM BACKEND

Slide 4

Slide 4 text

IR AND PASSES LLVM OPTIMIZER

Slide 5

Slide 5 text

WHAT IS A LLVM IR LLVM IR • LLVM IR is a Static Single Assignment (SSA) based representation that provides type safety, low-level operations, flexibility, and the capability of representing ‘all’ high-level languages cleanly. • The LLVM code representation is designed to be used in three different forms: as an in-memory compiler IR, as an on-disk bitcode representation (suitable for fast loading by a Just-In- Time compiler), and as a human readable assembly language representation.

Slide 6

Slide 6 text

AN EXAMPLE “HELLO WORLD” IR LLVM IR // hello.c #include void print() { printf(“Hello World!\n”); } int main(int argc, char** argv) { print(); } $ clang -S -emit-llvm -o hello.ll -c hello.c

Slide 7

Slide 7 text

AN EXAMPLE “HELLO WORLD” IR LLVM IR @.str = private unnamed_addr constant [14 x i8] c"Hello World! \0A\00", align 1 ; Function Attrs: nounwind uwtable define void @print() #0 { entry: %call = call i32 (i8*, ...) @printf(i8* getelementptr inbounds ([14 x i8], [14 x i8]* @.str, i32 0, i32 0)) ret void } declare i32 @printf(i8*, ...) #1 ; Function Attrs: nounwind uwtable define i32 @main(i32 %argc, i8** %argv) #0 { entry: %argc.addr = alloca i32, align 4 %argv.addr = alloca i8**, align 8 store i32 %argc, i32* %argc.addr, align 4 store i8** %argv, i8*** %argv.addr, align 8 call void @print() ret i32 0 }

Slide 8

Slide 8 text

WHAT IS LLVM PASSES LLVM PASSES • A LLVM pass is an operation on a unit of LLVM IR which is able to Compute something about the IR ; Mutate the LLVM IR. • The LLVM pass is used to analyse the LLVM IR and perform code transformation or optimisation on it. • All LLVM passes are subclasses of the Pass class, which implement functionality by overriding virtual methods inherited from Pass. Depending on how your pass works, you should inherit from the ModulePass , CallGraphSCCPass, FunctionPass, or LoopPass, or RegionPass, or BasicBlockPass classes, which gives the system more information about what your pass does, and how it can be combined with other passes.

Slide 9

Slide 9 text

WHAT IS LLVM PASSES LLVM PASSES FRONTEND BACKEND LLVM IR TRANSFORM PASSES ANALYSIS PASSES DISK

Slide 10

Slide 10 text

AN EXAMPLE “HELLO WORLD” IR APPLYING PASSES TO THE LLVM IR @str = private unnamed_addr constant [13 x i8] c"Hello World! \00" ; Function Attrs: nounwind uwtable define void @print() #0 { entry: %puts = tail call i32 @puts(i8* getelementptr inbounds ([13 x i8], [13 x i8]* @str, i64 0, i64 0)) ret void } ; Function Attrs: nounwind uwtable define i32 @main(i32 %argc, i8** nocapture readnone %argv) #0 { entry: %puts.i = tail call i32 @puts(i8* getelementptr inbounds ([13 x i8], [13 x i8]* @str, i64 0, i64 0)) #1 ret i32 0 } $ clang -S -emit-llvm -O3 -o hello.ll -c hello.c

Slide 11

Slide 11 text

INTRODUCTION LLVM OPTIMISATION FLAGS • LLVM enables some default passes while producing the machine codes. The optimisation flag -ON used in the compiler option will be forwarded to the back-end IR generator. Each -ON flag defines a set of passes. • Code to Dump LLVM Optimisation Passes Group $ llvm-as < /dev/null | opt -ON -disable-output -debug-pass=Arguments You may replace -ON with -O1, -O2 or -O3

Slide 12

Slide 12 text

FOR O1 OPTIMISATION LEVEL OPTIMISATION TECHNIQUES ALIAS ANALYSIS Alias analysis is a technique in compiler theory, used to determine if a storage location may be accessed in more than one way. Two pointers are said to be aliased if they point to the same location. TYPE BASED ALIAS ANALYSIS, GLOBALS ALIAS ANALYSIS

Slide 13

Slide 13 text

FOR O1 OPTIMISATION LEVEL OPTIMISATION TECHNIQUES SPARSE CONDITIONAL CONSTANT PROPAGATION These passes provide Sparse Conditional Constant Propagation optimisation. Sparse Conditional Constant Propagation optimisation is an optimisation technique based on Constant Folding and Constant Propagation.

Slide 14

Slide 14 text

FOR O1 OPTIMISATION LEVEL OPTIMISATION TECHNIQUES GLOBAL VARIABLE OPTIMISER This pass transforms simple global variables that never have their address taken. If obviously true, it marks read/write globals as constant, deletes variables only stored to, etc.

Slide 15

Slide 15 text

FOR O1 OPTIMISATION LEVEL OPTIMISATION TECHNIQUES DEAD ARGUMENT ELIMINATION This pass deletes dead arguments from internal functions. Dead argument elimination removes arguments which are directly dead, as well as arguments only passed into function calls as dead arguments of other functions. This pass also deletes dead arguments in a similar way.

Slide 16

Slide 16 text

FOR O1 OPTIMISATION LEVEL OPTIMISATION TECHNIQUES COMBINE REDUNDANT INSTRUCTIONS Combine instructions to form fewer, simple instructions. This pass does not modify the CFG. This pass is where algebraic simplification happens. y = x + 1; z = y + 1; z = x + 2;

Slide 17

Slide 17 text

FOR O1 OPTIMISATION LEVEL OPTIMISATION TECHNIQUES SIMPLIFY THE CFG Performs dead code elimination and basic block merging. Specifically: 1. Removes basic blocks with no predecessors; 2. Merges a basic block into its predecessor if there is only one and the predecessor only has one successor. 3. Eliminates PHI nodes for basic blocks with a single predecessor. 4. Eliminates a basic block that only contains an unconditional branch.

Slide 18

Slide 18 text

FOR O1 OPTIMISATION LEVEL OPTIMISATION TECHNIQUES DEDUCE FUNCTION ATTRIBUTES A simple interprocedural pass which walks the call- graph, looking for functions which do not access or only read non-local memory, and marking them readnone/readonly. In addition, it marks function arguments (of pointer type) “nocapture” if a call to the function does not create any copies of the pointer value that outlive the call. This more or less means that the pointer is only dereferenced, and not returned from the function or stored in a global. This pass is implemented as a bottom-up traversal of the call-graph.

Slide 19

Slide 19 text

FOR O1 OPTIMISATION LEVEL OPTIMISATION TECHNIQUES Loop Invariant Code Motion This pass performs loop invariant code motion, attempting to remove as much code from the body of a loop as possible. It does this by either hoisting code into the preheader block, or by sinking code to the exit blocks if it is safe. This pass also promotes must-aliased memory locations in the loop to live in registers, thus hoisting and sinking “invariant” loads and stores. LOOP OPTIMISATION

Slide 20

Slide 20 text

FOR O1 OPTIMISATION LEVEL OPTIMISATION TECHNIQUES This transformation analyses and transforms the induction variables (and computations derived from them) into simpler forms suitable for subsequent analysis and transformation. LOOP OPTIMISATION for (i = 7; i*i < 1000; ++i) for (i = 0; i != 25; ++i)

Slide 21

Slide 21 text

FOR O1 OPTIMISATION LEVEL OPTIMISATION TECHNIQUES Dead Store Elimination A trivial dead store elimination that only considers basic-block local redundant stores. Strip Unused Function Prototypes This pass loops over all of the functions in the input module, looking for dead declarations and removes them. DEAD CODE ELIMINATION

Slide 22

Slide 22 text

FOR O1 OPTIMISATION LEVEL OPTIMISATION TECHNIQUES Dead Store Elimination A trivial dead store elimination that only considers basic-block local redundant stores. Strip Unused Function Prototypes This pass loops over all of the functions in the input module, looking for dead declarations and removes them. DEAD CODE ELIMINATION

Slide 23

Slide 23 text

FOR O2 OPTIMISATION LEVEL OPTIMISATION TECHNIQUES Function Integration/Inlining Global Value Numbering Global Dead Code Elimination Merge Duplicate Global Constants O2 IS BASED ON O1 AND ADDS

Slide 24

Slide 24 text

FOR O2 OPTIMISATION LEVEL OPTIMISATION TECHNIQUES Bottom-up inlining of functions into callees. FUNCTION INTEGRATION/INLINING GLOBAL VALUE NUMBERING This pass performs global value numbering to eliminate fully and partially redundant instructions. It also performs redundant load elimination.

Slide 25

Slide 25 text

FOR O2 OPTIMISATION LEVEL OPTIMISATION TECHNIQUES This transform is designed to eliminate unreachable internal globals from the program. It uses an aggressive algorithm, searching out globals that are known to be alive. After it finds all of the globals which are needed, it deletes whatever is left over. This allows it to delete recursive chunks of the program which are unreachable. GLOBAL DEAD CODE ELIMINATION

Slide 26

Slide 26 text

FOR O2 OPTIMISATION LEVEL OPTIMISATION TECHNIQUES Merges duplicate global constants together into a single constant that is shared. This is useful because some passes (i.e., TraceValues) insert a lot of string constants into the program, regardless of whether or not an existing string is available. MERGE DUPLICATE GLOBAL CONSTANTS

Slide 27

Slide 27 text

FOR O3 OPTIMISATION LEVEL OPTIMISATION TECHNIQUES Promote ‘by reference’ arguments to scalars O3 IS BASED ON O2 AND ADDS

Slide 28

Slide 28 text

FOR O2 OPTIMISATION LEVEL OPTIMISATION TECHNIQUES This pass promotes “by reference” arguments to be “by value” arguments. In practice, this means looking for internal functions that have pointer arguments. If it can prove, through the use of alias analysis, that an argument is only loaded, then it can pass the value into the function instead of the address of the value. PROMOTE ‘BY REFERENCE’ ARGUMENTS TO SCALARS