Upgrade to Pro — share decks privately, control downloads, hide ads and more …

LLVM IR & Optimisation Techniques

Ciel
October 28, 2015

LLVM IR & Optimisation Techniques

LLVM IR & Optimisation Techniques

Ciel

October 28, 2015
Tweet

More Decks by Ciel

Other Decks in Programming

Transcript

  1. WHAT IS LLVM LLVM AND ITS ARCHITECTURE • LLVM is

    a library that is used to construct, optimise and produce intermediate and/or binary machine code. • LLVM can be used as a compiler framework, where you provide the "front end" (parser and lexer) and the "back end" (code that converts LLVM's representation to actual machine code). • LLVM can also act as a JIT compiler - it has support for x86/ x86_64 and PPC/PPC64 assembly generation with fast code optimisations aimed for compilation speed.
  2. LLVM’S ARCHITECTURE LLVM AND ITS ARCHITECTURE CLANG C/C++/OBJC FRONTEND LLVM-GCC

    FRONTEND GCC FRONTEND C Fortran Haskell LLVM OPTIMIZER LLVM X86 BACKEND LLVM POWERPC BACKEND LLVM ARM BACKEND
  3. WHAT IS A LLVM IR LLVM IR • LLVM IR

    is a Static Single Assignment (SSA) based representation that provides type safety, low-level operations, flexibility, and the capability of representing ‘all’ high-level languages cleanly. • The LLVM code representation is designed to be used in three different forms: as an in-memory compiler IR, as an on-disk bitcode representation (suitable for fast loading by a Just-In- Time compiler), and as a human readable assembly language representation.
  4. AN EXAMPLE “HELLO WORLD” IR LLVM IR // hello.c #include

    <stdio.h> void print() { printf(“Hello World!\n”); } int main(int argc, char** argv) { print(); } $ clang -S -emit-llvm -o hello.ll -c hello.c
  5. AN EXAMPLE “HELLO WORLD” IR LLVM IR @.str = private

    unnamed_addr constant [14 x i8] c"Hello World! \0A\00", align 1 ; Function Attrs: nounwind uwtable define void @print() #0 { entry: %call = call i32 (i8*, ...) @printf(i8* getelementptr inbounds ([14 x i8], [14 x i8]* @.str, i32 0, i32 0)) ret void } declare i32 @printf(i8*, ...) #1 ; Function Attrs: nounwind uwtable define i32 @main(i32 %argc, i8** %argv) #0 { entry: %argc.addr = alloca i32, align 4 %argv.addr = alloca i8**, align 8 store i32 %argc, i32* %argc.addr, align 4 store i8** %argv, i8*** %argv.addr, align 8 call void @print() ret i32 0 }
  6. WHAT IS LLVM PASSES LLVM PASSES • A LLVM pass

    is an operation on a unit of LLVM IR which is able to Compute something about the IR ; Mutate the LLVM IR. • The LLVM pass is used to analyse the LLVM IR and perform code transformation or optimisation on it. • All LLVM passes are subclasses of the Pass class, which implement functionality by overriding virtual methods inherited from Pass. Depending on how your pass works, you should inherit from the ModulePass , CallGraphSCCPass, FunctionPass, or LoopPass, or RegionPass, or BasicBlockPass classes, which gives the system more information about what your pass does, and how it can be combined with other passes.
  7. WHAT IS LLVM PASSES LLVM PASSES FRONTEND BACKEND LLVM IR

    TRANSFORM PASSES ANALYSIS PASSES DISK
  8. AN EXAMPLE “HELLO WORLD” IR APPLYING PASSES TO THE LLVM

    IR @str = private unnamed_addr constant [13 x i8] c"Hello World! \00" ; Function Attrs: nounwind uwtable define void @print() #0 { entry: %puts = tail call i32 @puts(i8* getelementptr inbounds ([13 x i8], [13 x i8]* @str, i64 0, i64 0)) ret void } ; Function Attrs: nounwind uwtable define i32 @main(i32 %argc, i8** nocapture readnone %argv) #0 { entry: %puts.i = tail call i32 @puts(i8* getelementptr inbounds ([13 x i8], [13 x i8]* @str, i64 0, i64 0)) #1 ret i32 0 } $ clang -S -emit-llvm -O3 -o hello.ll -c hello.c
  9. INTRODUCTION LLVM OPTIMISATION FLAGS • LLVM enables some default passes

    while producing the machine codes. The optimisation flag -ON used in the compiler option will be forwarded to the back-end IR generator. Each -ON flag defines a set of passes. • Code to Dump LLVM Optimisation Passes Group $ llvm-as < /dev/null | opt -ON -disable-output -debug-pass=Arguments You may replace -ON with -O1, -O2 or -O3
  10. FOR O1 OPTIMISATION LEVEL OPTIMISATION TECHNIQUES ALIAS ANALYSIS Alias analysis

    is a technique in compiler theory, used to determine if a storage location may be accessed in more than one way. Two pointers are said to be aliased if they point to the same location. TYPE BASED ALIAS ANALYSIS, GLOBALS ALIAS ANALYSIS
  11. FOR O1 OPTIMISATION LEVEL OPTIMISATION TECHNIQUES SPARSE CONDITIONAL CONSTANT PROPAGATION

    These passes provide Sparse Conditional Constant Propagation optimisation. Sparse Conditional Constant Propagation optimisation is an optimisation technique based on Constant Folding and Constant Propagation.
  12. FOR O1 OPTIMISATION LEVEL OPTIMISATION TECHNIQUES GLOBAL VARIABLE OPTIMISER This

    pass transforms simple global variables that never have their address taken. If obviously true, it marks read/write globals as constant, deletes variables only stored to, etc.
  13. FOR O1 OPTIMISATION LEVEL OPTIMISATION TECHNIQUES DEAD ARGUMENT ELIMINATION This

    pass deletes dead arguments from internal functions. Dead argument elimination removes arguments which are directly dead, as well as arguments only passed into function calls as dead arguments of other functions. This pass also deletes dead arguments in a similar way.
  14. FOR O1 OPTIMISATION LEVEL OPTIMISATION TECHNIQUES COMBINE REDUNDANT INSTRUCTIONS Combine

    instructions to form fewer, simple instructions. This pass does not modify the CFG. This pass is where algebraic simplification happens. y = x + 1; z = y + 1; z = x + 2;
  15. FOR O1 OPTIMISATION LEVEL OPTIMISATION TECHNIQUES SIMPLIFY THE CFG Performs

    dead code elimination and basic block merging. Specifically: 1. Removes basic blocks with no predecessors; 2. Merges a basic block into its predecessor if there is only one and the predecessor only has one successor. 3. Eliminates PHI nodes for basic blocks with a single predecessor. 4. Eliminates a basic block that only contains an unconditional branch.
  16. FOR O1 OPTIMISATION LEVEL OPTIMISATION TECHNIQUES DEDUCE FUNCTION ATTRIBUTES A

    simple interprocedural pass which walks the call- graph, looking for functions which do not access or only read non-local memory, and marking them readnone/readonly. In addition, it marks function arguments (of pointer type) “nocapture” if a call to the function does not create any copies of the pointer value that outlive the call. This more or less means that the pointer is only dereferenced, and not returned from the function or stored in a global. This pass is implemented as a bottom-up traversal of the call-graph.
  17. FOR O1 OPTIMISATION LEVEL OPTIMISATION TECHNIQUES Loop Invariant Code Motion

    This pass performs loop invariant code motion, attempting to remove as much code from the body of a loop as possible. It does this by either hoisting code into the preheader block, or by sinking code to the exit blocks if it is safe. This pass also promotes must-aliased memory locations in the loop to live in registers, thus hoisting and sinking “invariant” loads and stores. LOOP OPTIMISATION
  18. FOR O1 OPTIMISATION LEVEL OPTIMISATION TECHNIQUES This transformation analyses and

    transforms the induction variables (and computations derived from them) into simpler forms suitable for subsequent analysis and transformation. LOOP OPTIMISATION for (i = 7; i*i < 1000; ++i) for (i = 0; i != 25; ++i)
  19. FOR O1 OPTIMISATION LEVEL OPTIMISATION TECHNIQUES Dead Store Elimination A

    trivial dead store elimination that only considers basic-block local redundant stores. Strip Unused Function Prototypes This pass loops over all of the functions in the input module, looking for dead declarations and removes them. DEAD CODE ELIMINATION
  20. FOR O1 OPTIMISATION LEVEL OPTIMISATION TECHNIQUES Dead Store Elimination A

    trivial dead store elimination that only considers basic-block local redundant stores. Strip Unused Function Prototypes This pass loops over all of the functions in the input module, looking for dead declarations and removes them. DEAD CODE ELIMINATION
  21. FOR O2 OPTIMISATION LEVEL OPTIMISATION TECHNIQUES Function Integration/Inlining Global Value

    Numbering Global Dead Code Elimination Merge Duplicate Global Constants O2 IS BASED ON O1 AND ADDS
  22. FOR O2 OPTIMISATION LEVEL OPTIMISATION TECHNIQUES Bottom-up inlining of functions

    into callees. FUNCTION INTEGRATION/INLINING GLOBAL VALUE NUMBERING This pass performs global value numbering to eliminate fully and partially redundant instructions. It also performs redundant load elimination.
  23. FOR O2 OPTIMISATION LEVEL OPTIMISATION TECHNIQUES This transform is designed

    to eliminate unreachable internal globals from the program. It uses an aggressive algorithm, searching out globals that are known to be alive. After it finds all of the globals which are needed, it deletes whatever is left over. This allows it to delete recursive chunks of the program which are unreachable. GLOBAL DEAD CODE ELIMINATION
  24. FOR O2 OPTIMISATION LEVEL OPTIMISATION TECHNIQUES Merges duplicate global constants

    together into a single constant that is shared. This is useful because some passes (i.e., TraceValues) insert a lot of string constants into the program, regardless of whether or not an existing string is available. MERGE DUPLICATE GLOBAL CONSTANTS
  25. FOR O2 OPTIMISATION LEVEL OPTIMISATION TECHNIQUES This pass promotes “by

    reference” arguments to be “by value” arguments. In practice, this means looking for internal functions that have pointer arguments. If it can prove, through the use of alias analysis, that an argument is only loaded, then it can pass the value into the function instead of the address of the value. PROMOTE ‘BY REFERENCE’ ARGUMENTS TO SCALARS