Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Code Optimization 101

Filipe Moura
December 16, 2013

Code Optimization 101

Code Optimization 101

Filipe Moura

December 16, 2013
Tweet

More Decks by Filipe Moura

Other Decks in Programming

Transcript

  1. DISCLAIMER The names and scenarios presented here are purely fictional.

    Any similarity to real people or scenarios is purely a coincidence. No animals were harmed during the making of this Workshop. 3
  2. WHAT IS OPTIMIZATION? Making code execute more efficiently Can be

    measured either by execution speed or resource usage 5
  3. RULES OF PROGRAM OPTIMIZATION: The First Rule: Don't do it.

    The Second Rule (for experts only!): Don't do it yet. Michael A. Jackson, 1970s 8
  4. TIMING AND BENCHMARKING UNIX COMMAND TIME > time myProgram real

    0m0.0005s user 0m0.0004s sys 0m0.0000s The rows are: • Elapsed time of process • Seconds of user time devoted to process • Seconds of system time devoted to process 21
  5. TIMING AND BENCHMARKING ABOUT MEASURES • Measure to find bottlenecks

    • Measurements need to be precise • Measurements need to be repeatable • Measure improvement after each optimization 22
  6. CODE TUNING GENERALITIES Data Allocation • Disk access is slow

    • Main Memory access is faster than disk • CPU Cache Memory (if exists) is faster than main memory • CPU Registers are fastest Binary Formats • Double precision arithmetic is slow • Floating point arithmetic is faster than double precision • Long integer arithmetic is faster than floating-point • Short Integer, fixed-point arithmetic is faster than long arithmetic • Bitwise arithmetic is fastest Arithmetic Operations • Exponentiation is slow • Division is faster than exponentiation • Multiplication is faster than division • Addition/Subtraction is faster than multiplication • Bitwise operations are fastest 24
  7. TIMING AND BENCHMARKING CONSTANT FOLDING a = 30/10 * b

    * 2; - - - a = 8; b = 100; for(i=0; i<100; i++) j = i + a * b; a = 3 * b * 2; - - - a = 8; b = 100; c = a * b; for(i=0; i<100; i++) j = i + c; 25
  8. TIMING AND BENCHMARKING COMMON SUB-EXPRESSION ELIMINATION a = b *

    c; x = 5 * c * b; y = c * b + 8; t = b * c; a = t; x = 5 * t; y = t + 8; 26
  9. TIMING AND BENCHMARKING INACCESSIBLE CODE | DEAD CODE for(i=0; i<100;

    i++){ j += i * 20; if(j<i) k = i * j } for(i=0; i<100; i++) j += i * 20; 27
  10. TIMING AND BENCHMARKING UNUSED RESULTS function stupid(k){ int j; for(i=0;

    i<100; i++){ j = i * 20; k += i * 2; } return k; } function stupid(k){ for(i=0; i<100; i++) k += i * 2; return k; } 28
  11. TIMING AND BENCHMARKING SIMPLIFICATION OF CONSTANTS for(i=0; i<100; i++) j

    += i * 20 * sqrt(25); for(i=0; i<100; i++) j += i * 100; 29
  12. TIMING AND BENCHMARKING REMOVAL OF LOOP INVARIANT CODE for(i=0; i<100;

    i++){ a = b * c; d += i * 10; } a = b * c; for(i=0; i<100; i++) d += i * 10; 30
  13. TIMING AND BENCHMARKING AVOIDING SUBROUTINE CALLS for(i=0; i<100; i++) a

    += cube(i); function cube(k){ int t; t = k * k * k; return k; } for(i=0; i<100; i++) a += i * i * i; 31
  14. TIMING AND BENCHMARKING LOOP FUSION for(i=0; i<100; i++) a +=

    i * 5; for(i=0; i<100; i++) b += i + 10; for(i=0; i<100; i++){ a += i * 5; b += i + 10; } 32
  15. TIMING AND BENCHMARKING LOOPS BRANCHING for(i=0; i<100; i++){ if(a>b) c

    += i * 10; else c += i * 20; } if(a>b) for(i=0; i<100; i++) c += i * 10; else for(i=0; i<100; i++) c += i * 20; 33
  16. TIMING AND BENCHMARKING LOOP UNROLLING for(i=0; i<10000; i++) a[i] =

    i * b(i); for(i=0; i<10000; i+=4){ a[i] = i * b(i); a[i+1] = i+1 * b(i+1); a[i+2] = i+2 * b(i+2); a[i+3] = i+3 * b(i+3); } 34
  17. TIMING AND BENCHMARKING ELIMINATE LOOPS WITH LOW TRIP COUNTS for(i=0;

    i<4; i++) a[i] = i * b(i); a[0] = 0; a[1] = b(1); a[2] = 2 * b(2); a[3] = 3 * b(3); 35
  18. TIMING AND BENCHMARKING REMOVAL OF SUB-LOOP INVARIANT CODE for(i=0; i<100;

    i++) for(j=0; j<100; j++) a = a + b[i] * c[j]; for(i=0; i<100; i++){ t = b[i]; for(j=0; j<100; j++) a = a + t * c[j]; } 36
  19. TIMING AND BENCHMARKING CHANGE LOOP ORDER for(j=0; j<100; j++) for(i=0;

    i<10; i++) a = a + b[i][j]; for(i=0; i<10; i++) for(j=0; j<100; j++) a = a + b[i][j]; 37
  20. TIMING AND BENCHMARKING FASTER FOR LOOPS for(i=0; i<100; i++) a

    += i * 10; for(i=100; i; --i) a += i * 10; 38
  21. TIMING AND BENCHMARKING PROPER MEMORY ALLOCATION for(i=0; i<n; i++) a[i]

    = malloc( 3 * sizeof(int) ); a = malloc( 3 * n * sizeof(int) ); 39
  22. TIMING AND BENCHMARKING CONSTANT PROPAGATION • Measure to find bottlenecks

    • Measurements need to be precise • Measurements need to be repeatable • Measure improvement after each optimization 40
  23. ALGORITHM COMPLEXITY BIG-O NOTATION Definition: A theoretical measure of the

    execution of an algorithm, usually the time or memory needed 42
  24. ALGORITHM COMPLEXITY BIG-O NOTATION Definition: A theoretical measure of the

    execution of an algorithm, usually the time or memory needed 43
  25. ALGORITHM COMPLEXITY BIG-O NOTATION Complexities: O(1) constant O(log(n)) logarithmic O((log(n))^c)

    polylogarithmic O(n) linear O(n^2) quadratic O(n^c) polynomial O(c^n) exponential 44
  26. ALGORITHM COMPLEXITY BIG-O NOTATION O(N) public int linear(int n) {

    int sum = 0; for (int j = 0; j < n; j++) sum += j; return sum; } 47
  27. ALGORITHM COMPLEXITY BIG-O NOTATION O(n^2) public int quadratic(int n) {

    int sum = 0; for (int j = 0; j < n; j++) for (int k = 0; k < n; k++) sum += j * k; return sum; } 48
  28. ALGORITHM COMPLEXITY BIG-O NOTATION O(n^3) public int cubic(int n) {

    int sum = 0; for (int j = 0; j < n; j++) for (int k = 0; k < n; k++) for (int l = 0; l < n; l++) sum += j * k / (l + 1); return sum; } 49
  29. ALGORITHM COMPLEXITY BIG-O NOTATION O(log N) public static int binarySearch(int[]

    toSearch, int key) { int fromIndex = 0; int toIndex = toSearch.length - 1; while (fromIndex < toIndex) { int midIndex = (toIndex - fromIndex / 2) + fromIndex; int midValue = toSearch[midIndex]; if (key > midValue) fromIndex = midIndex++; elseif (key < midValue) toIndex = midIndex - 1; else return midIndex; } return -1; } 50
  30. ALGORITHM COMPLEXITY BIG-O NOTATION Example 1 for(int i = 0;

    i < n; i++) for( int j = 0; j < n * n; j++) sum++; Example 2 for(int i = 1; i < n; i = i * 2) sum++; Example 3 for(int i = 0; i < n; i++) for( int j = 0; j < n * n; j++) for(int k = 0; k < j; k++) sum++; 52
  31. ALGORITHM COMPLEXITY BIG-O NOTATION Example 1 for(int i = 0;

    i < n; i++) for( int j = 0; j < n * n; j++) sum++; Correct answer: O(n^3) Example 2 for(int i = 1; i < n; i = i * 2) sum++; Example 3 for(int i = 0; i < n; i++) for( int j = 0; j < n * n; j++) for(int k = 0; k < j; k++) sum++; 53
  32. ALGORITHM COMPLEXITY BIG-O NOTATION Example 1 for(int i = 0;

    i < n; i++) for( int j = 0; j < n * n; j++) sum++; Correct answer: O(n^3) Example 2 for(int i = 1; i < n; i = i * 2) sum++; Correct answer: O(log(N)) Example 3 for(int i = 0; i < n; i++) for( int j = 0; j < n * n; j++) for(int k = 0; k < j; k++) sum++; 54
  33. ALGORITHM COMPLEXITY BIG-O NOTATION Example 1 for(int i = 0;

    i < n; i++) for( int j = 0; j < n * n; j++) sum++; Correct answer: O(n^3) Example 2 for(int i = 1; i < n; i = i * 2) sum++; Correct answer: O(log(N)) Example 3 for(int i = 0; i < n; i++) for( int j = 0; j < n * n; j++) for(int k = 0; k < j; k++) sum++; Correct answer: O(n^5) 55
  34. PRO TUNING WHAT I SHOULD LEARN NEXT • Data Flow

    Analysis • Control Flow Analysis • Three Address Code (TAC) • Assembly Language 57
  35. 58