Upgrade to Pro — share decks privately, control downloads, hide ads and more …

TMPA-2017: Simple Type Based Alias Analysis for a VLIW Processor

TMPA-2017: Simple Type Based Alias Analysis for a VLIW Processor

TMPA-2017: Tools and Methods of Program Analysis
3-4 March, 2017, Hotel Holiday Inn Moscow Vinogradovo, Moscow

Simple Type Based Alias Analysis for a VLIW Processor
Aleksey Markin, Alexandr Ermolitsky, Moscow Center of SPARC Technologies

For video follow the link: https://youtu.be/_szACqu1fX8
Would like to know more?
Visit our website:
www.tmpaconf.org
www.exactprosystems.com/events/tmpa

Follow us:
https://www.linkedin.com/company/exactpro-systems-llc?trk=biz-companies-cym
https://twitter.com/exactpro

Exactpro

March 23, 2017
Tweet

More Decks by Exactpro

Other Decks in Technology

Transcript

  1. Elbrus Elbrus — general purpose VLIW (Very Long Instruction Word)

    microprocessor. Features: 23 instructions per tick In-Order instruction execution Array Access Unit (AAU) — asynchronous array loading from memory to the Array Prefetch Buffer (APB) Hardware support of loop pipelining Disambiguation Access Memory (DAM) — hardware support of pointer disambiguation All these features vitaly need good compiler optimization. 2 / 20
  2. Pointer analysis void foo(int * a, float * b) {

    for(int i = 1; i < N; i++) { a[0] += a[i]; b[0] *= b[i]; } } The purpose of pointer analysis is to detect whether a and b may refer to the the same memory area. It is difficult because: Lack of information about program (in per-module build mode) Pointer analysis needs a lot of resources (in whole-program mode) Pointer analysis algorithms are complicated 3 / 20
  3. Strict-aliasing The C language allows to disambiguate pointers by types:

    7 An object shall have its stored value accessed only by an lvalue expression that has one of the following types: a type compatible with the effective type of the object, a qualified version of a type compatible with the effective type of the object, a type that is the signed or unsigned type corresponding to the effective type of the object, a type that is the signed or unsigned type corresponding to a qualified version of the effective type of the object, an aggregate or union type that includes one of the a mentioned types among its members (including, recursively, a member of a subaggregate or contained union), or a character type. 4 / 20
  4. Algorithm The strict-aliasing implementation for lcc (Elbrus C Compiler) works

    with the architecture-independent IR (EIR). General description: 1. Gather all interesting READ and WRITE operations 2. Generate compatibility vector for each type of operations 3. Assign results of analysis to corresponding operations Type-based alias analysis is implemented in all major compilers. 5 / 20
  5. Implementation characteristics Pointer analysis — answers whether two pointers can

    refer to the same memory area Intraprocedural — does not require whole program information Flow-insensitive — does not use information about the program control-flow Context-insensitive — does not use information from the functions call points No memory modeling Result representation is vector 6 / 20
  6. Runtime results 400.perlbench 401.bzip2 403.gcc 429.mcf 445.gobmk 462.libquantum 464.h264ref 471.omnetpp

    473.astar 483.xalancbmk 0.90 0.95 1.00 1.05 1.10 1.15 17.49 lcc module lcc whole gcc module gcc lto Figure: Integer SPEC CPU2006 execution speedup (> 1 is better) 7 / 20
  7. Runtime results 416.gamess 433.milc 434.zeusmp 435.gromacs 436.cactusADM 437.leslie3d 444.namd 447.dealII

    450.soplex 453.povray 454.calculix 459.GemsFDTD 465.tonto 482.sphinx3 0.8 1.0 1.2 1.4 1.6 1.8 2.0 2.2 lcc module lcc whole gcc module gcc lto Figure: Floating point SPEC CPU2006 execution speedup (> 1 is better) 8 / 20
  8. Runtime results GMean speedup gained with the help of strict-aliasing:

    lcc -O3 -ffast lcc -O3 -ffast -fwhole gcc -O3 gcc -O3 -flto SPEC CPU2006 INT 28.6% 1.9% 1% 0% SPEC CPU2006 FP 13.3% 4.3% 1.5 1.1% Testing environment: lcc — Elbrus 4C (Elbrus v3 ISA) gcc — Intel Xeon E5-2650 (x86 64 ISA) 9 / 20
  9. Implementation Aspects Problem: strict aliasing violations are common. So separate

    analysis for strict-aliasing errors detecting was implemented Problem: unions are hard to analyse at compile time, so they are ignored 10 / 20
  10. 462.libquantum This test got 17.49 times execution speedup after enabling

    strict-aliasing analysis for per-module build mode! Three hottest functions have the same pattern: void foo(str_1 * str) { for(int i = 0; i < N; i++) { str->arr[i].field; // LOAD of arr and LOAD of field ... str->arr[i].field = val; // STORE to field } } Dependence between STORE of field and LOAD of arr prohibits to eliminate invariant LOAD. 11 / 20
  11. 462.libquantum In the lcc architecture-independent representation (EIR) we have the

    following operations: loop: ... o1. READ str : str_1 * o2. RD_FIELD o1.arr : str_2 * o3. ADD_P o2, i : str_2 * o4. RD_FIELD o3.field : int32 ... o4. WR_FIELD o3.field <- val : int32 12 / 20
  12. 462.libquantum The strict-aliasing analysis builds table of type compatibility for

    three types: str_1 * str_2 * int32 str_1 * 1 0 0 str_2 * 0 1 0 int32 0 0 1 In this example all three types are incompatibile and the operations working with them can not refer to the same memory area. 13 / 20
  13. 462.libquantum Speedup was gained by the Elbrus-specific optimizations. The architecture-dependent

    IR of the loop is the following: loop: ... o1. LOAD str->arr 0 -> r1 // Alias vector: 010 o2. ADD_P r1 i -> r2 o3. LOAD r2 offset(field) -> r3 // Alias vector: 001 ... o4. STORE r2 offset(field) val // Alias vector: 001 Results of strict-aliasing makes possible to disambiguate operations o1. LOAD and o4. STORE and to eliminate invariant o1. LOAD from the loop. 14 / 20
  14. 462.libquantum The only LOAD in the loop makes possible to

    evaluate some optimizations: o1. LOAD str->arr 0 -> r1 // Alias vector: 010 loop: ... o2. MOVA arr_buff ... o3. ADD_P r1 i -> r2 o4. STORE r2 offset(field) val // Alias vector: 001 Before strict-aliasing: weak pipelining DAM applied no APB After strict-aliasing: improved pipelining No DAM APB 15 / 20
  15. Other tests Almost all other tests (except 453.povray) have similar

    to 462.libquantum but more complicated code patterns. The tests 459.GemsFDTD and 437.leslie3d are Fortran tests but lcc translates them to C so we can also see their speedup. In the 453.povray hot functions there are no loops. The 16% speedup is based only on peephole improvement! 16 / 20
  16. Compile Time In general the impact of the analysis on

    the compilation time is low. Compilation time speedup: lcc -O3 -ffast lcc -O3 -ffast -fwhole gcc -O3 gcc -O3 -flto GMean -3% 1% 1% 2% The size of the stored analysis results is linear to the number of operations in the procedure. 18 / 20
  17. Summary Advantages of strict-aliasing: Relatively easy implementation Works in per-module

    build mode In some cases works with object fields High scalability Great execution speedup on VLIW processor Disadvantages of strict-aliasing: Needs complicated analysis for detecting strict-aliasing errors Low precision 19 / 20
  18. Conclusion In this work: Simple type-base alias analysis algorithm was

    described and implemented for Elbrus compiler The impact on the runtime and compile time characteristics analyzed Further work Extending algorithm to disambiguate fields of structures Detailed research of strict-aliasing errors in GNU/Linux distribution Comparison of different pointer analysis precision 20 / 20