Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Source Language Representation of Function Summaries in Static Analysis

Source Language Representation of Function Summaries in Static Analysis

ICOOOLPS Workshop

July 18, 2016
Tweet

More Decks by ICOOOLPS Workshop

Other Decks in Science

Transcript

  1. Static Analysis  Analyzing the code without executig it 

    Widely used  Bug finding  Optimization  Code visualization  Verification  …  Can be a cheap supplement to testing
  2. LLVM/Clang  Open Source compiler (C, C++, Objective-C)  Supported

    by large companies (Apple, Google, Intel, …)  Modular design  Static Analysis tools  Static Analyzer  Clang Tidy  Clang Format  Dynamic Analysis tools  Sanitizers
  3. Clang Static Analyzer  Path Sensitive  Exponential in the

    number of branches  Lots of heuristics to keep the execution time reasonable  Symbolic Execution  Constraints on symbols represented as intervals  Context Sensitive  Function cloning (i.e.: „inlining” at the call site)  Models memory  Hierarchy  Aliasing  Works on large scale industrial projects
  4. void test(int b) { int a,c; switch (b){ case 1:

    a = b / 0; break; case 4: c = b - 4; a = b / c; break; }; } Symbolic Execution (Abstract interpretation) b: $b b: $b b: $b b: $b $b=[4,4] $b=[1,1] $b=[MIN,0],[2,3],[5,MAX] b: $b c: 0 $b=[4,4]; c=$b-4 c=0 b: $b c: 0 $b=[4,4] a=$b/$c case 4 c = b-4; switch(b) a = b/c; b: $b a = b/0;
  5. Despite The Precise Modeling… Information Loss void f(int *x); void

    g(int *x) { if (*x > 0) { f(x); f(x); } } void f(int *x) { *x = -(*x); } A.cpp B.cpp *x is positive *x is unknown
  6. Information Loss  Missing information in industrial applications  Used

    libaries  The precision can be greatly improved when information loss is minimized  Clang Static Analyzer does not support cross translation unit analysis  Greater degree of information loss than necessary  A fork have basic CTU implementation  On some projects it can provide up to 3 times more results
  7. Case Studies  rAthena  150k LOC C code 

    140 MB AST dump  4x analysis time  3x reported bugs  LLVM/Clang  2M+ LOC C++ code  45.4 GB AST dumps  36 GB w/o STL  24 GB w/o template instantiations  16 GB w/o STL and templ. insts.  Less than 10% is executable code  Xerces  150k LOC C++ code  900 MB AST dump
  8. Lazily Merge AST  Duplicated headers --> Large AST dumps

     Can not be solved in general due to macros  Inherent problem of the C/C++ compilation model  C++20 module system will solve this problem  To utilize modules, semantic changes are required  Most projects did not even catch up to C++11  Well behaved headers (e.g.: STL) can be deduplicated  Type context of two translation units are not disjoint  Type context merging on AST loading is necessary  Large portion of the AST dumps is type information
  9. Can We Omit Type Information?  Unfortunately, no in general!

     Type information is necessary for parsing (resolve ambiguities in the grammar)  In case a parsed AST is available, but the necessary type information is not, it is not possible to reason about the semantics of that piece of AST S (x); T ∗ y; func ((R) ∗ z); T T;
  10. Can We Omit Type Information at the Cost of Some

    Restrictions?  Yes!  There is a lowest common denominator of type contexts that is available at every call site for a function!  Types of the formal parameters  Return type  The type of the parent class for methods  The minimal required type context for these types  Summaries are a tool for interprocedural analysis  Summarization of information about a function  Use summaries of the original functions that only contain those permitted types!
  11. C/C++ Representation of Summaries  Use the same Static Analysis

    engine unmodified  Use summaries that only contain permitted types  They can be stored efficiently  No type information needs to be stored  They can be loaded efficiently  No type context merging is needed  It is possible to synthesize them!
  12. Use of a Summary void f(int *x); void g(int *x)

    { if (*x > 0) { f(x); f(x); } } void f(int *x) { *x = -(*x); } A.cpp f’s summary *x is positive *x is negative
  13. Use of a Summary #2 void f(int *x, D* d);

    void g(int *x, D* d) { if (*x > 0) { f(x, d); f(x, d); } } #include "D.h" void f(int *x, D* d) { if (d->negate()) *x = -(*x); } A.cpp B.cpp *x is positive *x has the same absolute value // No include! void f(int *x, D* d) { if (unknown_value()) *x = -(*x); } f’s summary D* is permitted, not D
  14. Conclusion  New method to represent lightweight summaries in C/C++

     These summaries can aid symbolic execution to improve precision  Efficient to store/load  Implementation is available in Clang  Proposed a scheme to synthesize summaries  It can be used to achieve cross translation unit analysis without modifying the static analysis engine