Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Program Analysis

Program Analysis

Go is a simple language with a very small grammar, which makes building tools for the language quite enjoyable.

This talk covers different techniques that you can use to analyze a given program without having to read a single line of code. From tools to measure the quality and complexity of a program to ones that pinpoint bugs and performance issues.

We will discuss the limits of what static analysis can understand and how dynamic analysis techniques can help us breach the gap between theory and usability.

Francesc Campoy Flores

August 21, 2015
Tweet

More Decks by Francesc Campoy Flores

Other Decks in Programming

Transcript

  1. Program Analysis Given a program we extract properties: correctness robustness

    performance others (well formatted, idiomatic, ...) This is done automatically, no human is involved.
  2. Dynamic Analysis We observe the behavior of the running program

    often it requires instrumenting the program it doesn't prove a property, it looks for failures Dmitry Vyukov - Go Dynamic Tools YouTube
  3. Dynamic analysis: unit tests Verify the correctness of a function.

    f u n c S u m ( v s [ ] i n t ) i n t { s : = 0 f o r _ , v : = r a n g e v s { s + = v } r e t u r n s } The verification is done with more code, no instrumentation needed. f u n c T e s t S u m ( t * t e s t i n g . T ) { v s : = [ ] i n t { 1 , 2 , 3 } s : = S u m ( v s ) i f s ! = 6 { t . E r r o r f ( " s u m ( % v ) s h o u l d b e 6 ; g o t % v " , v s , s ) } }
  4. Dynamic analysis: bound checks Go programs are instrumented to detect

    accesses to invalid positions in a slice. f u n c S u m ( v s [ ] i n t ) i n t { s : = 0 f o r i : = 0 ; i < = l e n ( v s ) ; i + + { s + = v s [ i ] } r e t u r n s } f u n c m a i n ( ) { v : = [ ] i n t { 1 , 2 , 3 , 4 } f m t . P r i n t l n ( S u m ( v [ : 3 ] ) ) } $ g o r u n b o u n d s . g o p a n i c : r u n t i m e e r r o r : i n d e x o u t o f r a n g e We can disable bounds checking with g c f l a g s = - B
  5. Dynamic analysis: race detector All memory accesses are instrumented to

    detect data races. Is this code correct? g o t e s t - r a c e r a c e . g o f u n c m a i n ( ) { n : = 0 g o f u n c ( ) { f o r r a n g e t i m e . T i c k ( t i m e . S e c o n d ) { n + + } } ( ) f o r r a n g e t i m e . T i c k ( t i m e . S e c o n d / 5 ) { f m t . P r i n t l n ( n ) } } Run
  6. Concurrent prime sieve Using channels and go routines by Russ

    Cox Generate generates all the numbers starting from 2 Filter filters all the numbers multiple of a given value
  7. Dynamic analysis: pprof Analyzing the code for the concurrent prime

    sieve f u n c m a i n ( ) { c , e r r : = o s . C r e a t e ( " p r i m e s . p r o f " ) i f e r r ! = n i l { l o g . F a t a l ( e r r ) } p p r o f . S t a r t C P U P r o f i l e ( c ) d e f e r p p r o f . S t o p C P U P r o f i l e ( ) r u n ( ) } Execute the program first, then generate a graph $ g o t o o l p p r o f - - s v g p p r o f p r i m e s . p r o f > p r i m e s . s v g Similar analysis can be done on memory and locks.
  8. Dynamic analysis: execution tracer Captures with nanosecond precision: goroutine creation/start/end

    goroutine blocking/unblocking network blocking system calls GC events
  9. Dynamic analysis: execution tracer Run a test with - t

    r a c e $ g o t e s t - t r a c e = t r a c e . o u t And start a tracer on it $ g o t o o l t r a c e t r a c e . t e s t t r a c e . o u t
  10. Dynamic analysis conclusion Pros: often conceptually simple finds only issues

    that are ACTUALLY occurring Cons: often makes your program slower finds ONLY issues that are actually occurring
  11. Static Analysis The program is not executed, instead the code

    is analyzed. no need to instrument any code unlike dynamic analysis, it can prove specific property
  12. The usual suspects Some go* gofmt: verify formatting conventions i

    m p o r t ( " m a t h " ; " f m t " ; " i o " ) golint: other conventions (naming, docs, etc) v a r s o m e _ v a l u e i n t go vet: find possible errors f m t . P r i n t f ( " % v + % v = % v " , a , b ) godoc: find exported identifiers and their docs
  13. Errcheck Finds all the errors that have been implicitly ignored.

    f u n c m a i n ( ) { o s . R e m o v e ( " s o m e f i l e " ) / / t h i s i s r e p o r t e d } Errors ignored explicitly are not reported. f u n c m a i n ( ) { _ = o s . R e m o v e ( " s o m e f i l e " ) / / t h i s i s f i n e } github.com/kisielk/errcheck
  14. goimports Finds unused imported packages and removes them Finds missing

    imported packages and finds them in your GOPATH: by matching the name of the package and the name of the identifier used Limitation: what package defines t e m p l a t e . T e m p l a t e ?
  15. goreturns Finds return statements where less values than expected are

    returned, and the last type returned is an e r r o r and adds the missing zero values. g o r e t u r n s turns: f u n c f o o ( ) ( i n t , b o o l , s t r i n g , e r r o r ) { r e t u r n f m t . E r r o r f ( " a w f u l s t u f f " ) } into: f u n c f o o ( ) ( i n t , b o o l , s t r i n g , e r r o r ) { r e t u r n 0 , f a l s e , " " , f m t . E r r o r f ( " a w f u l s t u f f " ) } github.com/sqs/goreturns
  16. govars Finds local variables that have been declared but not

    used and removes them, or comments them out, or adds a usage to them (e.g. _ = f o o ) This doesn't exist AFAIK, but if you want to do it it shouldn't be hard.
  17. Package for static analysis The standard library offers many packages

    for each level. Many tools can be built atop these handful of packages. But not all of them, or at least not easily.
  18. Limitations What's the value returned by this function? f u

    n c f o o ( n i n t ) i n t e r f a c e { } { s w i t c h { c a s e n < 0 : r e t u r n n i l c a s e n % 2 = = 0 : r e t u r n " e v e n " c a s e n = = 4 2 : r e t u r n 4 2 . 0 d e f a u l t : r e t u r n f a l s e } } We want to understand more than types: we want to follow the flow of a program.
  19. SSA IR Single Static Assignment Intermediary Representation Intermediary representation for

    code Every variable is assigned at most once Data flow analysis is easier based on this form Data flow analyzes the possible values for a variable at some point therefore, the possible ways some code can be executed
  20. SSA A p r o g r a m i

    s d e f i n e d t o b e i n S S A f o r m i f e a c h v a r i a b l e i s a t a r g e t o f e x a c t l y o n e a s s i g n m e n t s t a t e m e n t i n t h e p r o g r a m t e x t . So given a piece of code like this: a : = 0 b : = 1 b = a + b We can convert it to: a 1 : = 0 b 1 : = 1 b 2 : = a 1 + b 1
  21. The importance of naming Variables in Go have the same

    issue x : = 1 y : = x + 1 x = 2 z : = x + 1 What's the value of x ? Are y and z equal?
  22. Referential transparency SSA enforces one condition: one definition for each

    variable, so a variable value won't ever change. Similar to enforcing: c o n s t in C++ f i n a l in Java but not c o n s t in Go
  23. Referential transparency The value of a variable is independent of

    its position. Known in functional programming as referential transparency. Referential transparent expressions are independent of order of evaluation
  24. golang.org/x/tools/go/ssa SSA IR is an intermediate representation g o l

    a n g . o r g / x / t o o l s / g o / s s a provides the building blocks g o l a n g . o r g / x / t o o l s / c m d / s s a d u m p provides a tool to display SSA forms of Go programs
  25. ssadump Given this factorial function: f u n c f

    a c t ( x i n t ) i n t { i f x = = 0 { r e t u r n 1 } r e t u r n x * f a c t ( x - 1 ) } We can generate its SSA dump running s s a d u m p - b u i l d = F f a c t . g o
  26. ssadump # N a m e : f a c

    t . f a c t # P a c k a g e : f a c t # L o c a t i o n : f a c t . g o : 3 : 6 f u n c f a c t ( x i n t ) i n t : 0 : e n t r y P : 0 S : 2 t 0 = x = = 0 : i n t b o o l i f t 0 g o t o 1 e l s e 2 1 : i f . t h e n P : 1 S : 0 r e t u r n 1 : i n t 2 : i f . d o n e P : 1 S : 0 t 1 = x - 1 : i n t i n t t 2 = f a c t ( t 1 ) i n t t 3 = x * t 2 i n t r e t u r n t 3
  27. SSA New SSA Backend plan for 1.6 Some of the

    expected improvements: better common subexpression elimination better dead code elimination better register allocation better stack frame allocation
  28. Oracle Source analysis tool invoked by an editor answers questions

    about Go programs Powered by: SSA IR Pointer Analysis
  29. Pointer analysis on godoc g o d o c -

    a n a l y s i s = p o i n t e r live
  30. Narrower We all agree that: f u n c d

    u m p T o ( f * o s . F i l e ) { f m t . F p r i n t l n ( f , " s o m e c o n t e n t " ) } Is better written: f u n c d u m p T o ( w i o . W r i t e r ) { f m t . F p r i n t l n ( w , " s o m e c o n t e n t " ) } Using the narrowest interface is almost* always better. Could we write a tool to warn us when a valid narrower type exists?
  31. Algorithm For every parameter in the function find all the

    usages of the parameter if something else than a method is used we need concrete type otherwise the parameter should be an interface with only the used methods All of this can be achieved with g o / t y p e s .
  32. Examples A field is accessed, a struct is needed. t

    y p e T s t r u c t { N i n t } f u n c f o o ( v * T ) { f m t . P r i n t l n ( v . N ) } Indexing is used, a slice/array is needed. f u n c f i r s t ( v s [ ] i n t ) i n t { r e t u r n v s [ 0 ] } Only W r i t e is used, an i o . W r i t e r is great. f u n c h a n d l e ( c o n n n e t . C o n n ) { f m t . F p r i n t l n ( c o n n , " h e l l o " ) }
  33. Testing our hypothesis Now we have a tool to enforce

    our best practice. run the tool on all the code you can find the violations to the best practice fix them?
  34. Is narrower always better? There's other constraints we need to

    take into account. Should w be a i o . W r i t e r here? f u n c h e l l o H a n d l e r ( w h t t p . R e s p o n s e W r i t e r , r * h t t p . R e q u e s t ) { f m t . F p r i n t f ( w , " H e l l o , % v " , r . F o r m V a l u e ( " n a m e " ) ) } What if it is used somewhere as a h t t p . H a n d l e r F u n c ? f u n c i n i t ( ) { h t t p . H a n d l e F u n c ( " / " , h e l l o H a n d l e r ) } We need to find all the places where h e l l o H a n d l e r is called.
  35. Narrower is not always better Again: should w be a

    i o . W r i t e r here? t y p e H a n d l e r s t r u c t { } f u n c ( h H a n d l e r ) S e r v e H T T P ( w h t t p . R e s p o n s e W r i t e r , r * h t t p . R e q u e s t ) { f m t . F p r i n t f ( w , " H e l l o , % v " , r . F o r m V a l u e ( " n a m e " ) ) } What if a H a n d l e r is used as an h t t p . H a n d l e r ? f u n c i n i t ( ) { v a r h H a n d l e r h t t p . H a n d l e ( " / " , h ) } We need to find: all the places where a H a n d l e r is used all the interfaces that H a n d l e r satisfies
  36. Is this doable? Finding: all the places where a function

    is called all the places where a given type is used all the interfaces that a type satisfies The Go oracle and go/types provide all these features.
  37. Static analysis as a test for best practices Best practices

    are described in natural language. Translate them into programs that enforce them. Run them on your code base and see them fail: improve your code improve your best practice with more context