Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Analyzing Go Programs: no reading required

Analyzing Go Programs: no reading required

Go is a simple language: both its grammar its type systems are easy to explain and express. This has an influence on the tooling one can develop and this talk will concentrate on what information you can extract from a program without having any human reading the code at any point.

Presented at Øredev 2015
http://oredev.org/2015/sessions/analyzing-go-programs-no-reading-required

Francesc Campoy Flores

November 05, 2015
Tweet

More Decks by Francesc Campoy Flores

Other Decks in Programming

Transcript

  1. Program Analysis Given a program we extract properties: correctness robustness

    performance others (well formatted, idiomatic, ...) This is done automatically, no human is involved.
  2. Dynamic Analysis We observe the behavior of the running program

    often it requires instrumenting the program it doesn't prove a property, it looks for failures
  3. Dynamic analysis: unit tests Verify the correctness of a function.

    f u n c S u m ( v s [ ] i n t ) i n t { s : = 0 f o r _ , v : = r a n g e v s { s + = v } r e t u r n s }
  4. Dynamic analysis: unit tests The verification is done with more

    code, no instrumentation needed. f u n c T e s t S u m ( t * t e s t i n g . T ) { v s : = [ ] i n t { 1 , 2 , 3 } s : = S u m ( v s ) i f s ! = 6 { t . E r r o r f ( " s u m ( % v ) s h o u l d b e 6 ; g o t % v " , v s , s ) } } You can run tests: $ g o t e s t P A S S o k g o l a n g . o r g / x / t a l k s / 2 0 1 5 / p r o g r a m - a n a l y s i s / s u m 0 . 0 1 9 s
  5. Dynamic analysis: benchmarks Benchmarks can be used to f u

    n c B e n c h m a r k S u m 1 ( b * t e s t i n g . B ) { f o r i : = 0 ; i < b . N ; i + + { s = S u m ( [ ] i n t { 1 } ) } } f u n c B e n c h m a r k S u m 1 0 ( b * t e s t i n g . B ) { f o r i : = 0 ; i < b . N ; i + + { s = S u m ( [ ] i n t { 1 , 2 , 3 , 4 , 5 , 6 , 7 , 8 , 9 , 1 0 } ) } } You can run benchmarks: $ g o t e s t - b e n c h = . P A S S B e n c h m a r k S u m 1 - 4 5 0 0 0 0 0 0 0 0 3 . 5 3 n s / o p B e n c h m a r k S u m 1 0 - 4 1 0 0 0 0 0 0 0 0 1 1 . 3 n s / o p o k g o l a n g . o r g / x / t a l k s / 2 0 1 5 / p r o g r a m - a n a l y s i s / s u m 3 . 2 9 0 s
  6. Dynamic analysis: bound checks Go programs are instrumented to detect

    accesses to invalid positions in a slice. f u n c S u m ( v s [ ] i n t ) i n t { s : = 0 f o r i : = 0 ; i < = l e n ( v s ) ; i + + { s + = v s [ i ] } r e t u r n s } f u n c m a i n ( ) { v : = [ ] i n t { 1 , 2 , 3 , 4 } f m t . P r i n t l n ( S u m ( v [ : 3 ] ) ) } $ g o r u n b o u n d s . g o p a n i c : r u n t i m e e r r o r : i n d e x o u t o f r a n g e We can disable bounds checking with g c f l a g s = - B
  7. Dynamic analysis: race detector All memory accesses are instrumented to

    detect data races. Is this code correct? g o t e s t - r a c e r a c e . g o f u n c m a i n ( ) { n : = 0 g o f u n c ( ) { f o r r a n g e t i m e . T i c k ( t i m e . S e c o n d ) { n + + } } ( ) f o r r a n g e t i m e . T i c k ( t i m e . S e c o n d / 5 ) { f m t . P r i n t l n ( n ) } } Run
  8. Concurrent prime sieve Using channels and go routines by Russ

    Cox Generate generates all the numbers starting from 2 Filter filters all the numbers multiple of a given value
  9. Dynamic analysis: pprof Analyzing the code for the concurrent prime

    sieve f u n c m a i n ( ) { c , e r r : = o s . C r e a t e ( " p r i m e s . p r o f " ) i f e r r ! = n i l { l o g . F a t a l ( e r r ) } p p r o f . S t a r t C P U P r o f i l e ( c ) d e f e r p p r o f . S t o p C P U P r o f i l e ( ) r u n ( ) } Execute the program first, then generate a graph $ g o t o o l p p r o f - - s v g p p r o f p r i m e s . p r o f > p r i m e s . s v g Similar analysis can be done on memory and locks.
  10. Dynamic analysis: execution tracer Captures with nanosecond precision: goroutine creation/start/end

    goroutine blocking/unblocking network blocking system calls GC events
  11. Dynamic analysis: execution tracer Run a test with - t

    r a c e $ g o t e s t - t r a c e = t r a c e . o u t And start a tracer on it $ g o t o o l t r a c e t r a c e . t e s t t r a c e . o u t
  12. Dynamic analysis conclusion Pros: often conceptually simple finds only issues

    that are ACTUALLY occurring Cons: often makes your program slower finds ONLY issues that are actually occurring
  13. Static Analysis The program is not executed, instead the code

    is analyzed. no need to instrument any code unlike dynamic analysis, it can prove specific property
  14. The usual suspects Some go* gofmt: verify formatting conventions i

    m p o r t ( " m a t h " ; " f m t " ; " i o " ) golint: other conventions (naming, docs, etc) v a r s o m e _ v a l u e i n t go vet: find possible errors f m t . P r i n t f ( " % v + % v = % v " , a , b ) godoc: find exported identifiers and their docs
  15. Errcheck Finds all the errors that have been implicitly ignored.

    f u n c m a i n ( ) { o s . R e m o v e ( " s o m e f i l e " ) / / t h i s i s r e p o r t e d } Errors ignored explicitly are not reported. f u n c m a i n ( ) { _ = o s . R e m o v e ( " s o m e f i l e " ) / / t h i s i s f i n e } github.com/kisielk/errcheck
  16. goimports Finds unused imported packages and removes them Finds missing

    imported packages and finds them in your GOPATH: by matching the name of the package and the name of the identifier used Limitation: what package defines t e m p l a t e . T e m p l a t e ?
  17. goreturns Finds return statements where less values than expected are

    returned, and the last type returned is an e r r o r and adds the missing zero values. g o r e t u r n s turns: f u n c f o o ( ) ( i n t , b o o l , s t r i n g , e r r o r ) { r e t u r n f m t . E r r o r f ( " a w f u l s t u f f " ) } into: f u n c f o o ( ) ( i n t , b o o l , s t r i n g , e r r o r ) { r e t u r n 0 , f a l s e , " " , f m t . E r r o r f ( " a w f u l s t u f f " ) } github.com/sqs/goreturns
  18. A bit harder What's the value returned by this function?

    f u n c f o o ( n i n t ) i n t e r f a c e { } { s w i t c h { c a s e n < 0 : r e t u r n n i l c a s e n % 2 = = 0 : r e t u r n " e v e n " c a s e n = = 4 2 : r e t u r n 4 2 . 0 d e f a u l t : r e t u r n f a l s e } } We want to understand more than types: we want to follow the flow of a program.
  19. SSA IR Single Static Assignment Intermediary Representation Intermediary representation for

    code Every variable is assigned at most once Data flow analysis is easier based on this form Data flow analyzes the possible values for a variable at some point therefore, the possible ways some code can be executed
  20. SSA A p r o g r a m i

    s d e f i n e d t o b e i n S S A f o r m i f e a c h v a r i a b l e i s a t a r g e t o f e x a c t l y o n e a s s i g n m e n t s t a t e m e n t i n t h e p r o g r a m t e x t . So given a piece of code like this: a : = 0 b : = 1 b = a + b We can convert it to: a 1 : = 0 b 1 : = 1 b 2 : = a 1 + b 1
  21. The importance of naming Variables in Go have the same

    issue x : = 1 y : = x + 1 x = 2 z : = x + 1 What's the value of x ? Are y and z equal?
  22. Referential transparency SSA enforces one condition: one definition for each

    variable, so a variable value won't ever change. Similar to enforcing: c o n s t in C++ f i n a l in Java but not c o n s t in Go
  23. Referential transparency The value of a variable is independent of

    its position. Referential transparent expressions are independent of order of evaluation
  24. golang.org/x/tools/go/ssa SSA IR is an intermediate representation g o l

    a n g . o r g / x / t o o l s / g o / s s a provides the building blocks g o l a n g . o r g / x / t o o l s / c m d / s s a d u m p provides a tool to display SSA forms of Go programs
  25. ssadump Given this factorial function: f u n c f

    a c t ( x i n t ) i n t { i f x = = 0 { r e t u r n 1 } r e t u r n x * f a c t ( x - 1 ) } We can generate its SSA dump running s s a d u m p - b u i l d = F f a c t . g o
  26. ssadump # N a m e : f a c

    t . f a c t # P a c k a g e : f a c t # L o c a t i o n : f a c t . g o : 3 : 6 f u n c f a c t ( x i n t ) i n t : 0 : e n t r y P : 0 S : 2 t 0 = x = = 0 : i n t b o o l i f t 0 g o t o 1 e l s e 2 1 : i f . t h e n P : 1 S : 0 r e t u r n 1 : i n t 2 : i f . d o n e P : 1 S : 0 t 1 = x - 1 : i n t i n t t 2 = f a c t ( t 1 ) i n t t 3 = x * t 2 i n t r e t u r n t 3
  27. SSA New SSA Backend plan for 1.6 Some of the

    expected improvements: better common subexpression elimination better dead code elimination better register allocation better stack frame allocation
  28. Oracle Source analysis tool invoked by an editor answers questions

    about Go programs Powered by: SSA IR Pointer Analysis
  29. Pointer analysis on godoc g o d o c -

    a n a l y s i s = p o i n t e r live
  30. Conclusion Program analysis: dynamic static Provides tools for: verification edition

    exploration Use the tools that exist and build the ones that you want!