Program Analysis

Program Analysis No reading required Francesc Campoy Gopher Developer Advocate
at Google

Agenda Program Analysis Dynamic Static A vision

Program Analysis Given a program we extract properties: correctness robustness
performance others (well formatted, idiomatic, ...) This is done automatically, no human is involved.

Program analysis families There's two families: Dynamic Static

Dynamic Analysis

Dynamic Analysis We observe the behavior of the running program
often it requires instrumenting the program it doesn't prove a property, it looks for failures Dmitry Vyukov - Go Dynamic Tools YouTube

Dynamic analysis: unit tests Verify the correctness of a function.
f u n c S u m ( v s [ ] i n t ) i n t { s : = 0 f o r _ , v : = r a n g e v s { s + = v } r e t u r n s } The verification is done with more code, no instrumentation needed. f u n c T e s t S u m ( t * t e s t i n g . T ) { v s : = [ ] i n t { 1 , 2 , 3 } s : = S u m ( v s ) i f s ! = 6 { t . E r r o r f ( " s u m ( % v ) s h o u l d b e 6 ; g o t % v " , v s , s ) } }

Dynamic analysis: bound checks Go programs are instrumented to detect
accesses to invalid positions in a slice. f u n c S u m ( v s [ ] i n t ) i n t { s : = 0 f o r i : = 0 ; i < = l e n ( v s ) ; i + + { s + = v s [ i ] } r e t u r n s } f u n c m a i n ( ) { v : = [ ] i n t { 1 , 2 , 3 , 4 } f m t . P r i n t l n ( S u m ( v [ : 3 ] ) ) } $ g o r u n b o u n d s . g o p a n i c : r u n t i m e e r r o r : i n d e x o u t o f r a n g e We can disable bounds checking with g c f l a g s = - B

Dynamic analysis: race detector All memory accesses are instrumented to
detect data races. Is this code correct? g o t e s t - r a c e r a c e . g o f u n c m a i n ( ) { n : = 0 g o f u n c ( ) { f o r r a n g e t i m e . T i c k ( t i m e . S e c o n d ) { n + + } } ( ) f o r r a n g e t i m e . T i c k ( t i m e . S e c o n d / 5 ) { f m t . P r i n t l n ( n ) } } Run

Sieve of Erathotestenes A method to obtain the list of
prime numbers from wikipedia

Concurrent prime sieve Using channels and go routines by Russ
Cox Generate generates all the numbers starting from 2 Filter filters all the numbers multiple of a given value

Dynamic analysis: pprof Analyzing the code for the concurrent prime
sieve f u n c m a i n ( ) { c , e r r : = o s . C r e a t e ( " p r i m e s . p r o f " ) i f e r r ! = n i l { l o g . F a t a l ( e r r ) } p p r o f . S t a r t C P U P r o f i l e ( c ) d e f e r p p r o f . S t o p C P U P r o f i l e ( ) r u n ( ) } Execute the program first, then generate a graph $ g o t o o l p p r o f - - s v g p p r o f p r i m e s . p r o f > p r i m e s . s v g Similar analysis can be done on memory and locks.

Dynamic analysis: execution tracer Captures with nanosecond precision: goroutine creation/start/end
goroutine blocking/unblocking network blocking system calls GC events

Dynamic analysis: execution tracer Run a test with - t
r a c e $ g o t e s t - t r a c e = t r a c e . o u t And start a tracer on it $ g o t o o l t r a c e t r a c e . t e s t t r a c e . o u t

trace of all the program activity

trace for a go routine

Dynamic analysis: others Other dynamic analysis tools: Debuggers Code coverage
what else?

Dynamic analysis conclusion Pros: often conceptually simple finds only issues
that are ACTUALLY occurring Cons: often makes your program slower finds ONLY issues that are actually occurring

Static Analysis

Static Analysis The program is not executed, instead the code
is analyzed. no need to instrument any code unlike dynamic analysis, it can prove specific property

The usual suspects Some go* gofmt: verify formatting conventions i
m p o r t ( " m a t h " ; " f m t " ; " i o " ) golint: other conventions (naming, docs, etc) v a r s o m e _ v a l u e i n t go vet: find possible errors f m t . P r i n t f ( " % v + % v = % v " , a , b ) godoc: find exported identifiers and their docs

Errcheck Finds all the errors that have been implicitly ignored.
f u n c m a i n ( ) { o s . R e m o v e ( " s o m e f i l e " ) / / t h i s i s r e p o r t e d } Errors ignored explicitly are not reported. f u n c m a i n ( ) { _ = o s . R e m o v e ( " s o m e f i l e " ) / / t h i s i s f i n e } github.com/kisielk/errcheck

Analysis of "almost-correct" code Impossible to do with dynamic analysis.
Useful for "lazy" code authors (like me).

goimports Finds unused imported packages and removes them Finds missing
imported packages and finds them in your GOPATH: by matching the name of the package and the name of the identifier used Limitation: what package defines t e m p l a t e . T e m p l a t e ?

goreturns Finds return statements where less values than expected are
returned, and the last type returned is an e r r o r and adds the missing zero values. g o r e t u r n s turns: f u n c f o o ( ) ( i n t , b o o l , s t r i n g , e r r o r ) { r e t u r n f m t . E r r o r f ( " a w f u l s t u f f " ) } into: f u n c f o o ( ) ( i n t , b o o l , s t r i n g , e r r o r ) { r e t u r n 0 , f a l s e , " " , f m t . E r r o r f ( " a w f u l s t u f f " ) } github.com/sqs/goreturns

govars Finds local variables that have been declared but not
used and removes them, or comments them out, or adds a usage to them (e.g. _ = f o o ) This doesn't exist AFAIK, but if you want to do it it shouldn't be hard.

Let's recapitulate

Packages for static analysis There's different levels of analysis

Package for static analysis The standard library offers many packages
for each level. Many tools can be built atop these handful of packages. But not all of them, or at least not easily.

Limitations What's the value returned by this function? f u
n c f o o ( n i n t ) i n t e r f a c e { } { s w i t c h { c a s e n < 0 : r e t u r n n i l c a s e n % 2 = = 0 : r e t u r n " e v e n " c a s e n = = 4 2 : r e t u r n 4 2 . 0 d e f a u l t : r e t u r n f a l s e } } We want to understand more than types: we want to follow the flow of a program.

Single Static Assignment Intermediary Representation

SSA IR

SSA IR Single Static Assignment Intermediary Representation Intermediary representation for
code Every variable is assigned at most once Data flow analysis is easier based on this form Data flow analyzes the possible values for a variable at some point therefore, the possible ways some code can be executed

SSA A p r o g r a m i
s d e f i n e d t o b e i n S S A f o r m i f e a c h v a r i a b l e i s a t a r g e t o f e x a c t l y o n e a s s i g n m e n t s t a t e m e n t i n t h e p r o g r a m t e x t . So given a piece of code like this: a : = 0 b : = 1 b = a + b We can convert it to: a 1 : = 0 b 1 : = 1 b 2 : = a 1 + b 1

The importance of naming

The importance of naming Variables in Go have the same
issue x : = 1 y : = x + 1 x = 2 z : = x + 1 What's the value of x ? Are y and z equal?

Referential transparency SSA enforces one condition: one definition for each
variable, so a variable value won't ever change. Similar to enforcing: c o n s t in C++ f i n a l in Java but not c o n s t in Go

Referential transparency The value of a variable is independent of
its position. Known in functional programming as referential transparency. Referential transparent expressions are independent of order of evaluation

Representing Go programs in SSA IR

golang.org/x/tools/go/ssa SSA IR is an intermediate representation g o l
a n g . o r g / x / t o o l s / g o / s s a provides the building blocks g o l a n g . o r g / x / t o o l s / c m d / s s a d u m p provides a tool to display SSA forms of Go programs

ssadump Given this factorial function: f u n c f
a c t ( x i n t ) i n t { i f x = = 0 { r e t u r n 1 } r e t u r n x * f a c t ( x - 1 ) } We can generate its SSA dump running s s a d u m p - b u i l d = F f a c t . g o

ssadump # N a m e : f a c
t . f a c t # P a c k a g e : f a c t # L o c a t i o n : f a c t . g o : 3 : 6 f u n c f a c t ( x i n t ) i n t : 0 : e n t r y P : 0 S : 2 t 0 = x = = 0 : i n t b o o l i f t 0 g o t o 1 e l s e 2 1 : i f . t h e n P : 1 S : 0 r e t u r n 1 : i n t 2 : i f . d o n e P : 1 S : 0 t 1 = x - 1 : i n t i n t t 2 = f a c t ( t 1 ) i n t t 3 = x * t 2 i n t r e t u r n t 3

SSA New SSA Backend plan for 1.6 Some of the
expected improvements: better common subexpression elimination better dead code elimination better register allocation better stack frame allocation

Tools enabled by SSA

Oracle Source analysis tool invoked by an editor answers questions
about Go programs Powered by: SSA IR Pointer Analysis

Demo time!

Pointer analysis on godoc g o d o c -
a n a l y s i s = p o i n t e r live

My vision

Static analysis as a test for best practices

Narrower We all agree that: f u n c d
u m p T o ( f * o s . F i l e ) { f m t . F p r i n t l n ( f , " s o m e c o n t e n t " ) } Is better written: f u n c d u m p T o ( w i o . W r i t e r ) { f m t . F p r i n t l n ( w , " s o m e c o n t e n t " ) } Using the narrowest interface is almost* always better. Could we write a tool to warn us when a valid narrower type exists?

Algorithm For every parameter in the function find all the
usages of the parameter if something else than a method is used we need concrete type otherwise the parameter should be an interface with only the used methods All of this can be achieved with g o / t y p e s .

Examples A field is accessed, a struct is needed. t
y p e T s t r u c t { N i n t } f u n c f o o ( v * T ) { f m t . P r i n t l n ( v . N ) } Indexing is used, a slice/array is needed. f u n c f i r s t ( v s [ ] i n t ) i n t { r e t u r n v s [ 0 ] } Only W r i t e is used, an i o . W r i t e r is great. f u n c h a n d l e ( c o n n n e t . C o n n ) { f m t . F p r i n t l n ( c o n n , " h e l l o " ) }

Testing our hypothesis Now we have a tool to enforce
our best practice. run the tool on all the code you can find the violations to the best practice fix them?

Is narrower always better? There's other constraints we need to
take into account. Should w be a i o . W r i t e r here? f u n c h e l l o H a n d l e r ( w h t t p . R e s p o n s e W r i t e r , r * h t t p . R e q u e s t ) { f m t . F p r i n t f ( w , " H e l l o , % v " , r . F o r m V a l u e ( " n a m e " ) ) } What if it is used somewhere as a h t t p . H a n d l e r F u n c ? f u n c i n i t ( ) { h t t p . H a n d l e F u n c ( " / " , h e l l o H a n d l e r ) } We need to find all the places where h e l l o H a n d l e r is called.

Narrower is not always better Again: should w be a
i o . W r i t e r here? t y p e H a n d l e r s t r u c t { } f u n c ( h H a n d l e r ) S e r v e H T T P ( w h t t p . R e s p o n s e W r i t e r , r * h t t p . R e q u e s t ) { f m t . F p r i n t f ( w , " H e l l o , % v " , r . F o r m V a l u e ( " n a m e " ) ) } What if a H a n d l e r is used as an h t t p . H a n d l e r ? f u n c i n i t ( ) { v a r h H a n d l e r h t t p . H a n d l e ( " / " , h ) } We need to find: all the places where a H a n d l e r is used all the interfaces that H a n d l e r satisfies

Is this doable? Finding: all the places where a function
is called all the places where a given type is used all the interfaces that a type satisfies The Go oracle and go/types provide all these features.

Static analysis as a test for best practices Best practices
are described in natural language. Translate them into programs that enforce them. Run them on your code base and see them fail: improve your code improve your best practice with more context

Thank you Francesc Campoy Gopher Developer Advocate at Google @francesc
[email protected]

Program Analysis

Program Analysis

More Decks by Francesc Campoy Flores

Other Decks in Programming

Featured

Transcript