Slide 1

Slide 1 text

CSI: Gopher

Slide 2

Slide 2 text

Francesc Campoy @francesc - funemployed Matt Silverlock @elithrar - Google

Slide 3

Slide 3 text

Two years ago, Francesc wrote a blog post

Slide 4

Slide 4 text

What are the most imported packages? Got some cool results

Slide 5

Slide 5 text

It wasn’t easy ...

Slide 6

Slide 6 text

No content

Slide 7

Slide 7 text

No content

Slide 8

Slide 8 text

No content

Slide 9

Slide 9 text

Last year I attended a GoSF meetup

Slide 10

Slide 10 text

Francesc attending Matt’s talk

Slide 11

Slide 11 text

● Wrote a LOT of regex ● Found some neat bugs! ● Francesc had been working with tools that made what I was doing more accessible.

Slide 12

Slide 12 text

How do you find bugs or API (mis) usage? ● Across microservice code-bases ● In all consumers of your library? ● Accurately?

Slide 13

Slide 13 text

Previously: ugly regular expressions

Slide 14

Slide 14 text

No content

Slide 15

Slide 15 text

And: really ugly regular expressions

Slide 16

Slide 16 text

Obviously, this is No Good ™ How can we do better?

Slide 17

Slide 17 text

What's this talk actually about?

Slide 18

Slide 18 text

Your task Find the most common string literal in your source code

Slide 19

Slide 19 text

What’s a string? It depends: 'single quotes' "double quotes" '''triple single quotes''' """triple double quotes""" `back quotes` «Please» „make it“ qq§ STOP! §

Slide 20

Slide 20 text

https://xkcd.com/1171/

Slide 21

Slide 21 text

Oh, also prepare to escape your escaping characters ...

Slide 22

Slide 22 text

https://xkcd.com/1638/

Slide 23

Slide 23 text

Well, that’s a complicated* regular expression *impossible without recursive regular expressions recursive regular expressions?

Slide 24

Slide 24 text

Recursive regular expressions?

Slide 25

Slide 25 text

Well, we can parse the repositories as AST (Abstract Syntax Trees).

Slide 26

Slide 26 text

Universal Abstract Syntax Trees package main import "fmt" func main() { fmt.Println("Hello, gophers") } File Declaration Import “fmt” Declaration FunctionGroup main Body go:CallExpr fmt Println “Hello, gophers” Fun Args

Slide 27

Slide 27 text

Universal Abstract Syntax Trees print('''Hello, Pythonistas''') File CallExpr print “Hello, Pythonistas” Fun Args

Slide 28

Slide 28 text

A single tree format for all languages: a single tool for all

Slide 29

Slide 29 text

A single label set: a single concept list

Slide 30

Slide 30 text

Universal Abstract Syntax Trees uast:String XPath: //uast:String File Declaration Import “fmt” Declaration FunctionGroup main Body go:CallExpr fmt Println “Hello, gophers” Fun Args “fmt” “Hello, gophers”

Slide 31

Slide 31 text

“Hello, Pythonistas” Universal Abstract Syntax Trees '''Hello, Pythonistas''' uast:String XPath: //uast:String File CallExpr print Fun Args

Slide 32

Slide 32 text

Let’s run it!

Slide 33

Slide 33 text

While this is running ...

Slide 34

Slide 34 text

So, how are ASTs relevant here? - We downloaded ~19GB of Go repositories from GitHub* - Parsed them as UAST with src-d/engine - Queried the generated databases with SQL + UAST extensions https://github.com/src-d/engine

Slide 35

Slide 35 text

OK, so: what are some of the things we can investigate? - Finding ‘bad’ code - Best practices and idioms - Usage analysis of APIs

Slide 36

Slide 36 text

Investigation #1: Finding 'bad' code

Slide 37

Slide 37 text

"Bad crypto" - One of my favorite topics - How is a non-expert supposed to know that math/rand is bad vs. crypto/rand? - What about hash functions vs. KDFs?

Slide 38

Slide 38 text

No content

Slide 39

Slide 39 text

No content

Slide 40

Slide 40 text

Investigation #2: Best practices & idioms

Slide 41

Slide 41 text

No content

Slide 42

Slide 42 text

No content

Slide 43

Slide 43 text

No content

Slide 44

Slide 44 text

No content

Slide 45

Slide 45 text

Computer says ...

Slide 46

Slide 46 text

No content

Slide 47

Slide 47 text

Number of init functions per file: mean 0.12

Slide 48

Slide 48 text

No content

Slide 49

Slide 49 text

Number of init functions per file (0 to 10)

Slide 50

Slide 50 text

Number of init functions per file (log scale) 352 init functions LOL

Slide 51

Slide 51 text

Smells like vendoring ...

Slide 52

Slide 52 text

github.com/vmware/govmomi/vim25/types/enum.go

Slide 53

Slide 53 text

No content

Slide 54

Slide 54 text

Applied the scientific method to best practices Make an observation. Ask a question. Form a hypothesis, or testable explanation. Make a prediction based on the hypothesis. Test the prediction. Iterate: use the results to make new hypotheses or predictions.

Slide 55

Slide 55 text

Investigation #3: API usage & breaking them

Slide 56

Slide 56 text

Premise: You maintain a popular OSS library. Problem: You're thinking about deprecating a method, but want to quantify the impact. How do you accurately find these cases?

Slide 57

Slide 57 text

Actual problem: ● gorilla/context predates net/http's Request.Context() implementation. ● Using both causes a memory leak, due to "islanding" the pointer to the original *Request. ● Can we find these users & help?!

Slide 58

Slide 58 text

The Problem

Slide 59

Slide 59 text

Users!

Slide 60

Slide 60 text

Compared to...

Slide 61

Slide 61 text

What else?

Slide 62

Slide 62 text

● Order your godoc by most-used identifiers - rather than a bunch of ErrSomething at the top ● Identify dependency version usage across your org, or as a maintainer. ● Smart auto-completion based on previous usages of the API.

Slide 63

Slide 63 text

No content

Slide 64

Slide 64 text

No content

Slide 65

Slide 65 text

Tools we used and where to find them

Slide 66

Slide 66 text

Tools: ● Google BigQuery ● source{d} engine: github.com/src-d/engine ● A lot of RAM, CPUs, and time Resources: ● Finding Bugs with BigQuery & GitHub: bit.ly/gosf-bq ● Analyzing Go code with BigQuery: bit.ly/go-bq

Slide 67

Slide 67 text

So … what about those strings?

Slide 68

Slide 68 text

No content

Slide 69

Slide 69 text

#1 with 1,260,374 occurrences: '' #2 with 456,480 occurrences: 'fmt' #3 with 337,064 occurrences: 'json:"-"' Most common strings in Go

Slide 70

Slide 70 text

Other languages most common strings in Python: 'automanaged' : 25,446 '//visibility:public' : 21,962 'go_library' : 16,485 most common strings in Ruby: '\n' : 658 '' : 551 'shell' : 363 most common strings in Java: '' : 613 '\n' : 392 '0' : 293 most common strings in PHP: 'strict_param' : 1 'strict' : 1 'short_array_syntax' : 1

Slide 71

Slide 71 text

Thanks! @elithrar @francesc