CSI: Gopher

CSI: Gopher

If we could intelligently parse all of the open-source Go code on GitHub, what could we learn? We’re going to show you some of the interesting things we’ve found in Go projects, from library usage, idioms & package layouts, to how Gophers can use this data to make decisions about their own APIs.

D8e5d79ca42edc07693b9c1aacaa7e5e?s=128

Francesc Campoy Flores

April 12, 2019
Tweet

Transcript

  1. 6.
  2. 7.
  3. 8.
  4. 11.

    • Wrote a LOT of regex • Found some neat

    bugs! • Francesc had been working with tools that made what I was doing more accessible.
  5. 12.

    How do you find bugs or API (mis) usage? •

    Across microservice code-bases • In all consumers of your library? • Accurately?
  6. 14.
  7. 19.

    What’s a string? It depends: 'single quotes' "double quotes" '''triple

    single quotes''' """triple double quotes""" `back quotes` «Please» „make it“ qq§ STOP! §
  8. 26.

    Universal Abstract Syntax Trees package main import "fmt" func main()

    { fmt.Println("Hello, gophers") } File Declaration Import “fmt” Declaration FunctionGroup main Body go:CallExpr fmt Println “Hello, gophers” Fun Args
  9. 30.

    Universal Abstract Syntax Trees uast:String XPath: //uast:String File Declaration Import

    “fmt” Declaration FunctionGroup main Body go:CallExpr fmt Println “Hello, gophers” Fun Args “fmt” “Hello, gophers”
  10. 34.

    So, how are ASTs relevant here? - We downloaded ~19GB

    of Go repositories from GitHub* - Parsed them as UAST with src-d/engine - Queried the generated databases with SQL + UAST extensions https://github.com/src-d/engine
  11. 35.

    OK, so: what are some of the things we can

    investigate? - Finding ‘bad’ code - Best practices and idioms - Usage analysis of APIs
  12. 37.

    "Bad crypto" - One of my favorite topics - How

    is a non-expert supposed to know that math/rand is bad vs. crypto/rand? - What about hash functions vs. KDFs?
  13. 38.
  14. 39.
  15. 41.
  16. 42.
  17. 43.
  18. 44.
  19. 46.
  20. 48.
  21. 53.
  22. 54.

    Applied the scientific method to best practices Make an observation.

    Ask a question. Form a hypothesis, or testable explanation. Make a prediction based on the hypothesis. Test the prediction. Iterate: use the results to make new hypotheses or predictions.
  23. 56.

    Premise: You maintain a popular OSS library. Problem: You're thinking

    about deprecating a method, but want to quantify the impact. How do you accurately find these cases?
  24. 57.

    Actual problem: • gorilla/context predates net/http's Request.Context() implementation. • Using

    both causes a memory leak, due to "islanding" the pointer to the original *Request. • Can we find these users & help?!
  25. 59.
  26. 62.

    • Order your godoc by most-used identifiers - rather than

    a bunch of ErrSomething at the top • Identify dependency version usage across your org, or as a maintainer. • Smart auto-completion based on previous usages of the API.
  27. 63.
  28. 64.
  29. 66.

    Tools: • Google BigQuery • source{d} engine: github.com/src-d/engine • A

    lot of RAM, CPUs, and time Resources: • Finding Bugs with BigQuery & GitHub: bit.ly/gosf-bq • Analyzing Go code with BigQuery: bit.ly/go-bq
  30. 68.
  31. 69.

    #1 with 1,260,374 occurrences: '' #2 with 456,480 occurrences: 'fmt'

    #3 with 337,064 occurrences: 'json:"-"' Most common strings in Go
  32. 70.

    Other languages most common strings in Python: 'automanaged' : 25,446

    '//visibility:public' : 21,962 'go_library' : 16,485 most common strings in Ruby: '\n' : 658 '' : 551 'shell' : 363 most common strings in Java: '' : 613 '\n' : 392 '0' : 293 most common strings in PHP: 'strict_param' : 1 'strict' : 1 'short_array_syntax' : 1