Upgrade to Pro — share decks privately, control downloads, hide ads and more …

How To Quickly Understand Millions of Lines of ...

Quil
October 20, 2020

How To Quickly Understand Millions of Lines of Code

Reading source code is not really doable once you hit the tens of thousands of lines of code. It's even more hopeless at millions of them. Yet, analysis tools that can summarise this information struggle just as much as humans do. So how do we build tools that can handle such ginormous codebases, anyway?

In this talk we'll take a practical (but superficial) look at some of the algorithms involved in the making of Glass, a static analysis tool developed at Klarna, and the optimisations that allow providing answers to analysis in real-time for IDEs, and reasonable-time for build/CI tools.

Links:
- Glass' GitHub repository: https://github.com/klarna-incubator/glass

- µKanren: A Minimal Functional Core for Relational Programming: http://webyrd.net/scheme-2013/papers/HemannMuKanren2013.pdf
- How Developers Search for Code: A Case Study: https://research.google/pubs/pub43835/
- Adapton: Composable, Demand-Driven Incremental Computation: http://matthewhammer.org/adapton/adapton-pldi2014.pdf

Quil

October 20, 2020
Tweet

More Decks by Quil

Other Decks in Programming

Transcript

  1. How to Quickly Understand Millions of Lines of Code? (a

    case for static analysis) http://talks.robotlolita.me
  2. And I want to work on tools to help people

    understand their programs.
  3. This is the perfect case for software analysis tools! We

    already rely on Dialyzer, Cover, Xref, etc
  4. And so I came up with an idea for a

    project that I called Glass. it means “ice cream” in Swedish.
  5. Let me have my cake and eat it, too! “I

    want code search that is fast enough to be used in an IDE, easy to use, and scalable for codebases of any size!” Let me have my cake and eat it, too!
  6. But what if you want very specific calls? “I’m looking

    for code like this” “But only if this holds”
  7. So we ran it on our biggest codebase: • >

    1 million lines of code • > 6000 Erlang modules • > 250 Erlang applications • Maintained by several teams over many years
  8. Visiting every node is bad!!! Oh no, the code changed

    AGAIN! Indexing!!! Invalidation :(
  9. Caitlin et al’s work also adds: Queries are generally bounded

    to “known” locations, rather than spanning all code.
  10. New knowledge! • Static information in queries --- infer indexes

    from it! • Queries are generally localised --- try to use location for ranking results! --- stream results to improve perceived performance!
  11. One thing I’ve learned: Having a clear UX vision and

    design constraints was very helpful. As did experience, of course.
  12. Glass’ design constraints were: • The time to think of

    a query matters --- make it tangible; • The perceived query performance matters --- show useful results quickly; • The user effort to get fast queries matters --- make queries easy to optimise;
  13. PAPER RECOMMENDATIONS µKanren: A Minimal Functional Core for Relational Programming

    Jason Hemann and Daniel P. Friedman (2013) How Developers Search for Code: A Case Study Caitlin Sadowski, Kathryn T. Stolee, and Sebastian Elbaum (2015) Adapton: Composable, Demand-Driven Incremental Computation Matthew A. Hammer, Khoo Yit Phang, Michael Hicks, and Jeffrey S. Foster (2014)