Upgrade to Pro — share decks privately, control downloads, hide ads and more …

gitbase: exploring git repos with SQL

gitbase: exploring git repos with SQL

Francesc Campoy from source{d} will talk about gitbase, a new Open Source project fully written in Go that stands on the shoulders of giants, as one says. By integrating the codebases of go-git – the most successful git implementation in Go – and vitess – a replication layer for all the MySQL databases at YouTube, gitbase is able to provide an easy way to extract information from hundreds of git repositories with a simple SQL request.

The talk will provide an in-depth description of the project as well as the way source{d} implemented it and what they learned on the way.

Francesc Campoy Flores

September 19, 2018
Tweet

More Decks by Francesc Campoy Flores

Other Decks in Programming

Transcript

  1. exploring git repos with SQL
    gitbase and source{d} engine

    View full-size slide

  2. Francesc Campoy
    VP of Developer Relations
    [email protected]
    @francesc
    github.com/campoy

    View full-size slide

  3. just for
    func
    youtube.com/justforfunc

    View full-size slide

  4. just for
    func
    twitch.tv/justforfunclive
    li
    !

    View full-size slide

  5. Agenda
    ● What is gitbase?
    ● How was it built?
    ● Q&A

    View full-size slide

  6. What is gitbase?

    View full-size slide

  7. github.com/src-d/gitbase
    ● SQL interface to git repositories
    ● Open Source
    ● Written in Go

    View full-size slide

  8. LANGUAGE(path, content):
    Returns the language of a file given its path and contents.
    Powered by github.com/src-d/enry.
    Some custom functions

    View full-size slide

  9. Lines of code per language
    # total lines of code per language in the Go repo
    SELECT lang, SUM(lines) as total_lines
    FROM (
    SELECT
    t.tree_entry_name as name,
    LANGUAGE(t.tree_entry_name, b.blob_content) AS lang,
    ARRAY_LENGTH(SPLIT(b.blob_content, '\n')) as lines
    FROM refs r
    NATURAL JOIN commits c
    NATURAL JOIN commit_trees ct
    NATURAL JOIN tree_entries t
    NATURAL JOIN blobs b
    WHERE r.ref_name = 'HEAD'
    ) AS lines
    WHERE lang is not null
    GROUP BY lang
    ORDER BY total_lines DESC;

    View full-size slide

  10. Lines of code per language
    # total lines of code per language in the Go repo
    SELECT lang, SUM(lines) as total_lines
    FROM (
    SELECT
    t.tree_entry_name as name,
    LANGUAGE(t.tree_entry_name, b.blob_content) AS lang,
    ARRAY_LENGTH(SPLIT(b.blob_content, '\n')) as lines
    FROM refs r
    NATURAL JOIN commits c
    NATURAL JOIN commit_trees ct
    NATURAL JOIN tree_entries t
    NATURAL JOIN blobs b
    WHERE r.ref_name = 'HEAD'
    ) AS lines
    WHERE lang is not null
    GROUP BY lang
    ORDER BY total_lines DESC;

    View full-size slide

  11. Some custom functions
    UAST(content, language, [filter]):
    Returns the Universal Abstract Syntax Tree resulting of parsing the
    given content in the given language.
    Powered by github.com/bblfsh/bblfshd.

    View full-size slide

  12. SELECT files.repository_id, files.file_path,
    ARRAY_LENGTH(UAST(
    files.blob_content,
    LANGUAGE(files.file_path, files.blob_content),
    '//*[@roleFunction and @roleDeclaration]')
    ) as functions
    FROM files
    NATURAL JOIN refs
    WHERE
    LANGUAGE(files.file_path,files.blob_content) = 'Go'
    AND refs.ref_name = 'HEAD'
    Number of functions per Go file

    View full-size slide

  13. SELECT files.repository_id, files.file_path,
    ARRAY_LENGTH(UAST(
    files.blob_content,
    LANGUAGE(files.file_path, files.blob_content),
    '//*[@roleFunction and @roleDeclaration]')
    ) as functions
    FROM files
    NATURAL JOIN refs
    WHERE
    LANGUAGE(files.file_path,files.blob_content) = 'Go'
    AND refs.ref_name = 'HEAD'
    Number of functions per Go file

    View full-size slide

  14. source{d} Engine
    ● Too many moving pieces
    ● Too many steps to get started
    ● Solving it all with the power of Docker!

    View full-size slide

  15. https://medium.com/sourcedtech

    View full-size slide

  16. How was gitbase built?

    View full-size slide

  17. vitess
    github.com/vitessio/vitess (by YouTube)

    View full-size slide

  18. youtu.be/midJ6b1LkA0

    View full-size slide

  19. go-mysql-server
    github.com/src-d/go-mysql-server
    ● Ready to run MySQL server
    ● Extensible via interfaces Database and Table
    ● Example: github.com/campoy/csvql

    View full-size slide

  20. Making it go faster

    View full-size slide

  21. Indexes
    ● SQL Indexes can speed up queries substantially
    ● Vitess doesn’t provide this
    ● Pilosa does!

    View full-size slide

  22. Caches
    ● Caching is the obvious option to make queries faster
    ● We didn’t want to reinvent the wheel
    ● We didn’t have to, thanks to Hashicorp
    ● Based on github.com/golang/groupcache

    View full-size slide

  23. ● The regexp package in Go is linear, but not always faster
    https://swtch.com/~rsc/regexp/regexp1.html
    ● Alternative: github.com/moovweb/rubex (onigurama)
    Regular Expressions

    View full-size slide

  24. We want you!

    View full-size slide

  25. And we’re hiring!

    View full-size slide

  26. Thanks!
    @francesc

    View full-size slide