Upgrade to Pro — share decks privately, control downloads, hide ads and more …

gitbase: exploring git repos with SQL

gitbase: exploring git repos with SQL

Francesc Campoy from source{d} will talk about gitbase, a new Open Source project fully written in Go that stands on the shoulders of giants, as one says. By integrating the codebases of go-git – the most successful git implementation in Go – and vitess – a replication layer for all the MySQL databases at YouTube, gitbase is able to provide an easy way to extract information from hundreds of git repositories with a simple SQL request.

The talk will provide an in-depth description of the project as well as the way source{d} implemented it and what they learned on the way.

Francesc Campoy Flores

September 19, 2018
Tweet

More Decks by Francesc Campoy Flores

Other Decks in Programming

Transcript

  1. exploring git repos with SQL
    gitbase and source{d} engine

    View Slide

  2. Francesc Campoy
    VP of Developer Relations
    [email protected]
    @francesc
    github.com/campoy

    View Slide

  3. just for
    func
    youtube.com/justforfunc

    View Slide

  4. just for
    func
    twitch.tv/justforfunclive
    li
    !

    View Slide

  5. Agenda
    ● What is gitbase?
    ● How was it built?
    ● Q&A

    View Slide

  6. What is gitbase?

    View Slide

  7. github.com/src-d/gitbase
    ● SQL interface to git repositories
    ● Open Source
    ● Written in Go

    View Slide

  8. LANGUAGE(path, content):
    Returns the language of a file given its path and contents.
    Powered by github.com/src-d/enry.
    Some custom functions

    View Slide

  9. Lines of code per language
    # total lines of code per language in the Go repo
    SELECT lang, SUM(lines) as total_lines
    FROM (
    SELECT
    t.tree_entry_name as name,
    LANGUAGE(t.tree_entry_name, b.blob_content) AS lang,
    ARRAY_LENGTH(SPLIT(b.blob_content, '\n')) as lines
    FROM refs r
    NATURAL JOIN commits c
    NATURAL JOIN commit_trees ct
    NATURAL JOIN tree_entries t
    NATURAL JOIN blobs b
    WHERE r.ref_name = 'HEAD'
    ) AS lines
    WHERE lang is not null
    GROUP BY lang
    ORDER BY total_lines DESC;

    View Slide

  10. Lines of code per language
    # total lines of code per language in the Go repo
    SELECT lang, SUM(lines) as total_lines
    FROM (
    SELECT
    t.tree_entry_name as name,
    LANGUAGE(t.tree_entry_name, b.blob_content) AS lang,
    ARRAY_LENGTH(SPLIT(b.blob_content, '\n')) as lines
    FROM refs r
    NATURAL JOIN commits c
    NATURAL JOIN commit_trees ct
    NATURAL JOIN tree_entries t
    NATURAL JOIN blobs b
    WHERE r.ref_name = 'HEAD'
    ) AS lines
    WHERE lang is not null
    GROUP BY lang
    ORDER BY total_lines DESC;

    View Slide

  11. Some custom functions
    UAST(content, language, [filter]):
    Returns the Universal Abstract Syntax Tree resulting of parsing the
    given content in the given language.
    Powered by github.com/bblfsh/bblfshd.

    View Slide

  12. SELECT files.repository_id, files.file_path,
    ARRAY_LENGTH(UAST(
    files.blob_content,
    LANGUAGE(files.file_path, files.blob_content),
    '//*[@roleFunction and @roleDeclaration]')
    ) as functions
    FROM files
    NATURAL JOIN refs
    WHERE
    LANGUAGE(files.file_path,files.blob_content) = 'Go'
    AND refs.ref_name = 'HEAD'
    Number of functions per Go file

    View Slide

  13. SELECT files.repository_id, files.file_path,
    ARRAY_LENGTH(UAST(
    files.blob_content,
    LANGUAGE(files.file_path, files.blob_content),
    '//*[@roleFunction and @roleDeclaration]')
    ) as functions
    FROM files
    NATURAL JOIN refs
    WHERE
    LANGUAGE(files.file_path,files.blob_content) = 'Go'
    AND refs.ref_name = 'HEAD'
    Number of functions per Go file

    View Slide

  14. source{d} Engine
    ● Too many moving pieces
    ● Too many steps to get started
    ● Solving it all with the power of Docker!

    View Slide

  15. View Slide

  16. https://medium.com/sourcedtech

    View Slide

  17. Why?

    View Slide

  18. View Slide

  19. Demo Time!

    View Slide

  20. How was gitbase built?

    View Slide

  21. vitess
    github.com/vitessio/vitess (by YouTube)

    View Slide

  22. youtu.be/midJ6b1LkA0

    View Slide

  23. go-mysql-server
    github.com/src-d/go-mysql-server
    ● Ready to run MySQL server
    ● Extensible via interfaces Database and Table
    ● Example: github.com/campoy/csvql

    View Slide

  24. Making it go faster

    View Slide

  25. Indexes
    ● SQL Indexes can speed up queries substantially
    ● Vitess doesn’t provide this
    ● Pilosa does!

    View Slide

  26. Caches
    ● Caching is the obvious option to make queries faster
    ● We didn’t want to reinvent the wheel
    ● We didn’t have to, thanks to Hashicorp
    ● Based on github.com/golang/groupcache

    View Slide

  27. ● The regexp package in Go is linear, but not always faster
    https://swtch.com/~rsc/regexp/regexp1.html
    ● Alternative: github.com/moovweb/rubex (onigurama)
    Regular Expressions

    View Slide

  28. We want you!

    View Slide

  29. And we’re hiring!

    View Slide

  30. Thanks!
    @francesc

    View Slide