Upgrade to Pro — share decks privately, control downloads, hide ads and more …

gitbase: exploring git repos with SQL

gitbase: exploring git repos with SQL

Francesc Campoy from source{d} will talk about gitbase, a new Open Source project fully written in Go that stands on the shoulders of giants, as one says. By integrating the codebases of go-git – the most successful git implementation in Go – and vitess – a replication layer for all the MySQL databases at YouTube, gitbase is able to provide an easy way to extract information from hundreds of git repositories with a simple SQL request.

The talk will provide an in-depth description of the project as well as the way source{d} implemented it and what they learned on the way.

Francesc Campoy Flores

September 19, 2018
Tweet

More Decks by Francesc Campoy Flores

Other Decks in Programming

Transcript

  1. LANGUAGE(path, content): Returns the language of a file given its

    path and contents. Powered by github.com/src-d/enry. Some custom functions
  2. Lines of code per language # total lines of code

    per language in the Go repo SELECT lang, SUM(lines) as total_lines FROM ( SELECT t.tree_entry_name as name, LANGUAGE(t.tree_entry_name, b.blob_content) AS lang, ARRAY_LENGTH(SPLIT(b.blob_content, '\n')) as lines FROM refs r NATURAL JOIN commits c NATURAL JOIN commit_trees ct NATURAL JOIN tree_entries t NATURAL JOIN blobs b WHERE r.ref_name = 'HEAD' ) AS lines WHERE lang is not null GROUP BY lang ORDER BY total_lines DESC;
  3. Lines of code per language # total lines of code

    per language in the Go repo SELECT lang, SUM(lines) as total_lines FROM ( SELECT t.tree_entry_name as name, LANGUAGE(t.tree_entry_name, b.blob_content) AS lang, ARRAY_LENGTH(SPLIT(b.blob_content, '\n')) as lines FROM refs r NATURAL JOIN commits c NATURAL JOIN commit_trees ct NATURAL JOIN tree_entries t NATURAL JOIN blobs b WHERE r.ref_name = 'HEAD' ) AS lines WHERE lang is not null GROUP BY lang ORDER BY total_lines DESC;
  4. Some custom functions UAST(content, language, [filter]): Returns the Universal Abstract

    Syntax Tree resulting of parsing the given content in the given language. Powered by github.com/bblfsh/bblfshd.
  5. SELECT files.repository_id, files.file_path, ARRAY_LENGTH(UAST( files.blob_content, LANGUAGE(files.file_path, files.blob_content), '//*[@roleFunction and @roleDeclaration]')

    ) as functions FROM files NATURAL JOIN refs WHERE LANGUAGE(files.file_path,files.blob_content) = 'Go' AND refs.ref_name = 'HEAD' Number of functions per Go file
  6. SELECT files.repository_id, files.file_path, ARRAY_LENGTH(UAST( files.blob_content, LANGUAGE(files.file_path, files.blob_content), '//*[@roleFunction and @roleDeclaration]')

    ) as functions FROM files NATURAL JOIN refs WHERE LANGUAGE(files.file_path,files.blob_content) = 'Go' AND refs.ref_name = 'HEAD' Number of functions per Go file
  7. source{d} Engine • Too many moving pieces • Too many

    steps to get started • Solving it all with the power of Docker!
  8. go-mysql-server github.com/src-d/go-mysql-server • Ready to run MySQL server • Extensible

    via interfaces Database and Table • Example: github.com/campoy/csvql
  9. Indexes • SQL Indexes can speed up queries substantially •

    Vitess doesn’t provide this • Pilosa does!
  10. Caches • Caching is the obvious option to make queries

    faster • We didn’t want to reinvent the wheel • We didn’t have to, thanks to Hashicorp • Based on github.com/golang/groupcache
  11. • The regexp package in Go is linear, but not

    always faster https://swtch.com/~rsc/regexp/regexp1.html • Alternative: github.com/moovweb/rubex (onigurama) Regular Expressions