Upgrade to Pro — share decks privately, control downloads, hide ads and more …

gitbase: exploring git repos with SQL

gitbase: exploring git repos with SQL

Francesc Campoy from source{d} will talk about gitbase, a new Open Source project fully written in Go that stands on the shoulders of giants, as one says. By integrating the codebases of go-git – the most successful git implementation in Go – and vitess – a replication layer for all the MySQL databases at YouTube, gitbase is able to provide an easy way to extract information from hundreds of git repositories with a simple SQL request.

The talk will provide an in-depth description of the project as well as the way source{d} implemented it and what they learned on the way.

D8e5d79ca42edc07693b9c1aacaa7e5e?s=128

Francesc Campoy Flores

September 19, 2018
Tweet

Transcript

  1. exploring git repos with SQL gitbase and source{d} engine

  2. Francesc Campoy VP of Developer Relations francesc@sourced.tech @francesc github.com/campoy

  3. just for func youtube.com/justforfunc

  4. just for func twitch.tv/justforfunclive li !

  5. Agenda • What is gitbase? • How was it built?

    • Q&A
  6. What is gitbase?

  7. github.com/src-d/gitbase • SQL interface to git repositories • Open Source

    • Written in Go
  8. LANGUAGE(path, content): Returns the language of a file given its

    path and contents. Powered by github.com/src-d/enry. Some custom functions
  9. Lines of code per language # total lines of code

    per language in the Go repo SELECT lang, SUM(lines) as total_lines FROM ( SELECT t.tree_entry_name as name, LANGUAGE(t.tree_entry_name, b.blob_content) AS lang, ARRAY_LENGTH(SPLIT(b.blob_content, '\n')) as lines FROM refs r NATURAL JOIN commits c NATURAL JOIN commit_trees ct NATURAL JOIN tree_entries t NATURAL JOIN blobs b WHERE r.ref_name = 'HEAD' ) AS lines WHERE lang is not null GROUP BY lang ORDER BY total_lines DESC;
  10. Lines of code per language # total lines of code

    per language in the Go repo SELECT lang, SUM(lines) as total_lines FROM ( SELECT t.tree_entry_name as name, LANGUAGE(t.tree_entry_name, b.blob_content) AS lang, ARRAY_LENGTH(SPLIT(b.blob_content, '\n')) as lines FROM refs r NATURAL JOIN commits c NATURAL JOIN commit_trees ct NATURAL JOIN tree_entries t NATURAL JOIN blobs b WHERE r.ref_name = 'HEAD' ) AS lines WHERE lang is not null GROUP BY lang ORDER BY total_lines DESC;
  11. Some custom functions UAST(content, language, [filter]): Returns the Universal Abstract

    Syntax Tree resulting of parsing the given content in the given language. Powered by github.com/bblfsh/bblfshd.
  12. SELECT files.repository_id, files.file_path, ARRAY_LENGTH(UAST( files.blob_content, LANGUAGE(files.file_path, files.blob_content), '//*[@roleFunction and @roleDeclaration]')

    ) as functions FROM files NATURAL JOIN refs WHERE LANGUAGE(files.file_path,files.blob_content) = 'Go' AND refs.ref_name = 'HEAD' Number of functions per Go file
  13. SELECT files.repository_id, files.file_path, ARRAY_LENGTH(UAST( files.blob_content, LANGUAGE(files.file_path, files.blob_content), '//*[@roleFunction and @roleDeclaration]')

    ) as functions FROM files NATURAL JOIN refs WHERE LANGUAGE(files.file_path,files.blob_content) = 'Go' AND refs.ref_name = 'HEAD' Number of functions per Go file
  14. source{d} Engine • Too many moving pieces • Too many

    steps to get started • Solving it all with the power of Docker!
  15. None
  16. https://medium.com/sourcedtech

  17. Why?

  18. None
  19. Demo Time!

  20. How was gitbase built?

  21. vitess github.com/vitessio/vitess (by YouTube)

  22. youtu.be/midJ6b1LkA0

  23. go-mysql-server github.com/src-d/go-mysql-server • Ready to run MySQL server • Extensible

    via interfaces Database and Table • Example: github.com/campoy/csvql
  24. Making it go faster

  25. Indexes • SQL Indexes can speed up queries substantially •

    Vitess doesn’t provide this • Pilosa does!
  26. Caches • Caching is the obvious option to make queries

    faster • We didn’t want to reinvent the wheel • We didn’t have to, thanks to Hashicorp • Based on github.com/golang/groupcache
  27. • The regexp package in Go is linear, but not

    always faster https://swtch.com/~rsc/regexp/regexp1.html • Alternative: github.com/moovweb/rubex (onigurama) Regular Expressions
  28. We want you!

  29. And we’re hiring!

  30. Thanks! @francesc