Slide 1

Slide 1 text

exploring git repos with SQL gitbase and source{d} engine

Slide 2

Slide 2 text

Francesc Campoy VP of Developer Relations [email protected] @francesc github.com/campoy

Slide 3

Slide 3 text

just for func youtube.com/justforfunc

Slide 4

Slide 4 text

just for func twitch.tv/justforfunclive li !

Slide 5

Slide 5 text

Agenda ● What is gitbase? ● How was it built? ● Q&A

Slide 6

Slide 6 text

What is gitbase?

Slide 7

Slide 7 text

github.com/src-d/gitbase ● SQL interface to git repositories ● Open Source ● Written in Go

Slide 8

Slide 8 text

LANGUAGE(path, content): Returns the language of a file given its path and contents. Powered by github.com/src-d/enry. Some custom functions

Slide 9

Slide 9 text

Lines of code per language # total lines of code per language in the Go repo SELECT lang, SUM(lines) as total_lines FROM ( SELECT t.tree_entry_name as name, LANGUAGE(t.tree_entry_name, b.blob_content) AS lang, ARRAY_LENGTH(SPLIT(b.blob_content, '\n')) as lines FROM refs r NATURAL JOIN commits c NATURAL JOIN commit_trees ct NATURAL JOIN tree_entries t NATURAL JOIN blobs b WHERE r.ref_name = 'HEAD' ) AS lines WHERE lang is not null GROUP BY lang ORDER BY total_lines DESC;

Slide 10

Slide 10 text

Lines of code per language # total lines of code per language in the Go repo SELECT lang, SUM(lines) as total_lines FROM ( SELECT t.tree_entry_name as name, LANGUAGE(t.tree_entry_name, b.blob_content) AS lang, ARRAY_LENGTH(SPLIT(b.blob_content, '\n')) as lines FROM refs r NATURAL JOIN commits c NATURAL JOIN commit_trees ct NATURAL JOIN tree_entries t NATURAL JOIN blobs b WHERE r.ref_name = 'HEAD' ) AS lines WHERE lang is not null GROUP BY lang ORDER BY total_lines DESC;

Slide 11

Slide 11 text

Some custom functions UAST(content, language, [filter]): Returns the Universal Abstract Syntax Tree resulting of parsing the given content in the given language. Powered by github.com/bblfsh/bblfshd.

Slide 12

Slide 12 text

SELECT files.repository_id, files.file_path, ARRAY_LENGTH(UAST( files.blob_content, LANGUAGE(files.file_path, files.blob_content), '//*[@roleFunction and @roleDeclaration]') ) as functions FROM files NATURAL JOIN refs WHERE LANGUAGE(files.file_path,files.blob_content) = 'Go' AND refs.ref_name = 'HEAD' Number of functions per Go file

Slide 13

Slide 13 text

SELECT files.repository_id, files.file_path, ARRAY_LENGTH(UAST( files.blob_content, LANGUAGE(files.file_path, files.blob_content), '//*[@roleFunction and @roleDeclaration]') ) as functions FROM files NATURAL JOIN refs WHERE LANGUAGE(files.file_path,files.blob_content) = 'Go' AND refs.ref_name = 'HEAD' Number of functions per Go file

Slide 14

Slide 14 text

source{d} Engine ● Too many moving pieces ● Too many steps to get started ● Solving it all with the power of Docker!

Slide 15

Slide 15 text

No content

Slide 16

Slide 16 text

https://medium.com/sourcedtech

Slide 17

Slide 17 text

Why?

Slide 18

Slide 18 text

No content

Slide 19

Slide 19 text

Demo Time!

Slide 20

Slide 20 text

How was gitbase built?

Slide 21

Slide 21 text

vitess github.com/vitessio/vitess (by YouTube)

Slide 22

Slide 22 text

youtu.be/midJ6b1LkA0

Slide 23

Slide 23 text

go-mysql-server github.com/src-d/go-mysql-server ● Ready to run MySQL server ● Extensible via interfaces Database and Table ● Example: github.com/campoy/csvql

Slide 24

Slide 24 text

Making it go faster

Slide 25

Slide 25 text

Indexes ● SQL Indexes can speed up queries substantially ● Vitess doesn’t provide this ● Pilosa does!

Slide 26

Slide 26 text

Caches ● Caching is the obvious option to make queries faster ● We didn’t want to reinvent the wheel ● We didn’t have to, thanks to Hashicorp ● Based on github.com/golang/groupcache

Slide 27

Slide 27 text

● The regexp package in Go is linear, but not always faster https://swtch.com/~rsc/regexp/regexp1.html ● Alternative: github.com/moovweb/rubex (onigurama) Regular Expressions

Slide 28

Slide 28 text

We want you!

Slide 29

Slide 29 text

And we’re hiring!

Slide 30

Slide 30 text

Thanks! @francesc