Originally presented at Collab Summit 2016, this talk covers the use of GHTorrent to gather and analyze public repo and community data from GitHub. We talk about using Azure Data Lake as well as how you can set up this infrastructure yourself.