Talk given at OSCON 2016
GitHub InsightsUnderstanding Open Source@jeffmcaffer–MicrosoftGeorgios Gousios –Delft University of Technology (TU Delft)Kevin Lewis – Microsoft
View Slide
Snapshot overview
Inspire confidence
How open is a project?http://ghtorrent.org/pullreq-perf/
Commits (core vs community)
Commits (origin)
Comments (core vs community)
PR lifelines
Are we using git in a distributed way?
How may devs are there per country?
Insights
Business insights
Research insights
Cross-domain insights
Operational insights
ApproachData for the masses
GitHub by the numbers (Mid 2016)
Approachhttp://ghtorrent.org
How does it work?http://api.github.com/events
Example event (condensed)https://api.github.com/users/Cepheihttps://api.github.com/repos/PowerDMS/Owin.Scimhttps://api.github.com/repos/PowerDMS/Owin.Scim/commits/c751014f634d73e0b72f78a53c8cf137888b3https://api.github.com/orgs/PowerDMS
Entities
GHTorrent architectureGithub APIEventRetrievalCommits QueueProject EventsQueueEventsDataRetrievalProjects Commitsevt.commitevt.watchevt.forkDataRetrievalDataRetrievalDataRetrievalMirroringCluster
GHTorrent by the numbers
Using the dataYou can do it too!
Using the data: Hostedhttp://ghtorrent.org
Using the data: Download
Using the data: Self-servicehttps://github.com/ghtorrent/ghtorrent-webhook
Using the data: Azure Data Lake
Resourceshttp://ghtorrent.orghttps://github.com/Microsoft/ghinsights@gousiosg @jeffmcaffer @kelewis