Lean GHTorrent: Github data on demand

Presentation given at the MSR 2014 data track

Georgios Gousios

June 03, 2014

  4. @gousiosg http://ghtorrent.org/lean.html Lean GHTorrent: Github data on demand Georgios Gousios,

    Bogdan Vasilescu, Alexander Serebrenik and Andy Zaidman {g.gousios, a.e.zaidman}@tudelft.nl {b.n.vasilescu, a.serebrenik}@tue.nl Web server Web form 1 GHTorrent server 5 6 8 Job db Retrieval workers … Requests queue Responses queue 3 Dispatcher GHTorrent db GitHub API 2 Request listener Response listener 4 9 7 Requests db Software Engineering Research Group http://swerl.tudelft.nl/ Delft University of Technology Want to do research with GHTorrent data? It is now as easy as: 2. Getting the data! No need to care about this (but ask if you do!) 1. Filling in the form at ghtorrent.org/lean.html ( ( In the package, you will find: • A MySQL dump (to query like a boss) • MongoDB collection dumps (all Github API data) for all repos specified in step 1!