The GHTorrent dataset and toolsuite

The GHTorrent dataset and toolsuite

MSR2013 data paper presentation

43df3993acc9af4e9f619e59cd849aee?s=128

Georgios Gousios

May 17, 2013
Tweet

Transcript

  1. The GHTorrent Dataset and Tool Suite Georgios Gousios Software Engineering

    Research Group TU Delft
  2.   All data from Github

  3.  Ready to be queried

  4. ghtorrent.org

  5.      Repositories

  6.    Commits

  7.     Pull requests

  8.     Issues

  9.     Users and Organizations

  10.  Mirror event stream

  11. <<event>> PushEvent <<api>> /users/:user ensure_user <<api>> /repos/:user/:repo/ ensure_repo <<api>> /repos/:user/:repo/commits

    ensure_commits ensure_user <<api>> /:user/:repo/sha ensure_commit ensure_user <<api>> /users/:user/ followers ensure_followers <<api>> /repos/:user/:repo/ commits/:sha/comments ensure_commit_comments <<api>> /users/:user/orgs ensure_orgs <<api>> /orgs/:org/teams ensure_teams Recursive dependency retrieval
  12. Build relational database to query

  13. repositories users organizations issues /users/:user /user/repos /repos/:user/:repo/issues /orgs/:org { 88"type":8"User",

    88"public_gists":80, 88"login":8"gousiosg", 88"followers":88, 88"name":8"Georgios8Gousios", 88"public_repos":84, 88"created_at":8..., 88"id":8386172, 88"following":84, } { . . . CoSQL database as cache
  14. Periodic dumps of DBs online

  15. Query relational DB online

  16. $ gem install sqlite3 ghtorrent $ ght-retrieve-repo mojombo jekyll $

    (edit config.yaml) Roll your own tools
  17. Research !   

  18.  Single developer identities

  19.   Single developer identities

  20.        Single developer identities

  21.          

     Single developer identities
  22.          

         Single developer identities
  23.       Source tracking

  24.         Source tracking

  25.          

    Source tracking
  26.          

      Source tracking
  27.          

        Source tracking
  28.          

          Source tracking
  29.          

            Source tracking
  30.          

              Source tracking
  31. Network analysis

  32. Distributed development Text Text TUD-SERG-2013-10 An Exploratory Study of the

    Pull- based Software Development model
  33. None
  34. None
  35. None
  36. None
  37. None
  38. ghtorrent.org Octicons font: courtesy Github