Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
The GHTorrent dataset and toolsuite
Search
Georgios Gousios
May 17, 2013
Technology
4
130k
The GHTorrent dataset and toolsuite
MSR2013 data paper presentation
Georgios Gousios
May 17, 2013
Tweet
Share
More Decks by Georgios Gousios
See All by Georgios Gousios
NLP + SE = ❤️
gousiosg
0
250
The troubles of modern dependency management and what to do about them
gousiosg
0
480
Mining Repositories with Apache Spark
gousiosg
0
590
My adventures with open everything
gousiosg
0
250
Structure and Evolution of Package Dependency Networks
gousiosg
0
700
Mining Github for fun and profit
gousiosg
9
63k
GitHub Insights: Understanding Open Source
gousiosg
0
330
Work Practices and Challenges in Pull-Based Development: The Contributor’s Perspective
gousiosg
0
880
Big Data in Software Engineering panel and Privacy: Should we care?
gousiosg
0
250
Other Decks in Technology
See All in Technology
OCI 運用監視サービス 概要
oracle4engineer
PRO
0
4.8k
Introduction to Works of ML Engineer in LY Corporation
lycorp_recruit_jp
0
110
初心者向けAWS Securityの勉強会mini Security-JAWSを9ヶ月ぐらい実施してきての近況
cmusudakeisuke
0
120
Security-JAWS【第35回】勉強会クラウドにおけるマルウェアやコンテンツ改ざんへの対策
4su_para
0
180
【Pycon mini 東海 2024】Google Colaboratoryで試すVLM
kazuhitotakahashi
2
500
適材適所の技術選定 〜GraphQL・REST API・tRPC〜 / Optimal Technology Selection
kakehashi
1
170
安心してください、日本語使えますよ―Ubuntu日本語Remix提供休止に寄せて― 2024-11-17
nobutomurata
1
990
OCI Vault 概要
oracle4engineer
PRO
0
9.7k
テストコード品質を高めるためにMutation Testingライブラリ・Strykerを実戦導入してみた話
ysknsid25
7
2.6k
Python(PYNQ)がテーマのAMD主催のFPGAコンテストに参加してきた
iotengineer22
0
470
ドメイン名の終活について - JPAAWG 7th -
mikit
33
20k
OCI Network Firewall 概要
oracle4engineer
PRO
0
4.1k
Featured
See All Featured
A Philosophy of Restraint
colly
203
16k
How GitHub (no longer) Works
holman
310
140k
We Have a Design System, Now What?
morganepeng
50
7.2k
Facilitating Awesome Meetings
lara
50
6.1k
VelocityConf: Rendering Performance Case Studies
addyosmani
325
24k
Mobile First: as difficult as doing things right
swwweet
222
8.9k
A better future with KSS
kneath
238
17k
Typedesign – Prime Four
hannesfritz
40
2.4k
"I'm Feeling Lucky" - Building Great Search Experiences for Today's Users (#IAC19)
danielanewman
226
22k
The Invisible Side of Design
smashingmag
298
50k
Site-Speed That Sticks
csswizardry
0
24
Optimising Largest Contentful Paint
csswizardry
33
2.9k
Transcript
The GHTorrent Dataset and Tool Suite Georgios Gousios Software Engineering
Research Group TU Delft
All data from Github
Ready to be queried
ghtorrent.org
Repositories
Commits
Pull requests
Issues
Users and Organizations
Mirror event stream
<<event>> PushEvent <<api>> /users/:user ensure_user <<api>> /repos/:user/:repo/ ensure_repo <<api>> /repos/:user/:repo/commits
ensure_commits ensure_user <<api>> /:user/:repo/sha ensure_commit ensure_user <<api>> /users/:user/ followers ensure_followers <<api>> /repos/:user/:repo/ commits/:sha/comments ensure_commit_comments <<api>> /users/:user/orgs ensure_orgs <<api>> /orgs/:org/teams ensure_teams Recursive dependency retrieval
Build relational database to query
repositories users organizations issues /users/:user /user/repos /repos/:user/:repo/issues /orgs/:org { 88"type":8"User",
88"public_gists":80, 88"login":8"gousiosg", 88"followers":88, 88"name":8"Georgios8Gousios", 88"public_repos":84, 88"created_at":8..., 88"id":8386172, 88"following":84, } { . . . CoSQL database as cache
Periodic dumps of DBs online
Query relational DB online
$ gem install sqlite3 ghtorrent $ ght-retrieve-repo mojombo jekyll $
(edit config.yaml) Roll your own tools
Research !
Single developer identities
Single developer identities
Single developer identities
Single developer identities
Single developer identities
Source tracking
Source tracking
Source tracking
Source tracking
Source tracking
Source tracking
Source tracking
Source tracking
Network analysis
Distributed development Text Text TUD-SERG-2013-10 An Exploratory Study of the
Pull- based Software Development model
None
None
None
None
None
ghtorrent.org Octicons font: courtesy Github