Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
The GHTorrent dataset and toolsuite
Search
Georgios Gousios
May 17, 2013
Technology
4
130k
The GHTorrent dataset and toolsuite
MSR2013 data paper presentation
Georgios Gousios
May 17, 2013
Tweet
Share
More Decks by Georgios Gousios
See All by Georgios Gousios
NLP + SE = ❤️
gousiosg
0
290
The troubles of modern dependency management and what to do about them
gousiosg
0
540
Mining Repositories with Apache Spark
gousiosg
0
660
My adventures with open everything
gousiosg
0
300
Structure and Evolution of Package Dependency Networks
gousiosg
0
780
Mining Github for fun and profit
gousiosg
9
63k
GitHub Insights: Understanding Open Source
gousiosg
0
370
Work Practices and Challenges in Pull-Based Development: The Contributor’s Perspective
gousiosg
0
920
Big Data in Software Engineering panel and Privacy: Should we care?
gousiosg
0
290
Other Decks in Technology
See All in Technology
研究開発と製品開発、両利きのロボティクス
youtalk
1
520
オブザーバビリティが広げる AIOps の世界 / The World of AIOps Expanded by Observability
aoto
PRO
0
360
生成AI時代のデータ基盤設計〜ペースレイヤリングで実現する高速開発と持続性〜 / Levtech Meetup_Session_2
sansan_randd
1
150
Agile PBL at New Grads Trainings
kawaguti
PRO
1
410
現場で効くClaude Code ─ 最新動向と企業導入
takaakikakei
1
240
未経験者・初心者に贈る!40分でわかるAndroidアプリ開発の今と大事なポイント
operando
5
390
ChatGPTとPlantUML/Mermaidによるソフトウェア設計
gowhich501
1
130
MCPで変わる Amebaデザインシステム「Spindle」の開発
spindle
PRO
3
3.2k
Snowflakeの生成AI機能を活用したデータ分析アプリの作成 〜Cortex AnalystとCortex Searchの活用とStreamlitアプリでの利用〜
nayuts
1
480
なぜテストマネージャの視点が 必要なのか? 〜 一歩先へ進むために 〜
moritamasami
0
220
職種の壁を溶かして開発サイクルを高速に回す~情報透明性と職種越境から考えるAIフレンドリーな職種間連携~
daitasu
0
150
ZOZOマッチのアーキテクチャと技術構成
zozotech
PRO
3
1.5k
Featured
See All Featured
Large-scale JavaScript Application Architecture
addyosmani
512
110k
Save Time (by Creating Custom Rails Generators)
garrettdimon
PRO
32
1.5k
ReactJS: Keep Simple. Everything can be a component!
pedronauck
667
120k
Exploring the Power of Turbo Streams & Action Cable | RailsConf2023
kevinliebholz
34
6k
Reflections from 52 weeks, 52 projects
jeffersonlam
352
21k
Fashionably flexible responsive web design (full day workshop)
malarkey
407
66k
Easily Structure & Communicate Ideas using Wireframe
afnizarnur
194
16k
Building Adaptive Systems
keathley
43
2.7k
Side Projects
sachag
455
43k
Improving Core Web Vitals using Speculation Rules API
sergeychernyshev
18
1.1k
What’s in a name? Adding method to the madness
productmarketing
PRO
23
3.7k
Documentation Writing (for coders)
carmenintech
74
5k
Transcript
The GHTorrent Dataset and Tool Suite Georgios Gousios Software Engineering
Research Group TU Delft
All data from Github
Ready to be queried
ghtorrent.org
Repositories
Commits
Pull requests
Issues
Users and Organizations
Mirror event stream
<<event>> PushEvent <<api>> /users/:user ensure_user <<api>> /repos/:user/:repo/ ensure_repo <<api>> /repos/:user/:repo/commits
ensure_commits ensure_user <<api>> /:user/:repo/sha ensure_commit ensure_user <<api>> /users/:user/ followers ensure_followers <<api>> /repos/:user/:repo/ commits/:sha/comments ensure_commit_comments <<api>> /users/:user/orgs ensure_orgs <<api>> /orgs/:org/teams ensure_teams Recursive dependency retrieval
Build relational database to query
repositories users organizations issues /users/:user /user/repos /repos/:user/:repo/issues /orgs/:org { 88"type":8"User",
88"public_gists":80, 88"login":8"gousiosg", 88"followers":88, 88"name":8"Georgios8Gousios", 88"public_repos":84, 88"created_at":8..., 88"id":8386172, 88"following":84, } { . . . CoSQL database as cache
Periodic dumps of DBs online
Query relational DB online
$ gem install sqlite3 ghtorrent $ ght-retrieve-repo mojombo jekyll $
(edit config.yaml) Roll your own tools
Research !
Single developer identities
Single developer identities
Single developer identities
Single developer identities
Single developer identities
Source tracking
Source tracking
Source tracking
Source tracking
Source tracking
Source tracking
Source tracking
Source tracking
Network analysis
Distributed development Text Text TUD-SERG-2013-10 An Exploratory Study of the
Pull- based Software Development model
None
None
None
None
None
ghtorrent.org Octicons font: courtesy Github