$30 off During Our Annual Pro Sale. View Details »
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
GitHub Insights: Understanding Open Source
Search
Georgios Gousios
May 19, 2016
Technology
0
400
GitHub Insights: Understanding Open Source
Talk given at OSCON 2016
Georgios Gousios
May 19, 2016
Tweet
Share
More Decks by Georgios Gousios
See All by Georgios Gousios
NLP + SE = ❤️
gousiosg
0
310
The troubles of modern dependency management and what to do about them
gousiosg
0
570
Mining Repositories with Apache Spark
gousiosg
0
680
My adventures with open everything
gousiosg
0
320
Structure and Evolution of Package Dependency Networks
gousiosg
0
810
Mining Github for fun and profit
gousiosg
9
63k
Work Practices and Challenges in Pull-Based Development: The Contributor’s Perspective
gousiosg
0
940
Big Data in Software Engineering panel and Privacy: Should we care?
gousiosg
0
310
The #issue32 incident
gousiosg
2
16k
Other Decks in Technology
See All in Technology
AIと二人三脚で育てた、個人開発アプリグロース術
zozotech
PRO
1
730
モダンデータスタック (MDS) の話とデータ分析が起こすビジネス変革
sutotakeshi
0
500
打 造 A I 驅 動 的 G i t H u b ⾃ 動 化 ⼯ 作 流 程
appleboy
0
340
Microsoft Agent 365 についてゆっくりじっくり理解する!
skmkzyk
0
330
年間40件以上の登壇を続けて見えた「本当の発信力」/ 20251213 Masaki Okuda
shift_evolve
PRO
1
130
WordPress は終わったのか ~今のWordPress の制作手法ってなにがあんねん?~ / Is WordPress Over? How We Build with WordPress Today
tbshiki
1
770
re:Invent 2025 ふりかえり 生成AI版
takaakikakei
1
210
非CUDAの悲哀 〜Claude Code と挑んだ image to 3D “Hunyuan3D”を EVO-X2(Ryzen AI Max+395)で動作させるチャレンジ〜
hawkymisc
2
190
AWSセキュリティアップデートとAWSを育てる話
cmusudakeisuke
0
280
mairuでつくるクレデンシャルレス開発環境 / Credential-less development environment using Mailru
mirakui
5
490
[CMU-DB-2025FALL] Apache Fluss - A Streaming Storage for Real-Time Lakehouse
jark
0
120
ChatGPTで論⽂は読めるのか
spatial_ai_network
9
28k
Featured
See All Featured
個人開発の失敗を避けるイケてる考え方 / tips for indie hackers
panda_program
122
21k
[Rails World 2023 - Day 1 Closing Keynote] - The Magic of Rails
eileencodes
37
2.6k
Performance Is Good for Brains [We Love Speed 2024]
tammyeverts
12
1.3k
The Web Performance Landscape in 2024 [PerfNow 2024]
tammyeverts
12
970
It's Worth the Effort
3n
187
29k
StorybookのUI Testing Handbookを読んだ
zakiyama
31
6.5k
Docker and Python
trallard
47
3.7k
The Straight Up "How To Draw Better" Workshop
denniskardys
239
140k
Building a Modern Day E-commerce SEO Strategy
aleyda
45
8.3k
Statistics for Hackers
jakevdp
799
230k
Designing for humans not robots
tammielis
254
26k
GraphQLの誤解/rethinking-graphql
sonatard
73
11k
Transcript
GitHub Insights Understanding Open Source @jeffmcaffer–Microsoft Georgios Gousios –Delft University
of Technology (TU Delft) Kevin Lewis – Microsoft
Snapshot overview
Inspire confidence
How open is a project? http://ghtorrent.org/pullreq-perf/
Commits (core vs community)
Commits (origin)
Comments (core vs community)
PR lifelines
Are we using git in a distributed way?
How may devs are there per country?
Insights
Business insights
Research insights
Cross-domain insights
Operational insights
Approach Data for the masses
GitHub by the numbers (Mid 2016)
Approach http://ghtorrent.org
How does it work? http://api.github.com/events
Example event (condensed) https://api.github.com/users/Cephei https://api.github.com/repos/PowerDMS/Owin.Scim https://api.github.com/repos/PowerDMS/Owin.Scim/commits/c751014f634d73e0b72f78a53c8cf137888b3 https://api.github.com/orgs/PowerDMS
Entities
GHTorrent architecture Github API Event Retrieval Commits Queue Project Events
Queue Events Data Retrieval Projects Commits evt.commit evt.watch evt.fork Data Retrieval Data Retrieval Data Retrieval Mirroring Cluster
GHTorrent by the numbers
Using the data You can do it too!
Using the data: Hosted http://ghtorrent.org
Using the data: Download
Using the data: Self-service https://github.com/ghtorrent/ghtorrent-webhook
Using the data: Azure Data Lake
Resources http://ghtorrent.org https://github.com/Microsoft/ghinsights @gousiosg @jeffmcaffer @kelewis