$30 off During Our Annual Pro Sale. View Details »
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
GitHub Insights: Understanding Open Source
Search
Georgios Gousios
May 19, 2016
Technology
0
400
GitHub Insights: Understanding Open Source
Talk given at OSCON 2016
Georgios Gousios
May 19, 2016
Tweet
Share
More Decks by Georgios Gousios
See All by Georgios Gousios
NLP + SE = ❤️
gousiosg
0
310
The troubles of modern dependency management and what to do about them
gousiosg
0
570
Mining Repositories with Apache Spark
gousiosg
0
680
My adventures with open everything
gousiosg
0
320
Structure and Evolution of Package Dependency Networks
gousiosg
0
810
Mining Github for fun and profit
gousiosg
9
63k
Work Practices and Challenges in Pull-Based Development: The Contributor’s Perspective
gousiosg
0
940
Big Data in Software Engineering panel and Privacy: Should we care?
gousiosg
0
310
The #issue32 incident
gousiosg
2
16k
Other Decks in Technology
See All in Technology
AI駆動開発の実践とその未来
eltociear
1
280
30分であなたをOmniのファンにしてみせます~分析画面のクリック操作をそのままコード化できるAI-ReadyなBIツール~
sagara
0
180
業務のトイルをバスターせよ 〜AI時代の生存戦略〜
staka121
PRO
2
220
「図面」から「法則」へ 〜メタ視点で読み解く現代のソフトウェアアーキテクチャ〜
scova0731
0
370
2025年 開発生産「可能」性向上報告 サイロ解消からチームが能動性を獲得するまで/ 20251216 Naoki Takahashi
shift_evolve
PRO
2
200
Lookerで実現するセキュアな外部データ提供
zozotech
PRO
0
170
AI駆動開発における設計思想 認知負荷を下げるフロントエンドアーキテクチャ/ 20251211 Teppei Hanai
shift_evolve
PRO
2
430
JEDAI認定プログラム JEDAI Order 2026 エントリーのご案内 / JEDAI Order 2026 Entry
databricksjapan
0
140
AWSを使う上で最低限知っておきたいセキュリティ研修を社内で実施した話 ~みんなでやるセキュリティ~
maimyyym
2
1.8k
Bedrock AgentCore Memoryの新機能 (Episode) を試してみた / try Bedrock AgentCore Memory Episodic functionarity
hoshi7_n
1
480
Haskell を武器にして挑む競技プログラミング ─ 操作的思考から意味モデル思考へ
naoya
7
1.6k
AI時代の新規LLMプロダクト開発: Findy Insightsを3ヶ月で立ち上げた舞台裏と振り返り
dakuon
0
230
Featured
See All Featured
JavaScript: Past, Present, and Future - NDC Porto 2020
reverentgeek
52
5.8k
Public Speaking Without Barfing On Your Shoes - THAT 2023
reverentgeek
0
270
Building the Perfect Custom Keyboard
takai
1
660
Crafting Experiences
bethany
0
18
Ruling the World: When Life Gets Gamed
codingconduct
0
91
Build The Right Thing And Hit Your Dates
maggiecrowley
38
3k
Balancing Empowerment & Direction
lara
5
810
Money Talks: Using Revenue to Get Sh*t Done
nikkihalliwell
0
120
Distributed Sagas: A Protocol for Coordinating Microservices
caitiem20
333
22k
The Success of Rails: Ensuring Growth for the Next 100 Years
eileencodes
47
7.9k
So, you think you're a good person
axbom
PRO
0
1.8k
How Fast Is Fast Enough? [PerfNow 2025]
tammyeverts
3
400
Transcript
GitHub Insights Understanding Open Source @jeffmcaffer–Microsoft Georgios Gousios –Delft University
of Technology (TU Delft) Kevin Lewis – Microsoft
Snapshot overview
Inspire confidence
How open is a project? http://ghtorrent.org/pullreq-perf/
Commits (core vs community)
Commits (origin)
Comments (core vs community)
PR lifelines
Are we using git in a distributed way?
How may devs are there per country?
Insights
Business insights
Research insights
Cross-domain insights
Operational insights
Approach Data for the masses
GitHub by the numbers (Mid 2016)
Approach http://ghtorrent.org
How does it work? http://api.github.com/events
Example event (condensed) https://api.github.com/users/Cephei https://api.github.com/repos/PowerDMS/Owin.Scim https://api.github.com/repos/PowerDMS/Owin.Scim/commits/c751014f634d73e0b72f78a53c8cf137888b3 https://api.github.com/orgs/PowerDMS
Entities
GHTorrent architecture Github API Event Retrieval Commits Queue Project Events
Queue Events Data Retrieval Projects Commits evt.commit evt.watch evt.fork Data Retrieval Data Retrieval Data Retrieval Mirroring Cluster
GHTorrent by the numbers
Using the data You can do it too!
Using the data: Hosted http://ghtorrent.org
Using the data: Download
Using the data: Self-service https://github.com/ghtorrent/ghtorrent-webhook
Using the data: Azure Data Lake
Resources http://ghtorrent.org https://github.com/Microsoft/ghinsights @gousiosg @jeffmcaffer @kelewis