Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
GitHub Insights: Understanding Open Source
Search
Georgios Gousios
May 19, 2016
Technology
0
340
GitHub Insights: Understanding Open Source
Talk given at OSCON 2016
Georgios Gousios
May 19, 2016
Tweet
Share
More Decks by Georgios Gousios
See All by Georgios Gousios
NLP + SE = ❤️
gousiosg
0
260
The troubles of modern dependency management and what to do about them
gousiosg
0
490
Mining Repositories with Apache Spark
gousiosg
0
620
My adventures with open everything
gousiosg
0
260
Structure and Evolution of Package Dependency Networks
gousiosg
0
730
Mining Github for fun and profit
gousiosg
9
63k
Work Practices and Challenges in Pull-Based Development: The Contributor’s Perspective
gousiosg
0
900
Big Data in Software Engineering panel and Privacy: Should we care?
gousiosg
0
260
The #issue32 incident
gousiosg
2
16k
Other Decks in Technology
See All in Technology
User Story Mapping + Inclusive Team
kawaguti
PRO
2
340
AI Agent時代なのでAWSのLLMs.txtが欲しい!
watany
3
370
AIエージェント元年@日本生成AIユーザ会
shukob
1
260
OPENLOGI Company Profile
hr01
0
60k
Amazon Q Developerの無料利用枠を使い倒してHello worldを表示させよう!
nrinetcom
PRO
2
120
ディスプレイ広告(Yahoo!広告・LINE広告)におけるバックエンド開発
lycorptech_jp
PRO
0
590
役員・マネージャー・著者・エンジニアそれぞれの立場から見たAWS認定資格
nrinetcom
PRO
4
6.8k
遷移の高速化 ヤフートップの試行錯誤
narirou
6
1.9k
【5分でわかる】セーフィー エンジニア向け会社紹介
safie_recruit
0
19k
RaspberryPi CM4(CM5も)面白いぞ!
nonnoise
0
110
Amazon Aurora のバージョンアップ手法について
smt7174
2
190
大規模アジャイルフレームワークから学ぶエンジニアマネジメントの本質
staka121
PRO
3
1.6k
Featured
See All Featured
Java REST API Framework Comparison - PWX 2021
mraible
29
8.4k
Why You Should Never Use an ORM
jnunemaker
PRO
55
9.2k
Improving Core Web Vitals using Speculation Rules API
sergeychernyshev
10
540
Docker and Python
trallard
44
3.3k
Documentation Writing (for coders)
carmenintech
68
4.6k
Large-scale JavaScript Application Architecture
addyosmani
511
110k
Raft: Consensus for Rubyists
vanstee
137
6.8k
XXLCSS - How to scale CSS and keep your sanity
sugarenia
248
1.3M
Chrome DevTools: State of the Union 2024 - Debugging React & Beyond
addyosmani
4
380
Sharpening the Axe: The Primacy of Toolmaking
bcantrill
40
2k
Fontdeck: Realign not Redesign
paulrobertlloyd
83
5.4k
Making Projects Easy
brettharned
116
6k
Transcript
GitHub Insights Understanding Open Source @jeffmcaffer–Microsoft Georgios Gousios –Delft University
of Technology (TU Delft) Kevin Lewis – Microsoft
Snapshot overview
Inspire confidence
How open is a project? http://ghtorrent.org/pullreq-perf/
Commits (core vs community)
Commits (origin)
Comments (core vs community)
PR lifelines
Are we using git in a distributed way?
How may devs are there per country?
Insights
Business insights
Research insights
Cross-domain insights
Operational insights
Approach Data for the masses
GitHub by the numbers (Mid 2016)
Approach http://ghtorrent.org
How does it work? http://api.github.com/events
Example event (condensed) https://api.github.com/users/Cephei https://api.github.com/repos/PowerDMS/Owin.Scim https://api.github.com/repos/PowerDMS/Owin.Scim/commits/c751014f634d73e0b72f78a53c8cf137888b3 https://api.github.com/orgs/PowerDMS
Entities
GHTorrent architecture Github API Event Retrieval Commits Queue Project Events
Queue Events Data Retrieval Projects Commits evt.commit evt.watch evt.fork Data Retrieval Data Retrieval Data Retrieval Mirroring Cluster
GHTorrent by the numbers
Using the data You can do it too!
Using the data: Hosted http://ghtorrent.org
Using the data: Download
Using the data: Self-service https://github.com/ghtorrent/ghtorrent-webhook
Using the data: Azure Data Lake
Resources http://ghtorrent.org https://github.com/Microsoft/ghinsights @gousiosg @jeffmcaffer @kelewis