$30 off During Our Annual Pro Sale. View Details »
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
GitHub Insights: Understanding Open Source
Search
Georgios Gousios
May 19, 2016
Technology
0
390
GitHub Insights: Understanding Open Source
Talk given at OSCON 2016
Georgios Gousios
May 19, 2016
Tweet
Share
More Decks by Georgios Gousios
See All by Georgios Gousios
NLP + SE = ❤️
gousiosg
0
300
The troubles of modern dependency management and what to do about them
gousiosg
0
560
Mining Repositories with Apache Spark
gousiosg
0
680
My adventures with open everything
gousiosg
0
310
Structure and Evolution of Package Dependency Networks
gousiosg
0
800
Mining Github for fun and profit
gousiosg
9
63k
Work Practices and Challenges in Pull-Based Development: The Contributor’s Perspective
gousiosg
0
940
Big Data in Software Engineering panel and Privacy: Should we care?
gousiosg
0
310
The #issue32 incident
gousiosg
2
16k
Other Decks in Technology
See All in Technology
AI 時代のデータ戦略
na0
8
3.1k
『星の世界の地図の話: Google Sky MapをAI Agentでよみがえらせる』 - Google Developers DevFest Tokyo 2025
taniiicom
0
460
2025 DORA Reportから読み解く!AIが映し出す、成果を出し続ける組織の共通点 #開発生産性_findy
takabow
2
950
MAP-7thplaceSolution
yukichi0403
2
230
Sansan Engineering Unit 紹介資料
sansan33
PRO
1
3.3k
プラットフォームエンジニアリングとは何であり、なぜプラットフォームエンジニアリングなのか
doublemarket
1
520
[続・営業向け 誰でも話せるOCI セールストーク] AWSよりOCIの優位性が分からない編(2025年11月21日開催)
oracle4engineer
PRO
1
210
ブラウザ拡張のセキュリティの話 / Browser Extension Security
flatt_security
0
240
Contract One Engineering Unit 紹介資料
sansan33
PRO
0
9.8k
その設計、 本当に価値を生んでますか?
shimomura
2
140
Eight Engineering Unit 紹介資料
sansan33
PRO
0
5.7k
なぜフロントエンド技術を追うのか?なぜカンファレンスに参加するのか?
sakito
8
1.9k
Featured
See All Featured
Leading Effective Engineering Teams in the AI Era
addyosmani
8
1.2k
No one is an island. Learnings from fostering a developers community.
thoeni
21
3.5k
JavaScript: Past, Present, and Future - NDC Porto 2020
reverentgeek
52
5.7k
Faster Mobile Websites
deanohume
310
31k
Distributed Sagas: A Protocol for Coordinating Microservices
caitiem20
333
22k
Done Done
chrislema
186
16k
For a Future-Friendly Web
brad_frost
180
10k
A better future with KSS
kneath
240
18k
Visualizing Your Data: Incorporating Mongo into Loggly Infrastructure
mongodb
48
9.8k
Practical Tips for Bootstrapping Information Extraction Pipelines
honnibal
25
1.6k
How to Think Like a Performance Engineer
csswizardry
28
2.3k
VelocityConf: Rendering Performance Case Studies
addyosmani
333
24k
Transcript
GitHub Insights Understanding Open Source @jeffmcaffer–Microsoft Georgios Gousios –Delft University
of Technology (TU Delft) Kevin Lewis – Microsoft
Snapshot overview
Inspire confidence
How open is a project? http://ghtorrent.org/pullreq-perf/
Commits (core vs community)
Commits (origin)
Comments (core vs community)
PR lifelines
Are we using git in a distributed way?
How may devs are there per country?
Insights
Business insights
Research insights
Cross-domain insights
Operational insights
Approach Data for the masses
GitHub by the numbers (Mid 2016)
Approach http://ghtorrent.org
How does it work? http://api.github.com/events
Example event (condensed) https://api.github.com/users/Cephei https://api.github.com/repos/PowerDMS/Owin.Scim https://api.github.com/repos/PowerDMS/Owin.Scim/commits/c751014f634d73e0b72f78a53c8cf137888b3 https://api.github.com/orgs/PowerDMS
Entities
GHTorrent architecture Github API Event Retrieval Commits Queue Project Events
Queue Events Data Retrieval Projects Commits evt.commit evt.watch evt.fork Data Retrieval Data Retrieval Data Retrieval Mirroring Cluster
GHTorrent by the numbers
Using the data You can do it too!
Using the data: Hosted http://ghtorrent.org
Using the data: Download
Using the data: Self-service https://github.com/ghtorrent/ghtorrent-webhook
Using the data: Azure Data Lake
Resources http://ghtorrent.org https://github.com/Microsoft/ghinsights @gousiosg @jeffmcaffer @kelewis