Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
GitHub Insights: Understanding Open Source
Search
Georgios Gousios
May 19, 2016
Technology
0
360
GitHub Insights: Understanding Open Source
Talk given at OSCON 2016
Georgios Gousios
May 19, 2016
Tweet
Share
More Decks by Georgios Gousios
See All by Georgios Gousios
NLP + SE = ❤️
gousiosg
0
280
The troubles of modern dependency management and what to do about them
gousiosg
0
520
Mining Repositories with Apache Spark
gousiosg
0
650
My adventures with open everything
gousiosg
0
290
Structure and Evolution of Package Dependency Networks
gousiosg
0
760
Mining Github for fun and profit
gousiosg
9
63k
Work Practices and Challenges in Pull-Based Development: The Contributor’s Perspective
gousiosg
0
920
Big Data in Software Engineering panel and Privacy: Should we care?
gousiosg
0
280
The #issue32 incident
gousiosg
2
16k
Other Decks in Technology
See All in Technology
OpenHands🤲にContributeしてみた
kotauchisunsun
1
500
Connect 100+を支える技術
kanyamaguc
0
150
Fabric + Databricks 2025.6 の最新情報ピックアップ
ryomaru0825
1
160
Geminiとv0による高速プロトタイピング
shinya337
0
190
Beyond Kaniko: Navigating Unprivileged Container Image Creation
f30
0
100
強化されたAmazon Location Serviceによる新機能と開発者体験
dayjournal
3
250
Tech-Verse 2025 Global CTO Session
lycorptech_jp
PRO
0
1.1k
rubygem開発で鍛える設計力
joker1007
2
270
KubeCon + CloudNativeCon Japan 2025 Recap Opening & Choose Your Own Adventureシリーズまとめ
mmmatsuda
0
230
ドメイン特化なCLIPモデルとデータセットの紹介
tattaka
1
440
KubeCon + CloudNativeCon Japan 2025 Recap
ren510dev
1
300
SpringBoot x TestContainerで実現するポータブル自動結合テスト
demaecan
0
120
Featured
See All Featured
Visualization
eitanlees
146
16k
Building Applications with DynamoDB
mza
95
6.5k
How STYLIGHT went responsive
nonsquared
100
5.6k
Distributed Sagas: A Protocol for Coordinating Microservices
caitiem20
331
22k
"I'm Feeling Lucky" - Building Great Search Experiences for Today's Users (#IAC19)
danielanewman
229
22k
A designer walks into a library…
pauljervisheath
207
24k
The Illustrated Children's Guide to Kubernetes
chrisshort
48
50k
Fantastic passwords and where to find them - at NoRuKo
philnash
51
3.3k
The Language of Interfaces
destraynor
158
25k
[RailsConf 2023 Opening Keynote] The Magic of Rails
eileencodes
29
9.5k
A Modern Web Designer's Workflow
chriscoyier
694
190k
Easily Structure & Communicate Ideas using Wireframe
afnizarnur
194
16k
Transcript
GitHub Insights Understanding Open Source @jeffmcaffer–Microsoft Georgios Gousios –Delft University
of Technology (TU Delft) Kevin Lewis – Microsoft
Snapshot overview
Inspire confidence
How open is a project? http://ghtorrent.org/pullreq-perf/
Commits (core vs community)
Commits (origin)
Comments (core vs community)
PR lifelines
Are we using git in a distributed way?
How may devs are there per country?
Insights
Business insights
Research insights
Cross-domain insights
Operational insights
Approach Data for the masses
GitHub by the numbers (Mid 2016)
Approach http://ghtorrent.org
How does it work? http://api.github.com/events
Example event (condensed) https://api.github.com/users/Cephei https://api.github.com/repos/PowerDMS/Owin.Scim https://api.github.com/repos/PowerDMS/Owin.Scim/commits/c751014f634d73e0b72f78a53c8cf137888b3 https://api.github.com/orgs/PowerDMS
Entities
GHTorrent architecture Github API Event Retrieval Commits Queue Project Events
Queue Events Data Retrieval Projects Commits evt.commit evt.watch evt.fork Data Retrieval Data Retrieval Data Retrieval Mirroring Cluster
GHTorrent by the numbers
Using the data You can do it too!
Using the data: Hosted http://ghtorrent.org
Using the data: Download
Using the data: Self-service https://github.com/ghtorrent/ghtorrent-webhook
Using the data: Azure Data Lake
Resources http://ghtorrent.org https://github.com/Microsoft/ghinsights @gousiosg @jeffmcaffer @kelewis