Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
GitHub Insights: Understanding Open Source
Search
Georgios Gousios
May 19, 2016
Technology
0
370
GitHub Insights: Understanding Open Source
Talk given at OSCON 2016
Georgios Gousios
May 19, 2016
Tweet
Share
More Decks by Georgios Gousios
See All by Georgios Gousios
NLP + SE = ❤️
gousiosg
0
290
The troubles of modern dependency management and what to do about them
gousiosg
0
530
Mining Repositories with Apache Spark
gousiosg
0
650
My adventures with open everything
gousiosg
0
290
Structure and Evolution of Package Dependency Networks
gousiosg
0
770
Mining Github for fun and profit
gousiosg
9
63k
Work Practices and Challenges in Pull-Based Development: The Contributor’s Perspective
gousiosg
0
920
Big Data in Software Engineering panel and Privacy: Should we care?
gousiosg
0
290
The #issue32 incident
gousiosg
2
16k
Other Decks in Technology
See All in Technology
LLM 機能を支える Langfuse / ClickHouse のサーバレス化
yuu26
9
1.4k
o11yツールを乗り換えた話
tak0x00
2
750
風が吹けばWHOISが使えなくなる~なぜWHOIS・RDAPはサーバー証明書のメール認証に使えなくなったのか~
orangemorishita
15
5.6k
九州の人に知ってもらいたいGISスポット / gis spot in kyushu 2025
sakaik
0
120
dipにおけるSRE変革の軌跡
dip_tech
PRO
1
250
Claude CodeでKiroの仕様駆動開発を実現させるには...
gotalab555
3
980
2時間で300+テーブルをデータ基盤に連携するためのAI活用 / FukuokaDataEngineer
sansan_randd
0
140
LTに影響を受けてテンプレリポジトリを作った話
hol1kgmg
0
350
Claude Codeから我々が学ぶべきこと
oikon48
10
2.8k
Claude Codeが働くAI中心の業務システム構築の挑戦―AIエージェント中心の働き方を目指して
os1ma
9
2.4k
人に寄り添うAIエージェントとアーキテクチャ #BetAIDay
layerx
PRO
9
2.1k
Google Cloud で学ぶデータエンジニアリング入門 2025年版 #GoogleCloudNext / 20250805
kazaneya
PRO
19
4.5k
Featured
See All Featured
Fantastic passwords and where to find them - at NoRuKo
philnash
51
3.4k
How To Stay Up To Date on Web Technology
chriscoyier
790
250k
How to train your dragon (web standard)
notwaldorf
96
6.2k
Keith and Marios Guide to Fast Websites
keithpitt
411
22k
Bash Introduction
62gerente
614
210k
Why Our Code Smells
bkeepers
PRO
337
57k
The Straight Up "How To Draw Better" Workshop
denniskardys
235
140k
Gamification - CAS2011
davidbonilla
81
5.4k
Build The Right Thing And Hit Your Dates
maggiecrowley
37
2.8k
Helping Users Find Their Own Way: Creating Modern Search Experiences
danielanewman
29
2.8k
VelocityConf: Rendering Performance Case Studies
addyosmani
332
24k
Code Reviewing Like a Champion
maltzj
524
40k
Transcript
GitHub Insights Understanding Open Source @jeffmcaffer–Microsoft Georgios Gousios –Delft University
of Technology (TU Delft) Kevin Lewis – Microsoft
Snapshot overview
Inspire confidence
How open is a project? http://ghtorrent.org/pullreq-perf/
Commits (core vs community)
Commits (origin)
Comments (core vs community)
PR lifelines
Are we using git in a distributed way?
How may devs are there per country?
Insights
Business insights
Research insights
Cross-domain insights
Operational insights
Approach Data for the masses
GitHub by the numbers (Mid 2016)
Approach http://ghtorrent.org
How does it work? http://api.github.com/events
Example event (condensed) https://api.github.com/users/Cephei https://api.github.com/repos/PowerDMS/Owin.Scim https://api.github.com/repos/PowerDMS/Owin.Scim/commits/c751014f634d73e0b72f78a53c8cf137888b3 https://api.github.com/orgs/PowerDMS
Entities
GHTorrent architecture Github API Event Retrieval Commits Queue Project Events
Queue Events Data Retrieval Projects Commits evt.commit evt.watch evt.fork Data Retrieval Data Retrieval Data Retrieval Mirroring Cluster
GHTorrent by the numbers
Using the data You can do it too!
Using the data: Hosted http://ghtorrent.org
Using the data: Download
Using the data: Self-service https://github.com/ghtorrent/ghtorrent-webhook
Using the data: Azure Data Lake
Resources http://ghtorrent.org https://github.com/Microsoft/ghinsights @gousiosg @jeffmcaffer @kelewis