Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
GitHub Insights: Understanding Open Source
Search
Georgios Gousios
May 19, 2016
Technology
0
420
GitHub Insights: Understanding Open Source
Talk given at OSCON 2016
Georgios Gousios
May 19, 2016
Tweet
Share
More Decks by Georgios Gousios
See All by Georgios Gousios
NLP + SE = ❤️
gousiosg
0
330
The troubles of modern dependency management and what to do about them
gousiosg
0
650
Mining Repositories with Apache Spark
gousiosg
0
700
My adventures with open everything
gousiosg
0
350
Structure and Evolution of Package Dependency Networks
gousiosg
0
850
Mining Github for fun and profit
gousiosg
9
63k
Work Practices and Challenges in Pull-Based Development: The Contributor’s Perspective
gousiosg
0
970
Big Data in Software Engineering panel and Privacy: Should we care?
gousiosg
0
340
The #issue32 incident
gousiosg
2
16k
Other Decks in Technology
See All in Technology
2026-02-25 Tokyo dbt meetup プロダクトと融合したCI/CD で実現する、堅牢なデータパイプラインの作り方
y_ken
0
160
【SLO】"多様な期待値" と向き合ってみた
z63d
2
270
Webアクセシビリティ技術と実装の実際
tomokusaba
0
170
組織のSREを推進するためのPlatform EngineeringとEKS / Platform Engineering and EKS to drive SRE in your organization
chmikata
0
160
Agentic Codingの実践とチームで導入するための工夫
lycorptech_jp
PRO
0
240
論文検索を日本語でできるアプリを作ってみた
sailen2
0
150
マイグレーションガイドに書いてないRiverpod 3移行話
taiju59
0
330
WBCの解説は生成AIにやらせよう - 生成AIで野球解説者AI Agentを実現する / Baseball Commentator AI Agent for Gemini
shinyorke
PRO
0
310
AIに視覚を与えモバイルアプリケーション開発をより円滑に行う
lycorptech_jp
PRO
1
620
競争優位を生み出す戦略的内製開発の実践技法
masuda220
PRO
2
520
Serverless Agent Architecture on Azure / serverless-agent-on-azure
miyake
1
120
Digitization部 紹介資料
sansan33
PRO
1
6.9k
Featured
See All Featured
Amusing Abliteration
ianozsvald
0
120
The AI Revolution Will Not Be Monopolized: How open-source beats economies of scale, even for LLMs
inesmontani
PRO
3
3.1k
DBのスキルで生き残る技術 - AI時代におけるテーブル設計の勘所
soudai
PRO
62
50k
Technical Leadership for Architectural Decision Making
baasie
3
270
Embracing the Ebb and Flow
colly
88
5k
Mozcon NYC 2025: Stop Losing SEO Traffic
samtorres
0
160
Measuring & Analyzing Core Web Vitals
bluesmoon
9
770
Mind Mapping
helmedeiros
PRO
1
110
Building a Modern Day E-commerce SEO Strategy
aleyda
45
8.7k
Understanding Cognitive Biases in Performance Measurement
bluesmoon
32
2.8k
Product Roadmaps are Hard
iamctodd
PRO
55
12k
Claude Code どこまでも/ Claude Code Everywhere
nwiizo
63
53k
Transcript
GitHub Insights Understanding Open Source @jeffmcaffer–Microsoft Georgios Gousios –Delft University
of Technology (TU Delft) Kevin Lewis – Microsoft
Snapshot overview
Inspire confidence
How open is a project? http://ghtorrent.org/pullreq-perf/
Commits (core vs community)
Commits (origin)
Comments (core vs community)
PR lifelines
Are we using git in a distributed way?
How may devs are there per country?
Insights
Business insights
Research insights
Cross-domain insights
Operational insights
Approach Data for the masses
GitHub by the numbers (Mid 2016)
Approach http://ghtorrent.org
How does it work? http://api.github.com/events
Example event (condensed) https://api.github.com/users/Cephei https://api.github.com/repos/PowerDMS/Owin.Scim https://api.github.com/repos/PowerDMS/Owin.Scim/commits/c751014f634d73e0b72f78a53c8cf137888b3 https://api.github.com/orgs/PowerDMS
Entities
GHTorrent architecture Github API Event Retrieval Commits Queue Project Events
Queue Events Data Retrieval Projects Commits evt.commit evt.watch evt.fork Data Retrieval Data Retrieval Data Retrieval Mirroring Cluster
GHTorrent by the numbers
Using the data You can do it too!
Using the data: Hosted http://ghtorrent.org
Using the data: Download
Using the data: Self-service https://github.com/ghtorrent/ghtorrent-webhook
Using the data: Azure Data Lake
Resources http://ghtorrent.org https://github.com/Microsoft/ghinsights @gousiosg @jeffmcaffer @kelewis