Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
GitHub Insights: Understanding Open Source
Search
Georgios Gousios
May 19, 2016
Technology
0
420
GitHub Insights: Understanding Open Source
Talk given at OSCON 2016
Georgios Gousios
May 19, 2016
Tweet
Share
More Decks by Georgios Gousios
See All by Georgios Gousios
NLP + SE = ❤️
gousiosg
0
320
The troubles of modern dependency management and what to do about them
gousiosg
0
640
Mining Repositories with Apache Spark
gousiosg
0
700
My adventures with open everything
gousiosg
0
340
Structure and Evolution of Package Dependency Networks
gousiosg
0
840
Mining Github for fun and profit
gousiosg
9
63k
Work Practices and Challenges in Pull-Based Development: The Contributor’s Perspective
gousiosg
0
960
Big Data in Software Engineering panel and Privacy: Should we care?
gousiosg
0
330
The #issue32 incident
gousiosg
2
16k
Other Decks in Technology
See All in Technology
ファインディの横断SREがTakumi byGMOと取り組む、セキュリティと開発スピードの両立
rvirus0817
1
850
データ民主化のための LLM 活用状況と課題紹介(IVRy の場合)
wxyzzz
2
510
KubeCon + CloudNativeCon NA ‘25 Recap, Extensibility: Gateway API / NRI
ladicle
0
170
MySQLのJSON機能の活用術
ikomachi226
0
130
開発メンバーが語るFindy Conferenceの裏側とこれから
sontixyou
2
470
サイボウズ 開発本部採用ピッチ / Cybozu Engineer Recruit
cybozuinsideout
PRO
10
73k
Azure Durable Functions で作った NL2SQL Agent の精度向上に取り組んだ話/jat08
thara0402
0
110
toCプロダクトにおけるAI機能開発のしくじりと学び / ai-product-failures-and-learnings
rince
6
5.1k
生成AI時代にこそ求められるSRE / SRE for Gen AI era
ymotongpoo
4
1.7k
日本語テキストと音楽の対照学習の技術とその応用
lycorptech_jp
PRO
1
400
今日から始めるAmazon Bedrock AgentCore
har1101
4
310
Frontier Agents (Kiro autonomous agent / AWS Security Agent / AWS DevOps Agent) の紹介
msysh
3
110
Featured
See All Featured
Claude Code のすすめ
schroneko
67
210k
Unsuck your backbone
ammeep
671
58k
My Coaching Mixtape
mlcsv
0
45
More Than Pixels: Becoming A User Experience Designer
marktimemedia
3
310
Principles of Awesome APIs and How to Build Them.
keavy
128
17k
Skip the Path - Find Your Career Trail
mkilby
0
51
Un-Boring Meetings
codingconduct
0
200
Impact Scores and Hybrid Strategies: The future of link building
tamaranovitovic
0
200
Claude Code どこまでも/ Claude Code Everywhere
nwiizo
61
52k
Exploring the relationship between traditional SERPs and Gen AI search
raygrieselhuber
PRO
2
3.6k
How to Build an AI Search Optimization Roadmap - Criteria and Steps to Take #SEOIRL
aleyda
1
1.9k
The Success of Rails: Ensuring Growth for the Next 100 Years
eileencodes
47
7.9k
Transcript
GitHub Insights Understanding Open Source @jeffmcaffer–Microsoft Georgios Gousios –Delft University
of Technology (TU Delft) Kevin Lewis – Microsoft
Snapshot overview
Inspire confidence
How open is a project? http://ghtorrent.org/pullreq-perf/
Commits (core vs community)
Commits (origin)
Comments (core vs community)
PR lifelines
Are we using git in a distributed way?
How may devs are there per country?
Insights
Business insights
Research insights
Cross-domain insights
Operational insights
Approach Data for the masses
GitHub by the numbers (Mid 2016)
Approach http://ghtorrent.org
How does it work? http://api.github.com/events
Example event (condensed) https://api.github.com/users/Cephei https://api.github.com/repos/PowerDMS/Owin.Scim https://api.github.com/repos/PowerDMS/Owin.Scim/commits/c751014f634d73e0b72f78a53c8cf137888b3 https://api.github.com/orgs/PowerDMS
Entities
GHTorrent architecture Github API Event Retrieval Commits Queue Project Events
Queue Events Data Retrieval Projects Commits evt.commit evt.watch evt.fork Data Retrieval Data Retrieval Data Retrieval Mirroring Cluster
GHTorrent by the numbers
Using the data You can do it too!
Using the data: Hosted http://ghtorrent.org
Using the data: Download
Using the data: Self-service https://github.com/ghtorrent/ghtorrent-webhook
Using the data: Azure Data Lake
Resources http://ghtorrent.org https://github.com/Microsoft/ghinsights @gousiosg @jeffmcaffer @kelewis