Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
GitHub Insights: Understanding Open Source
Search
Sponsored
·
Ship Features Fearlessly
Turn features on and off without deploys. Used by thousands of Ruby developers.
→
Georgios Gousios
May 19, 2016
Technology
450
0
Share
Embed
Copy iframe code
Copy JS code
Copy link
Start on current slide
GitHub Insights: Understanding Open Source
Talk given at OSCON 2016
Georgios Gousios
May 19, 2016
More Decks by Georgios Gousios
See All by Georgios Gousios
NLP + SE = ❤️
gousiosg
0
370
The troubles of modern dependency management and what to do about them
gousiosg
0
700
Mining Repositories with Apache Spark
gousiosg
0
730
My adventures with open everything
gousiosg
0
360
Structure and Evolution of Package Dependency Networks
gousiosg
0
920
Mining Github for fun and profit
gousiosg
9
63k
Work Practices and Challenges in Pull-Based Development: The Contributor’s Perspective
gousiosg
0
990
Big Data in Software Engineering panel and Privacy: Should we care?
gousiosg
0
350
The #issue32 incident
gousiosg
2
16k
Other Decks in Technology
See All in Technology
JEP 522 Deep Dive - G1 GC同期コスト削減によるスループット向上を徹底検証&解説
tabatad
1
870
生成 AI × MCP で切り拓く次世代 SRE!自律型運用への挑戦と開発者体験の進化
_awache
0
150
Dynamic Workersについて
yusukebe
2
590
AI-DLCを活用した高品質・安全なAI駆動開発実践 / AI Driven Development with AI-DLC
yoshidashingo
0
140
AI フレンドリーなエラー監視を TypeScript で実現する
shinyaigeek
2
260
コードレビューを制するチームがソフトウェアデリバリーのフローを制す / Beyond Code Review: Distributing Its Responsibilities Across the SDLC
mtx2s
4
1.1k
速さだけじゃない! VoidZero ツールが移行先に選ばれる理由
mizdra
PRO
6
750
AI-DLCを活用した高品質・安全なAI駆動開発実践 / AI Driven Development
yoshidashingo
1
370
SIer20年! 培ったスキルがスタートアップで輝く時
shucho0103
0
420
トークン数だけでは測れない — Claude Code 組織展開の効果検証から学んだこと
makikub
0
130
タクシーアプリ『GO』の実践的データ活用
mot_techtalk
3
150
AIガバナンス実践 - 生成AIコネクタのデータ漏洩リスクと実務対策
knishioka
0
190
Featured
See All Featured
Fight the Zombie Pattern Library - RWD Summit 2016
marcelosomers
234
17k
The Anti-SEO Checklist Checklist. Pubcon Cyber Week
ryanjones
0
150
Leading Effective Engineering Teams in the AI Era
addyosmani
9
2k
Building AI with AI
inesmontani
PRO
1
1.1k
Scaling GitHub
holman
464
140k
The Illustrated Guide to Node.js - THAT Conference 2024
reverentgeek
1
370
職位にかかわらず全員がリーダーシップを発揮するチーム作り / Building a team where everyone can demonstrate leadership regardless of position
madoxten
62
54k
Exploring anti-patterns in Rails
aemeredith
3
390
Let's Do A Bunch of Simple Stuff to Make Websites Faster
chriscoyier
508
140k
Between Models and Reality
mayunak
4
330
First, design no harm
axbom
PRO
2
1.2k
Neural Spatial Audio Processing for Sound Field Analysis and Control
skoyamalab
0
320
Transcript
GitHub Insights Understanding Open Source @jeffmcaffer–Microsoft Georgios Gousios –Delft University
of Technology (TU Delft) Kevin Lewis – Microsoft
Snapshot overview
Inspire confidence
How open is a project? http://ghtorrent.org/pullreq-perf/
Commits (core vs community)
Commits (origin)
Comments (core vs community)
PR lifelines
Are we using git in a distributed way?
How may devs are there per country?
Insights
Business insights
Research insights
Cross-domain insights
Operational insights
Approach Data for the masses
GitHub by the numbers (Mid 2016)
Approach http://ghtorrent.org
How does it work? http://api.github.com/events
Example event (condensed) https://api.github.com/users/Cephei https://api.github.com/repos/PowerDMS/Owin.Scim https://api.github.com/repos/PowerDMS/Owin.Scim/commits/c751014f634d73e0b72f78a53c8cf137888b3 https://api.github.com/orgs/PowerDMS
Entities
GHTorrent architecture Github API Event Retrieval Commits Queue Project Events
Queue Events Data Retrieval Projects Commits evt.commit evt.watch evt.fork Data Retrieval Data Retrieval Data Retrieval Mirroring Cluster
GHTorrent by the numbers
Using the data You can do it too!
Using the data: Hosted http://ghtorrent.org
Using the data: Download
Using the data: Self-service https://github.com/ghtorrent/ghtorrent-webhook
Using the data: Azure Data Lake
Resources http://ghtorrent.org https://github.com/Microsoft/ghinsights @gousiosg @jeffmcaffer @kelewis