Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
GitHub Insights: Understanding Open Source
Search
Georgios Gousios
May 19, 2016
Technology
0
360
GitHub Insights: Understanding Open Source
Talk given at OSCON 2016
Georgios Gousios
May 19, 2016
Tweet
Share
More Decks by Georgios Gousios
See All by Georgios Gousios
NLP + SE = ❤️
gousiosg
0
280
The troubles of modern dependency management and what to do about them
gousiosg
0
530
Mining Repositories with Apache Spark
gousiosg
0
650
My adventures with open everything
gousiosg
0
290
Structure and Evolution of Package Dependency Networks
gousiosg
0
760
Mining Github for fun and profit
gousiosg
9
63k
Work Practices and Challenges in Pull-Based Development: The Contributor’s Perspective
gousiosg
0
920
Big Data in Software Engineering panel and Privacy: Should we care?
gousiosg
0
280
The #issue32 incident
gousiosg
2
16k
Other Decks in Technology
See All in Technology
Lufthansa ®️ USA Contact Numbers: Complete 2025 Support Guide
lufthanahelpsupport
0
250
公開初日に Gemini CLI を試した話や FFmpeg と組み合わせてみた話など / Gemini CLI 初学者勉強会(#AI道場)
you
PRO
0
1k
推し書籍📚 / Books and a QA Engineer
ak1210
0
120
モニタリング統一への道のり - 分散モニタリングツール統合のためのオブザーバビリティプロジェクト
niftycorp
PRO
1
370
マーケットプレイス版Oracle WebCenter Content For OCI
oracle4engineer
PRO
3
990
Lakebaseを使ったAIエージェントを実装してみる
kameitomohiro
0
180
【LT会登壇資料】TROCCO新コネクタ「スマレジ」を活用した直営店データの分析
kazari0425
1
170
United Airlines Customer Service– Call 1-833-341-3142 Now!
airhelp
0
180
「クラウドコスト絶対削減」を支える技術—FinOpsを超えた徹底的なクラウドコスト削減の実践論
delta_tech
4
190
セキュアな社内Dify運用と外部連携の両立 ~AIによるAPIリスク評価~
zozotech
PRO
0
100
AIエージェントが書くのなら直接CloudFormationを書かせればいいじゃないですか何故AWS CDKを使う必要があるのさ
watany
18
7.1k
衛星運用をソフトウェアエンジニアに依頼したときにできあがるもの
sankichi92
1
230
Featured
See All Featured
Exploring the Power of Turbo Streams & Action Cable | RailsConf2023
kevinliebholz
34
5.9k
Docker and Python
trallard
45
3.5k
The Invisible Side of Design
smashingmag
301
51k
Thoughts on Productivity
jonyablonski
69
4.7k
Connecting the Dots Between Site Speed, User Experience & Your Business [WebExpo 2025]
tammyeverts
7
330
The Language of Interfaces
destraynor
158
25k
Keith and Marios Guide to Fast Websites
keithpitt
411
22k
The Art of Programming - Codeland 2020
erikaheidi
54
13k
The Myth of the Modular Monolith - Day 2 Keynote - Rails World 2024
eileencodes
26
2.9k
Unsuck your backbone
ammeep
671
58k
Cheating the UX When There Is Nothing More to Optimize - PixelPioneers
stephaniewalter
281
13k
The Power of CSS Pseudo Elements
geoffreycrofte
77
5.9k
Transcript
GitHub Insights Understanding Open Source @jeffmcaffer–Microsoft Georgios Gousios –Delft University
of Technology (TU Delft) Kevin Lewis – Microsoft
Snapshot overview
Inspire confidence
How open is a project? http://ghtorrent.org/pullreq-perf/
Commits (core vs community)
Commits (origin)
Comments (core vs community)
PR lifelines
Are we using git in a distributed way?
How may devs are there per country?
Insights
Business insights
Research insights
Cross-domain insights
Operational insights
Approach Data for the masses
GitHub by the numbers (Mid 2016)
Approach http://ghtorrent.org
How does it work? http://api.github.com/events
Example event (condensed) https://api.github.com/users/Cephei https://api.github.com/repos/PowerDMS/Owin.Scim https://api.github.com/repos/PowerDMS/Owin.Scim/commits/c751014f634d73e0b72f78a53c8cf137888b3 https://api.github.com/orgs/PowerDMS
Entities
GHTorrent architecture Github API Event Retrieval Commits Queue Project Events
Queue Events Data Retrieval Projects Commits evt.commit evt.watch evt.fork Data Retrieval Data Retrieval Data Retrieval Mirroring Cluster
GHTorrent by the numbers
Using the data You can do it too!
Using the data: Hosted http://ghtorrent.org
Using the data: Download
Using the data: Self-service https://github.com/ghtorrent/ghtorrent-webhook
Using the data: Azure Data Lake
Resources http://ghtorrent.org https://github.com/Microsoft/ghinsights @gousiosg @jeffmcaffer @kelewis