Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
GitHub Insights: Understanding Open Source
Search
Georgios Gousios
May 19, 2016
Technology
0
340
GitHub Insights: Understanding Open Source
Talk given at OSCON 2016
Georgios Gousios
May 19, 2016
Tweet
Share
More Decks by Georgios Gousios
See All by Georgios Gousios
NLP + SE = ❤️
gousiosg
0
260
The troubles of modern dependency management and what to do about them
gousiosg
0
490
Mining Repositories with Apache Spark
gousiosg
0
620
My adventures with open everything
gousiosg
0
260
Structure and Evolution of Package Dependency Networks
gousiosg
0
730
Mining Github for fun and profit
gousiosg
9
63k
Work Practices and Challenges in Pull-Based Development: The Contributor’s Perspective
gousiosg
0
900
Big Data in Software Engineering panel and Privacy: Should we care?
gousiosg
0
260
The #issue32 incident
gousiosg
2
16k
Other Decks in Technology
See All in Technology
Ruby on Railsで持続可能な開発を行うために取り組んでいること
am1157154
3
170
DevinでAI AWSエンジニア製造計画 序章 〜CDKを添えて〜/devin-load-to-aws-engineer
tomoki10
0
220
MIMEと文字コードの闇
hirachan
2
1.5k
Snowflake ML モデルを dbt データパイプラインに組み込む
estie
0
120
AWSアカウントのセキュリティ自動化、どこまで進める? 最適な設計と実践ポイント
yuobayashi
7
1.8k
役員・マネージャー・著者・エンジニアそれぞれの立場から見たAWS認定資格
nrinetcom
PRO
5
6.8k
開発者のための FinOps/FinOps for Engineers
oracle4engineer
PRO
2
260
AIエージェント開発のノウハウと課題
pharma_x_tech
9
5k
"TEAM"を導入したら最高のエンジニア"Team"を実現できた / Deploying "TEAM" and Building the Best Engineering "Team"
yuj1osm
1
240
30→150人のエンジニア組織拡大に伴うアジャイル文化を醸成する役割と取り組みの変化
nagata03
0
360
遷移の高速化 ヤフートップの試行錯誤
narirou
6
2k
事業を差別化する技術を生み出す技術
pyama86
2
540
Featured
See All Featured
Rebuilding a faster, lazier Slack
samanthasiow
80
8.9k
Build your cross-platform service in a week with App Engine
jlugia
229
18k
Fontdeck: Realign not Redesign
paulrobertlloyd
83
5.4k
For a Future-Friendly Web
brad_frost
176
9.6k
Keith and Marios Guide to Fast Websites
keithpitt
411
22k
Git: the NoSQL Database
bkeepers
PRO
428
65k
Art, The Web, and Tiny UX
lynnandtonic
298
20k
Designing for humans not robots
tammielis
250
25k
Visualization
eitanlees
146
15k
Music & Morning Musume
bryan
46
6.4k
Evolution of real-time – Irina Nazarova, EuRuKo, 2024
irinanazarova
6
580
Adopting Sorbet at Scale
ufuk
75
9.2k
Transcript
GitHub Insights Understanding Open Source @jeffmcaffer–Microsoft Georgios Gousios –Delft University
of Technology (TU Delft) Kevin Lewis – Microsoft
Snapshot overview
Inspire confidence
How open is a project? http://ghtorrent.org/pullreq-perf/
Commits (core vs community)
Commits (origin)
Comments (core vs community)
PR lifelines
Are we using git in a distributed way?
How may devs are there per country?
Insights
Business insights
Research insights
Cross-domain insights
Operational insights
Approach Data for the masses
GitHub by the numbers (Mid 2016)
Approach http://ghtorrent.org
How does it work? http://api.github.com/events
Example event (condensed) https://api.github.com/users/Cephei https://api.github.com/repos/PowerDMS/Owin.Scim https://api.github.com/repos/PowerDMS/Owin.Scim/commits/c751014f634d73e0b72f78a53c8cf137888b3 https://api.github.com/orgs/PowerDMS
Entities
GHTorrent architecture Github API Event Retrieval Commits Queue Project Events
Queue Events Data Retrieval Projects Commits evt.commit evt.watch evt.fork Data Retrieval Data Retrieval Data Retrieval Mirroring Cluster
GHTorrent by the numbers
Using the data You can do it too!
Using the data: Hosted http://ghtorrent.org
Using the data: Download
Using the data: Self-service https://github.com/ghtorrent/ghtorrent-webhook
Using the data: Azure Data Lake
Resources http://ghtorrent.org https://github.com/Microsoft/ghinsights @gousiosg @jeffmcaffer @kelewis