Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
GitHub Insights: Understanding Open Source
Search
Georgios Gousios
May 19, 2016
Technology
0
330
GitHub Insights: Understanding Open Source
Talk given at OSCON 2016
Georgios Gousios
May 19, 2016
Tweet
Share
More Decks by Georgios Gousios
See All by Georgios Gousios
NLP + SE = ❤️
gousiosg
0
250
The troubles of modern dependency management and what to do about them
gousiosg
0
480
Mining Repositories with Apache Spark
gousiosg
0
590
My adventures with open everything
gousiosg
0
250
Structure and Evolution of Package Dependency Networks
gousiosg
0
700
Mining Github for fun and profit
gousiosg
9
63k
Work Practices and Challenges in Pull-Based Development: The Contributor’s Perspective
gousiosg
0
880
Big Data in Software Engineering panel and Privacy: Should we care?
gousiosg
0
250
The #issue32 incident
gousiosg
2
15k
Other Decks in Technology
See All in Technology
ドメインの本質を掴む / Get the essence of the domain
sinsoku
2
150
Lambdaと地方とコミュニティ
miu_crescent
2
370
Lambda10周年!Lambdaは何をもたらしたか
smt7174
2
110
B2B SaaSから見た最近のC#/.NETの進化
sansantech
PRO
0
680
Shopifyアプリ開発における Shopifyの機能活用
sonatard
4
250
Terraform CI/CD パイプラインにおける AWS CodeCommit の代替手段
hiyanger
1
240
Platform Engineering for Software Developers and Architects
syntasso
1
510
【令和最新版】AWS Direct Connectと愉快なGWたちのおさらい
minorun365
PRO
5
750
ドメイン名の終活について - JPAAWG 7th -
mikit
33
20k
障害対応指揮の意思決定と情報共有における価値観 / Waroom Meetup #2
arthur1
5
470
IBC 2024 動画技術関連レポート / IBC 2024 Report
cyberagentdevelopers
PRO
0
110
10XにおけるData Contractの導入について: Data Contract事例共有会
10xinc
5
590
Featured
See All Featured
Refactoring Trust on Your Teams (GOTO; Chicago 2020)
rmw
31
2.7k
Docker and Python
trallard
40
3.1k
Building Better People: How to give real-time feedback that sticks.
wjessup
364
19k
Teambox: Starting and Learning
jrom
133
8.8k
How STYLIGHT went responsive
nonsquared
95
5.2k
RailsConf & Balkan Ruby 2019: The Past, Present, and Future of Rails at GitHub
eileencodes
131
33k
Building a Scalable Design System with Sketch
lauravandoore
459
33k
Done Done
chrislema
181
16k
Building Flexible Design Systems
yeseniaperezcruz
327
38k
ピンチをチャンスに:未来をつくるプロダクトロードマップ #pmconf2020
aki_iinuma
109
49k
jQuery: Nuts, Bolts and Bling
dougneiner
61
7.5k
XXLCSS - How to scale CSS and keep your sanity
sugarenia
246
1.3M
Transcript
GitHub Insights Understanding Open Source @jeffmcaffer–Microsoft Georgios Gousios –Delft University
of Technology (TU Delft) Kevin Lewis – Microsoft
Snapshot overview
Inspire confidence
How open is a project? http://ghtorrent.org/pullreq-perf/
Commits (core vs community)
Commits (origin)
Comments (core vs community)
PR lifelines
Are we using git in a distributed way?
How may devs are there per country?
Insights
Business insights
Research insights
Cross-domain insights
Operational insights
Approach Data for the masses
GitHub by the numbers (Mid 2016)
Approach http://ghtorrent.org
How does it work? http://api.github.com/events
Example event (condensed) https://api.github.com/users/Cephei https://api.github.com/repos/PowerDMS/Owin.Scim https://api.github.com/repos/PowerDMS/Owin.Scim/commits/c751014f634d73e0b72f78a53c8cf137888b3 https://api.github.com/orgs/PowerDMS
Entities
GHTorrent architecture Github API Event Retrieval Commits Queue Project Events
Queue Events Data Retrieval Projects Commits evt.commit evt.watch evt.fork Data Retrieval Data Retrieval Data Retrieval Mirroring Cluster
GHTorrent by the numbers
Using the data You can do it too!
Using the data: Hosted http://ghtorrent.org
Using the data: Download
Using the data: Self-service https://github.com/ghtorrent/ghtorrent-webhook
Using the data: Azure Data Lake
Resources http://ghtorrent.org https://github.com/Microsoft/ghinsights @gousiosg @jeffmcaffer @kelewis