Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
GitHub Insights: Understanding Open Source
Search
Georgios Gousios
May 19, 2016
Technology
0
420
GitHub Insights: Understanding Open Source
Talk given at OSCON 2016
Georgios Gousios
May 19, 2016
Tweet
Share
More Decks by Georgios Gousios
See All by Georgios Gousios
NLP + SE = ❤️
gousiosg
0
320
The troubles of modern dependency management and what to do about them
gousiosg
0
640
Mining Repositories with Apache Spark
gousiosg
0
700
My adventures with open everything
gousiosg
0
340
Structure and Evolution of Package Dependency Networks
gousiosg
0
840
Mining Github for fun and profit
gousiosg
9
63k
Work Practices and Challenges in Pull-Based Development: The Contributor’s Perspective
gousiosg
0
960
Big Data in Software Engineering panel and Privacy: Should we care?
gousiosg
0
330
The #issue32 incident
gousiosg
2
16k
Other Decks in Technology
See All in Technology
toCプロダクトにおけるAI機能開発のしくじりと学び / ai-product-failures-and-learnings
rince
6
5.1k
日本語テキストと音楽の対照学習の技術とその応用
lycorptech_jp
PRO
1
400
Databricks Free Edition講座 データサイエンス編
taka_aki
0
270
2026年、サーバーレスの現在地 -「制約と戦う技術」から「当たり前の実行基盤」へ- /serverless2026
slsops
2
140
Embedded SREの終わりを設計する 「なんとなく」から計画的な自立支援へ
sansantech
PRO
2
1.3k
レガシー共有バッチ基盤への挑戦 - SREドリブンなリアーキテクチャリングの取り組み
tatsukoni
0
160
ファインディの横断SREがTakumi byGMOと取り組む、セキュリティと開発スピードの両立
rvirus0817
1
860
What happened to RubyGems and what can we learn?
mikemcquaid
0
190
Frontier Agents (Kiro autonomous agent / AWS Security Agent / AWS DevOps Agent) の紹介
msysh
3
110
Amazon Bedrock AgentCore 認証・認可入門
hironobuiga
2
480
CDKで始めるTypeScript開発のススメ
tsukuboshi
1
220
あたらしい上流工程の形。 0日導入からはじめるAI駆動PM
kumaiu
4
670
Featured
See All Featured
Jess Joyce - The Pitfalls of Following Frameworks
techseoconnect
PRO
1
62
Embracing the Ebb and Flow
colly
88
5k
Statistics for Hackers
jakevdp
799
230k
Chrome DevTools: State of the Union 2024 - Debugging React & Beyond
addyosmani
9
1.1k
Exploring the Power of Turbo Streams & Action Cable | RailsConf2023
kevinliebholz
37
6.3k
Dealing with People You Can't Stand - Big Design 2015
cassininazir
367
27k
Neural Spatial Audio Processing for Sound Field Analysis and Control
skoyamalab
0
160
Building Flexible Design Systems
yeseniaperezcruz
330
40k
CSS Pre-Processors: Stylus, Less & Sass
bermonpainter
359
30k
The State of eCommerce SEO: How to Win in Today's Products SERPs - #SEOweek
aleyda
2
9.5k
Amusing Abliteration
ianozsvald
0
91
Why Your Marketing Sucks and What You Can Do About It - Sophie Logan
marketingsoph
0
69
Transcript
GitHub Insights Understanding Open Source @jeffmcaffer–Microsoft Georgios Gousios –Delft University
of Technology (TU Delft) Kevin Lewis – Microsoft
Snapshot overview
Inspire confidence
How open is a project? http://ghtorrent.org/pullreq-perf/
Commits (core vs community)
Commits (origin)
Comments (core vs community)
PR lifelines
Are we using git in a distributed way?
How may devs are there per country?
Insights
Business insights
Research insights
Cross-domain insights
Operational insights
Approach Data for the masses
GitHub by the numbers (Mid 2016)
Approach http://ghtorrent.org
How does it work? http://api.github.com/events
Example event (condensed) https://api.github.com/users/Cephei https://api.github.com/repos/PowerDMS/Owin.Scim https://api.github.com/repos/PowerDMS/Owin.Scim/commits/c751014f634d73e0b72f78a53c8cf137888b3 https://api.github.com/orgs/PowerDMS
Entities
GHTorrent architecture Github API Event Retrieval Commits Queue Project Events
Queue Events Data Retrieval Projects Commits evt.commit evt.watch evt.fork Data Retrieval Data Retrieval Data Retrieval Mirroring Cluster
GHTorrent by the numbers
Using the data You can do it too!
Using the data: Hosted http://ghtorrent.org
Using the data: Download
Using the data: Self-service https://github.com/ghtorrent/ghtorrent-webhook
Using the data: Azure Data Lake
Resources http://ghtorrent.org https://github.com/Microsoft/ghinsights @gousiosg @jeffmcaffer @kelewis