Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
GitHub Insights: Understanding Open Source
Search
Georgios Gousios
May 19, 2016
Technology
0
370
GitHub Insights: Understanding Open Source
Talk given at OSCON 2016
Georgios Gousios
May 19, 2016
Tweet
Share
More Decks by Georgios Gousios
See All by Georgios Gousios
NLP + SE = ❤️
gousiosg
0
290
The troubles of modern dependency management and what to do about them
gousiosg
0
540
Mining Repositories with Apache Spark
gousiosg
0
660
My adventures with open everything
gousiosg
0
300
Structure and Evolution of Package Dependency Networks
gousiosg
0
780
Mining Github for fun and profit
gousiosg
9
63k
Work Practices and Challenges in Pull-Based Development: The Contributor’s Perspective
gousiosg
0
930
Big Data in Software Engineering panel and Privacy: Should we care?
gousiosg
0
290
The #issue32 incident
gousiosg
2
16k
Other Decks in Technology
See All in Technology
EncryptedSharedPreferences が deprecated になっちゃった!どうしよう! / Oh no! EncryptedSharedPreferences has been deprecated! What should I do?
yanzm
0
490
Platform開発が先行する Platform Engineeringの違和感
kintotechdev
4
590
サラリーマンの小遣いで作るtoCサービス - Cloudflare Workersでスケールする開発戦略
shinaps
2
470
「どこから読む?」コードとカルチャーに最速で馴染むための実践ガイド
zozotech
PRO
0
570
Claude Code でアプリ開発をオートパイロットにするためのTips集 Zennの場合 / Claude Code Tips in Zenn
wadayusuke
5
1.9k
OCI Oracle Database Services新機能アップデート(2025/06-2025/08)
oracle4engineer
PRO
0
180
「全員プロダクトマネージャー」を実現する、Cursorによる仕様検討の自動運転
applism118
22
12k
今日から始めるAWSセキュリティ対策 3ステップでわかる実践ガイド
yoshidatakeshi1994
0
120
Unlocking the Power of AI Agents with LINE Bot MCP Server
linedevth
0
120
Snowflake×dbtを用いたテレシーのデータ基盤のこれまでとこれから
sagara
0
130
Firestore → Spanner 移行 を成功させた段階的移行プロセス
athug
1
500
「何となくテストする」を卒業するためにプロダクトが動く仕組みを理解しよう
kawabeaver
0
440
Featured
See All Featured
Creating an realtime collaboration tool: Agile Flush - .NET Oxford
marcduiker
31
2.2k
KATA
mclloyd
32
14k
StorybookのUI Testing Handbookを読んだ
zakiyama
31
6.1k
Code Review Best Practice
trishagee
71
19k
Thoughts on Productivity
jonyablonski
70
4.8k
CoffeeScript is Beautiful & I Never Want to Write Plain JavaScript Again
sstephenson
162
15k
Design and Strategy: How to Deal with People Who Don’t "Get" Design
morganepeng
131
19k
The Straight Up "How To Draw Better" Workshop
denniskardys
236
140k
GraphQLの誤解/rethinking-graphql
sonatard
72
11k
Optimizing for Happiness
mojombo
379
70k
Understanding Cognitive Biases in Performance Measurement
bluesmoon
29
1.9k
Templates, Plugins, & Blocks: Oh My! Creating the theme that thinks of everything
marktimemedia
31
2.5k
Transcript
GitHub Insights Understanding Open Source @jeffmcaffer–Microsoft Georgios Gousios –Delft University
of Technology (TU Delft) Kevin Lewis – Microsoft
Snapshot overview
Inspire confidence
How open is a project? http://ghtorrent.org/pullreq-perf/
Commits (core vs community)
Commits (origin)
Comments (core vs community)
PR lifelines
Are we using git in a distributed way?
How may devs are there per country?
Insights
Business insights
Research insights
Cross-domain insights
Operational insights
Approach Data for the masses
GitHub by the numbers (Mid 2016)
Approach http://ghtorrent.org
How does it work? http://api.github.com/events
Example event (condensed) https://api.github.com/users/Cephei https://api.github.com/repos/PowerDMS/Owin.Scim https://api.github.com/repos/PowerDMS/Owin.Scim/commits/c751014f634d73e0b72f78a53c8cf137888b3 https://api.github.com/orgs/PowerDMS
Entities
GHTorrent architecture Github API Event Retrieval Commits Queue Project Events
Queue Events Data Retrieval Projects Commits evt.commit evt.watch evt.fork Data Retrieval Data Retrieval Data Retrieval Mirroring Cluster
GHTorrent by the numbers
Using the data You can do it too!
Using the data: Hosted http://ghtorrent.org
Using the data: Download
Using the data: Self-service https://github.com/ghtorrent/ghtorrent-webhook
Using the data: Azure Data Lake
Resources http://ghtorrent.org https://github.com/Microsoft/ghinsights @gousiosg @jeffmcaffer @kelewis