Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
GitHub Insights: Understanding Open Source
Search
Georgios Gousios
May 19, 2016
Technology
0
400
GitHub Insights: Understanding Open Source
Talk given at OSCON 2016
Georgios Gousios
May 19, 2016
Tweet
Share
More Decks by Georgios Gousios
See All by Georgios Gousios
NLP + SE = ❤️
gousiosg
0
310
The troubles of modern dependency management and what to do about them
gousiosg
0
580
Mining Repositories with Apache Spark
gousiosg
0
680
My adventures with open everything
gousiosg
0
320
Structure and Evolution of Package Dependency Networks
gousiosg
0
810
Mining Github for fun and profit
gousiosg
9
63k
Work Practices and Challenges in Pull-Based Development: The Contributor’s Perspective
gousiosg
0
950
Big Data in Software Engineering panel and Privacy: Should we care?
gousiosg
0
320
The #issue32 incident
gousiosg
2
16k
Other Decks in Technology
See All in Technology
M&Aで拡大し続けるGENDAのデータ活用を促すためのDatabricks権限管理 / AEON TECH HUB #22
genda
0
310
Autonomous Database - Dedicated 技術詳細 / adb-d_technical_detail_jp
oracle4engineer
PRO
5
12k
「アウトプット脳からユーザー価値脳へ」がそんなに簡単にできたら苦労しない #RSGT2026
aki_iinuma
6
1.8k
Agent Skillsがハーネスの垣根を超える日
gotalab555
7
5.1k
2025年 山梨の技術コミュニティを振り返る
yuukis
0
140
松尾研LLM講座2025 応用編Day3「軽量化」 講義資料
aratako
14
4.8k
モダンデータスタックの理想と現実の間で~1.3億人Vポイントデータ基盤の現在地とこれから~
taromatsui_cccmkhd
2
300
AI with TiDD
shiraji
1
330
善意の活動は、なぜ続かなくなるのか ーふりかえりが"構造を変える判断"になった半年間ー
matsukurou
0
110
_第4回__AIxIoTビジネス共創ラボ紹介資料_20251203.pdf
iotcomjpadmin
0
170
AWS re:Invent2025最新動向まとめ(NRIグループre:Cap 2025)
gamogamo
0
150
「リリースファースト」の実感を届けるには 〜停滞するチームに変化を起こすアプローチ〜 #RSGT2026
kintotechdev
0
360
Featured
See All Featured
Responsive Adventures: Dirty Tricks From The Dark Corners of Front-End
smashingmag
254
22k
Designing for Timeless Needs
cassininazir
0
110
Paper Plane
katiecoart
PRO
0
45k
A brief & incomplete history of UX Design for the World Wide Web: 1989–2019
jct
1
270
Tell your own story through comics
letsgokoyo
0
770
How to Think Like a Performance Engineer
csswizardry
28
2.4k
Information Architects: The Missing Link in Design Systems
soysaucechin
0
730
ラッコキーワード サービス紹介資料
rakko
0
1.9M
Measuring & Analyzing Core Web Vitals
bluesmoon
9
720
Why Mistakes Are the Best Teachers: Turning Failure into a Pathway for Growth
auna
0
32
[Rails World 2023 - Day 1 Closing Keynote] - The Magic of Rails
eileencodes
38
2.7k
BBQ
matthewcrist
89
9.9k
Transcript
GitHub Insights Understanding Open Source @jeffmcaffer–Microsoft Georgios Gousios –Delft University
of Technology (TU Delft) Kevin Lewis – Microsoft
Snapshot overview
Inspire confidence
How open is a project? http://ghtorrent.org/pullreq-perf/
Commits (core vs community)
Commits (origin)
Comments (core vs community)
PR lifelines
Are we using git in a distributed way?
How may devs are there per country?
Insights
Business insights
Research insights
Cross-domain insights
Operational insights
Approach Data for the masses
GitHub by the numbers (Mid 2016)
Approach http://ghtorrent.org
How does it work? http://api.github.com/events
Example event (condensed) https://api.github.com/users/Cephei https://api.github.com/repos/PowerDMS/Owin.Scim https://api.github.com/repos/PowerDMS/Owin.Scim/commits/c751014f634d73e0b72f78a53c8cf137888b3 https://api.github.com/orgs/PowerDMS
Entities
GHTorrent architecture Github API Event Retrieval Commits Queue Project Events
Queue Events Data Retrieval Projects Commits evt.commit evt.watch evt.fork Data Retrieval Data Retrieval Data Retrieval Mirroring Cluster
GHTorrent by the numbers
Using the data You can do it too!
Using the data: Hosted http://ghtorrent.org
Using the data: Download
Using the data: Self-service https://github.com/ghtorrent/ghtorrent-webhook
Using the data: Azure Data Lake
Resources http://ghtorrent.org https://github.com/Microsoft/ghinsights @gousiosg @jeffmcaffer @kelewis