Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
GitHub Insights: Understanding Open Source
Search
Sponsored
·
SiteGround - Reliable hosting with speed, security, and support you can count on.
→
Georgios Gousios
May 19, 2016
Technology
0
430
GitHub Insights: Understanding Open Source
Talk given at OSCON 2016
Georgios Gousios
May 19, 2016
Tweet
Share
More Decks by Georgios Gousios
See All by Georgios Gousios
NLP + SE = ❤️
gousiosg
0
340
The troubles of modern dependency management and what to do about them
gousiosg
0
660
Mining Repositories with Apache Spark
gousiosg
0
700
My adventures with open everything
gousiosg
0
350
Structure and Evolution of Package Dependency Networks
gousiosg
0
880
Mining Github for fun and profit
gousiosg
9
63k
Work Practices and Challenges in Pull-Based Development: The Contributor’s Perspective
gousiosg
0
970
Big Data in Software Engineering panel and Privacy: Should we care?
gousiosg
0
340
The #issue32 incident
gousiosg
2
16k
Other Decks in Technology
See All in Technology
ソフトバンク流!プラットフォームエンジニアリング実現へのアプローチ
sbtechnight
1
220
TypeScript 7.0の現在地と備え方
uhyo
7
1.9k
LINEヤフーにおけるAIOpsの現在地
lycorptech_jp
PRO
4
730
Zeal of the Convert: Taming Shai-Hulud with AI
ramimac
0
160
OCHaCafe S11 #2 コンテナ時代の次の一手:Wasm 最前線
oracle4engineer
PRO
2
150
形式手法特論:SMT ソルバで解く認可ポリシの静的解析 #kernelvm / Kernel VM Study Tsukuba No3
ytaka23
1
670
猫でもわかるKiro CLI(AI 駆動開発への道編)
kentapapa
0
270
CyberAgentの生成AI戦略 〜変わるものと変わらないもの〜
katayan
0
280
社内レビューは機能しているのか
matsuba
0
170
Claude Code のコード品質がばらつくので AI に品質保証させる仕組みを作った話 / A story about building a mechanism to have AI ensure quality, because the code quality from Claude Code was inconsistent
nrslib
13
8.7k
Agent ServerはWeb Serverではない。ADKで考えるAgentOps
akiratameto
0
120
VLAモデル構築のための AIロボット向け模倣学習キット
kmatsuiugo
0
290
Featured
See All Featured
Technical Leadership for Architectural Decision Making
baasie
3
300
Stop Working from a Prison Cell
hatefulcrawdad
274
21k
[Rails World 2023 - Day 1 Closing Keynote] - The Magic of Rails
eileencodes
38
2.8k
Data-driven link building: lessons from a $708K investment (BrightonSEO talk)
szymonslowik
1
980
Leveraging LLMs for student feedback in introductory data science courses - posit::conf(2025)
minecr
1
200
Designing Experiences People Love
moore
143
24k
<Decoding/> the Language of Devs - We Love SEO 2024
nikkihalliwell
1
160
GraphQLとの向き合い方2022年版
quramy
50
14k
Groundhog Day: Seeking Process in Gaming for Health
codingconduct
0
130
Are puppies a ranking factor?
jonoalderson
1
3.1k
The MySQL Ecosystem @ GitHub 2015
samlambert
251
13k
SEO Brein meetup: CTRL+C is not how to scale international SEO
lindahogenes
1
2.4k
Transcript
GitHub Insights Understanding Open Source @jeffmcaffer–Microsoft Georgios Gousios –Delft University
of Technology (TU Delft) Kevin Lewis – Microsoft
Snapshot overview
Inspire confidence
How open is a project? http://ghtorrent.org/pullreq-perf/
Commits (core vs community)
Commits (origin)
Comments (core vs community)
PR lifelines
Are we using git in a distributed way?
How may devs are there per country?
Insights
Business insights
Research insights
Cross-domain insights
Operational insights
Approach Data for the masses
GitHub by the numbers (Mid 2016)
Approach http://ghtorrent.org
How does it work? http://api.github.com/events
Example event (condensed) https://api.github.com/users/Cephei https://api.github.com/repos/PowerDMS/Owin.Scim https://api.github.com/repos/PowerDMS/Owin.Scim/commits/c751014f634d73e0b72f78a53c8cf137888b3 https://api.github.com/orgs/PowerDMS
Entities
GHTorrent architecture Github API Event Retrieval Commits Queue Project Events
Queue Events Data Retrieval Projects Commits evt.commit evt.watch evt.fork Data Retrieval Data Retrieval Data Retrieval Mirroring Cluster
GHTorrent by the numbers
Using the data You can do it too!
Using the data: Hosted http://ghtorrent.org
Using the data: Download
Using the data: Self-service https://github.com/ghtorrent/ghtorrent-webhook
Using the data: Azure Data Lake
Resources http://ghtorrent.org https://github.com/Microsoft/ghinsights @gousiosg @jeffmcaffer @kelewis