Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
GitHub Insights: Understanding Open Source
Search
Georgios Gousios
May 19, 2016
Technology
0
360
GitHub Insights: Understanding Open Source
Talk given at OSCON 2016
Georgios Gousios
May 19, 2016
Tweet
Share
More Decks by Georgios Gousios
See All by Georgios Gousios
NLP + SE = ❤️
gousiosg
0
280
The troubles of modern dependency management and what to do about them
gousiosg
0
520
Mining Repositories with Apache Spark
gousiosg
0
640
My adventures with open everything
gousiosg
0
280
Structure and Evolution of Package Dependency Networks
gousiosg
0
760
Mining Github for fun and profit
gousiosg
9
63k
Work Practices and Challenges in Pull-Based Development: The Contributor’s Perspective
gousiosg
0
910
Big Data in Software Engineering panel and Privacy: Should we care?
gousiosg
0
280
The #issue32 incident
gousiosg
2
16k
Other Decks in Technology
See All in Technology
登壇ネタの見つけ方 / How to find talk topics
pinkumohikan
3
260
A2Aのクライアントを自作する
rynsuke
1
150
~宇宙最速~2025年AWS Summit レポート
satodesu
1
1.2k
[TechNight #90-1] 本当に使える?ZDMの新機能を実践検証してみた
oracle4engineer
PRO
3
140
本部長の代わりに提案書レビュー! KDDI営業が毎日使うAIエージェント「A-BOSS」開発秘話
minorun365
PRO
14
2.3k
ひとり情シスなCTOがLLMと始めるオペレーション最適化 / CTO's LLM-Powered Ops
yamitzky
0
370
Definition of Done
kawaguti
PRO
6
460
データプラットフォーム技術におけるメダリオンアーキテクチャという考え方/DataPlatformWithMedallionArchitecture
smdmts
5
560
CIでのgolangci-lintの実行を約90%削減した話
kazukihayase
0
340
doda開発 生成AI元年宣言!自家製AIエージェントから始める生産性改革 / doda Development Declaration of the First Year of Generated AI! Productivity Reforms Starting with Home-grown AI Agents
techtekt
0
200
讓測試不再 BB! 從 BDD 到 CI/CD, 不靠人力也能 MVP
line_developers_tw
PRO
0
1.1k
IAMのマニアックな話 2025を執筆して、 見えてきたAWSアカウント管理の現在
nrinetcom
PRO
4
650
Featured
See All Featured
Put a Button on it: Removing Barriers to Going Fast.
kastner
60
3.9k
Designing Dashboards & Data Visualisations in Web Apps
destraynor
231
53k
The Pragmatic Product Professional
lauravandoore
35
6.7k
Performance Is Good for Brains [We Love Speed 2024]
tammyeverts
10
920
Rebuilding a faster, lazier Slack
samanthasiow
81
9k
The Cost Of JavaScript in 2023
addyosmani
51
8.4k
Let's Do A Bunch of Simple Stuff to Make Websites Faster
chriscoyier
507
140k
jQuery: Nuts, Bolts and Bling
dougneiner
63
7.8k
XXLCSS - How to scale CSS and keep your sanity
sugarenia
248
1.3M
Large-scale JavaScript Application Architecture
addyosmani
512
110k
The Psychology of Web Performance [Beyond Tellerrand 2023]
tammyeverts
48
2.8k
The Illustrated Children's Guide to Kubernetes
chrisshort
48
50k
Transcript
GitHub Insights Understanding Open Source @jeffmcaffer–Microsoft Georgios Gousios –Delft University
of Technology (TU Delft) Kevin Lewis – Microsoft
Snapshot overview
Inspire confidence
How open is a project? http://ghtorrent.org/pullreq-perf/
Commits (core vs community)
Commits (origin)
Comments (core vs community)
PR lifelines
Are we using git in a distributed way?
How may devs are there per country?
Insights
Business insights
Research insights
Cross-domain insights
Operational insights
Approach Data for the masses
GitHub by the numbers (Mid 2016)
Approach http://ghtorrent.org
How does it work? http://api.github.com/events
Example event (condensed) https://api.github.com/users/Cephei https://api.github.com/repos/PowerDMS/Owin.Scim https://api.github.com/repos/PowerDMS/Owin.Scim/commits/c751014f634d73e0b72f78a53c8cf137888b3 https://api.github.com/orgs/PowerDMS
Entities
GHTorrent architecture Github API Event Retrieval Commits Queue Project Events
Queue Events Data Retrieval Projects Commits evt.commit evt.watch evt.fork Data Retrieval Data Retrieval Data Retrieval Mirroring Cluster
GHTorrent by the numbers
Using the data You can do it too!
Using the data: Hosted http://ghtorrent.org
Using the data: Download
Using the data: Self-service https://github.com/ghtorrent/ghtorrent-webhook
Using the data: Azure Data Lake
Resources http://ghtorrent.org https://github.com/Microsoft/ghinsights @gousiosg @jeffmcaffer @kelewis