Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
GitHub Insights: Understanding Open Source
Search
Georgios Gousios
May 19, 2016
Technology
0
370
GitHub Insights: Understanding Open Source
Talk given at OSCON 2016
Georgios Gousios
May 19, 2016
Tweet
Share
More Decks by Georgios Gousios
See All by Georgios Gousios
NLP + SE = ❤️
gousiosg
0
290
The troubles of modern dependency management and what to do about them
gousiosg
0
540
Mining Repositories with Apache Spark
gousiosg
0
650
My adventures with open everything
gousiosg
0
300
Structure and Evolution of Package Dependency Networks
gousiosg
0
780
Mining Github for fun and profit
gousiosg
9
63k
Work Practices and Challenges in Pull-Based Development: The Contributor’s Perspective
gousiosg
0
920
Big Data in Software Engineering panel and Privacy: Should we care?
gousiosg
0
290
The #issue32 incident
gousiosg
2
16k
Other Decks in Technology
See All in Technology
「魔法少女まどか☆マギカ Magia Exedra」での負荷試験の実践と学び
gree_tech
PRO
0
440
Grafana MCPサーバーによるAIエージェント経由でのGrafanaダッシュボード動的生成
hamadakoji
1
990
Kubernetes における cgroup driver のしくみ: runwasi の bugfix より
z63d
2
110
【 LLMエンジニアがヒューマノイド開発に挑んでみた 】 - 第104回 Machine Learning 15minutes! Hybrid
soneo1127
0
240
ZOZOマッチのアーキテクチャと技術構成
zozotech
PRO
2
1.1k
進捗
ydah
2
230
事業価値と Engineering
recruitengineers
PRO
8
5.3k
Automating Web Accessibility Testing with AI Agents
maminami373
0
220
カミナシ社の『ID管理基盤』製品内製 - その意思決定背景と2年間の進化 #AWSUnicornDay / Kaminashi ID - The Big Whys
kaminashi
3
720
実践アプリケーション設計 ②トランザクションスクリプトへの対応
recruitengineers
PRO
4
1.2k
Figma + Storybook + PlaywrightのMCPを使ったフロントエンド開発
yug1224
10
3.6k
Flutterでキャッチしないエラーはどこに行く
taiju59
0
210
Featured
See All Featured
jQuery: Nuts, Bolts and Bling
dougneiner
64
7.9k
Easily Structure & Communicate Ideas using Wireframe
afnizarnur
194
16k
Learning to Love Humans: Emotional Interface Design
aarron
273
40k
Cheating the UX When There Is Nothing More to Optimize - PixelPioneers
stephaniewalter
284
13k
Java REST API Framework Comparison - PWX 2021
mraible
33
8.8k
For a Future-Friendly Web
brad_frost
179
9.9k
Performance Is Good for Brains [We Love Speed 2024]
tammyeverts
11
1.1k
Optimizing for Happiness
mojombo
379
70k
GraphQLとの向き合い方2022年版
quramy
49
14k
Sharpening the Axe: The Primacy of Toolmaking
bcantrill
44
2.5k
Art, The Web, and Tiny UX
lynnandtonic
302
21k
ReactJS: Keep Simple. Everything can be a component!
pedronauck
667
120k
Transcript
GitHub Insights Understanding Open Source @jeffmcaffer–Microsoft Georgios Gousios –Delft University
of Technology (TU Delft) Kevin Lewis – Microsoft
Snapshot overview
Inspire confidence
How open is a project? http://ghtorrent.org/pullreq-perf/
Commits (core vs community)
Commits (origin)
Comments (core vs community)
PR lifelines
Are we using git in a distributed way?
How may devs are there per country?
Insights
Business insights
Research insights
Cross-domain insights
Operational insights
Approach Data for the masses
GitHub by the numbers (Mid 2016)
Approach http://ghtorrent.org
How does it work? http://api.github.com/events
Example event (condensed) https://api.github.com/users/Cephei https://api.github.com/repos/PowerDMS/Owin.Scim https://api.github.com/repos/PowerDMS/Owin.Scim/commits/c751014f634d73e0b72f78a53c8cf137888b3 https://api.github.com/orgs/PowerDMS
Entities
GHTorrent architecture Github API Event Retrieval Commits Queue Project Events
Queue Events Data Retrieval Projects Commits evt.commit evt.watch evt.fork Data Retrieval Data Retrieval Data Retrieval Mirroring Cluster
GHTorrent by the numbers
Using the data You can do it too!
Using the data: Hosted http://ghtorrent.org
Using the data: Download
Using the data: Self-service https://github.com/ghtorrent/ghtorrent-webhook
Using the data: Azure Data Lake
Resources http://ghtorrent.org https://github.com/Microsoft/ghinsights @gousiosg @jeffmcaffer @kelewis