Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
GitHub Insights: Understanding Open Source
Search
Georgios Gousios
May 19, 2016
Technology
0
310
GitHub Insights: Understanding Open Source
Talk given at OSCON 2016
Georgios Gousios
May 19, 2016
Tweet
Share
More Decks by Georgios Gousios
See All by Georgios Gousios
NLP + SE = ❤️
gousiosg
0
240
The troubles of modern dependency management and what to do about them
gousiosg
0
450
Mining Repositories with Apache Spark
gousiosg
0
540
My adventures with open everything
gousiosg
0
240
Structure and Evolution of Package Dependency Networks
gousiosg
0
680
Mining Github for fun and profit
gousiosg
9
62k
Work Practices and Challenges in Pull-Based Development: The Contributor’s Perspective
gousiosg
0
860
Big Data in Software Engineering panel and Privacy: Should we care?
gousiosg
0
240
The #issue32 incident
gousiosg
2
15k
Other Decks in Technology
See All in Technology
ここがすごいよ! AWS Systems Manager!
saichan11
0
1.8k
大規模ドラレコデータ収集・機械学習基盤を支える AWS CDK 〜導入・運用事例紹介〜
pemugi
0
110
Classmethod Odyssey 登壇資料
yamahiro
0
390
AWSでRAGを作る法方
sonoda_mj
1
140
コミュニティサービスに「あなたへ」フィードを リリースするまでの試行錯誤
takapy
1
150
GoとアクターモデルでES+CQRSを実践! / proto_actor_es_cqrs
ytake
1
150
エンジニアリングマネージャーはどう学んでいくのか #devsumi / How Do Engineering Managers Continue to Learn and Grow?
expajp
4
1.3k
フルリモートワークはエンジニアの夢を叶えたか? #cm_odyssey
mamohacy
2
600
セキュリティ研修 Day1【MIXI 24新卒技術研修】
mixi_engineers
PRO
0
160
【基調講演】変える、今ここから ― IoTとAIで紡ぐ未来
soracom
PRO
0
320
CEL(Common Expression Language)で書いた条件にマッチしたIAM Policyを見つける / iam-policy-finder
fujiwara3
0
710
成長期に歩みを止めないための創業期の開発文化形成
mayah
6
420
Featured
See All Featured
KATA
mclloyd
20
13k
The Illustrated Children's Guide to Kubernetes
chrisshort
39
47k
個人開発の失敗を避けるイケてる考え方 / tips for indie hackers
panda_program
78
15k
Designing on Purpose - Digital PM Summit 2013
jponch
113
6.6k
Designing for Performance
lara
604
67k
Learning to Love Humans: Emotional Interface Design
aarron
269
39k
Sharpening the Axe: The Primacy of Toolmaking
bcantrill
26
1.6k
Fight the Zombie Pattern Library - RWD Summit 2016
marcelosomers
228
16k
JavaScript: Past, Present, and Future - NDC Porto 2020
reverentgeek
44
4.7k
Into the Great Unknown - MozCon
thekraken
20
1.3k
The Brand Is Dead. Long Live the Brand.
mthomps
52
36k
[RailsConf 2023] Rails as a piece of cake
palkan
35
4.4k
Transcript
GitHub Insights Understanding Open Source @jeffmcaffer–Microsoft Georgios Gousios –Delft University
of Technology (TU Delft) Kevin Lewis – Microsoft
Snapshot overview
Inspire confidence
How open is a project? http://ghtorrent.org/pullreq-perf/
Commits (core vs community)
Commits (origin)
Comments (core vs community)
PR lifelines
Are we using git in a distributed way?
How may devs are there per country?
Insights
Business insights
Research insights
Cross-domain insights
Operational insights
Approach Data for the masses
GitHub by the numbers (Mid 2016)
Approach http://ghtorrent.org
How does it work? http://api.github.com/events
Example event (condensed) https://api.github.com/users/Cephei https://api.github.com/repos/PowerDMS/Owin.Scim https://api.github.com/repos/PowerDMS/Owin.Scim/commits/c751014f634d73e0b72f78a53c8cf137888b3 https://api.github.com/orgs/PowerDMS
Entities
GHTorrent architecture Github API Event Retrieval Commits Queue Project Events
Queue Events Data Retrieval Projects Commits evt.commit evt.watch evt.fork Data Retrieval Data Retrieval Data Retrieval Mirroring Cluster
GHTorrent by the numbers
Using the data You can do it too!
Using the data: Hosted http://ghtorrent.org
Using the data: Download
Using the data: Self-service https://github.com/ghtorrent/ghtorrent-webhook
Using the data: Azure Data Lake
Resources http://ghtorrent.org https://github.com/Microsoft/ghinsights @gousiosg @jeffmcaffer @kelewis