Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
GitHub Insights: Understanding Open Source
Search
Georgios Gousios
May 19, 2016
Technology
430
0
Share
GitHub Insights: Understanding Open Source
Talk given at OSCON 2016
Georgios Gousios
May 19, 2016
More Decks by Georgios Gousios
See All by Georgios Gousios
NLP + SE = ❤️
gousiosg
0
340
The troubles of modern dependency management and what to do about them
gousiosg
0
670
Mining Repositories with Apache Spark
gousiosg
0
710
My adventures with open everything
gousiosg
0
350
Structure and Evolution of Package Dependency Networks
gousiosg
0
890
Mining Github for fun and profit
gousiosg
9
63k
Work Practices and Challenges in Pull-Based Development: The Contributor’s Perspective
gousiosg
0
970
Big Data in Software Engineering panel and Privacy: Should we care?
gousiosg
0
340
The #issue32 incident
gousiosg
2
16k
Other Decks in Technology
See All in Technology
AI時代のガードレールとしてのAPIガバナンス
nagix
0
300
Microsoft 365 / Microsoft 365 Copilot : 自分の状態を確認する「ラベル」について
taichinakamura
0
340
これからの「データマネジメント」の話をしよう
sansantech
PRO
0
150
Class.new is all you need
riseshia
1
150
Shipping AI Agents — Lessons from Production
vvatanabe
0
280
The Journey of Box Building
tagomoris
4
3.4k
Cortex Codeのコスト見積ヒントご紹介
yokatsuki
0
110
「SaaSの次の時代」に重要性を増すステークホルダーマネジメントの要諦 ~解像度を圧倒的に高めPdMの価値を最大化させる方法~
kakehashi
PRO
3
2.4k
AndroidアプリとCopilot Studioの統合
nakasho
0
120
コードや知識を組み込む / Incorporate Code and Knowledge
ks91
PRO
0
170
コミュニティ・勉強会を作るのは目的じゃない
ohmori_yusuke
0
250
UIライブラリに依存しすぎないReact Native設計を目指して
grandbig
0
130
Featured
See All Featured
Avoiding the “Bad Training, Faster” Trap in the Age of AI
tmiket
0
130
Bash Introduction
62gerente
615
210k
JAMstack: Web Apps at Ludicrous Speed - All Things Open 2022
reverentgeek
1
420
Breaking role norms: Why Content Design is so much more than writing copy - Taylor Woolridge
uxyall
0
260
The SEO Collaboration Effect
kristinabergwall1
1
430
Site-Speed That Sticks
csswizardry
13
1.2k
Future Trends and Review - Lecture 12 - Web Technologies (1019888BNR)
signer
PRO
0
3.5k
How to optimise 3,500 product descriptions for ecommerce in one day using ChatGPT
katarinadahlin
PRO
1
3.5k
Joys of Absence: A Defence of Solitary Play
codingconduct
1
350
Jess Joyce - The Pitfalls of Following Frameworks
techseoconnect
PRO
1
140
Highjacked: Video Game Concept Design
rkendrick25
PRO
1
340
Mobile First: as difficult as doing things right
swwweet
225
10k
Transcript
GitHub Insights Understanding Open Source @jeffmcaffer–Microsoft Georgios Gousios –Delft University
of Technology (TU Delft) Kevin Lewis – Microsoft
Snapshot overview
Inspire confidence
How open is a project? http://ghtorrent.org/pullreq-perf/
Commits (core vs community)
Commits (origin)
Comments (core vs community)
PR lifelines
Are we using git in a distributed way?
How may devs are there per country?
Insights
Business insights
Research insights
Cross-domain insights
Operational insights
Approach Data for the masses
GitHub by the numbers (Mid 2016)
Approach http://ghtorrent.org
How does it work? http://api.github.com/events
Example event (condensed) https://api.github.com/users/Cephei https://api.github.com/repos/PowerDMS/Owin.Scim https://api.github.com/repos/PowerDMS/Owin.Scim/commits/c751014f634d73e0b72f78a53c8cf137888b3 https://api.github.com/orgs/PowerDMS
Entities
GHTorrent architecture Github API Event Retrieval Commits Queue Project Events
Queue Events Data Retrieval Projects Commits evt.commit evt.watch evt.fork Data Retrieval Data Retrieval Data Retrieval Mirroring Cluster
GHTorrent by the numbers
Using the data You can do it too!
Using the data: Hosted http://ghtorrent.org
Using the data: Download
Using the data: Self-service https://github.com/ghtorrent/ghtorrent-webhook
Using the data: Azure Data Lake
Resources http://ghtorrent.org https://github.com/Microsoft/ghinsights @gousiosg @jeffmcaffer @kelewis