Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
GitHub Insights: Understanding Open Source
Search
Georgios Gousios
May 19, 2016
Technology
0
340
GitHub Insights: Understanding Open Source
Talk given at OSCON 2016
Georgios Gousios
May 19, 2016
Tweet
Share
More Decks by Georgios Gousios
See All by Georgios Gousios
NLP + SE = ❤️
gousiosg
0
260
The troubles of modern dependency management and what to do about them
gousiosg
0
480
Mining Repositories with Apache Spark
gousiosg
0
610
My adventures with open everything
gousiosg
0
250
Structure and Evolution of Package Dependency Networks
gousiosg
0
710
Mining Github for fun and profit
gousiosg
9
63k
Work Practices and Challenges in Pull-Based Development: The Contributor’s Perspective
gousiosg
0
890
Big Data in Software Engineering panel and Privacy: Should we care?
gousiosg
0
250
The #issue32 incident
gousiosg
2
16k
Other Decks in Technology
See All in Technology
Alignment and Autonomy in Cybozu - 300人の開発組織でアラインメントと自律性を両立させるアジャイルな組織運営 / RSGT2025
ama_ch
1
2.4k
AWSサービスアップデート 2024/12 Part3
nrinetcom
PRO
0
140
Building Scalable Backend Services with Firebase
wisdommatt
0
110
Kotlin Multiplatformのポテンシャル
recruitengineers
PRO
2
150
re:Invent 2024のふりかえり
beli68
0
110
Amazon Q Developerで.NET Frameworkプロジェクトをモダナイズしてみた
kenichirokimura
1
200
[IBM TechXchange Dojo]Watson Discoveryとwatsonx.aiでRAGを実現!座学①
siyuanzh09
0
110
Bring Your Own Container: When Containers Turn the Key to EDR Bypass/byoc-avtokyo2024
tkmru
0
860
カップ麺の待ち時間(3分)でわかるPartyRockアップデート
ryutakondo
0
140
Evolving Architecture
rainerhahnekamp
3
250
Oracle Exadata Database Service(Dedicated Infrastructure):サービス概要のご紹介
oracle4engineer
PRO
0
12k
JuliaTokaiとJuliaLangJaの紹介 for NGK2025S
antimon2
1
120
Featured
See All Featured
CSS Pre-Processors: Stylus, Less & Sass
bermonpainter
356
29k
Learning to Love Humans: Emotional Interface Design
aarron
274
40k
Easily Structure & Communicate Ideas using Wireframe
afnizarnur
192
16k
I Don’t Have Time: Getting Over the Fear to Launch Your Podcast
jcasabona
30
2.1k
Imperfection Machines: The Place of Print at Facebook
scottboms
267
13k
No one is an island. Learnings from fostering a developers community.
thoeni
19
3.1k
StorybookのUI Testing Handbookを読んだ
zakiyama
28
5.4k
Become a Pro
speakerdeck
PRO
26
5.1k
Fashionably flexible responsive web design (full day workshop)
malarkey
406
66k
Automating Front-end Workflow
addyosmani
1366
200k
"I'm Feeling Lucky" - Building Great Search Experiences for Today's Users (#IAC19)
danielanewman
226
22k
Fantastic passwords and where to find them - at NoRuKo
philnash
50
2.9k
Transcript
GitHub Insights Understanding Open Source @jeffmcaffer–Microsoft Georgios Gousios –Delft University
of Technology (TU Delft) Kevin Lewis – Microsoft
Snapshot overview
Inspire confidence
How open is a project? http://ghtorrent.org/pullreq-perf/
Commits (core vs community)
Commits (origin)
Comments (core vs community)
PR lifelines
Are we using git in a distributed way?
How may devs are there per country?
Insights
Business insights
Research insights
Cross-domain insights
Operational insights
Approach Data for the masses
GitHub by the numbers (Mid 2016)
Approach http://ghtorrent.org
How does it work? http://api.github.com/events
Example event (condensed) https://api.github.com/users/Cephei https://api.github.com/repos/PowerDMS/Owin.Scim https://api.github.com/repos/PowerDMS/Owin.Scim/commits/c751014f634d73e0b72f78a53c8cf137888b3 https://api.github.com/orgs/PowerDMS
Entities
GHTorrent architecture Github API Event Retrieval Commits Queue Project Events
Queue Events Data Retrieval Projects Commits evt.commit evt.watch evt.fork Data Retrieval Data Retrieval Data Retrieval Mirroring Cluster
GHTorrent by the numbers
Using the data You can do it too!
Using the data: Hosted http://ghtorrent.org
Using the data: Download
Using the data: Self-service https://github.com/ghtorrent/ghtorrent-webhook
Using the data: Azure Data Lake
Resources http://ghtorrent.org https://github.com/Microsoft/ghinsights @gousiosg @jeffmcaffer @kelewis