Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
GitHub Insights: Understanding Open Source
Search
Georgios Gousios
May 19, 2016
Technology
0
320
GitHub Insights: Understanding Open Source
Talk given at OSCON 2016
Georgios Gousios
May 19, 2016
Tweet
Share
More Decks by Georgios Gousios
See All by Georgios Gousios
NLP + SE = ❤️
gousiosg
0
240
The troubles of modern dependency management and what to do about them
gousiosg
0
460
Mining Repositories with Apache Spark
gousiosg
0
550
My adventures with open everything
gousiosg
0
240
Structure and Evolution of Package Dependency Networks
gousiosg
0
690
Mining Github for fun and profit
gousiosg
9
62k
Work Practices and Challenges in Pull-Based Development: The Contributor’s Perspective
gousiosg
0
870
Big Data in Software Engineering panel and Privacy: Should we care?
gousiosg
0
240
The #issue32 incident
gousiosg
2
15k
Other Decks in Technology
See All in Technology
Oracle Base Database Service:サービス概要のご紹介
oracle4engineer
PRO
0
13k
サーバー管理しないサーバーサービスManaged DevOps Pool
kkamegawa
0
110
Developer Experienceを向上させる基盤づくりの取り組み事例集
coconala_engineer
0
120
Privacy Sandbox on Android / DroidKaigi 2024
7pairs
1
170
四国クラウドお遍路 2024 in 高知 オープニング
yukataoka
0
190
プロダクトエンジニアを支えるための開発生産性向上施策
tsukakei
0
140
JEP 480: Structured Concurrency
aya_ebata
0
130
可視化により内部品質をあげるAIドキュメントリバース/20240910 Hiromitsu Akiba
shift_evolve
0
190
Optuna: a Black-Box Optimization Framework
pfn
PRO
1
110
突撃! 隣のAmazon Bedrockユーザー 〜YouはどうしてAWSで?〜
minorun365
PRO
3
340
Estrategias de escalabilidade para projetos web
jessilyneh
2
230
「家族アルバム みてね」における運用管理・ オブザーバビリティの全貌 / Overview of Operation Management and Observability in FamilyAlbum
isaoshimizu
4
150
Featured
See All Featured
Docker and Python
trallard
39
3k
10 Git Anti Patterns You Should be Aware of
lemiorhan
653
58k
The World Runs on Bad Software
bkeepers
PRO
64
11k
Producing Creativity
orderedlist
PRO
340
39k
Product Roadmaps are Hard
iamctodd
PRO
48
10k
Building Adaptive Systems
keathley
36
2.1k
Building a Scalable Design System with Sketch
lauravandoore
458
32k
Thoughts on Productivity
jonyablonski
66
4.2k
Navigating Team Friction
lara
183
13k
Reflections from 52 weeks, 52 projects
jeffersonlam
346
20k
How to Create Impact in a Changing Tech Landscape [PerfNow 2023]
tammyeverts
43
2k
How STYLIGHT went responsive
nonsquared
93
5.1k
Transcript
GitHub Insights Understanding Open Source @jeffmcaffer–Microsoft Georgios Gousios –Delft University
of Technology (TU Delft) Kevin Lewis – Microsoft
Snapshot overview
Inspire confidence
How open is a project? http://ghtorrent.org/pullreq-perf/
Commits (core vs community)
Commits (origin)
Comments (core vs community)
PR lifelines
Are we using git in a distributed way?
How may devs are there per country?
Insights
Business insights
Research insights
Cross-domain insights
Operational insights
Approach Data for the masses
GitHub by the numbers (Mid 2016)
Approach http://ghtorrent.org
How does it work? http://api.github.com/events
Example event (condensed) https://api.github.com/users/Cephei https://api.github.com/repos/PowerDMS/Owin.Scim https://api.github.com/repos/PowerDMS/Owin.Scim/commits/c751014f634d73e0b72f78a53c8cf137888b3 https://api.github.com/orgs/PowerDMS
Entities
GHTorrent architecture Github API Event Retrieval Commits Queue Project Events
Queue Events Data Retrieval Projects Commits evt.commit evt.watch evt.fork Data Retrieval Data Retrieval Data Retrieval Mirroring Cluster
GHTorrent by the numbers
Using the data You can do it too!
Using the data: Hosted http://ghtorrent.org
Using the data: Download
Using the data: Self-service https://github.com/ghtorrent/ghtorrent-webhook
Using the data: Azure Data Lake
Resources http://ghtorrent.org https://github.com/Microsoft/ghinsights @gousiosg @jeffmcaffer @kelewis