Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
GitHub Insights: Understanding Open Source
Search
Georgios Gousios
May 19, 2016
Technology
0
300
GitHub Insights: Understanding Open Source
Talk given at OSCON 2016
Georgios Gousios
May 19, 2016
Tweet
Share
More Decks by Georgios Gousios
See All by Georgios Gousios
NLP + SE = ❤️
gousiosg
0
220
The troubles of modern dependency management and what to do about them
gousiosg
0
420
Mining Repositories with Apache Spark
gousiosg
0
510
My adventures with open everything
gousiosg
0
220
Structure and Evolution of Package Dependency Networks
gousiosg
0
660
Mining Github for fun and profit
gousiosg
9
62k
Work Practices and Challenges in Pull-Based Development: The Contributor’s Perspective
gousiosg
0
840
Big Data in Software Engineering panel and Privacy: Should we care?
gousiosg
0
220
The #issue32 incident
gousiosg
2
15k
Other Decks in Technology
See All in Technology
今さら聞けないDocker入門 〜 Dockerfileのベストプラクティス編
devops_vtj
21
6.3k
R3のコードから見る実践LINQ実装最適化・コンカレントプログラミング実例
neuecc
3
3.6k
コードファーストの考え方。 Amplify Gen2から学ぶAWS次世代のWeb開発体験
yoshiitaka
2
530
How to Lead? Testimonial of a Lead Android Engineer
oleur
1
120
社内アプリで Cloudflare D1を プロダクト運用してみた体験談(Tokyo)
haochenx
0
130
障害対応をちょっとずつよくしていくための 演習の作りかた
heleeen
1
1.9k
LLM開発・活用の舞台裏@2024.04.25
yushin_n
3
1.4k
リテール金融(キャッシュレス・ネット銀行・ネット証券)の競争環境と経済圏
8maki
0
1.8k
LangSmith入門―トレース/評価/プロンプト管理などを担うLLMアプリ開発プラットフォーム
os1ma
5
790
しくじり先生、PharmaXのLLMアプリケーション開発の失敗を語る
pharma_x_tech
0
130
2024春 注目のWeb系 OSS & SaaS 3選
makies
0
210
ルーターでプレゼンする
puhitaku
1
3.4k
Featured
See All Featured
The Mythical Team-Month
searls
217
42k
Fireside Chat
paigeccino
22
2.6k
WebSockets: Embracing the real-time Web
robhawkes
59
7k
Raft: Consensus for Rubyists
vanstee
133
6.3k
How GitHub Uses GitHub to Build GitHub
holman
468
290k
Rebuilding a faster, lazier Slack
samanthasiow
74
8.3k
Java REST API Framework Comparison - PWX 2021
mraible
PRO
18
7k
Statistics for Hackers
jakevdp
790
220k
Building Adaptive Systems
keathley
32
1.9k
Side Projects
sachag
451
41k
YesSQL, Process and Tooling at Scale
rocio
165
13k
実際に使うSQLの書き方 徹底解説 / pgcon21j-tutorial
soudai
123
39k
Transcript
GitHub Insights Understanding Open Source @jeffmcaffer–Microsoft Georgios Gousios –Delft University
of Technology (TU Delft) Kevin Lewis – Microsoft
Snapshot overview
Inspire confidence
How open is a project? http://ghtorrent.org/pullreq-perf/
Commits (core vs community)
Commits (origin)
Comments (core vs community)
PR lifelines
Are we using git in a distributed way?
How may devs are there per country?
Insights
Business insights
Research insights
Cross-domain insights
Operational insights
Approach Data for the masses
GitHub by the numbers (Mid 2016)
Approach http://ghtorrent.org
How does it work? http://api.github.com/events
Example event (condensed) https://api.github.com/users/Cephei https://api.github.com/repos/PowerDMS/Owin.Scim https://api.github.com/repos/PowerDMS/Owin.Scim/commits/c751014f634d73e0b72f78a53c8cf137888b3 https://api.github.com/orgs/PowerDMS
Entities
GHTorrent architecture Github API Event Retrieval Commits Queue Project Events
Queue Events Data Retrieval Projects Commits evt.commit evt.watch evt.fork Data Retrieval Data Retrieval Data Retrieval Mirroring Cluster
GHTorrent by the numbers
Using the data You can do it too!
Using the data: Hosted http://ghtorrent.org
Using the data: Download
Using the data: Self-service https://github.com/ghtorrent/ghtorrent-webhook
Using the data: Azure Data Lake
Resources http://ghtorrent.org https://github.com/Microsoft/ghinsights @gousiosg @jeffmcaffer @kelewis