Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
GitHub Insights: Understanding Open Source
Search
Georgios Gousios
May 19, 2016
Technology
0
330
GitHub Insights: Understanding Open Source
Talk given at OSCON 2016
Georgios Gousios
May 19, 2016
Tweet
Share
More Decks by Georgios Gousios
See All by Georgios Gousios
NLP + SE = ❤️
gousiosg
0
250
The troubles of modern dependency management and what to do about them
gousiosg
0
480
Mining Repositories with Apache Spark
gousiosg
0
600
My adventures with open everything
gousiosg
0
250
Structure and Evolution of Package Dependency Networks
gousiosg
0
710
Mining Github for fun and profit
gousiosg
9
63k
Work Practices and Challenges in Pull-Based Development: The Contributor’s Perspective
gousiosg
0
880
Big Data in Software Engineering panel and Privacy: Should we care?
gousiosg
0
250
The #issue32 incident
gousiosg
2
16k
Other Decks in Technology
See All in Technology
UI State設計とテスト方針
rmakiyama
2
320
KubeCon NA 2024 Recap: How to Move from Ingress to Gateway API with Minimal Hassle
ysakotch
0
200
Microsoft Azure全冠になってみた ~アレを使い倒した者が試験を制す!?~/Obtained all Microsoft Azure certifications Those who use "that" to the full will win the exam! ?
yuj1osm
1
110
統計データで2024年の クラウド・インフラ動向を眺める
ysknsid25
2
840
re:Invent をおうちで楽しんでみた ~CloudWatch のオブザーバビリティ機能がスゴい!/ Enjoyed AWS re:Invent from Home and CloudWatch Observability Feature is Amazing!
yuj1osm
0
120
権威ドキュメントで振り返る2024 #年忘れセキュリティ2024
hirotomotaguchi
2
730
AWS re:Invent 2024で発表された コードを書く開発者向け機能について
maruto
0
180
新機能VPCリソースエンドポイント機能検証から得られた考察
duelist2020jp
0
210
Qiita埋め込み用スライド
naoki_0531
0
860
スタートアップで取り組んでいるAzureとMicrosoft 365のセキュリティ対策/How to Improve Azure and Microsoft 365 Security at Startup
yuj1osm
0
210
日本版とグローバル版のモバイルアプリ統合の開発の裏側と今後の展望
miichan
1
120
なぜCodeceptJSを選んだか
goataka
0
160
Featured
See All Featured
Become a Pro
speakerdeck
PRO
26
5k
Fantastic passwords and where to find them - at NoRuKo
philnash
50
2.9k
Code Reviewing Like a Champion
maltzj
520
39k
Building Adaptive Systems
keathley
38
2.3k
Producing Creativity
orderedlist
PRO
341
39k
Keith and Marios Guide to Fast Websites
keithpitt
410
22k
Cheating the UX When There Is Nothing More to Optimize - PixelPioneers
stephaniewalter
280
13k
10 Git Anti Patterns You Should be Aware of
lemiorhan
PRO
656
59k
Practical Tips for Bootstrapping Information Extraction Pipelines
honnibal
PRO
10
810
ピンチをチャンスに:未来をつくるプロダクトロードマップ #pmconf2020
aki_iinuma
111
49k
How to Think Like a Performance Engineer
csswizardry
22
1.2k
"I'm Feeling Lucky" - Building Great Search Experiences for Today's Users (#IAC19)
danielanewman
226
22k
Transcript
GitHub Insights Understanding Open Source @jeffmcaffer–Microsoft Georgios Gousios –Delft University
of Technology (TU Delft) Kevin Lewis – Microsoft
Snapshot overview
Inspire confidence
How open is a project? http://ghtorrent.org/pullreq-perf/
Commits (core vs community)
Commits (origin)
Comments (core vs community)
PR lifelines
Are we using git in a distributed way?
How may devs are there per country?
Insights
Business insights
Research insights
Cross-domain insights
Operational insights
Approach Data for the masses
GitHub by the numbers (Mid 2016)
Approach http://ghtorrent.org
How does it work? http://api.github.com/events
Example event (condensed) https://api.github.com/users/Cephei https://api.github.com/repos/PowerDMS/Owin.Scim https://api.github.com/repos/PowerDMS/Owin.Scim/commits/c751014f634d73e0b72f78a53c8cf137888b3 https://api.github.com/orgs/PowerDMS
Entities
GHTorrent architecture Github API Event Retrieval Commits Queue Project Events
Queue Events Data Retrieval Projects Commits evt.commit evt.watch evt.fork Data Retrieval Data Retrieval Data Retrieval Mirroring Cluster
GHTorrent by the numbers
Using the data You can do it too!
Using the data: Hosted http://ghtorrent.org
Using the data: Download
Using the data: Self-service https://github.com/ghtorrent/ghtorrent-webhook
Using the data: Azure Data Lake
Resources http://ghtorrent.org https://github.com/Microsoft/ghinsights @gousiosg @jeffmcaffer @kelewis