Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
GitHub Insights: Understanding Open Source
Search
Georgios Gousios
May 19, 2016
Technology
430
0
Share
GitHub Insights: Understanding Open Source
Talk given at OSCON 2016
Georgios Gousios
May 19, 2016
More Decks by Georgios Gousios
See All by Georgios Gousios
NLP + SE = ❤️
gousiosg
0
340
The troubles of modern dependency management and what to do about them
gousiosg
0
660
Mining Repositories with Apache Spark
gousiosg
0
710
My adventures with open everything
gousiosg
0
350
Structure and Evolution of Package Dependency Networks
gousiosg
0
890
Mining Github for fun and profit
gousiosg
9
63k
Work Practices and Challenges in Pull-Based Development: The Contributor’s Perspective
gousiosg
0
970
Big Data in Software Engineering panel and Privacy: Should we care?
gousiosg
0
340
The #issue32 incident
gousiosg
2
16k
Other Decks in Technology
See All in Technology
AIにより大幅に強化された AWS Transform Customを触ってみる
0air
0
290
Datadog で実現するセキュリティ対策 ~オブザーバビリティとセキュリティを 一緒にやると何がいいのか~
a2ush
0
190
最大のアウトプット術は問題を作ること
ryoaccount
0
280
Bref でサービスを運用している話
sgash708
0
220
Tour of Agent Protocols: MCP, A2A, AG-UI, A2UI with ADK
meteatamel
0
200
互換性のある(らしい)DBへの移行など考えるにあたってたいへんざっくり
sejima
PRO
0
540
GitHub Advanced Security × Defender for Cloudで開発とSecOpsのサイロを超える: コードとクラウドをつなぐ、開発プラットフォームのセキュリティ
yuriemori
1
120
Babylon.js を使って試した色々な内容 / Various things I tried using Babylon.js / Babylon.js 勉強会 vol.5
you
PRO
0
210
AIエージェント時代に必要な オペレーションマネージャーのロールとは
kentarofujii
0
290
ブラックボックス化したMLシステムのVertex AI移行 / mlops_community_62
visional_engineering_and_design
1
270
I ran an automated simulation of fake news spread using OpenClaw.
zzzzico
1
770
第26回FA設備技術勉強会 - Claude/Claude_codeでデータ分析 -
happysamurai294
0
360
Featured
See All Featured
Building AI with AI
inesmontani
PRO
1
850
Jess Joyce - The Pitfalls of Following Frameworks
techseoconnect
PRO
1
120
[RailsConf 2023] Rails as a piece of cake
palkan
59
6.4k
Everyday Curiosity
cassininazir
0
180
We Have a Design System, Now What?
morganepeng
55
8.1k
Thoughts on Productivity
jonyablonski
76
5.1k
Keith and Marios Guide to Fast Websites
keithpitt
413
23k
From π to Pie charts
rasagy
0
160
Digital Ethics as a Driver of Design Innovation
axbom
PRO
1
250
Amusing Abliteration
ianozsvald
1
150
Stewardship and Sustainability of Urban and Community Forests
pwiseman
0
170
Bootstrapping a Software Product
garrettdimon
PRO
307
120k
Transcript
GitHub Insights Understanding Open Source @jeffmcaffer–Microsoft Georgios Gousios –Delft University
of Technology (TU Delft) Kevin Lewis – Microsoft
Snapshot overview
Inspire confidence
How open is a project? http://ghtorrent.org/pullreq-perf/
Commits (core vs community)
Commits (origin)
Comments (core vs community)
PR lifelines
Are we using git in a distributed way?
How may devs are there per country?
Insights
Business insights
Research insights
Cross-domain insights
Operational insights
Approach Data for the masses
GitHub by the numbers (Mid 2016)
Approach http://ghtorrent.org
How does it work? http://api.github.com/events
Example event (condensed) https://api.github.com/users/Cephei https://api.github.com/repos/PowerDMS/Owin.Scim https://api.github.com/repos/PowerDMS/Owin.Scim/commits/c751014f634d73e0b72f78a53c8cf137888b3 https://api.github.com/orgs/PowerDMS
Entities
GHTorrent architecture Github API Event Retrieval Commits Queue Project Events
Queue Events Data Retrieval Projects Commits evt.commit evt.watch evt.fork Data Retrieval Data Retrieval Data Retrieval Mirroring Cluster
GHTorrent by the numbers
Using the data You can do it too!
Using the data: Hosted http://ghtorrent.org
Using the data: Download
Using the data: Self-service https://github.com/ghtorrent/ghtorrent-webhook
Using the data: Azure Data Lake
Resources http://ghtorrent.org https://github.com/Microsoft/ghinsights @gousiosg @jeffmcaffer @kelewis