Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
PyData ATL Lightning Talk
Search
Will McGinnis
June 17, 2016
Programming
0
300
PyData ATL Lightning Talk
https://www.github.com/wdm0006/git-pandas
Will McGinnis
June 17, 2016
Tweet
Share
More Decks by Will McGinnis
See All by Will McGinnis
I made a model, now what?
wdm0006
0
260
Encoding categorical variables with categorical encoders
wdm0006
0
450
Other Decks in Programming
See All in Programming
外接に惑わされない自システムの処理時間SLIをOpenTelemetryで実現した話
kotaro7750
0
110
スマホから Youtube Shortsを見られないようにする
lemolatoon
27
34k
iOSでSVG画像を扱う
kishikawakatsumi
0
170
CSC509 Lecture 08
javiergs
PRO
0
260
Foundation Modelsを実装日本語学習アプリを作ってみた!
hypebeans
0
130
Pythonに漸進的に型をつける
nealle
1
130
GC25 Recap: The Code You Reviewed is Not the Code You Built / #newt_gophercon_tour
mazrean
0
120
理論と実務のギャップを超える
eycjur
0
190
オンデバイスAIとXcode
ryodeveloper
0
260
Domain-centric? Why Hexagonal, Onion, and Clean Architecture Are Answers to the Wrong Question
olivergierke
3
980
技術的負債の正体を知って向き合う
irof
0
270
Software Architecture
hschwentner
6
2.3k
Featured
See All Featured
Scaling GitHub
holman
463
140k
Save Time (by Creating Custom Rails Generators)
garrettdimon
PRO
32
1.7k
Building Applications with DynamoDB
mza
96
6.7k
Code Review Best Practice
trishagee
72
19k
The Psychology of Web Performance [Beyond Tellerrand 2023]
tammyeverts
49
3.1k
How STYLIGHT went responsive
nonsquared
100
5.9k
Keith and Marios Guide to Fast Websites
keithpitt
411
23k
Product Roadmaps are Hard
iamctodd
PRO
55
11k
Become a Pro
speakerdeck
PRO
29
5.6k
Bootstrapping a Software Product
garrettdimon
PRO
307
110k
Understanding Cognitive Biases in Performance Measurement
bluesmoon
31
2.7k
Embracing the Ebb and Flow
colly
88
4.9k
Transcript
Analyzing Git/ Github data with git-pandas (or: vanity metrics at
scale)
Who am I? • Will McGinnis • Write code at
Predikto (we’re hiring) • www.predikto.com • twitter.com/willmcginnis • github.com/wdm0006
What is git-pandas? • Open source library: https://github.com/wdm0006/git-pandas • Represents
git data as pandas dataframes • Abstracts groups of repos into an object called a ProjectDirectory • Does some common analysis tasks for you
Org-wide Punchcards
Cumulative Blame
Estimate Code Quality • “file owner” • metric for refactors
• how long will an owner’s file go without being refactored?
GitNOC
None
None
Other things • Bus factors (for files, repos and orgs)
• Development time estimation • Rate of change metrics (risk) • Basic datasets (commit history, file changes, branches, tags, etc.) • File owners • File details • and more: https://github.com/wdm0006/git- pandas