Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
How to Build a GitHub
Search
Zach Holman
August 05, 2012
Programming
147
170k
How to Build a GitHub
Learn about the growth patterns and the architecture behind github.com.
Zach Holman
August 05, 2012
Tweet
Share
More Decks by Zach Holman
See All by Zach Holman
Firing People
holman
40
6.4k
Even More Emoji Abuse 🚧🚨
holman
17
10k
Move Fast and Break Nothing
holman
68
180k
The Talk on Talks
holman
66
32k
How GitHub (no longer) Works
holman
310
140k
More Git and GitHub Secrets
holman
183
110k
Keeping People
holman
64
62k
If Only I Knew This Shit in College
holman
98
100k
GitHub: Behind the Feature
holman
41
15k
Other Decks in Programming
See All in Programming
「今のプロジェクトいろいろ大変なんですよ、app/services とかもあって……」/After Kaigi on Rails 2024 LT Night
junk0612
5
2.1k
Compose 1.7のTextFieldはPOBox Plusで日本語変換できない
tomoya0x00
0
190
카카오페이는 어떻게 수천만 결제를 처리할까? 우아한 결제 분산락 노하우
kakao
PRO
0
110
Less waste, more joy, and a lot more green: How Quarkus makes Java better
hollycummins
0
100
Make Impossible States Impossibleを 意識してReactのPropsを設計しよう
ikumatadokoro
0
160
Enabling DevOps and Team Topologies Through Architecture: Architecting for Fast Flow
cer
PRO
0
310
色々なIaCツールを実際に触って比較してみる
iriikeita
0
330
型付き API リクエストを実現するいくつかの手法とその選択 / Typed API Request
euxn23
8
2.2k
Jakarta EE meets AI
ivargrimstad
0
580
CSC509 Lecture 09
javiergs
PRO
0
140
Ethereum_.pdf
nekomatu
0
460
役立つログに取り組もう
irof
28
9.6k
Featured
See All Featured
JavaScript: Past, Present, and Future - NDC Porto 2020
reverentgeek
47
5k
Visualization
eitanlees
145
15k
Principles of Awesome APIs and How to Build Them.
keavy
126
17k
The Myth of the Modular Monolith - Day 2 Keynote - Rails World 2024
eileencodes
16
2.1k
Responsive Adventures: Dirty Tricks From The Dark Corners of Front-End
smashingmag
250
21k
The Psychology of Web Performance [Beyond Tellerrand 2023]
tammyeverts
44
2.2k
CSS Pre-Processors: Stylus, Less & Sass
bermonpainter
356
29k
ピンチをチャンスに:未来をつくるプロダクトロードマップ #pmconf2020
aki_iinuma
109
49k
Building a Modern Day E-commerce SEO Strategy
aleyda
38
6.9k
Build The Right Thing And Hit Your Dates
maggiecrowley
33
2.4k
[Rails World 2023 - Day 1 Closing Keynote] - The Magic of Rails
eileencodes
33
1.9k
The Art of Programming - Codeland 2020
erikaheidi
52
13k
Transcript
githu H O W t B U I L
D GITHUB a
githu
6.5MM REPOSITORIES LARGEST GIT HOST 1.9MM USERS SINCE 2008
6.5MM REPOSITORIES LARGEST GIT HOST 1.9MM USERS SINCE 2008 SVN
HOST
gh gh gh gh gh
gh
gh gh gh gh gh
gh SHOW YOU OUR CARDS going t
MAGIC BULLET there i n
FOUR STAGES OF GROWTH happiness the EVERYTHING automate
NO FORKING HOLMAN @ LOST YO QUIT READING THIS SHIT
ho DID WE GIT HERE
1809: PERL INVENTED
1814: COMPUTERS INVENTED
1814-2004: ANARCHY AND CHAOS AND ZOMG EVERYONE’S DYING
2005: VERSION CONTROL INVENTED git
2007: githu GLOBAL PEACE AND HAPPINESS ACHIEVED
...or something like that
PRESTON-WERNER TOM GRIT O C TOBER 9, 2 0 07
git via ruby
GRIT git via ruby github’s interface to git object-oriented, read/write
open source
repo = Grit::Repo.new('/tmp/repository') grit repo.commits
grit shelling out to git is expensive grit reimplements portions
of git in ruby native packfile and git object support 2x-100x speedup on low-level operations
grit slowly reimplement grit for speed allows for incremental improvements
LED TO GITHUB grit O C TOBER 19, 2 0
07
TODAY ADDING 2TB A MONTH 22 FILESERVER PAIRS 23TB OF
REPO DATA
GITHUB GROWTH THE FOUR STAGES of
LOCAL NETWORKED NET-SHARD GITRPC FOUR STAGES OF GROWTH GITHUB:
LOCAL NETWORKED NET-SHARD GITRPC FOUR STAGES OF GROWTH GITHUB: 2008
2009 2010 2012
LOCAL NETWORKED NET-SHARD GITRPC FOUR STAGES OF GROWTH GITHUB:
JAN 2008 DEC 2008 FOUR STAGES OF GROWTH GITHUB: 42,000
USERS
JAN 2008 DEC 2008 FOUR STAGES OF GROWTH GITHUB: 80,000
REPOSITORIES
LOCAL MULTI-VM SHARED GFS MOUNT
LOCAL MULTI-VM WEB FRONTENDS BACKGROUND WORKERS
LOCAL MULTI-VM SIMPLE ARCHITECTURE HORIZONTALLY SCALABLE-ish
LOCAL SHARED GFS MOUNT SHARED MOUNT ON EACH VM SIMILAR
PRODUCTION + DEVELOPMENT ACCESS ALLOWED LOCAL ACCESS VIA GRIT
SIMPLE APPROACH, COMMON GIT INTERFACE, QUICK TO BUILD AND SHIP
LOCAL
LOCAL NETWORKED FOUR STAGES OF GROWTH GITHUB: NET-SHARD GITRPC
2008 2009 2010 FOUR STAGES OF GROWTH GITHUB: 166,000 USERS
2008 2009 2010 FOUR STAGES OF GROWTH GITHUB: 484,000 REPOSITORIES
the problem: is slow GFS performance degraded as repos added
the problem: i/o-bound we’re read/write to disk needs to be
fast
THE PLAN NETWORKED HARDWARE MOVE DATACENTERS
NETWORKED HARDWARE bare metal servers 16 machines 6x RAM machine
roles solid datacenter got dat cloud
NETWORKED FRONTENDS FILESERVERS AUX DB LAUNCH: SERVER PAIRS
NETWORKED GRIT IS LOCAL NEEDS TO BE NETWORKED
NETWORKED smoke service is run on each fs; facilitates disk
access chimney routes the smoke, stores routing table in redis stub local grit calls, retain API usage, but send over network
NETWORKED server pairs offer failover via DRBD real servers, real
big RAM allocations
NETWORKED LATENCY networked routing adds 2-10ms per request optimize for
the roundtrip smoke contains smarter server-side logic
NETWORKED LATENCY smoke has custom git extension commands git-distinct-commits returns
commits only contained on a given branch calls to git-show-refs and git-rev-list run all calls server-side in one roundtrip
NETWORKED HORIZONTALLY-SCALABLE, LATENCY- CONSIDERATE, API-COMPATIBLE WITH GRIT
LOCAL FOUR STAGES OF GROWTH GITHUB: NET-SHARD GITRPC NETWORKED
2008 2009 2010 2011 FOUR STAGES OF GROWTH GITHUB: 510,000
USERS
2008 2009 2010 2011 FOUR STAGES OF GROWTH GITHUB: 1.3MM
REPOSITORIES
the problem: duplication data each fork is a full project
history
duplication data i create a repo you fork my
repo fs5:/data/repositories/6/nw/6b/de/92/1/1.git fs7:/data/repositories/4/na/3b/dr/72/2/2.git
duplication data 1,000 commits 1,001 commits 10MB 10MB 20MB
total disk }
duplication data 1,000 commits 1 commit 1KB 10MB 10MB
total disk }GOAL:
duplication data 75 MB repo 3.5k forks x ~250
GB x 2 fs pairs + offsite backups
NET-SHARD shard by repository network (“forks”)
NET-SHARD network.git 1.git 2.git 3.git 4.git CONTAINS DELTA }CONTAINS ALL
REFS ›
NET-SHARD network.git GIT ALTERNATES store git object data externally to
repository we fetch refs into your fork, transparently
NET-SHARD network.git PRIVACY potential leaking of refs cross-network net-shard enabled
on all-public and all-private repository networks only
NET-SHARD network.git DISK halves disk usage increase disk and kernel
cache hits
NET-SHARD network.git MIGRATION gradually transitioned repos to network.git effectively feature-flagged
by repo
NET-SHARD SAVE DISK, IMPROVE PERFORMANCE
LOCAL FOUR STAGES OF GROWTH GITHUB: GITRPC NETWORKED NET-SHARD
2008 2009 2010 2011 2012 FOUR STAGES OF GROWTH GITHUB:
1.2MM USERS
2008 2009 2010 2011 2012 AUGUST FOUR STAGES OF GROWTH
GITHUB: 1.9MM USERS
2008 2009 2010 2011 2012 FOUR STAGES OF GROWTH GITHUB:
3.4MM REPOSITORIES
2008 2009 2010 2011 2012 AUGUST FOUR STAGES OF GROWTH
GITHUB: 6.5MM REPOSITORIES
the problem: GRIT git via ruby
the problem: local, ruby-based grit ended up in a high-traffic
distributed system
the problem: inelegant code spread out everywhere
GITRPC network-oriented library for git access GitRPC
GITRPC open source fastest git implementation (C) github-sponsored project bindings
for all major languages used in our mac, windows clients
GITRPC rugged (RUBY) libgit2 (C) gitrpc (RUBY)
GITRPC like smoke, gitrpc aims to reduce latency by reducing
roundtrips LATENCY
GITRPC operations cached on library level CACHING yank out tons
of app-level cache logic
GITRPC the move to gitrpc started this summer and will
take months MIGRATION gradually replace smoke and grit; avoids a risky deploy
FAST AND STABLE NETWORKED GIT ACCESS GITRPC
LOCAL NETWORKED NET-SHARD GITRPC FOUR STAGES OF GROWTH GITHUB:
identify WHAT’S BROKEN
sma CHANGES, FAST DEVELOPMENT
realCODE BEATS IMAGINARY CODE
EVERYTHING automate automate automate automate automate AUTOMATE automate automate automate
automate automate automate
m . manage LOL DEVELOPERS SOFTWARE
DEVELOPMENT
m . manage DEADLINES MEETINGS PRIORITIES ESTIMATES
m . manage DEADLINES MEETINGS PRIORITIES ESTIMATES
EVERYONE i A MANAGER
AUTOMATE AWAY PAIN DEPLOYMENT RECOVERY DEVELOPMENT
DEVELOPMENT automate
DEVELOPMENT > ./do-work RUN THIS IN EACH PROJECT: ...AND YOU’RE
DONE! loljk
DEVELOPMENT YOU CAN AUTOMATE THE PAIN OF DEVELOPMENT
SETUP DEVELOPMENT the
SETUP DEVELOPMENT the ONE-LINER INSTALLS ALL GITHUB DEVELOPMENT DEPENDENCIES
30 min SETUP DEVELOPMENT the CLEAN MACHINE TO FULL
DEVELOPMENT ENVIRONMENT
SETUP DEVELOPMENT the NEW EMPLOYEES SHIP THEIR FIRST WEEK
SETUP DEVELOPMENT the PUPPET HANDLES ALL DEPENDENCIES
DEPLOYMENT automate
DEPLOYMENT REAL BROGRAMMERS DEPLOY WITH NO FEAR SO FUCK THAT
DEPLOYMENT DEPLOYS SHOULD BE CAUTIOUS, COMMONPLACE, AND AUTOMATED
DEPLOYMENT GITHUB DEPLOYS 20-40 TIMES A DAY
DEPLOYMENT PUSH BRANCH DEPLOY BRANCH EVERYWHERE · MACHINE CLASS ·
SPECIFIC SERVERS HUBOT RUNS TESTS IN ABOUT 200 SECONDS USUALLY OPEN A PULL REQUEST
DEPLOYMENT DEPLOY LOCKING CAN’T DEPLOY IF A BRANCH IS DEPLOYED
AUTODEPLOYS PUSHED TO MASTER WITH GREEN TESTS? DEPLOY.
DEPLOYMENT STAFF-ONLY FEATURE FLAGS LIMITS EXPOSURE · REAL-WORLD · AVOIDS
MERGES
RECOVERY automate
RECOVERY SOMETHING WILL ALWAYS BREAK
RECOVERY HUBOT IS A SYSADMIN
RECOVERY HUBOT LOAD HUBOT QUERIES HUBOT CONNS SERVER LOAD RUNNING
DB QUERIES ALL OPEN CONNECTIONS
RECOVERY HUBOT RESTORE <REPO> HUBOT PUSH-LOG <REPO> HUBOT GH-EACH <HOST>
<COMMAND> RESTORE A REPO FROM BACKUPS SEE RECENT PUSH LOGS TO A REPO RUN COMMAND ON SPECIFIC HOSTS
HIGH-LEVEL OVERVIEW IN MINUTES SPEND MORE TIME FIXING AND LESS
TIME INVESTIGATING RECOVERY
— happiness the — — — —
EMPLOYEES HAVE QUIT YEARS 5 EMPLOYEES 108 ZERO
1-2 MONTHS HIRE 1-3 MONTHS RAMP-UP 2 WEEKS LEAVE
LOSING AN EMPLOYEE CAN SET YOU BACK HALF A YEAR
remove ANY REASON TO LEAVE — — — — —
— — — — — — — — — — — —
TDD✓ PAIR PROGRAMMING ✓ BDD ✓ TEST-FIRST ✓ DESIGN-FIRST ✓
(just kidding) EMACS x NONE OF THESE ✓
WE CARE ABOUT THE WORK YOU DO, NOT ABOUT HOW
YOU DO IT
LOCATION HOURS DIRECTION
LOCATION HOURS DIRECTION GITHUB EMPLOYEES WORK REMOTELY
⅔
LOCATION HOURS DIRECTION FAMILY RELOCATION, TRAVEL FREEDOM
LOCATION HOURS DIRECTION CHOOSE YOUR SCHEDULE CHOOSE
YOUR VACATIONS FRESH, CREATIVE EMPLOYEES
LOCATION HOURS DIRECTION YOU HACK ON THINGS
THAT INTEREST YOU REDUCES BURNOUT
flexible LOCATION HOURS DIRECTION BE TOWARDS WORK/LIFE
githu
basica y, MOVE FAST = SMALL CHANGES
basica y, BE STABLE = DEPLOY CONSTANTLY
basica y, HAPPY COMPANY = HAPPY EMPLOYEES
thank
NO FORKING HOLMAN @ LOST YO QUIT READING THIS SHIT
ZACHHOLMAN.COM/TALKS