Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
How to Build a GitHub
Search
Zach Holman
August 05, 2012
Programming
147
170k
How to Build a GitHub
Learn about the growth patterns and the architecture behind github.com.
Zach Holman
August 05, 2012
Tweet
Share
More Decks by Zach Holman
See All by Zach Holman
Firing People
holman
40
6.3k
Even More Emoji Abuse 🚧🚨
holman
17
10k
Move Fast and Break Nothing
holman
68
180k
The Talk on Talks
holman
66
32k
How GitHub (no longer) Works
holman
311
140k
More Git and GitHub Secrets
holman
183
110k
Keeping People
holman
64
62k
If Only I Knew This Shit in College
holman
98
99k
GitHub: Behind the Feature
holman
41
15k
Other Decks in Programming
See All in Programming
Cloud Adoption Framework にみる組織とクラウド導入戦略
tomokusaba
2
440
DjangoNinjaで高速なAPI開発を実現する
masaya00
0
500
AWS認定資格を受験するにあたり、気づいたこと・実践していたことのまとめ
satoshi256kbyte
1
120
Subclassing, Composition, Python, and You
hynek
3
120
Quarto Clean Theme
nicetak
0
220
(Deep|Web) Link support with expo-router
mrtry
0
170
メルカリ ハロ アプリの技術スタック
atsumo
2
700
文化が生産性を作る
jimpei
3
540
Unlocking Python's Core Magic
leew
0
120
Infrastructure as Code でセキュリティを楽にしよう!
konokenj
6
1.4k
フロントエンドの現在地とこれから
koba04
10
4.4k
推しの夫に恋のGPS「ときメーター」#M5Stack #IoT #M5JPTour2024
riyu
0
230
Featured
See All Featured
A Philosophy of Restraint
colly
202
16k
[RailsConf 2023] Rails as a piece of cake
palkan
49
4.7k
[RailsConf 2023 Opening Keynote] The Magic of Rails
eileencodes
28
9k
Ruby is Unlike a Banana
tanoku
96
11k
Being A Developer After 40
akosma
84
590k
Pencils Down: Stop Designing & Start Developing
hursman
119
11k
Designing Experiences People Love
moore
138
23k
Put a Button on it: Removing Barriers to Going Fast.
kastner
58
3.5k
How to Ace a Technical Interview
jacobian
275
23k
Six Lessons from altMBA
skipperchong
26
3.4k
Building Flexible Design Systems
yeseniaperezcruz
327
38k
Web Components: a chance to create the future
zenorocha
310
42k
Transcript
githu H O W t B U I L
D GITHUB a
githu
6.5MM REPOSITORIES LARGEST GIT HOST 1.9MM USERS SINCE 2008
6.5MM REPOSITORIES LARGEST GIT HOST 1.9MM USERS SINCE 2008 SVN
HOST
gh gh gh gh gh
gh
gh gh gh gh gh
gh SHOW YOU OUR CARDS going t
MAGIC BULLET there i n
FOUR STAGES OF GROWTH happiness the EVERYTHING automate
NO FORKING HOLMAN @ LOST YO QUIT READING THIS SHIT
ho DID WE GIT HERE
1809: PERL INVENTED
1814: COMPUTERS INVENTED
1814-2004: ANARCHY AND CHAOS AND ZOMG EVERYONE’S DYING
2005: VERSION CONTROL INVENTED git
2007: githu GLOBAL PEACE AND HAPPINESS ACHIEVED
...or something like that
PRESTON-WERNER TOM GRIT O C TOBER 9, 2 0 07
git via ruby
GRIT git via ruby github’s interface to git object-oriented, read/write
open source
repo = Grit::Repo.new('/tmp/repository') grit repo.commits
grit shelling out to git is expensive grit reimplements portions
of git in ruby native packfile and git object support 2x-100x speedup on low-level operations
grit slowly reimplement grit for speed allows for incremental improvements
LED TO GITHUB grit O C TOBER 19, 2 0
07
TODAY ADDING 2TB A MONTH 22 FILESERVER PAIRS 23TB OF
REPO DATA
GITHUB GROWTH THE FOUR STAGES of
LOCAL NETWORKED NET-SHARD GITRPC FOUR STAGES OF GROWTH GITHUB:
LOCAL NETWORKED NET-SHARD GITRPC FOUR STAGES OF GROWTH GITHUB: 2008
2009 2010 2012
LOCAL NETWORKED NET-SHARD GITRPC FOUR STAGES OF GROWTH GITHUB:
JAN 2008 DEC 2008 FOUR STAGES OF GROWTH GITHUB: 42,000
USERS
JAN 2008 DEC 2008 FOUR STAGES OF GROWTH GITHUB: 80,000
REPOSITORIES
LOCAL MULTI-VM SHARED GFS MOUNT
LOCAL MULTI-VM WEB FRONTENDS BACKGROUND WORKERS
LOCAL MULTI-VM SIMPLE ARCHITECTURE HORIZONTALLY SCALABLE-ish
LOCAL SHARED GFS MOUNT SHARED MOUNT ON EACH VM SIMILAR
PRODUCTION + DEVELOPMENT ACCESS ALLOWED LOCAL ACCESS VIA GRIT
SIMPLE APPROACH, COMMON GIT INTERFACE, QUICK TO BUILD AND SHIP
LOCAL
LOCAL NETWORKED FOUR STAGES OF GROWTH GITHUB: NET-SHARD GITRPC
2008 2009 2010 FOUR STAGES OF GROWTH GITHUB: 166,000 USERS
2008 2009 2010 FOUR STAGES OF GROWTH GITHUB: 484,000 REPOSITORIES
the problem: is slow GFS performance degraded as repos added
the problem: i/o-bound we’re read/write to disk needs to be
fast
THE PLAN NETWORKED HARDWARE MOVE DATACENTERS
NETWORKED HARDWARE bare metal servers 16 machines 6x RAM machine
roles solid datacenter got dat cloud
NETWORKED FRONTENDS FILESERVERS AUX DB LAUNCH: SERVER PAIRS
NETWORKED GRIT IS LOCAL NEEDS TO BE NETWORKED
NETWORKED smoke service is run on each fs; facilitates disk
access chimney routes the smoke, stores routing table in redis stub local grit calls, retain API usage, but send over network
NETWORKED server pairs offer failover via DRBD real servers, real
big RAM allocations
NETWORKED LATENCY networked routing adds 2-10ms per request optimize for
the roundtrip smoke contains smarter server-side logic
NETWORKED LATENCY smoke has custom git extension commands git-distinct-commits returns
commits only contained on a given branch calls to git-show-refs and git-rev-list run all calls server-side in one roundtrip
NETWORKED HORIZONTALLY-SCALABLE, LATENCY- CONSIDERATE, API-COMPATIBLE WITH GRIT
LOCAL FOUR STAGES OF GROWTH GITHUB: NET-SHARD GITRPC NETWORKED
2008 2009 2010 2011 FOUR STAGES OF GROWTH GITHUB: 510,000
USERS
2008 2009 2010 2011 FOUR STAGES OF GROWTH GITHUB: 1.3MM
REPOSITORIES
the problem: duplication data each fork is a full project
history
duplication data i create a repo you fork my
repo fs5:/data/repositories/6/nw/6b/de/92/1/1.git fs7:/data/repositories/4/na/3b/dr/72/2/2.git
duplication data 1,000 commits 1,001 commits 10MB 10MB 20MB
total disk }
duplication data 1,000 commits 1 commit 1KB 10MB 10MB
total disk }GOAL:
duplication data 75 MB repo 3.5k forks x ~250
GB x 2 fs pairs + offsite backups
NET-SHARD shard by repository network (“forks”)
NET-SHARD network.git 1.git 2.git 3.git 4.git CONTAINS DELTA }CONTAINS ALL
REFS ›
NET-SHARD network.git GIT ALTERNATES store git object data externally to
repository we fetch refs into your fork, transparently
NET-SHARD network.git PRIVACY potential leaking of refs cross-network net-shard enabled
on all-public and all-private repository networks only
NET-SHARD network.git DISK halves disk usage increase disk and kernel
cache hits
NET-SHARD network.git MIGRATION gradually transitioned repos to network.git effectively feature-flagged
by repo
NET-SHARD SAVE DISK, IMPROVE PERFORMANCE
LOCAL FOUR STAGES OF GROWTH GITHUB: GITRPC NETWORKED NET-SHARD
2008 2009 2010 2011 2012 FOUR STAGES OF GROWTH GITHUB:
1.2MM USERS
2008 2009 2010 2011 2012 AUGUST FOUR STAGES OF GROWTH
GITHUB: 1.9MM USERS
2008 2009 2010 2011 2012 FOUR STAGES OF GROWTH GITHUB:
3.4MM REPOSITORIES
2008 2009 2010 2011 2012 AUGUST FOUR STAGES OF GROWTH
GITHUB: 6.5MM REPOSITORIES
the problem: GRIT git via ruby
the problem: local, ruby-based grit ended up in a high-traffic
distributed system
the problem: inelegant code spread out everywhere
GITRPC network-oriented library for git access GitRPC
GITRPC open source fastest git implementation (C) github-sponsored project bindings
for all major languages used in our mac, windows clients
GITRPC rugged (RUBY) libgit2 (C) gitrpc (RUBY)
GITRPC like smoke, gitrpc aims to reduce latency by reducing
roundtrips LATENCY
GITRPC operations cached on library level CACHING yank out tons
of app-level cache logic
GITRPC the move to gitrpc started this summer and will
take months MIGRATION gradually replace smoke and grit; avoids a risky deploy
FAST AND STABLE NETWORKED GIT ACCESS GITRPC
LOCAL NETWORKED NET-SHARD GITRPC FOUR STAGES OF GROWTH GITHUB:
identify WHAT’S BROKEN
sma CHANGES, FAST DEVELOPMENT
realCODE BEATS IMAGINARY CODE
EVERYTHING automate automate automate automate automate AUTOMATE automate automate automate
automate automate automate
m . manage LOL DEVELOPERS SOFTWARE
DEVELOPMENT
m . manage DEADLINES MEETINGS PRIORITIES ESTIMATES
m . manage DEADLINES MEETINGS PRIORITIES ESTIMATES
EVERYONE i A MANAGER
AUTOMATE AWAY PAIN DEPLOYMENT RECOVERY DEVELOPMENT
DEVELOPMENT automate
DEVELOPMENT > ./do-work RUN THIS IN EACH PROJECT: ...AND YOU’RE
DONE! loljk
DEVELOPMENT YOU CAN AUTOMATE THE PAIN OF DEVELOPMENT
SETUP DEVELOPMENT the
SETUP DEVELOPMENT the ONE-LINER INSTALLS ALL GITHUB DEVELOPMENT DEPENDENCIES
30 min SETUP DEVELOPMENT the CLEAN MACHINE TO FULL
DEVELOPMENT ENVIRONMENT
SETUP DEVELOPMENT the NEW EMPLOYEES SHIP THEIR FIRST WEEK
SETUP DEVELOPMENT the PUPPET HANDLES ALL DEPENDENCIES
DEPLOYMENT automate
DEPLOYMENT REAL BROGRAMMERS DEPLOY WITH NO FEAR SO FUCK THAT
DEPLOYMENT DEPLOYS SHOULD BE CAUTIOUS, COMMONPLACE, AND AUTOMATED
DEPLOYMENT GITHUB DEPLOYS 20-40 TIMES A DAY
DEPLOYMENT PUSH BRANCH DEPLOY BRANCH EVERYWHERE · MACHINE CLASS ·
SPECIFIC SERVERS HUBOT RUNS TESTS IN ABOUT 200 SECONDS USUALLY OPEN A PULL REQUEST
DEPLOYMENT DEPLOY LOCKING CAN’T DEPLOY IF A BRANCH IS DEPLOYED
AUTODEPLOYS PUSHED TO MASTER WITH GREEN TESTS? DEPLOY.
DEPLOYMENT STAFF-ONLY FEATURE FLAGS LIMITS EXPOSURE · REAL-WORLD · AVOIDS
MERGES
RECOVERY automate
RECOVERY SOMETHING WILL ALWAYS BREAK
RECOVERY HUBOT IS A SYSADMIN
RECOVERY HUBOT LOAD HUBOT QUERIES HUBOT CONNS SERVER LOAD RUNNING
DB QUERIES ALL OPEN CONNECTIONS
RECOVERY HUBOT RESTORE <REPO> HUBOT PUSH-LOG <REPO> HUBOT GH-EACH <HOST>
<COMMAND> RESTORE A REPO FROM BACKUPS SEE RECENT PUSH LOGS TO A REPO RUN COMMAND ON SPECIFIC HOSTS
HIGH-LEVEL OVERVIEW IN MINUTES SPEND MORE TIME FIXING AND LESS
TIME INVESTIGATING RECOVERY
— happiness the — — — —
EMPLOYEES HAVE QUIT YEARS 5 EMPLOYEES 108 ZERO
1-2 MONTHS HIRE 1-3 MONTHS RAMP-UP 2 WEEKS LEAVE
LOSING AN EMPLOYEE CAN SET YOU BACK HALF A YEAR
remove ANY REASON TO LEAVE — — — — —
— — — — — — — — — — — —
TDD✓ PAIR PROGRAMMING ✓ BDD ✓ TEST-FIRST ✓ DESIGN-FIRST ✓
(just kidding) EMACS x NONE OF THESE ✓
WE CARE ABOUT THE WORK YOU DO, NOT ABOUT HOW
YOU DO IT
LOCATION HOURS DIRECTION
LOCATION HOURS DIRECTION GITHUB EMPLOYEES WORK REMOTELY
⅔
LOCATION HOURS DIRECTION FAMILY RELOCATION, TRAVEL FREEDOM
LOCATION HOURS DIRECTION CHOOSE YOUR SCHEDULE CHOOSE
YOUR VACATIONS FRESH, CREATIVE EMPLOYEES
LOCATION HOURS DIRECTION YOU HACK ON THINGS
THAT INTEREST YOU REDUCES BURNOUT
flexible LOCATION HOURS DIRECTION BE TOWARDS WORK/LIFE
githu
basica y, MOVE FAST = SMALL CHANGES
basica y, BE STABLE = DEPLOY CONSTANTLY
basica y, HAPPY COMPANY = HAPPY EMPLOYEES
thank
NO FORKING HOLMAN @ LOST YO QUIT READING THIS SHIT
ZACHHOLMAN.COM/TALKS