Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
How to Build a GitHub
Search
Zach Holman
August 05, 2012
Programming
147
170k
How to Build a GitHub
Learn about the growth patterns and the architecture behind github.com.
Zach Holman
August 05, 2012
Tweet
Share
More Decks by Zach Holman
See All by Zach Holman
Firing People
holman
40
6.4k
Even More Emoji Abuse 🚧🚨
holman
17
10k
Move Fast and Break Nothing
holman
68
180k
The Talk on Talks
holman
66
32k
How GitHub (no longer) Works
holman
310
140k
More Git and GitHub Secrets
holman
183
110k
Keeping People
holman
64
62k
If Only I Knew This Shit in College
holman
98
100k
GitHub: Behind the Feature
holman
41
15k
Other Decks in Programming
See All in Programming
距離関数を極める! / SESSIONS 2024
gam0022
0
290
Pinia Colada が実現するスマートな非同期処理
naokihaba
4
230
Vapor Revolution
kazupon
1
170
Modular Monolith Monorepo ~シンプルさを保ちながらmonorepoのメリットを最大化する~
yuisakamoto
5
330
.NET のための通信フレームワーク MagicOnion 入門 / Introduction to MagicOnion
mayuki
1
1.8k
TypeScript Graph でコードレビューの心理的障壁を乗り越える
ysk8hori
3
1.2k
Better Code Design in PHP
afilina
PRO
0
130
What’s New in Compose Multiplatform - A Live Tour (droidcon London 2024)
zsmb
1
480
Quine, Polyglot, 良いコード
qnighy
4
650
シェーダーで魅せるMapLibreの動的ラスタータイル
satoshi7190
1
480
よくできたテンプレート言語として TypeScript + JSX を利用する試み / Using TypeScript + JSX outside of Web Frontend #TSKaigiKansai
izumin5210
6
1.8k
Hotwire or React? ~アフタートーク・本編に含めなかった話~ / Hotwire or React? after talk
harunatsujita
1
120
Featured
See All Featured
Reflections from 52 weeks, 52 projects
jeffersonlam
346
20k
Gamification - CAS2011
davidbonilla
80
5k
Measuring & Analyzing Core Web Vitals
bluesmoon
4
130
Designing for Performance
lara
604
68k
The Psychology of Web Performance [Beyond Tellerrand 2023]
tammyeverts
44
2.2k
Product Roadmaps are Hard
iamctodd
PRO
49
11k
Six Lessons from altMBA
skipperchong
27
3.5k
Principles of Awesome APIs and How to Build Them.
keavy
126
17k
Bash Introduction
62gerente
608
210k
Designing for humans not robots
tammielis
250
25k
Learning to Love Humans: Emotional Interface Design
aarron
273
40k
Build The Right Thing And Hit Your Dates
maggiecrowley
33
2.4k
Transcript
githu H O W t B U I L
D GITHUB a
githu
6.5MM REPOSITORIES LARGEST GIT HOST 1.9MM USERS SINCE 2008
6.5MM REPOSITORIES LARGEST GIT HOST 1.9MM USERS SINCE 2008 SVN
HOST
gh gh gh gh gh
gh
gh gh gh gh gh
gh SHOW YOU OUR CARDS going t
MAGIC BULLET there i n
FOUR STAGES OF GROWTH happiness the EVERYTHING automate
NO FORKING HOLMAN @ LOST YO QUIT READING THIS SHIT
ho DID WE GIT HERE
1809: PERL INVENTED
1814: COMPUTERS INVENTED
1814-2004: ANARCHY AND CHAOS AND ZOMG EVERYONE’S DYING
2005: VERSION CONTROL INVENTED git
2007: githu GLOBAL PEACE AND HAPPINESS ACHIEVED
...or something like that
PRESTON-WERNER TOM GRIT O C TOBER 9, 2 0 07
git via ruby
GRIT git via ruby github’s interface to git object-oriented, read/write
open source
repo = Grit::Repo.new('/tmp/repository') grit repo.commits
grit shelling out to git is expensive grit reimplements portions
of git in ruby native packfile and git object support 2x-100x speedup on low-level operations
grit slowly reimplement grit for speed allows for incremental improvements
LED TO GITHUB grit O C TOBER 19, 2 0
07
TODAY ADDING 2TB A MONTH 22 FILESERVER PAIRS 23TB OF
REPO DATA
GITHUB GROWTH THE FOUR STAGES of
LOCAL NETWORKED NET-SHARD GITRPC FOUR STAGES OF GROWTH GITHUB:
LOCAL NETWORKED NET-SHARD GITRPC FOUR STAGES OF GROWTH GITHUB: 2008
2009 2010 2012
LOCAL NETWORKED NET-SHARD GITRPC FOUR STAGES OF GROWTH GITHUB:
JAN 2008 DEC 2008 FOUR STAGES OF GROWTH GITHUB: 42,000
USERS
JAN 2008 DEC 2008 FOUR STAGES OF GROWTH GITHUB: 80,000
REPOSITORIES
LOCAL MULTI-VM SHARED GFS MOUNT
LOCAL MULTI-VM WEB FRONTENDS BACKGROUND WORKERS
LOCAL MULTI-VM SIMPLE ARCHITECTURE HORIZONTALLY SCALABLE-ish
LOCAL SHARED GFS MOUNT SHARED MOUNT ON EACH VM SIMILAR
PRODUCTION + DEVELOPMENT ACCESS ALLOWED LOCAL ACCESS VIA GRIT
SIMPLE APPROACH, COMMON GIT INTERFACE, QUICK TO BUILD AND SHIP
LOCAL
LOCAL NETWORKED FOUR STAGES OF GROWTH GITHUB: NET-SHARD GITRPC
2008 2009 2010 FOUR STAGES OF GROWTH GITHUB: 166,000 USERS
2008 2009 2010 FOUR STAGES OF GROWTH GITHUB: 484,000 REPOSITORIES
the problem: is slow GFS performance degraded as repos added
the problem: i/o-bound we’re read/write to disk needs to be
fast
THE PLAN NETWORKED HARDWARE MOVE DATACENTERS
NETWORKED HARDWARE bare metal servers 16 machines 6x RAM machine
roles solid datacenter got dat cloud
NETWORKED FRONTENDS FILESERVERS AUX DB LAUNCH: SERVER PAIRS
NETWORKED GRIT IS LOCAL NEEDS TO BE NETWORKED
NETWORKED smoke service is run on each fs; facilitates disk
access chimney routes the smoke, stores routing table in redis stub local grit calls, retain API usage, but send over network
NETWORKED server pairs offer failover via DRBD real servers, real
big RAM allocations
NETWORKED LATENCY networked routing adds 2-10ms per request optimize for
the roundtrip smoke contains smarter server-side logic
NETWORKED LATENCY smoke has custom git extension commands git-distinct-commits returns
commits only contained on a given branch calls to git-show-refs and git-rev-list run all calls server-side in one roundtrip
NETWORKED HORIZONTALLY-SCALABLE, LATENCY- CONSIDERATE, API-COMPATIBLE WITH GRIT
LOCAL FOUR STAGES OF GROWTH GITHUB: NET-SHARD GITRPC NETWORKED
2008 2009 2010 2011 FOUR STAGES OF GROWTH GITHUB: 510,000
USERS
2008 2009 2010 2011 FOUR STAGES OF GROWTH GITHUB: 1.3MM
REPOSITORIES
the problem: duplication data each fork is a full project
history
duplication data i create a repo you fork my
repo fs5:/data/repositories/6/nw/6b/de/92/1/1.git fs7:/data/repositories/4/na/3b/dr/72/2/2.git
duplication data 1,000 commits 1,001 commits 10MB 10MB 20MB
total disk }
duplication data 1,000 commits 1 commit 1KB 10MB 10MB
total disk }GOAL:
duplication data 75 MB repo 3.5k forks x ~250
GB x 2 fs pairs + offsite backups
NET-SHARD shard by repository network (“forks”)
NET-SHARD network.git 1.git 2.git 3.git 4.git CONTAINS DELTA }CONTAINS ALL
REFS ›
NET-SHARD network.git GIT ALTERNATES store git object data externally to
repository we fetch refs into your fork, transparently
NET-SHARD network.git PRIVACY potential leaking of refs cross-network net-shard enabled
on all-public and all-private repository networks only
NET-SHARD network.git DISK halves disk usage increase disk and kernel
cache hits
NET-SHARD network.git MIGRATION gradually transitioned repos to network.git effectively feature-flagged
by repo
NET-SHARD SAVE DISK, IMPROVE PERFORMANCE
LOCAL FOUR STAGES OF GROWTH GITHUB: GITRPC NETWORKED NET-SHARD
2008 2009 2010 2011 2012 FOUR STAGES OF GROWTH GITHUB:
1.2MM USERS
2008 2009 2010 2011 2012 AUGUST FOUR STAGES OF GROWTH
GITHUB: 1.9MM USERS
2008 2009 2010 2011 2012 FOUR STAGES OF GROWTH GITHUB:
3.4MM REPOSITORIES
2008 2009 2010 2011 2012 AUGUST FOUR STAGES OF GROWTH
GITHUB: 6.5MM REPOSITORIES
the problem: GRIT git via ruby
the problem: local, ruby-based grit ended up in a high-traffic
distributed system
the problem: inelegant code spread out everywhere
GITRPC network-oriented library for git access GitRPC
GITRPC open source fastest git implementation (C) github-sponsored project bindings
for all major languages used in our mac, windows clients
GITRPC rugged (RUBY) libgit2 (C) gitrpc (RUBY)
GITRPC like smoke, gitrpc aims to reduce latency by reducing
roundtrips LATENCY
GITRPC operations cached on library level CACHING yank out tons
of app-level cache logic
GITRPC the move to gitrpc started this summer and will
take months MIGRATION gradually replace smoke and grit; avoids a risky deploy
FAST AND STABLE NETWORKED GIT ACCESS GITRPC
LOCAL NETWORKED NET-SHARD GITRPC FOUR STAGES OF GROWTH GITHUB:
identify WHAT’S BROKEN
sma CHANGES, FAST DEVELOPMENT
realCODE BEATS IMAGINARY CODE
EVERYTHING automate automate automate automate automate AUTOMATE automate automate automate
automate automate automate
m . manage LOL DEVELOPERS SOFTWARE
DEVELOPMENT
m . manage DEADLINES MEETINGS PRIORITIES ESTIMATES
m . manage DEADLINES MEETINGS PRIORITIES ESTIMATES
EVERYONE i A MANAGER
AUTOMATE AWAY PAIN DEPLOYMENT RECOVERY DEVELOPMENT
DEVELOPMENT automate
DEVELOPMENT > ./do-work RUN THIS IN EACH PROJECT: ...AND YOU’RE
DONE! loljk
DEVELOPMENT YOU CAN AUTOMATE THE PAIN OF DEVELOPMENT
SETUP DEVELOPMENT the
SETUP DEVELOPMENT the ONE-LINER INSTALLS ALL GITHUB DEVELOPMENT DEPENDENCIES
30 min SETUP DEVELOPMENT the CLEAN MACHINE TO FULL
DEVELOPMENT ENVIRONMENT
SETUP DEVELOPMENT the NEW EMPLOYEES SHIP THEIR FIRST WEEK
SETUP DEVELOPMENT the PUPPET HANDLES ALL DEPENDENCIES
DEPLOYMENT automate
DEPLOYMENT REAL BROGRAMMERS DEPLOY WITH NO FEAR SO FUCK THAT
DEPLOYMENT DEPLOYS SHOULD BE CAUTIOUS, COMMONPLACE, AND AUTOMATED
DEPLOYMENT GITHUB DEPLOYS 20-40 TIMES A DAY
DEPLOYMENT PUSH BRANCH DEPLOY BRANCH EVERYWHERE · MACHINE CLASS ·
SPECIFIC SERVERS HUBOT RUNS TESTS IN ABOUT 200 SECONDS USUALLY OPEN A PULL REQUEST
DEPLOYMENT DEPLOY LOCKING CAN’T DEPLOY IF A BRANCH IS DEPLOYED
AUTODEPLOYS PUSHED TO MASTER WITH GREEN TESTS? DEPLOY.
DEPLOYMENT STAFF-ONLY FEATURE FLAGS LIMITS EXPOSURE · REAL-WORLD · AVOIDS
MERGES
RECOVERY automate
RECOVERY SOMETHING WILL ALWAYS BREAK
RECOVERY HUBOT IS A SYSADMIN
RECOVERY HUBOT LOAD HUBOT QUERIES HUBOT CONNS SERVER LOAD RUNNING
DB QUERIES ALL OPEN CONNECTIONS
RECOVERY HUBOT RESTORE <REPO> HUBOT PUSH-LOG <REPO> HUBOT GH-EACH <HOST>
<COMMAND> RESTORE A REPO FROM BACKUPS SEE RECENT PUSH LOGS TO A REPO RUN COMMAND ON SPECIFIC HOSTS
HIGH-LEVEL OVERVIEW IN MINUTES SPEND MORE TIME FIXING AND LESS
TIME INVESTIGATING RECOVERY
— happiness the — — — —
EMPLOYEES HAVE QUIT YEARS 5 EMPLOYEES 108 ZERO
1-2 MONTHS HIRE 1-3 MONTHS RAMP-UP 2 WEEKS LEAVE
LOSING AN EMPLOYEE CAN SET YOU BACK HALF A YEAR
remove ANY REASON TO LEAVE — — — — —
— — — — — — — — — — — —
TDD✓ PAIR PROGRAMMING ✓ BDD ✓ TEST-FIRST ✓ DESIGN-FIRST ✓
(just kidding) EMACS x NONE OF THESE ✓
WE CARE ABOUT THE WORK YOU DO, NOT ABOUT HOW
YOU DO IT
LOCATION HOURS DIRECTION
LOCATION HOURS DIRECTION GITHUB EMPLOYEES WORK REMOTELY
⅔
LOCATION HOURS DIRECTION FAMILY RELOCATION, TRAVEL FREEDOM
LOCATION HOURS DIRECTION CHOOSE YOUR SCHEDULE CHOOSE
YOUR VACATIONS FRESH, CREATIVE EMPLOYEES
LOCATION HOURS DIRECTION YOU HACK ON THINGS
THAT INTEREST YOU REDUCES BURNOUT
flexible LOCATION HOURS DIRECTION BE TOWARDS WORK/LIFE
githu
basica y, MOVE FAST = SMALL CHANGES
basica y, BE STABLE = DEPLOY CONSTANTLY
basica y, HAPPY COMPANY = HAPPY EMPLOYEES
thank
NO FORKING HOLMAN @ LOST YO QUIT READING THIS SHIT
ZACHHOLMAN.COM/TALKS