Slide 1

Slide 1 text

 githu H O W t B U I L D GITHUB a 

Slide 2

Slide 2 text

githu

Slide 3

Slide 3 text

6.5MM REPOSITORIES LARGEST GIT HOST 1.9MM USERS SINCE 2008

Slide 4

Slide 4 text

6.5MM REPOSITORIES LARGEST GIT HOST 1.9MM USERS SINCE 2008 SVN HOST

Slide 5

Slide 5 text

gh  gh  gh  gh  gh  gh 

Slide 6

Slide 6 text

gh  gh  gh  gh  gh  gh  SHOW YOU OUR CARDS going t

Slide 7

Slide 7 text

MAGIC BULLET there i n

Slide 8

Slide 8 text

FOUR STAGES OF GROWTH happiness the EVERYTHING automate

Slide 9

Slide 9 text

NO FORKING HOLMAN @ LOST YO QUIT READING THIS SHIT

Slide 10

Slide 10 text

ho DID WE GIT HERE

Slide 11

Slide 11 text

1809: PERL INVENTED

Slide 12

Slide 12 text

1814: COMPUTERS INVENTED

Slide 13

Slide 13 text

1814-2004: ANARCHY AND CHAOS AND ZOMG EVERYONE’S DYING

Slide 14

Slide 14 text

2005: VERSION CONTROL INVENTED git

Slide 15

Slide 15 text

2007: githu GLOBAL PEACE AND HAPPINESS ACHIEVED

Slide 16

Slide 16 text

...or something like that

Slide 17

Slide 17 text

PRESTON-WERNER TOM GRIT O C TOBER 9, 2 0 07 git via ruby

Slide 18

Slide 18 text

GRIT git via ruby github’s interface to git object-oriented, read/write open source

Slide 19

Slide 19 text

repo = Grit::Repo.new('/tmp/repository') grit repo.commits

Slide 20

Slide 20 text

grit shelling out to git is expensive grit reimplements portions of git in ruby native packfile and git object support 2x-100x speedup on low-level operations

Slide 21

Slide 21 text

grit slowly reimplement grit for speed allows for incremental improvements

Slide 22

Slide 22 text

LED TO GITHUB grit O C TOBER 19, 2 0 07

Slide 23

Slide 23 text

TODAY ADDING 2TB A MONTH 22 FILESERVER PAIRS 23TB OF REPO DATA

Slide 24

Slide 24 text

GITHUB GROWTH THE FOUR STAGES of

Slide 25

Slide 25 text

LOCAL NETWORKED NET-SHARD GITRPC FOUR STAGES OF GROWTH GITHUB:

Slide 26

Slide 26 text

LOCAL NETWORKED NET-SHARD GITRPC FOUR STAGES OF GROWTH GITHUB: 2008 2009 2010 2012

Slide 27

Slide 27 text

LOCAL NETWORKED NET-SHARD GITRPC FOUR STAGES OF GROWTH GITHUB:

Slide 28

Slide 28 text

JAN 2008 DEC 2008 FOUR STAGES OF GROWTH GITHUB: 42,000 USERS 

Slide 29

Slide 29 text

JAN 2008 DEC 2008 FOUR STAGES OF GROWTH GITHUB: 80,000 REPOSITORIES 

Slide 30

Slide 30 text

LOCAL MULTI-VM SHARED GFS MOUNT

Slide 31

Slide 31 text

LOCAL MULTI-VM WEB FRONTENDS BACKGROUND WORKERS

Slide 32

Slide 32 text

LOCAL MULTI-VM SIMPLE ARCHITECTURE HORIZONTALLY SCALABLE-ish

Slide 33

Slide 33 text

LOCAL SHARED GFS MOUNT SHARED MOUNT ON EACH VM SIMILAR PRODUCTION + DEVELOPMENT ACCESS ALLOWED LOCAL ACCESS VIA GRIT

Slide 34

Slide 34 text

SIMPLE APPROACH, COMMON GIT INTERFACE, QUICK TO BUILD AND SHIP LOCAL

Slide 35

Slide 35 text

LOCAL NETWORKED FOUR STAGES OF GROWTH GITHUB: NET-SHARD GITRPC

Slide 36

Slide 36 text

2008 2009 2010 FOUR STAGES OF GROWTH GITHUB: 166,000 USERS 

Slide 37

Slide 37 text

2008 2009 2010 FOUR STAGES OF GROWTH GITHUB: 484,000 REPOSITORIES 

Slide 38

Slide 38 text

the problem: is slow GFS performance degraded as repos added

Slide 39

Slide 39 text

the problem: i/o-bound we’re read/write to disk needs to be fast

Slide 40

Slide 40 text

THE PLAN NETWORKED HARDWARE MOVE DATACENTERS

Slide 41

Slide 41 text

NETWORKED HARDWARE bare metal servers 16 machines 6x RAM machine roles solid datacenter got dat cloud

Slide 42

Slide 42 text

NETWORKED FRONTENDS FILESERVERS AUX DB LAUNCH: SERVER PAIRS

Slide 43

Slide 43 text

NETWORKED GRIT IS LOCAL NEEDS TO BE NETWORKED

Slide 44

Slide 44 text

NETWORKED smoke service is run on each fs; facilitates disk access chimney routes the smoke, stores routing table in redis stub local grit calls, retain API usage, but send over network

Slide 45

Slide 45 text

NETWORKED server pairs offer failover via DRBD real servers, real big RAM allocations

Slide 46

Slide 46 text

NETWORKED LATENCY networked routing adds 2-10ms per request optimize for the roundtrip smoke contains smarter server-side logic

Slide 47

Slide 47 text

NETWORKED LATENCY smoke has custom git extension commands git-distinct-commits returns commits only contained on a given branch calls to git-show-refs and git-rev-list run all calls server-side in one roundtrip

Slide 48

Slide 48 text

NETWORKED HORIZONTALLY-SCALABLE, LATENCY- CONSIDERATE, API-COMPATIBLE WITH GRIT

Slide 49

Slide 49 text

LOCAL FOUR STAGES OF GROWTH GITHUB: NET-SHARD GITRPC NETWORKED

Slide 50

Slide 50 text

2008 2009 2010 2011 FOUR STAGES OF GROWTH GITHUB: 510,000 USERS 

Slide 51

Slide 51 text

2008 2009 2010 2011 FOUR STAGES OF GROWTH GITHUB: 1.3MM REPOSITORIES 

Slide 52

Slide 52 text

the problem: duplication data each fork is a full project history 

Slide 53

Slide 53 text

duplication data  i create a repo you fork my repo fs5:/data/repositories/6/nw/6b/de/92/1/1.git fs7:/data/repositories/4/na/3b/dr/72/2/2.git

Slide 54

Slide 54 text

duplication data  1,000 commits 1,001 commits 10MB 10MB 20MB total disk }

Slide 55

Slide 55 text

duplication data  1,000 commits 1 commit 1KB 10MB 10MB total disk }GOAL:

Slide 56

Slide 56 text

duplication data  75 MB repo 3.5k forks x ~250 GB x 2 fs pairs + offsite backups

Slide 57

Slide 57 text

NET-SHARD shard by repository network (“forks”)

Slide 58

Slide 58 text

NET-SHARD network.git 1.git 2.git 3.git 4.git CONTAINS DELTA }CONTAINS ALL REFS ›

Slide 59

Slide 59 text

NET-SHARD network.git GIT ALTERNATES store git object data externally to repository we fetch refs into your fork, transparently

Slide 60

Slide 60 text

NET-SHARD network.git PRIVACY potential leaking of refs cross-network net-shard enabled on all-public and all-private repository networks only

Slide 61

Slide 61 text

NET-SHARD network.git DISK halves disk usage increase disk and kernel cache hits

Slide 62

Slide 62 text

NET-SHARD network.git MIGRATION gradually transitioned repos to network.git effectively feature-flagged by repo

Slide 63

Slide 63 text

NET-SHARD SAVE DISK, IMPROVE PERFORMANCE

Slide 64

Slide 64 text

LOCAL FOUR STAGES OF GROWTH GITHUB: GITRPC NETWORKED NET-SHARD

Slide 65

Slide 65 text

2008 2009 2010 2011 2012 FOUR STAGES OF GROWTH GITHUB: 1.2MM USERS 

Slide 66

Slide 66 text

2008 2009 2010 2011 2012 AUGUST FOUR STAGES OF GROWTH GITHUB: 1.9MM USERS 

Slide 67

Slide 67 text

2008 2009 2010 2011 2012 FOUR STAGES OF GROWTH GITHUB: 3.4MM REPOSITORIES 

Slide 68

Slide 68 text

2008 2009 2010 2011 2012 AUGUST FOUR STAGES OF GROWTH GITHUB: 6.5MM REPOSITORIES 

Slide 69

Slide 69 text

the problem: GRIT git via ruby

Slide 70

Slide 70 text

the problem: local, ruby-based grit ended up in a high-traffic distributed system

Slide 71

Slide 71 text

the problem: inelegant code spread out everywhere

Slide 72

Slide 72 text

GITRPC network-oriented library for git access GitRPC

Slide 73

Slide 73 text

GITRPC open source fastest git implementation (C) github-sponsored project bindings for all major languages used in our mac, windows clients

Slide 74

Slide 74 text

GITRPC rugged (RUBY) libgit2 (C) gitrpc (RUBY)

Slide 75

Slide 75 text

GITRPC like smoke, gitrpc aims to reduce latency by reducing roundtrips LATENCY

Slide 76

Slide 76 text

GITRPC operations cached on library level CACHING yank out tons of app-level cache logic

Slide 77

Slide 77 text

GITRPC the move to gitrpc started this summer and will take months MIGRATION gradually replace smoke and grit; avoids a risky deploy

Slide 78

Slide 78 text

FAST AND STABLE NETWORKED GIT ACCESS GITRPC

Slide 79

Slide 79 text

LOCAL NETWORKED NET-SHARD GITRPC FOUR STAGES OF GROWTH GITHUB:

Slide 80

Slide 80 text

identify WHAT’S BROKEN

Slide 81

Slide 81 text

sma CHANGES, FAST DEVELOPMENT

Slide 82

Slide 82 text

realCODE BEATS IMAGINARY CODE

Slide 83

Slide 83 text

EVERYTHING automate automate automate automate automate AUTOMATE automate automate automate automate automate automate

Slide 84

Slide 84 text

    m . manage LOL DEVELOPERS SOFTWARE DEVELOPMENT

Slide 85

Slide 85 text

   m . manage DEADLINES MEETINGS PRIORITIES ESTIMATES

Slide 86

Slide 86 text

   m . manage DEADLINES MEETINGS PRIORITIES ESTIMATES

Slide 87

Slide 87 text

 EVERYONE i A MANAGER

Slide 88

Slide 88 text

AUTOMATE AWAY PAIN DEPLOYMENT RECOVERY DEVELOPMENT

Slide 89

Slide 89 text

DEVELOPMENT automate

Slide 90

Slide 90 text

DEVELOPMENT > ./do-work RUN THIS IN EACH PROJECT: ...AND YOU’RE DONE! loljk

Slide 91

Slide 91 text

DEVELOPMENT YOU CAN AUTOMATE THE PAIN OF DEVELOPMENT

Slide 92

Slide 92 text

SETUP DEVELOPMENT the

Slide 93

Slide 93 text

SETUP DEVELOPMENT the ONE-LINER INSTALLS ALL GITHUB DEVELOPMENT DEPENDENCIES

Slide 94

Slide 94 text

 30 min SETUP DEVELOPMENT the CLEAN MACHINE TO FULL DEVELOPMENT ENVIRONMENT

Slide 95

Slide 95 text

SETUP DEVELOPMENT the NEW EMPLOYEES SHIP THEIR FIRST WEEK 

Slide 96

Slide 96 text

SETUP DEVELOPMENT the PUPPET HANDLES ALL DEPENDENCIES

Slide 97

Slide 97 text

DEPLOYMENT automate

Slide 98

Slide 98 text

DEPLOYMENT REAL BROGRAMMERS DEPLOY WITH NO FEAR SO FUCK THAT

Slide 99

Slide 99 text

DEPLOYMENT DEPLOYS SHOULD BE CAUTIOUS, COMMONPLACE, AND AUTOMATED

Slide 100

Slide 100 text

DEPLOYMENT GITHUB DEPLOYS 20-40 TIMES A DAY

Slide 101

Slide 101 text

DEPLOYMENT PUSH BRANCH DEPLOY BRANCH EVERYWHERE · MACHINE CLASS · SPECIFIC SERVERS HUBOT RUNS TESTS IN ABOUT 200 SECONDS USUALLY OPEN A PULL REQUEST

Slide 102

Slide 102 text

DEPLOYMENT DEPLOY LOCKING CAN’T DEPLOY IF A BRANCH IS DEPLOYED AUTODEPLOYS PUSHED TO MASTER WITH GREEN TESTS? DEPLOY.

Slide 103

Slide 103 text

DEPLOYMENT STAFF-ONLY FEATURE FLAGS LIMITS EXPOSURE · REAL-WORLD · AVOIDS MERGES

Slide 104

Slide 104 text

RECOVERY automate

Slide 105

Slide 105 text

RECOVERY SOMETHING WILL ALWAYS BREAK

Slide 106

Slide 106 text

RECOVERY HUBOT IS A SYSADMIN

Slide 107

Slide 107 text

RECOVERY HUBOT LOAD HUBOT QUERIES HUBOT CONNS SERVER LOAD RUNNING DB QUERIES ALL OPEN CONNECTIONS

Slide 108

Slide 108 text

RECOVERY HUBOT RESTORE HUBOT PUSH-LOG HUBOT GH-EACH RESTORE A REPO FROM BACKUPS SEE RECENT PUSH LOGS TO A REPO RUN COMMAND ON SPECIFIC HOSTS

Slide 109

Slide 109 text

HIGH-LEVEL OVERVIEW IN MINUTES SPEND MORE TIME FIXING AND LESS TIME INVESTIGATING RECOVERY

Slide 110

Slide 110 text

— happiness the — — — —

Slide 111

Slide 111 text

EMPLOYEES HAVE QUIT YEARS 5 EMPLOYEES 108 ZERO

Slide 112

Slide 112 text

1-2 MONTHS HIRE 1-3 MONTHS RAMP-UP 2 WEEKS LEAVE

Slide 113

Slide 113 text

LOSING AN EMPLOYEE CAN SET YOU BACK HALF A YEAR

Slide 114

Slide 114 text

remove ANY REASON TO LEAVE — — — — — — — — — — — — — — — — —

Slide 115

Slide 115 text

TDD✓ PAIR PROGRAMMING ✓ BDD ✓ TEST-FIRST ✓ DESIGN-FIRST ✓ (just kidding) EMACS x NONE OF THESE ✓

Slide 116

Slide 116 text

WE CARE ABOUT THE WORK YOU DO, NOT ABOUT HOW YOU DO IT

Slide 117

Slide 117 text

LOCATION  HOURS  DIRECTION 

Slide 118

Slide 118 text

LOCATION  HOURS  DIRECTION  GITHUB EMPLOYEES WORK REMOTELY ⅔

Slide 119

Slide 119 text

LOCATION  HOURS  DIRECTION  FAMILY RELOCATION, TRAVEL FREEDOM 

Slide 120

Slide 120 text

LOCATION  HOURS  DIRECTION  CHOOSE YOUR SCHEDULE CHOOSE YOUR VACATIONS FRESH, CREATIVE EMPLOYEES 

Slide 121

Slide 121 text

LOCATION  HOURS  DIRECTION  YOU HACK ON THINGS THAT INTEREST YOU REDUCES BURNOUT 

Slide 122

Slide 122 text

flexible LOCATION  HOURS  DIRECTION  BE TOWARDS WORK/LIFE

Slide 123

Slide 123 text

githu

Slide 124

Slide 124 text

basica y, MOVE FAST = SMALL CHANGES

Slide 125

Slide 125 text

basica y, BE STABLE = DEPLOY CONSTANTLY

Slide 126

Slide 126 text

basica y, HAPPY COMPANY = HAPPY EMPLOYEES

Slide 127

Slide 127 text

thank

Slide 128

Slide 128 text

NO FORKING HOLMAN @ LOST YO QUIT READING THIS SHIT ZACHHOLMAN.COM/TALKS