Link
Embed
Share
Beginning
This slide
Copy link URL
Copy link URL
Copy iframe embed code
Copy iframe embed code
Copy javascript embed code
Copy javascript embed code
Share
Tweet
Share
Tweet
Slide 1
Slide 1 text
Scaling Instagram AirBnB Tech Talk 2012 Mike Krieger Instagram
Slide 2
Slide 2 text
me - Co-founder, Instagram - Previously: UX & Front-end @ Meebo - Stanford HCI BS/MS - @mikeyk on everything
Slide 3
Slide 3 text
No content
Slide 4
Slide 4 text
No content
Slide 5
Slide 5 text
No content
Slide 6
Slide 6 text
communicating and sharing in the real world
Slide 7
Slide 7 text
30+ million users in less than 2 years
Slide 8
Slide 8 text
the story of how we scaled it
Slide 9
Slide 9 text
a brief tangent
Slide 10
Slide 10 text
the beginning
Slide 11
Slide 11 text
Text
Slide 12
Slide 12 text
2 product guys
Slide 13
Slide 13 text
no real back-end experience
Slide 14
Slide 14 text
analytics & python @ meebo
Slide 15
Slide 15 text
CouchDB
Slide 16
Slide 16 text
CrimeDesk SF
Slide 17
Slide 17 text
No content
Slide 18
Slide 18 text
let’s get hacking
Slide 19
Slide 19 text
good components in place early on
Slide 20
Slide 20 text
...but were hosted on a single machine somewhere in LA
Slide 21
Slide 21 text
No content
Slide 22
Slide 22 text
less powerful than my MacBook Pro
Slide 23
Slide 23 text
okay, we launched. now what?
Slide 24
Slide 24 text
25k signups in the first day
Slide 25
Slide 25 text
everything is on fire!
Slide 26
Slide 26 text
best & worst day of our lives so far
Slide 27
Slide 27 text
load was through the roof
Slide 28
Slide 28 text
first culprit?
Slide 29
Slide 29 text
No content
Slide 30
Slide 30 text
favicon.ico
Slide 31
Slide 31 text
404-ing on Django, causing tons of errors
Slide 32
Slide 32 text
lesson #1: don’t forget your favicon
Slide 33
Slide 33 text
real lesson #1: most of your initial scaling problems won’t be glamorous
Slide 34
Slide 34 text
favicon
Slide 35
Slide 35 text
ulimit -n
Slide 36
Slide 36 text
memcached -t 4
Slide 37
Slide 37 text
prefork/postfork
Slide 38
Slide 38 text
friday rolls around
Slide 39
Slide 39 text
not slowing down
Slide 40
Slide 40 text
let’s move to EC2.
Slide 41
Slide 41 text
No content
Slide 42
Slide 42 text
No content
Slide 43
Slide 43 text
scaling = replacing all components of a car while driving it at 100mph
Slide 44
Slide 44 text
since...
Slide 45
Slide 45 text
“"canonical [architecture] of an early stage startup in this era." (HighScalability.com)
Slide 46
Slide 46 text
Nginx & Redis & Postgres & Django.
Slide 47
Slide 47 text
Nginx & HAProxy & Redis & Memcached & Postgres & Gearman & Django.
Slide 48
Slide 48 text
24h Ops
Slide 49
Slide 49 text
No content
Slide 50
Slide 50 text
No content
Slide 51
Slide 51 text
our philosophy
Slide 52
Slide 52 text
1 simplicity
Slide 53
Slide 53 text
2 optimize for minimal operational burden
Slide 54
Slide 54 text
3 instrument everything
Slide 55
Slide 55 text
walkthrough: 1 scaling the database 2 choosing technology 3 staying nimble 4 scaling for android
Slide 56
Slide 56 text
1 scaling the db
Slide 57
Slide 57 text
early days
Slide 58
Slide 58 text
django ORM, postgresql
Slide 59
Slide 59 text
why pg? postgis.
Slide 60
Slide 60 text
moved db to its own machine
Slide 61
Slide 61 text
but photos kept growing and growing...
Slide 62
Slide 62 text
...and only 68GB of RAM on biggest machine in EC2
Slide 63
Slide 63 text
so what now?
Slide 64
Slide 64 text
vertical partitioning
Slide 65
Slide 65 text
django db routers make it pretty easy
Slide 66
Slide 66 text
def db_for_read(self, model): if app_label == 'photos': return 'photodb'
Slide 67
Slide 67 text
...once you untangle all your foreign key relationships
Slide 68
Slide 68 text
a few months later...
Slide 69
Slide 69 text
photosdb > 60GB
Slide 70
Slide 70 text
what now?
Slide 71
Slide 71 text
horizontal partitioning!
Slide 72
Slide 72 text
aka: sharding
Slide 73
Slide 73 text
“surely we’ll have hired someone experienced before we actually need to shard”
Slide 74
Slide 74 text
you don’t get to choose when scaling challenges come up
Slide 75
Slide 75 text
evaluated solutions
Slide 76
Slide 76 text
at the time, none were up to task of being our primary DB
Slide 77
Slide 77 text
did in Postgres itself
Slide 78
Slide 78 text
what’s painful about sharding?
Slide 79
Slide 79 text
1 data retrieval
Slide 80
Slide 80 text
hard to know what your primary access patterns will be w/out any usage
Slide 81
Slide 81 text
in most cases, user ID
Slide 82
Slide 82 text
2 what happens if one of your shards gets too big?
Slide 83
Slide 83 text
in range-based schemes (like MongoDB), you split
Slide 84
Slide 84 text
A-H: shard0 I-Z: shard1
Slide 85
Slide 85 text
A-D: shard0 E-H: shard2 I-P: shard1 Q-Z: shard2
Slide 86
Slide 86 text
downsides (especially on EC2): disk IO
Slide 87
Slide 87 text
instead, we pre-split
Slide 88
Slide 88 text
many many many (thousands) of logical shards
Slide 89
Slide 89 text
that map to fewer physical ones
Slide 90
Slide 90 text
// 8 logical shards on 2 machines user_id % 8 = logical shard logical shards -> physical shard map { 0: A, 1: A, 2: A, 3: A, 4: B, 5: B, 6: B, 7: B }
Slide 91
Slide 91 text
// 8 logical shards on 2 4 machines user_id % 8 = logical shard logical shards -> physical shard map { 0: A, 1: A, 2: C, 3: C, 4: B, 5: B, 6: D, 7: D }
Slide 92
Slide 92 text
little known but awesome PG feature: schemas
Slide 93
Slide 93 text
not “columns” schema
Slide 94
Slide 94 text
- database: - schema: - table: - columns
Slide 95
Slide 95 text
machineA: shard0 photos_by_user shard1 photos_by_user shard2 photos_by_user shard3 photos_by_user
Slide 96
Slide 96 text
machineA: shard0 photos_by_user shard1 photos_by_user shard2 photos_by_user shard3 photos_by_user machineA’: shard0 photos_by_user shard1 photos_by_user shard2 photos_by_user shard3 photos_by_user
Slide 97
Slide 97 text
machineA: shard0 photos_by_user shard1 photos_by_user shard2 photos_by_user shard3 photos_by_user machineC: shard0 photos_by_user shard1 photos_by_user shard2 photos_by_user shard3 photos_by_user
Slide 98
Slide 98 text
can do this as long as you have more logical shards than physical ones
Slide 99
Slide 99 text
lesson: take tech/tools you know and try first to adapt them into a simple solution
Slide 100
Slide 100 text
2 which tools where?
Slide 101
Slide 101 text
where to cache / otherwise denormalize data
Slide 102
Slide 102 text
we <3 redis
Slide 103
Slide 103 text
what happens when a user posts a photo?
Slide 104
Slide 104 text
1 user uploads photo with (optional) caption and location
Slide 105
Slide 105 text
2 synchronous write to the media database for that user
Slide 106
Slide 106 text
3 queues!
Slide 107
Slide 107 text
3a if geotagged, async worker POSTs to Solr
Slide 108
Slide 108 text
3b follower delivery
Slide 109
Slide 109 text
can’t have every user who loads her timeline look up all their followers and then their photos
Slide 110
Slide 110 text
instead, everyone gets their own list in Redis
Slide 111
Slide 111 text
media ID is pushed onto a list for every person who’s following this user
Slide 112
Slide 112 text
Redis is awesome for this; rapid insert, rapid subsets
Slide 113
Slide 113 text
when time to render a feed, we take small # of IDs, go look up info in memcached
Slide 114
Slide 114 text
Redis is great for...
Slide 115
Slide 115 text
data structures that are relatively bounded
Slide 116
Slide 116 text
(don’t tie yourself to a solution where your in- memory DB is your main data store)
Slide 117
Slide 117 text
caching complex objects where you want to more than GET
Slide 118
Slide 118 text
ex: counting, sub- ranges, testing membership
Slide 119
Slide 119 text
especially when Taylor Swift posts live from the CMAs
Slide 120
Slide 120 text
follow graph
Slide 121
Slide 121 text
v1: simple DB table (source_id, target_id, status)
Slide 122
Slide 122 text
who do I follow? who follows me? do I follow X? does X follow me?
Slide 123
Slide 123 text
DB was busy, so we started storing parallel version in Redis
Slide 124
Slide 124 text
follow_all(300 item list)
Slide 125
Slide 125 text
inconsistency
Slide 126
Slide 126 text
extra logic
Slide 127
Slide 127 text
so much extra logic
Slide 128
Slide 128 text
exposing your support team to the idea of cache invalidation
Slide 129
Slide 129 text
No content
Slide 130
Slide 130 text
redesign took a page from twitter’s book
Slide 131
Slide 131 text
PG can handle tens of thousands of requests, very light memcached caching
Slide 132
Slide 132 text
two takeaways
Slide 133
Slide 133 text
1 have a versatile complement to your core data storage (like Redis)
Slide 134
Slide 134 text
2 try not to have two tools trying to do the same job
Slide 135
Slide 135 text
3 staying nimble
Slide 136
Slide 136 text
2010: 2 engineers
Slide 137
Slide 137 text
2011: 3 engineers
Slide 138
Slide 138 text
2012: 5 engineers
Slide 139
Slide 139 text
scarcity -> focus
Slide 140
Slide 140 text
engineer solutions that you’re not constantly returning to because they broke
Slide 141
Slide 141 text
1 extensive unit-tests and functional tests
Slide 142
Slide 142 text
2 keep it DRY
Slide 143
Slide 143 text
3 loose coupling using notifications / signals
Slide 144
Slide 144 text
4 do most of our work in Python, drop to C when necessary
Slide 145
Slide 145 text
5 frequent code reviews, pull requests to keep things in the ‘shared brain’
Slide 146
Slide 146 text
6 extensive monitoring
Slide 147
Slide 147 text
munin
Slide 148
Slide 148 text
statsd
Slide 149
Slide 149 text
No content
Slide 150
Slide 150 text
“how is the system right now?”
Slide 151
Slide 151 text
“how does this compare to historical trends?”
Slide 152
Slide 152 text
scaling for android
Slide 153
Slide 153 text
1 million new users in 12 hours
Slide 154
Slide 154 text
great tools that enable easy read scalability
Slide 155
Slide 155 text
redis: slaveof
Slide 156
Slide 156 text
our Redis framework assumes 0+ readslaves
Slide 157
Slide 157 text
tight iteration loops
Slide 158
Slide 158 text
statsd & pgfouine
Slide 159
Slide 159 text
know where you can shed load if needed
Slide 160
Slide 160 text
(e.g. shorter feeds)
Slide 161
Slide 161 text
if you’re tempted to reinvent the wheel...
Slide 162
Slide 162 text
don’t.
Slide 163
Slide 163 text
“our app servers sometimes kernel panic under load”
Slide 164
Slide 164 text
...
Slide 165
Slide 165 text
“what if we write a monitoring daemon...”
Slide 166
Slide 166 text
wait! this is exactly what HAProxy is great at
Slide 167
Slide 167 text
surround yourself with awesome advisors
Slide 168
Slide 168 text
culture of openness around engineering
Slide 169
Slide 169 text
give back; e.g. node2dm
Slide 170
Slide 170 text
focus on making what you have better
Slide 171
Slide 171 text
“fast, beautiful photo sharing”
Slide 172
Slide 172 text
“can we make all of our requests 50% the time?”
Slide 173
Slide 173 text
staying nimble = remind yourself of what’s important
Slide 174
Slide 174 text
your users around the world don’t care that you wrote your own DB
Slide 175
Slide 175 text
wrapping up
Slide 176
Slide 176 text
unprecedented times
Slide 177
Slide 177 text
2 backend engineers can scale a system to 30+ million users
Slide 178
Slide 178 text
key word = simplicity
Slide 179
Slide 179 text
cleanest solution with the fewest moving parts as possible
Slide 180
Slide 180 text
don’t over-optimize or expect to know ahead of time how site will scale
Slide 181
Slide 181 text
don’t think “someone else will join & take care of this”
Slide 182
Slide 182 text
will happen sooner than you think; surround yourself with great advisors
Slide 183
Slide 183 text
when adding software to stack: only if you have to, optimizing for operational simplicity
Slide 184
Slide 184 text
few, if any, unsolvable scaling challenges for a social startup
Slide 185
Slide 185 text
have fun