Scaling Instagram - Speaker Deck

Slide 1

Slide 1 text

Scaling Instagram AirBnB Tech Talk 2012 Mike Krieger Instagram

Slide 2

Slide 2 text

me - Co-founder, Instagram - Previously: UX & Front-end @ Meebo - Stanford HCI BS/MS - @mikeyk on everything

Slide 3

Slide 3 text

No content

Slide 4

Slide 4 text

No content

Slide 5

Slide 5 text

No content

Slide 6

Slide 6 text

communicating and sharing in the real world

Slide 7

Slide 7 text

30+ million users in less than 2 years

Slide 8

Slide 8 text

the story of how we scaled it

Slide 9

Slide 9 text

a brief tangent

Slide 10

Slide 10 text

the beginning

Slide 11

Slide 11 text

Text

Slide 12

Slide 12 text

2 product guys

Slide 13

Slide 13 text

no real back-end experience

Slide 14

Slide 14 text

analytics & python @ meebo

Slide 15

Slide 15 text

CouchDB

Slide 16

Slide 16 text

CrimeDesk SF

Slide 17

Slide 17 text

No content

Slide 18

Slide 18 text

let’s get hacking

Slide 19

Slide 19 text

good components in place early on

Slide 20

Slide 20 text

...but were hosted on a single machine somewhere in LA

Slide 21

Slide 21 text

No content

Slide 22

Slide 22 text

less powerful than my MacBook Pro

Slide 23

Slide 23 text

okay, we launched. now what?

Slide 24

Slide 24 text

25k signups in the ﬁrst day

Slide 25

Slide 25 text

everything is on ﬁre!

Slide 26

Slide 26 text

best & worst day of our lives so far

Slide 27

Slide 27 text

load was through the roof

Slide 28

Slide 28 text

ﬁrst culprit?

Slide 29

Slide 29 text

No content

Slide 30

Slide 30 text

favicon.ico

Slide 31

Slide 31 text

404-ing on Django, causing tons of errors

Slide 32

Slide 32 text

lesson #1: don’t forget your favicon

Slide 33

Slide 33 text

real lesson #1: most of your initial scaling problems won’t be glamorous

Slide 34

Slide 34 text

favicon

Slide 35

Slide 35 text

ulimit -n

Slide 36

Slide 36 text

memcached -t 4

Slide 37

Slide 37 text

prefork/postfork

Slide 38

Slide 38 text

friday rolls around

Slide 39

Slide 39 text

not slowing down

Slide 40

Slide 40 text

let’s move to EC2.

Slide 41

Slide 41 text

No content

Slide 42

Slide 42 text

No content

Slide 43

Slide 43 text

scaling = replacing all components of a car while driving it at 100mph

Slide 44

Slide 44 text

since...

Slide 45

Slide 45 text

“"canonical [architecture] of an early stage startup in this era." (HighScalability.com)

Slide 46

Slide 46 text

Nginx & Redis & Postgres & Django.

Slide 47

Slide 47 text

Nginx & HAProxy & Redis & Memcached & Postgres & Gearman & Django.

Slide 48

Slide 48 text

24h Ops

Slide 49

Slide 49 text

No content

Slide 50

Slide 50 text

No content

Slide 51

Slide 51 text

our philosophy

Slide 52

Slide 52 text

1 simplicity

Slide 53

Slide 53 text

2 optimize for minimal operational burden

Slide 54

Slide 54 text

3 instrument everything

Slide 55

Slide 55 text

walkthrough: 1 scaling the database 2 choosing technology 3 staying nimble 4 scaling for android

Slide 56

Slide 56 text

1 scaling the db

Slide 57

Slide 57 text

early days

Slide 58

Slide 58 text

django ORM, postgresql

Slide 59

Slide 59 text

why pg? postgis.

Slide 60

Slide 60 text

moved db to its own machine

Slide 61

Slide 61 text

but photos kept growing and growing...

Slide 62

Slide 62 text

...and only 68GB of RAM on biggest machine in EC2

Slide 63

Slide 63 text

so what now?

Slide 64

Slide 64 text

vertical partitioning

Slide 65

Slide 65 text

django db routers make it pretty easy

Slide 66

Slide 66 text

def db_for_read(self, model): if app_label == 'photos': return 'photodb'

Slide 67

Slide 67 text

...once you untangle all your foreign key relationships

Slide 68

Slide 68 text

a few months later...

Slide 69

Slide 69 text

photosdb > 60GB

Slide 70

Slide 70 text

what now?

Slide 71

Slide 71 text

horizontal partitioning!

Slide 72

Slide 72 text

aka: sharding

Slide 73

Slide 73 text

“surely we’ll have hired someone experienced before we actually need to shard”

Slide 74

Slide 74 text

you don’t get to choose when scaling challenges come up

Slide 75

Slide 75 text

evaluated solutions

Slide 76

Slide 76 text

at the time, none were up to task of being our primary DB

Slide 77

Slide 77 text

did in Postgres itself

Slide 78

Slide 78 text

what’s painful about sharding?

Slide 79

Slide 79 text

1 data retrieval

Slide 80

Slide 80 text

hard to know what your primary access patterns will be w/out any usage

Slide 81

Slide 81 text

in most cases, user ID

Slide 82

Slide 82 text

2 what happens if one of your shards gets too big?

Slide 83

Slide 83 text

in range-based schemes (like MongoDB), you split

Slide 84

Slide 84 text

A-H: shard0 I-Z: shard1

Slide 85

Slide 85 text

A-D: shard0 E-H: shard2 I-P: shard1 Q-Z: shard2

Slide 86

Slide 86 text

downsides (especially on EC2): disk IO

Slide 87

Slide 87 text

instead, we pre-split

Slide 88

Slide 88 text

many many many (thousands) of logical shards

Slide 89

Slide 89 text

that map to fewer physical ones

Slide 90

Slide 90 text

// 8 logical shards on 2 machines user_id % 8 = logical shard logical shards -> physical shard map { 0: A, 1: A, 2: A, 3: A, 4: B, 5: B, 6: B, 7: B }

Slide 91

Slide 91 text

// 8 logical shards on 2 4 machines user_id % 8 = logical shard logical shards -> physical shard map { 0: A, 1: A, 2: C, 3: C, 4: B, 5: B, 6: D, 7: D }

Slide 92

Slide 92 text

little known but awesome PG feature: schemas

Slide 93

Slide 93 text

not “columns” schema

Slide 94

Slide 94 text

- database: - schema: - table: - columns

Slide 95

Slide 95 text

machineA: shard0 photos_by_user shard1 photos_by_user shard2 photos_by_user shard3 photos_by_user

Slide 96

Slide 96 text

machineA: shard0 photos_by_user shard1 photos_by_user shard2 photos_by_user shard3 photos_by_user machineA’: shard0 photos_by_user shard1 photos_by_user shard2 photos_by_user shard3 photos_by_user

Slide 97

Slide 97 text

machineA: shard0 photos_by_user shard1 photos_by_user shard2 photos_by_user shard3 photos_by_user machineC: shard0 photos_by_user shard1 photos_by_user shard2 photos_by_user shard3 photos_by_user

Slide 98

Slide 98 text

can do this as long as you have more logical shards than physical ones

Slide 99

Slide 99 text

lesson: take tech/tools you know and try ﬁrst to adapt them into a simple solution

Slide 100

Slide 100 text

2 which tools where?

Slide 101

Slide 101 text

where to cache / otherwise denormalize data

Slide 102

Slide 102 text

we <3 redis

Slide 103

Slide 103 text

what happens when a user posts a photo?

Slide 104

Slide 104 text

1 user uploads photo with (optional) caption and location

Slide 105

Slide 105 text

2 synchronous write to the media database for that user

Slide 106

Slide 106 text

3 queues!

Slide 107

Slide 107 text

3a if geotagged, async worker POSTs to Solr

Slide 108

Slide 108 text

3b follower delivery

Slide 109

Slide 109 text

can’t have every user who loads her timeline look up all their followers and then their photos

Slide 110

Slide 110 text

instead, everyone gets their own list in Redis

Slide 111

Slide 111 text

media ID is pushed onto a list for every person who’s following this user

Slide 112

Slide 112 text

Redis is awesome for this; rapid insert, rapid subsets

Slide 113

Slide 113 text

when time to render a feed, we take small # of IDs, go look up info in memcached

Slide 114

Slide 114 text

Redis is great for...

Slide 115

Slide 115 text

data structures that are relatively bounded

Slide 116

Slide 116 text

(don’t tie yourself to a solution where your in- memory DB is your main data store)

Slide 117

Slide 117 text

caching complex objects where you want to more than GET

Slide 118

Slide 118 text

ex: counting, sub- ranges, testing membership

Slide 119

Slide 119 text

especially when Taylor Swift posts live from the CMAs

Slide 120

Slide 120 text

follow graph

Slide 121

Slide 121 text

v1: simple DB table (source_id, target_id, status)

Slide 122

Slide 122 text

who do I follow? who follows me? do I follow X? does X follow me?

Slide 123

Slide 123 text

DB was busy, so we started storing parallel version in Redis

Slide 124

Slide 124 text

follow_all(300 item list)

Slide 125

Slide 125 text

inconsistency

Slide 126

Slide 126 text

extra logic

Slide 127

Slide 127 text

so much extra logic

Slide 128

Slide 128 text

exposing your support team to the idea of cache invalidation

Slide 129

Slide 129 text

No content

Slide 130

Slide 130 text

redesign took a page from twitter’s book

Slide 131

Slide 131 text

PG can handle tens of thousands of requests, very light memcached caching

Slide 132

Slide 132 text

two takeaways

Slide 133

Slide 133 text

1 have a versatile complement to your core data storage (like Redis)

Slide 134

Slide 134 text

2 try not to have two tools trying to do the same job

Slide 135

Slide 135 text

3 staying nimble

Slide 136

Slide 136 text

2010: 2 engineers

Slide 137

Slide 137 text

2011: 3 engineers

Slide 138

Slide 138 text

2012: 5 engineers

Slide 139

Slide 139 text

scarcity -> focus

Slide 140

Slide 140 text

engineer solutions that you’re not constantly returning to because they broke

Slide 141

Slide 141 text

1 extensive unit-tests and functional tests

Slide 142

Slide 142 text

2 keep it DRY

Slide 143

Slide 143 text

3 loose coupling using notiﬁcations / signals

Slide 144

Slide 144 text

4 do most of our work in Python, drop to C when necessary

Slide 145

Slide 145 text

5 frequent code reviews, pull requests to keep things in the ‘shared brain’

Slide 146

Slide 146 text

6 extensive monitoring

Slide 147

Slide 147 text

munin

Slide 148

Slide 148 text

statsd

Slide 149

Slide 149 text

No content

Slide 150

Slide 150 text

“how is the system right now?”

Slide 151

Slide 151 text

“how does this compare to historical trends?”

Slide 152

Slide 152 text

scaling for android

Slide 153

Slide 153 text

1 million new users in 12 hours

Slide 154

Slide 154 text

great tools that enable easy read scalability

Slide 155

Slide 155 text

redis: slaveof

Slide 156

Slide 156 text

our Redis framework assumes 0+ readslaves

Slide 157

Slide 157 text

tight iteration loops

Slide 158

Slide 158 text

statsd & pgfouine

Slide 159

Slide 159 text

know where you can shed load if needed

Slide 160

Slide 160 text

(e.g. shorter feeds)

Slide 161

Slide 161 text

if you’re tempted to reinvent the wheel...

Slide 162

Slide 162 text

don’t.

Slide 163

Slide 163 text

“our app servers sometimes kernel panic under load”

Slide 164

Slide 164 text

...

Slide 165

Slide 165 text

“what if we write a monitoring daemon...”

Slide 166

Slide 166 text

wait! this is exactly what HAProxy is great at

Slide 167

Slide 167 text

surround yourself with awesome advisors

Slide 168

Slide 168 text

culture of openness around engineering

Slide 169

Slide 169 text

give back; e.g. node2dm

Slide 170

Slide 170 text

focus on making what you have better

Slide 171

Slide 171 text

“fast, beautiful photo sharing”

Slide 172

Slide 172 text

“can we make all of our requests 50% the time?”

Slide 173

Slide 173 text

staying nimble = remind yourself of what’s important

Slide 174

Slide 174 text

your users around the world don’t care that you wrote your own DB

Slide 175

Slide 175 text

wrapping up

Slide 176

Slide 176 text

unprecedented times

Slide 177

Slide 177 text

2 backend engineers can scale a system to 30+ million users

Slide 178

Slide 178 text

key word = simplicity

Slide 179

Slide 179 text

cleanest solution with the fewest moving parts as possible

Slide 180

Slide 180 text

don’t over-optimize or expect to know ahead of time how site will scale

Slide 181

Slide 181 text

don’t think “someone else will join & take care of this”

Slide 182

Slide 182 text

will happen sooner than you think; surround yourself with great advisors

Slide 183

Slide 183 text

when adding software to stack: only if you have to, optimizing for operational simplicity