Slides from an AirBnb tech talk I gave; doesn't make the most sense out of context but will hopefully be helpful for any folks that saw the talk
Scaling InstagramAirBnB Tech Talk 2012Mike KriegerInstagram
View Slide
me- Co-founder, Instagram- Previously: UX & Front-end@ Meebo- Stanford HCI BS/MS- @mikeyk on everything
communicating andsharing in the real world
30+ million users in lessthan 2 years
the story of how wescaled it
a brief tangent
the beginning
Text
2 product guys
no real back-endexperience
analytics & python @meebo
CouchDB
CrimeDesk SF
let’s get hacking
good components inplace early on
...but were hosted on asingle machinesomewhere in LA
less powerful than myMacBook Pro
okay, we launched.now what?
25k signups in the firstday
everything is on fire!
best & worst day of ourlives so far
load was through theroof
first culprit?
favicon.ico
404-ing on Django,causing tons of errors
lesson #1: don’t forgetyour favicon
real lesson #1: most ofyour initial scalingproblems won’t beglamorous
favicon
ulimit -n
memcached -t 4
prefork/postfork
friday rolls around
not slowing down
let’s move to EC2.
scaling = replacing allcomponents of a carwhile driving it at100mph
since...
“"canonical [architecture]of an early stage startupin this era."(HighScalability.com)
Nginx &Redis &Postgres &Django.
Nginx & HAProxy &Redis & Memcached &Postgres & Gearman &Django.
24h Ops
our philosophy
1 simplicity
2 optimize forminimal operationalburden
3 instrumenteverything
walkthrough:1 scaling the database2 choosing technology3 staying nimble4 scaling for android
1 scaling the db
early days
django ORM, postgresql
why pg? postgis.
moved db to its ownmachine
but photos kept growingand growing...
...and only 68GB ofRAM on biggestmachine in EC2
so what now?
vertical partitioning
django db routers makeit pretty easy
def db_for_read(self, model):if app_label == 'photos':return 'photodb'
...once you untangle allyour foreign keyrelationships
a few months later...
photosdb > 60GB
what now?
horizontal partitioning!
aka: sharding
“surely we’ll have hiredsomeone experiencedbefore we actually needto shard”
you don’t get to choosewhen scaling challengescome up
evaluated solutions
at the time, none wereup to task of being ourprimary DB
did in Postgres itself
what’s painful aboutsharding?
1 data retrieval
hard to know what yourprimary access patternswill be w/out any usage
in most cases, user ID
2 what happens ifone of your shardsgets too big?
in range-based schemes(like MongoDB), you split
A-H: shard0I-Z: shard1
A-D: shard0E-H: shard2I-P: shard1Q-Z: shard2
downsides (especially onEC2): disk IO
instead, we pre-split
many many many(thousands) of logicalshards
that map to fewerphysical ones
// 8 logical shards on 2 machinesuser_id % 8 = logical shardlogical shards -> physical shard map{0: A, 1: A,2: A, 3: A,4: B, 5: B,6: B, 7: B}
// 8 logical shards on 2 4 machinesuser_id % 8 = logical shardlogical shards -> physical shard map{0: A, 1: A,2: C, 3: C,4: B, 5: B,6: D, 7: D}
little known but awesomePG feature: schemas
not “columns” schema
- database:- schema:- table:- columns
machineA:shard0photos_by_usershard1photos_by_usershard2photos_by_usershard3photos_by_user
machineA:shard0photos_by_usershard1photos_by_usershard2photos_by_usershard3photos_by_usermachineA’:shard0photos_by_usershard1photos_by_usershard2photos_by_usershard3photos_by_user
machineA:shard0photos_by_usershard1photos_by_usershard2photos_by_usershard3photos_by_usermachineC:shard0photos_by_usershard1photos_by_usershard2photos_by_usershard3photos_by_user
can do this as long asyou have more logicalshards than physicalones
lesson: take tech/toolsyou know and try first toadapt them into a simplesolution
2 which tools where?
where to cache /otherwise denormalizedata
we <3 redis
what happens when auser posts a photo?
1 user uploads photowith (optional) captionand location
2 synchronous write tothe media database forthat user
3 queues!
3a if geotagged, asyncworker POSTs to Solr
3b follower delivery
can’t have every userwho loads her timelinelook up all their followersand then their photos
instead, everyone getstheir own list in Redis
media ID is pushed ontoa list for every personwho’s following this user
Redis is awesome forthis; rapid insert, rapidsubsets
when time to render afeed, we take small # ofIDs, go look up info inmemcached
Redis is great for...
data structures that arerelatively bounded
(don’t tie yourself to asolution where your in-memory DB is your maindata store)
caching complex objectswhere you want to morethan GET
ex: counting, sub-ranges, testingmembership
especially when TaylorSwift posts live from theCMAs
follow graph
v1: simple DB table(source_id, target_id,status)
who do I follow?who follows me?do I follow X?does X follow me?
DB was busy, so westarted storing parallelversion in Redis
follow_all(300 item list)
inconsistency
extra logic
so much extra logic
exposing your supportteam to the idea ofcache invalidation
redesign took a pagefrom twitter’s book
PG can handle tens ofthousands of requests,very light memcachedcaching
two takeaways
1 have a versatilecomplement to your coredata storage (like Redis)
2 try not to have twotools trying to do thesame job
3 staying nimble
2010: 2 engineers
2011: 3 engineers
2012: 5 engineers
scarcity -> focus
engineer solutions thatyou’re not constantlyreturning to becausethey broke
1 extensive unit-testsand functional tests
2 keep it DRY
3 loose coupling usingnotifications / signals
4 do most of our work inPython, drop to C whennecessary
5 frequent code reviews,pull requests to keepthings in the ‘sharedbrain’
6 extensive monitoring
munin
statsd
“how is the system rightnow?”
“how does this compareto historical trends?”
scaling for android
1 million new users in 12hours
great tools that enableeasy read scalability
redis: slaveof
our Redis frameworkassumes 0+ readslaves
tight iteration loops
statsd & pgfouine
know where you canshed load if needed
(e.g. shorter feeds)
if you’re tempted toreinvent the wheel...
don’t.
“our app serverssometimes kernel panicunder load”
...
“what if we write amonitoring daemon...”
wait! this is exactly whatHAProxy is great at
surround yourself withawesome advisors
culture of opennessaround engineering
give back; e.g.node2dm
focus on making whatyou have better
“fast, beautiful photosharing”
“can we make all of ourrequests 50% the time?”
staying nimble = remindyourself of what’simportant
your users around theworld don’t care that youwrote your own DB
wrapping up
unprecedented times
2 backend engineerscan scale a system to30+ million users
key word = simplicity
cleanest solution with thefewest moving parts aspossible
don’t over-optimize orexpect to know ahead oftime how site will scale
don’t think “someoneelse will join & take careof this”
will happen sooner thanyou think; surroundyourself with greatadvisors
when adding software tostack: only if you have to,optimizing for operationalsimplicity
few, if any, unsolvablescaling challenges for asocial startup
have fun