A talk on scaling a typical web stack, based on lessons learned at 99designs, given to an RMIT Systems Architecture class.
Scaling a web stackLars Yencken / Data Scientist / 99designs29 April, 2013
View Slide
99designsGrowinginfrastructureStabilityandrobustnessPerformanceRecap
99designsa.k.a. why you should listen
0225000450000675000900000Jan-07 Jan-08 Jan-09 Jan-10 Jan-11 Jan-12 Jan-13Designs submitted
$52,000,000 paid out to designers
$52,000,000 paid out to designers30,000,000 visitors
$52,000,000 paid out to designers30,000,000 visitors1,100,000,000 pageviews
$52,000,000 paid out to designers30,000,000 visitors1,100,000,000 pageviews35,000,000,000 HTTP requests
Growinginfrastructure
App
AppDB
DBAppApp DNS round robin
DBCacheAppAppreverse proxylayer
DBCacheQueueWorkerAppAppasync taskqueue
DBCacheQueueWorkerAppAppoptimized forlatencyoptimized forthroughput
CacheQueueMemcacheWorkerAppAppDB
CacheAppAppCacheAppAppAppAppMemcache QueueWorkerremove singlepoints of failureDBDB*
CacheAppAppCacheAppAppAppAppMemcache QueueWorkerBalancerDBadd flexibility tothe cache layerDB*
Software asinfrastructure
“make recipes,not servers”
• "Cloud" hosting on Amazon Web Services• Instead of few, highly-tuned servers, havemany disposable servers• Tradeoff that favours flexibility
Challenges
Stability androbustness
Costs of instability• Lost customer business (direct & indirect)• Support burden and costs• Ops burden and costs
Redundant serversCacheAppAppCacheAppAppAppAppBalancer
Asynchronous tasksDBAppAppAppAppAppAppDBQueueWorker3rd partyservices
Database replicationAppAppAppAppAppAppDBHot spareDB readerDB reader
Still difficult...• Testing failure tolerance betweencomponents: not trivial!• Avoiding correlated failures
Correlated failures
Performance
• Less traffic (experiments by Yahoo,Microsoft, Google; ranking)• Worse user experience• Higher hardware costsCosts of slow sites
CacheingDBCacheAppAppCacheAppAppAppAppDBMemcache
CacheingDBCacheAppAppCacheAppAppAppAppDBMemcachewhole pages& imageswhole fileson diskdb queries &page fragments
Response time from cache (s)
Response time from cache (s)cachemisscachehit
3 orders of magnitude!Response time from cache (s)cachemisscachehit
Serving globallyCacheCacheBalancerContentdistributionnetworkmysite.com media.mysite.com
Bundling static media99designs.com/static/css/core.css99designs.com/static/css/contest.css99designs.com/static/css/marketing.css99designs.com/bundle/css/core,contest,marketing.css
Difficulties• Norms for browsers and internet connectionsconstantly change• Some strategies conflict with each-other• Measure, measure, measure!
Recap
Thanks!@larsyencken