Scaling a web stack

Scaling a web stack

A talk on scaling a typical web stack, based on lessons learned at 99designs, given to an RMIT Systems Architecture class.

Ef621203799c7dcc735dc469d1aaee6f?s=128

Lars Yencken

April 29, 2013
Tweet

Transcript

  1. Scaling a web stack Lars Yencken / Data Scientist /

    99designs 29 April, 2013
  2. 99designs Growing infrastructure Stability and robustness Performance Recap

  3. 99designs a.k.a. why you should listen

  4. None
  5. None
  6. None
  7. None
  8. None
  9. None
  10. None
  11. None
  12. None
  13. 0 225000 450000 675000 900000 Jan-07 Jan-08 Jan-09 Jan-10 Jan-11

    Jan-12 Jan-13 Designs submitted
  14. None
  15. None
  16. $52,000,000 paid out to designers

  17. $52,000,000 paid out to designers 30,000,000 visitors

  18. $52,000,000 paid out to designers 30,000,000 visitors 1,100,000,000 pageviews

  19. $52,000,000 paid out to designers 30,000,000 visitors 1,100,000,000 pageviews 35,000,000,000

    HTTP requests
  20. Growing infrastructure

  21. App

  22. App DB

  23. DB App App DNS round robin

  24. DB Cache App App reverse proxy layer

  25. DB Cache Queue Worker App App async task queue

  26. DB Cache Queue Worker App App optimized for latency optimized

    for throughput
  27. Cache Queue Memcache Worker App App DB

  28. Cache App App Cache App App App App Memcache Queue

    Worker remove single points of failure DB DB*
  29. Cache App App Cache App App App App Memcache Queue

    Worker Balancer DB add flexibility to the cache layer DB*
  30. Software as infrastructure

  31. “make recipes, not servers”

  32. • "Cloud" hosting on Amazon Web Services • Instead of

    few, highly-tuned servers, have many disposable servers • Tradeoff that favours flexibility
  33. Challenges

  34. Stability and robustness

  35. Costs of instability • Lost customer business (direct & indirect)

    • Support burden and costs • Ops burden and costs
  36. Redundant servers Cache App App Cache App App App App

    Balancer
  37. Asynchronous tasks DB App App App App App App DB

    Queue Worker 3rd party services
  38. Database replication App App App App App App DB Hot

    spare DB reader DB reader
  39. Still difficult... • Testing failure tolerance between components: not trivial!

    • Avoiding correlated failures
  40. Correlated failures

  41. Performance

  42. • Less traffic (experiments by Yahoo, Microsoft, Google; ranking) •

    Worse user experience • Higher hardware costs Costs of slow sites
  43. Cacheing DB Cache App App Cache App App App App

    DB Memcache
  44. Cacheing DB Cache App App Cache App App App App

    DB Memcache whole pages & images whole files on disk db queries & page fragments
  45. Response time from cache (s)

  46. Response time from cache (s) cache miss cache hit

  47. 3 orders of magnitude! Response time from cache (s) cache

    miss cache hit
  48. Serving globally Cache Cache Balancer Content distribution network mysite.com media.mysite.com

  49. Bundling static media 99designs.com/static/css/core.css 99designs.com/static/css/contest.css 99designs.com/static/css/marketing.css 99designs.com/bundle/css/core,contest,marketing.css

  50. Difficulties • Norms for browsers and internet connections constantly change

    • Some strategies conflict with each-other • Measure, measure, measure!
  51. Recap

  52. Thanks! @larsyencken