Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Scaling a web stack

Scaling a web stack

A talk on scaling a typical web stack, based on lessons learned at 99designs, given to an RMIT Systems Architecture class.

Lars Yencken

April 29, 2013
Tweet

More Decks by Lars Yencken

Other Decks in Programming

Transcript

  1. Scaling a web stack
    Lars Yencken / Data Scientist / 99designs
    29 April, 2013

    View Slide

  2. 99designs
    Growing
    infrastructure
    Stability
    and
    robustness
    Performance
    Recap

    View Slide

  3. 99designs
    a.k.a. why you should listen

    View Slide

  4. View Slide

  5. View Slide

  6. View Slide

  7. View Slide

  8. View Slide

  9. View Slide

  10. View Slide

  11. View Slide

  12. View Slide

  13. 0
    225000
    450000
    675000
    900000
    Jan-07 Jan-08 Jan-09 Jan-10 Jan-11 Jan-12 Jan-13
    Designs submitted

    View Slide

  14. View Slide

  15. View Slide

  16. $52,000,000 paid out to designers

    View Slide

  17. $52,000,000 paid out to designers
    30,000,000 visitors

    View Slide

  18. $52,000,000 paid out to designers
    30,000,000 visitors
    1,100,000,000 pageviews

    View Slide

  19. $52,000,000 paid out to designers
    30,000,000 visitors
    1,100,000,000 pageviews
    35,000,000,000 HTTP requests

    View Slide

  20. Growing
    infrastructure

    View Slide

  21. App

    View Slide

  22. App
    DB

    View Slide

  23. DB
    App
    App DNS round robin

    View Slide

  24. DB
    Cache
    App
    App
    reverse proxy
    layer

    View Slide

  25. DB
    Cache
    Queue
    Worker
    App
    App
    async task
    queue

    View Slide

  26. DB
    Cache
    Queue
    Worker
    App
    App
    optimized for
    latency
    optimized for
    throughput

    View Slide

  27. Cache
    Queue
    Memcache
    Worker
    App
    App
    DB

    View Slide

  28. Cache
    App
    App
    Cache
    App
    App
    App
    App
    Memcache Queue
    Worker
    remove single
    points of failure
    DB
    DB*

    View Slide

  29. Cache
    App
    App
    Cache
    App
    App
    App
    App
    Memcache Queue
    Worker
    Balancer
    DB
    add flexibility to
    the cache layer
    DB*

    View Slide

  30. Software as
    infrastructure

    View Slide

  31. “make recipes,
    not servers”

    View Slide

  32. • "Cloud" hosting on Amazon Web Services
    • Instead of few, highly-tuned servers, have
    many disposable servers
    • Tradeoff that favours flexibility

    View Slide

  33. Challenges

    View Slide

  34. Stability and
    robustness

    View Slide

  35. Costs of instability
    • Lost customer business (direct & indirect)
    • Support burden and costs
    • Ops burden and costs

    View Slide

  36. Redundant servers
    Cache
    App
    App
    Cache
    App
    App
    App
    App
    Balancer

    View Slide

  37. Asynchronous tasks
    DB
    App
    App
    App
    App
    App
    App
    DB
    Queue
    Worker
    3rd party
    services

    View Slide

  38. Database replication
    App
    App
    App
    App
    App
    App
    DB
    Hot spare
    DB reader
    DB reader

    View Slide

  39. Still difficult...
    • Testing failure tolerance between
    components: not trivial!
    • Avoiding correlated failures

    View Slide

  40. Correlated failures

    View Slide

  41. Performance

    View Slide

  42. • Less traffic (experiments by Yahoo,
    Microsoft, Google; ranking)
    • Worse user experience
    • Higher hardware costs
    Costs of slow sites

    View Slide

  43. Cacheing
    DB
    Cache
    App
    App
    Cache
    App
    App
    App
    App
    DB
    Memcache

    View Slide

  44. Cacheing
    DB
    Cache
    App
    App
    Cache
    App
    App
    App
    App
    DB
    Memcache
    whole pages
    & images
    whole files
    on disk
    db queries &
    page fragments

    View Slide

  45. Response time from cache (s)

    View Slide

  46. Response time from cache (s)
    cache
    miss
    cache
    hit

    View Slide

  47. 3 orders of magnitude!
    Response time from cache (s)
    cache
    miss
    cache
    hit

    View Slide

  48. Serving globally
    Cache
    Cache
    Balancer
    Content
    distribution
    network
    mysite.com media.mysite.com

    View Slide

  49. Bundling static media
    99designs.com/static/css/core.css
    99designs.com/static/css/contest.css
    99designs.com/static/css/marketing.css
    99designs.com/bundle/css/core,contest,marketing.css

    View Slide

  50. Difficulties
    • Norms for browsers and internet connections
    constantly change
    • Some strategies conflict with each-other
    • Measure, measure, measure!

    View Slide

  51. Recap

    View Slide

  52. Thanks!
    @larsyencken

    View Slide