Save 37% off PRO during our Black Friday Sale! »

Do you scale?

Do you scale?

When you hit your first scaling problem with a web application, it’s likely to be one of the few common issues examined and soled in this presentation.

Fd1af6cc88403788ae1e5710871bbf62?s=128

Andrew Betts

April 20, 2012
Tweet

Transcript

  1. Design patterns for mega traffic on a budget Scaling web

    apps
  2. Loading, please wait Estimated time remaining: ages.

  3. Crikey. We hadn't quite counted on welcoming quite so many

    of you to OnOneMap in one go, and to be quite honest we've completely run out of oomph. Please do come back tomorrow, and we promise to make you a lovely cup of tea to make up for not being quite on top form today.
  4. Two problems • Performance You need to make your app

    more efficient • Scaling You need to increase capacity
  5. performance != scaling

  6. but you need both

  7. Vertical scaling Weedy server Powerful server Deep thought Easy, expensive,

    limited
  8. Horizontal scaling Cheap, limitless. HARD.

  9. It's obvious, really…

  10. 1 2 Sessions Recognise your reader Caching Love being lazy

    3 Writing Learn to feel the pain
  11. Fear sessions, and you will scale well. The master Jedi

    plans ahead.
  12. None
  13. Scenario • Lots of non-personalised content (newspaper, blog, web store)

    • Some minor session-based data (eg. 'Welcome Andrew')
  14. Not cool. <?php session_start(); ?>

  15. Expires: Thu, 19 Nov 1981 08:52:00 GMT Cache-Control: post-check=0, must-revalidate,

    no- store, no-cache, pre-check=0 Last-Modified: not present ETag: not present Set-Cookie: path=/; phpsessid=b7977f7c69eb898bf42526652dda4c6c BAD BAD BAD BAD Sascha Schumann's Birthday
  16. b7977f7c69e b898bf42526 652dda4c6c name: Andrew email: andrew.betts@assanka.net logindate: 2009-02-28 20:42:12

    userid: 453245 Client Server 1
  17. b7977f7c69e b898bf42526 652dda4c6c unknown session Client Server 2

  18. Defeat, summarised. • Can't cache it • Need sessions everywhere

    • Sessions are lost if you switch server • Session-enabled requests are processed sequentially, due to file locking • Nightmare.
  19. 1 Use JavaScript to inject session state

  20. Solution • Generate only generic content • Leave gaps (login

    status, shopping basket etc) • Load session data from somewhere else • Merge in browser using magic (or JavaScript)
  21. <head> <script type='text/javascript' src='/js/session.js'></script> <script type='text/javascript' src='http://sessions.example.com/sessiondata'></script> loadSession({ un:'Andrew', em:'andrew.betts@assanka.net',

    cartid:824457, cart:[ {id:12, desc:'Coal', qty:1, units:'lumps'}, {id:28, desc:'Rudolf jumper', qty:1, units:'jumpers'}, {id:82, desc:'Socks', qty:8, units:'pairs') ] });
  22. Result. • Most scripts don't need to track sessions •

    You can cache stuff (even use a CDN) • Cache it for ages. Reduce load on your kit. • Sessions become a separate issue - build a scalable session store on a separate vhost / machine / cluster
  23. Hmm.

  24. OK, so…. • Your pages are mostly dynamic content (webmail,

    identity manager etc) • Almost entire page content is session-specific
  25. 2 Use cookies for client-side session storage

  26. Solution • Don't use server sessions at all • Store

    all session state data in a cookie • Sign it with a hash (sha1) • Timestamp allows you to expire it • You can get a lot in there • Remember it's not encrypted on the wire, and adds to your bandwidth 27478932510|triblondon|1231936510|2,4,6,52,183|a152c24d9874ba15235f userid | username | sessionstart | groupmemberships | signature
  27. Other scalable session solutions • memcached (php.net/memcache) – Performs well,

    scales nicely. – All the cool kids are doing it • Sticky sessions (Varnish / Squid) – Or redirect-and-stick, ie: www.example.com -> (302) -> www4.example.com But doesn't work for some apps (Wordpress) • Database sessions – Bit pointless. Definitely not cool.
  28. Caching. Not using intelligence when stupidity will do just fine.

  29. Scenario • Your CSS/JS/images don't change often, so users should

    cache them • But when they do change, you want everyone to flush their cache, else the site will stop working.
  30. Not cool. <?php header('Expires: Sat, 26 Jul 1997 05:00:00 GMT');

    header('Last-Modified: ' . gmdate( 'D, d M Y H:i:s') . ' GMT'); header('Cache-Control: no-store, no-cache, must-revalidate'); header('Cache-Control: post-check=0, pre-check=0', false); header('Pragma: no-cache'); ?>
  31. 3 Add query strings to enable far-future caching

  32. Solution • /lib/img/my_website_header.png Expires: Sun, 17 Jan 2038 19:26:00 GMT

    But these are not the same object: • /lib/img/my_website_header.png?v=2 • /lib/img/my_website_header.png?v=3 • /lib/img/my_website_header.png?v=4
  33. Result. • Changing the filename or adding a query string

    will cause all browsers to re-request the file. • All the benefits of long term caching • No update latency
  34. None
  35. 4 Use a CDN for huge capacity, low cost, and

    no hassle
  36. Solution • Choose a reverse proxy CDN • Put some

    thought into these headers – Expires: – Cache-control: – Last-Modified: – Content-Length: – Etag: • then offload your traffic …
  37. CDN providers • Velocix - 500GB/mo, free http://www.velocix.com • Edgecast

    - 1TB/mo, $500 http://www.edgecast.com • Limelight - 1TB/mo, $1000 http://www.limelight.com • Amazon Cloudfront - 1TB/mo, $400 (ish!) http://aws.amazon.com (cheap for big files) Note: I have no affiliation with any of these providers
  38. Are you cachable? • http://www.ircache.net/cgi-bin/cacheability.py

  39. It's not hard. Or is it? • yahoo.co.uk - server

    clock is wrong • microsoft.com - sends malformed headers • timesonline.com - no cache control • digg.com - Has PHP's 19 Nov 1981 expiry date • msn.co.uk - Two redirects, no caching • gumtree.com - tries to cache for 10 mins, but has no validator or content length
  40. Overburden your site with writes, and you're going nowhere fast.

  41. Scenario • Your app runs on a single server /

    shared host • You connect to a database using some kind of DB abstraction class / framework
  42. Doesn't scale.

  43. Scales. (a bit)

  44. 5 Splitting database connections for easier scaling later

  45. Solution • Always plan for your write queries to go

    somewhere different to your reads. – Even if they won't in the immediate future • And assume that writes take a non-negligible amount of time to become readable.
  46. Scenario • 'Most viewed/emailed' widgets • Thinking about doing this?

    Obligatory BBC News Online Screenshot. UPDATE content SET viewcount=viewcount+1 WHERE contentid=5309342; NOT COOL
  47. • You're writing on every page load! • Low read:write

    ratio • High page generation overhead • Can't cache. • Disaster.
  48. 8 Using hosted analytics to avoid logging

  49. Solution • You want to optimise for reads. • You

    don't really need all this data. Just the aggregated results. • So let someone else do it!
  50. Solution • Hosted analytics: – Google Analytics (free), SiteIntelligence, Webtrends

    • But what about AJAX / downloads / outbound links / JavaScript actions? <a href="http://www.example.com" onClick="javascript: pageTracker._trackPageview('/outgoing/example.com');">
  51. Scenario • Script reads from cache, or regenerates and stores

    in cache if cache is stale • At the moment the cache expires, lots of threads try to write to it at the same time. • Evil writes kill your web server.
  52. 6 Prep content in advance to avoid cache slamming

  53. Solution • Use a separate process to write to the

    cache, periodically or event driven, but not triggered by web requests. • Scripts handling HTTP requests never write
  54. Quick recap • Sessions: Try JavaScript injection, cookie- stored session

    data, sticky sessions, memcached. • Caching: Use a CDN and far-future caching • Writes: Split reads and writes, reduce writes, use hosted analytics, prep content on a schedule
  55. It scales.

  56. Thanks • andrew.betts@assanka.net • We're hiring, blah blah blah. •

    http://www.assanka.net/jobs • http://www.flickr.com/photos/57158820@N00/920872985 • http://www.flickr.com/photos/exotictransport/163976659/ • http://www.flickr.com/photos/erazmilic/178574918/