Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Do you scale?

Do you scale?

When you hit your first scaling problem with a web application, it’s likely to be one of the few common issues examined and soled in this presentation.

Andrew Betts

April 20, 2012
Tweet

More Decks by Andrew Betts

Other Decks in Technology

Transcript

  1. Crikey. We hadn't quite counted on welcoming quite so many

    of you to OnOneMap in one go, and to be quite honest we've completely run out of oomph. Please do come back tomorrow, and we promise to make you a lovely cup of tea to make up for not being quite on top form today.
  2. Two problems • Performance You need to make your app

    more efficient • Scaling You need to increase capacity
  3. Scenario • Lots of non-personalised content (newspaper, blog, web store)

    • Some minor session-based data (eg. 'Welcome Andrew')
  4. Expires: Thu, 19 Nov 1981 08:52:00 GMT Cache-Control: post-check=0, must-revalidate,

    no- store, no-cache, pre-check=0 Last-Modified: not present ETag: not present Set-Cookie: path=/; phpsessid=b7977f7c69eb898bf42526652dda4c6c BAD BAD BAD BAD Sascha Schumann's Birthday
  5. Defeat, summarised. • Can't cache it • Need sessions everywhere

    • Sessions are lost if you switch server • Session-enabled requests are processed sequentially, due to file locking • Nightmare.
  6. Solution • Generate only generic content • Leave gaps (login

    status, shopping basket etc) • Load session data from somewhere else • Merge in browser using magic (or JavaScript)
  7. <head> <script type='text/javascript' src='/js/session.js'></script> <script type='text/javascript' src='http://sessions.example.com/sessiondata'></script> loadSession({ un:'Andrew', em:'[email protected]',

    cartid:824457, cart:[ {id:12, desc:'Coal', qty:1, units:'lumps'}, {id:28, desc:'Rudolf jumper', qty:1, units:'jumpers'}, {id:82, desc:'Socks', qty:8, units:'pairs') ] });
  8. Result. • Most scripts don't need to track sessions •

    You can cache stuff (even use a CDN) • Cache it for ages. Reduce load on your kit. • Sessions become a separate issue - build a scalable session store on a separate vhost / machine / cluster
  9. OK, so…. • Your pages are mostly dynamic content (webmail,

    identity manager etc) • Almost entire page content is session-specific
  10. Solution • Don't use server sessions at all • Store

    all session state data in a cookie • Sign it with a hash (sha1) • Timestamp allows you to expire it • You can get a lot in there • Remember it's not encrypted on the wire, and adds to your bandwidth 27478932510|triblondon|1231936510|2,4,6,52,183|a152c24d9874ba15235f userid | username | sessionstart | groupmemberships | signature
  11. Other scalable session solutions • memcached (php.net/memcache) – Performs well,

    scales nicely. – All the cool kids are doing it • Sticky sessions (Varnish / Squid) – Or redirect-and-stick, ie: www.example.com -> (302) -> www4.example.com But doesn't work for some apps (Wordpress) • Database sessions – Bit pointless. Definitely not cool.
  12. Scenario • Your CSS/JS/images don't change often, so users should

    cache them • But when they do change, you want everyone to flush their cache, else the site will stop working.
  13. Not cool. <?php header('Expires: Sat, 26 Jul 1997 05:00:00 GMT');

    header('Last-Modified: ' . gmdate( 'D, d M Y H:i:s') . ' GMT'); header('Cache-Control: no-store, no-cache, must-revalidate'); header('Cache-Control: post-check=0, pre-check=0', false); header('Pragma: no-cache'); ?>
  14. Solution • /lib/img/my_website_header.png Expires: Sun, 17 Jan 2038 19:26:00 GMT

    But these are not the same object: • /lib/img/my_website_header.png?v=2 • /lib/img/my_website_header.png?v=3 • /lib/img/my_website_header.png?v=4
  15. Result. • Changing the filename or adding a query string

    will cause all browsers to re-request the file. • All the benefits of long term caching • No update latency
  16. Solution • Choose a reverse proxy CDN • Put some

    thought into these headers – Expires: – Cache-control: – Last-Modified: – Content-Length: – Etag: • then offload your traffic …
  17. CDN providers • Velocix - 500GB/mo, free http://www.velocix.com • Edgecast

    - 1TB/mo, $500 http://www.edgecast.com • Limelight - 1TB/mo, $1000 http://www.limelight.com • Amazon Cloudfront - 1TB/mo, $400 (ish!) http://aws.amazon.com (cheap for big files) Note: I have no affiliation with any of these providers
  18. It's not hard. Or is it? • yahoo.co.uk - server

    clock is wrong • microsoft.com - sends malformed headers • timesonline.com - no cache control • digg.com - Has PHP's 19 Nov 1981 expiry date • msn.co.uk - Two redirects, no caching • gumtree.com - tries to cache for 10 mins, but has no validator or content length
  19. Scenario • Your app runs on a single server /

    shared host • You connect to a database using some kind of DB abstraction class / framework
  20. Solution • Always plan for your write queries to go

    somewhere different to your reads. – Even if they won't in the immediate future • And assume that writes take a non-negligible amount of time to become readable.
  21. Scenario • 'Most viewed/emailed' widgets • Thinking about doing this?

    Obligatory BBC News Online Screenshot. UPDATE content SET viewcount=viewcount+1 WHERE contentid=5309342; NOT COOL
  22. • You're writing on every page load! • Low read:write

    ratio • High page generation overhead • Can't cache. • Disaster.
  23. Solution • You want to optimise for reads. • You

    don't really need all this data. Just the aggregated results. • So let someone else do it!
  24. Solution • Hosted analytics: – Google Analytics (free), SiteIntelligence, Webtrends

    • But what about AJAX / downloads / outbound links / JavaScript actions? <a href="http://www.example.com" onClick="javascript: pageTracker._trackPageview('/outgoing/example.com');">
  25. Scenario • Script reads from cache, or regenerates and stores

    in cache if cache is stale • At the moment the cache expires, lots of threads try to write to it at the same time. • Evil writes kill your web server.
  26. Solution • Use a separate process to write to the

    cache, periodically or event driven, but not triggered by web requests. • Scripts handling HTTP requests never write
  27. Quick recap • Sessions: Try JavaScript injection, cookie- stored session

    data, sticky sessions, memcached. • Caching: Use a CDN and far-future caching • Writes: Split reads and writes, reduce writes, use hosted analytics, prep content on a schedule
  28. Thanks • [email protected] • We're hiring, blah blah blah. •

    http://www.assanka.net/jobs • http://www.flickr.com/photos/57158820@N00/920872985 • http://www.flickr.com/photos/exotictransport/163976659/ • http://www.flickr.com/photos/erazmilic/178574918/