Slide 1

Slide 1 text

Design patterns for mega traffic on a budget Scaling web apps

Slide 2

Slide 2 text

Loading, please wait Estimated time remaining: ages.

Slide 3

Slide 3 text

Crikey. We hadn't quite counted on welcoming quite so many of you to OnOneMap in one go, and to be quite honest we've completely run out of oomph. Please do come back tomorrow, and we promise to make you a lovely cup of tea to make up for not being quite on top form today.

Slide 4

Slide 4 text

Two problems • Performance You need to make your app more efficient • Scaling You need to increase capacity

Slide 5

Slide 5 text

performance != scaling

Slide 6

Slide 6 text

but you need both

Slide 7

Slide 7 text

Vertical scaling Weedy server Powerful server Deep thought Easy, expensive, limited

Slide 8

Slide 8 text

Horizontal scaling Cheap, limitless. HARD.

Slide 9

Slide 9 text

It's obvious, really…

Slide 10

Slide 10 text

1 2 Sessions Recognise your reader Caching Love being lazy 3 Writing Learn to feel the pain

Slide 11

Slide 11 text

Fear sessions, and you will scale well. The master Jedi plans ahead.

Slide 12

Slide 12 text

No content

Slide 13

Slide 13 text

Scenario • Lots of non-personalised content (newspaper, blog, web store) • Some minor session-based data (eg. 'Welcome Andrew')

Slide 14

Slide 14 text

Not cool.

Slide 15

Slide 15 text

Expires: Thu, 19 Nov 1981 08:52:00 GMT Cache-Control: post-check=0, must-revalidate, no- store, no-cache, pre-check=0 Last-Modified: not present ETag: not present Set-Cookie: path=/; phpsessid=b7977f7c69eb898bf42526652dda4c6c BAD BAD BAD BAD Sascha Schumann's Birthday

Slide 16

Slide 16 text

b7977f7c69e b898bf42526 652dda4c6c name: Andrew email: [email protected] logindate: 2009-02-28 20:42:12 userid: 453245 Client Server 1

Slide 17

Slide 17 text

b7977f7c69e b898bf42526 652dda4c6c unknown session Client Server 2

Slide 18

Slide 18 text

Defeat, summarised. • Can't cache it • Need sessions everywhere • Sessions are lost if you switch server • Session-enabled requests are processed sequentially, due to file locking • Nightmare.

Slide 19

Slide 19 text

1 Use JavaScript to inject session state

Slide 20

Slide 20 text

Solution • Generate only generic content • Leave gaps (login status, shopping basket etc) • Load session data from somewhere else • Merge in browser using magic (or JavaScript)

Slide 21

Slide 21 text

loadSession({ un:'Andrew', em:'[email protected]', cartid:824457, cart:[ {id:12, desc:'Coal', qty:1, units:'lumps'}, {id:28, desc:'Rudolf jumper', qty:1, units:'jumpers'}, {id:82, desc:'Socks', qty:8, units:'pairs') ] });

Slide 22

Slide 22 text

Result. • Most scripts don't need to track sessions • You can cache stuff (even use a CDN) • Cache it for ages. Reduce load on your kit. • Sessions become a separate issue - build a scalable session store on a separate vhost / machine / cluster

Slide 23

Slide 23 text

Hmm.

Slide 24

Slide 24 text

OK, so…. • Your pages are mostly dynamic content (webmail, identity manager etc) • Almost entire page content is session-specific

Slide 25

Slide 25 text

2 Use cookies for client-side session storage

Slide 26

Slide 26 text

Solution • Don't use server sessions at all • Store all session state data in a cookie • Sign it with a hash (sha1) • Timestamp allows you to expire it • You can get a lot in there • Remember it's not encrypted on the wire, and adds to your bandwidth 27478932510|triblondon|1231936510|2,4,6,52,183|a152c24d9874ba15235f userid | username | sessionstart | groupmemberships | signature

Slide 27

Slide 27 text

Other scalable session solutions • memcached (php.net/memcache) – Performs well, scales nicely. – All the cool kids are doing it • Sticky sessions (Varnish / Squid) – Or redirect-and-stick, ie: www.example.com -> (302) -> www4.example.com But doesn't work for some apps (Wordpress) • Database sessions – Bit pointless. Definitely not cool.

Slide 28

Slide 28 text

Caching. Not using intelligence when stupidity will do just fine.

Slide 29

Slide 29 text

Scenario • Your CSS/JS/images don't change often, so users should cache them • But when they do change, you want everyone to flush their cache, else the site will stop working.

Slide 30

Slide 30 text

Not cool.

Slide 31

Slide 31 text

3 Add query strings to enable far-future caching

Slide 32

Slide 32 text

Solution • /lib/img/my_website_header.png Expires: Sun, 17 Jan 2038 19:26:00 GMT But these are not the same object: • /lib/img/my_website_header.png?v=2 • /lib/img/my_website_header.png?v=3 • /lib/img/my_website_header.png?v=4

Slide 33

Slide 33 text

Result. • Changing the filename or adding a query string will cause all browsers to re-request the file. • All the benefits of long term caching • No update latency

Slide 34

Slide 34 text

No content

Slide 35

Slide 35 text

4 Use a CDN for huge capacity, low cost, and no hassle

Slide 36

Slide 36 text

Solution • Choose a reverse proxy CDN • Put some thought into these headers – Expires: – Cache-control: – Last-Modified: – Content-Length: – Etag: • then offload your traffic …

Slide 37

Slide 37 text

CDN providers • Velocix - 500GB/mo, free http://www.velocix.com • Edgecast - 1TB/mo, $500 http://www.edgecast.com • Limelight - 1TB/mo, $1000 http://www.limelight.com • Amazon Cloudfront - 1TB/mo, $400 (ish!) http://aws.amazon.com (cheap for big files) Note: I have no affiliation with any of these providers

Slide 38

Slide 38 text

Are you cachable? • http://www.ircache.net/cgi-bin/cacheability.py

Slide 39

Slide 39 text

It's not hard. Or is it? • yahoo.co.uk - server clock is wrong • microsoft.com - sends malformed headers • timesonline.com - no cache control • digg.com - Has PHP's 19 Nov 1981 expiry date • msn.co.uk - Two redirects, no caching • gumtree.com - tries to cache for 10 mins, but has no validator or content length

Slide 40

Slide 40 text

Overburden your site with writes, and you're going nowhere fast.

Slide 41

Slide 41 text

Scenario • Your app runs on a single server / shared host • You connect to a database using some kind of DB abstraction class / framework

Slide 42

Slide 42 text

Doesn't scale.

Slide 43

Slide 43 text

Scales. (a bit)

Slide 44

Slide 44 text

5 Splitting database connections for easier scaling later

Slide 45

Slide 45 text

Solution • Always plan for your write queries to go somewhere different to your reads. – Even if they won't in the immediate future • And assume that writes take a non-negligible amount of time to become readable.

Slide 46

Slide 46 text

Scenario • 'Most viewed/emailed' widgets • Thinking about doing this? Obligatory BBC News Online Screenshot. UPDATE content SET viewcount=viewcount+1 WHERE contentid=5309342; NOT COOL

Slide 47

Slide 47 text

• You're writing on every page load! • Low read:write ratio • High page generation overhead • Can't cache. • Disaster.

Slide 48

Slide 48 text

8 Using hosted analytics to avoid logging

Slide 49

Slide 49 text

Solution • You want to optimise for reads. • You don't really need all this data. Just the aggregated results. • So let someone else do it!

Slide 50

Slide 50 text

Solution • Hosted analytics: – Google Analytics (free), SiteIntelligence, Webtrends • But what about AJAX / downloads / outbound links / JavaScript actions?

Slide 51

Slide 51 text

Scenario • Script reads from cache, or regenerates and stores in cache if cache is stale • At the moment the cache expires, lots of threads try to write to it at the same time. • Evil writes kill your web server.

Slide 52

Slide 52 text

6 Prep content in advance to avoid cache slamming

Slide 53

Slide 53 text

Solution • Use a separate process to write to the cache, periodically or event driven, but not triggered by web requests. • Scripts handling HTTP requests never write

Slide 54

Slide 54 text

Quick recap • Sessions: Try JavaScript injection, cookie- stored session data, sticky sessions, memcached. • Caching: Use a CDN and far-future caching • Writes: Split reads and writes, reduce writes, use hosted analytics, prep content on a schedule

Slide 55

Slide 55 text

It scales.

Slide 56

Slide 56 text

Thanks • [email protected] • We're hiring, blah blah blah. • http://www.assanka.net/jobs • http://www.flickr.com/photos/57158820@N00/920872985 • http://www.flickr.com/photos/exotictransport/163976659/ • http://www.flickr.com/photos/erazmilic/178574918/