Slide 1

Slide 1 text

How to launch a large-scale website confidently and successfully Ryan Townsend – 25th October 2017 @ryantownsend

Slide 2

Slide 2 text

“Just use Heroku auto-scaling and forget about it” Kris Quigley – Lead Architect @ SHIFT (sarcasm)

Slide 3

Slide 3 text

Timeline Development Pre-launch Launch Post-launch

Slide 4

Slide 4 text

Ryan Townsend, CTO @ryantownsend

Slide 5

Slide 5 text

Relaunched on SHIFT: May 2017

Slide 6

Slide 6 text

Development http://www.spacex.com/media-gallery/detail/149431/9391

Slide 7

Slide 7 text

Keep things simple • New features or new technology… not both • Mature (or ‘boring’) technology and architecture • Limit project scope

Slide 8

Slide 8 text

Load testing • It’s A LOT harder than people let on • Assume user behaviour will change • Use real metrics and logged user behaviour • Use a wide variety of metrics
 – not just traffic • Post-test validate the metrics
 – at source, not just in your load testing tool • Push the limits
 – understand when you need to proactively start load reduction

Slide 9

Slide 9 text

Caching • Care for the performance of cache misses
 – Faster performance = lower load (typically) • Content Delivery Networks
 – Assets: far-future expiry
 – Content: start with low TTLs, raise for desired effect • Consider redirects and 404s • Fragment caching • Low-level method / query caching

Slide 10

Slide 10 text

Caching Higher hit ratios = less traffic hitting our servers

Slide 11

Slide 11 text

Pre-launch https://www.flickr.com/photos/spacex/31450835954/

Slide 12

Slide 12 text

Communication • Build a positive, trusting relationship with client / stakeholder • Understand their metrics • Authority
 – e.g. who can agree to change the plan? i.e. disabling a feature to reduce load • Perspective
 – what’s critical? what’s a problem? what can be ignored?

Slide 13

Slide 13 text

Visibility • System monitoring
 – infrastructure & client-side • Client / stakeholder dashboards & reporting
 – see what they see • Customer engagement
 – social media, customer support • Instant access to logs
 – filterable, searchable Above shows how New Relic tracked a 3rd party script harming site performance but the server-side was fine.

Slide 14

Slide 14 text

Roleplay • What could go wrong? • Who would you escalate to? • How would you solve? • What systems do you need access to? • What people do you need access to?

Slide 15

Slide 15 text

Launch https://unsplash.com/photos/yJv97tE7GDM

Slide 16

Slide 16 text

Keep calm and carry on • Expect issues • Keep a level-head • Be professional • You’re an expert – you’ve got this

Slide 17

Slide 17 text

Release mechanism • Big bang • Holding page • Easy ways out • Feature toggles • Canary release • Start minimal and increase features/percentage

Slide 18

Slide 18 text

Scaling • Scale up • Keep it there • Auto-scaling

Slide 19

Slide 19 text

Post-launch https://unsplash.com/photos/-p-KCm6xB9I

Slide 20

Slide 20 text

Build Confidence • Gather actual real metrics & usage patterns • Revisit your load tests and re-assess • Re-run load tests for future releases • Ship some safe releases • Ship small releases, often

Slide 21

Slide 21 text

Thank you @ryantownsend