Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Don't Panic! How to launch a large-scale website confidently and successfully

Don't Panic! How to launch a large-scale website confidently and successfully

Launch-days can be stressful, particularly when the project’s key stakeholders are keeping an extra keen eye on how things are going. Back in May 2017, my team felt this pressure all too well as we relaunched the website of billion-pound UK retailer, Matalan, on our own ecommerce platform, SHIFT. As exciting as it was, there were tense moments as we opened the floodgates and let an unprecedented amount of traffic begin hitting the site in an instant.

In this talk, I’ll step through some of the technical and non-technical lessons that we learned in the process, and leave you better prepared and more confident when doing the same.

Talk given at DevOps Tallinn – 17th May 2018

F085bf2092cb300bac787cc5bc65d301?s=128

Ryan Townsend

May 17, 2018
Tweet

Transcript

  1. Don’t Panic! How to launch a large-scale website confidently and

    successfully Photo by SpaceX on Unsplash DevOps Tallinn 2018
  2. Who am I? @ryantownsend Ryan Townsend, CTO

  3. Relaunched May 2017

  4. “Just use auto-scaling and forget about it” Kris Quigley –

    Lead Developer @ SHIFT (sarcasm)
  5. Timeline Development Pre-launch Launch Post-launch

  6. • Functional Testing • Deployment Pipelines • Configuration & Implementation

  7. Development http://www.spacex.com/media-gallery/detail/149431/9391

  8. Keep Things Simple

  9. Limit Project Scope

  10. New Problem or New Technology

  11. “Almost all the cases where I've heard of a system

    that was built as a microservice system from scratch, it has ended up in serious trouble.” – Martin Fowler, ThoughtWorks CTO
  12. Clear Decoupling

  13. Admin Panel API Website

  14. Use Boring Mature Technology

  15. Load Testing

  16. Don’t wait until the end

  17. It’s A LOT harder than people let on

  18. • Use real metrics and logged user behaviour • Use

    a wide variety of metrics, not just traffic • Post-test validate the metrics at source
  19. Assume user behaviour will change

  20. Stress Test

  21. Web Performance Testing

  22. Remember: it’s not just for you!

  23. Caching

  24. Client CDN Application Database

  25. Write-through caches

  26. Start small… low TTLs

  27. Front-end – static assets & redirects

  28. Higher hit ratios = less traffic hitting our servers

  29. Feature Toggles

  30. Ideal Fallback Off On

  31. On Ideal Fallback Off

  32. • Built into your application • Content Delivery Network •

    A/B testing tool
  33. Circuit Breakers

  34. Ideal Fallback Open Error Closed

  35. Ideal Fallback Open Error Closed

  36. Ideal Fallback Open Error Closed

  37. Pre-launch Preparations https://www.flickr.com/photos/spacex/31450835954/

  38. Communication

  39. • Build a trusting relationship with stakeholders • Understand their

    metrics • Get their perspective • Determine authority
  40. Visibility

  41. • System monitoring
 – infrastructure & client-side • Client /

    stakeholder dashboards & reporting
 – see what they see • Customer engagement
 – social media, customer support • Instant access to logs
 – filterable, searchable
  42. Above shows how New Relic tracked a 3rd party script

    harming site performance but the server-side was fine.
  43. Roleplay

  44. • What could go wrong? • Who would you escalate

    to? • How would you solve? • What people do you need access to? • What systems do you need access to?
  45. Traffic Reduction

  46. None
  47. • Avoid scheduling big campaigns • Paid advertising is easy

    to turn off • Reduce offering
  48. Launch Day https://unsplash.com/photos/yJv97tE7GDM

  49. Scale-up

  50. “Big Bang” vs Canary Release

  51. Feature Toggles: Off

  52. Keep Calm and Carry On

  53. • Expect issues • Keep a level-head • Remain professional

    • You’re an expert – you’ve got this
  54. Post-launch https://unsplash.com/photos/-p-KCm6xB9I

  55. Continue Building Confidence

  56. • Gather actual real metrics & usage patterns • Revisit

    your load tests and re-assess • Re-run load tests for future releases • Ship some safe releases • Ship small releases, often
  57. Since Launch https://unsplash.com/photos/MEW1f-yu2KI

  58. Optimising Caching

  59. Strong Migrations

  60. Started working towards micro macro-services

  61. Event Sourcing

  62. Static Site Generation

  63. Communication is Paramount

  64. Thank you @ryantownsend