Slide 1

Slide 1 text

Don’t Panic! How to launch a large-scale website confidently and successfully Photo by SpaceX on Unsplash DevOps Tallinn 2018

Slide 2

Slide 2 text

Who am I? @ryantownsend Ryan Townsend, CTO

Slide 3

Slide 3 text

Relaunched May 2017

Slide 4

Slide 4 text

“Just use auto-scaling and forget about it” Kris Quigley – Lead Developer @ SHIFT (sarcasm)

Slide 5

Slide 5 text

Timeline Development Pre-launch Launch Post-launch

Slide 6

Slide 6 text

• Functional Testing • Deployment Pipelines • Configuration & Implementation

Slide 7

Slide 7 text

Development http://www.spacex.com/media-gallery/detail/149431/9391

Slide 8

Slide 8 text

Keep Things Simple

Slide 9

Slide 9 text

Limit Project Scope

Slide 10

Slide 10 text

New Problem or New Technology

Slide 11

Slide 11 text

“Almost all the cases where I've heard of a system that was built as a microservice system from scratch, it has ended up in serious trouble.” – Martin Fowler, ThoughtWorks CTO

Slide 12

Slide 12 text

Clear Decoupling

Slide 13

Slide 13 text

Admin Panel API Website

Slide 14

Slide 14 text

Use Boring Mature Technology

Slide 15

Slide 15 text

Load Testing

Slide 16

Slide 16 text

Don’t wait until the end

Slide 17

Slide 17 text

It’s A LOT harder than people let on

Slide 18

Slide 18 text

• Use real metrics and logged user behaviour • Use a wide variety of metrics, not just traffic • Post-test validate the metrics at source

Slide 19

Slide 19 text

Assume user behaviour will change

Slide 20

Slide 20 text

Stress Test

Slide 21

Slide 21 text

Web Performance Testing

Slide 22

Slide 22 text

Remember: it’s not just for you!

Slide 23

Slide 23 text

Caching

Slide 24

Slide 24 text

Client CDN Application Database

Slide 25

Slide 25 text

Write-through caches

Slide 26

Slide 26 text

Start small… low TTLs

Slide 27

Slide 27 text

Front-end – static assets & redirects

Slide 28

Slide 28 text

Higher hit ratios = less traffic hitting our servers

Slide 29

Slide 29 text

Feature Toggles

Slide 30

Slide 30 text

Ideal Fallback Off On

Slide 31

Slide 31 text

On Ideal Fallback Off

Slide 32

Slide 32 text

• Built into your application • Content Delivery Network • A/B testing tool

Slide 33

Slide 33 text

Circuit Breakers

Slide 34

Slide 34 text

Ideal Fallback Open Error Closed

Slide 35

Slide 35 text

Ideal Fallback Open Error Closed

Slide 36

Slide 36 text

Ideal Fallback Open Error Closed

Slide 37

Slide 37 text

Pre-launch Preparations https://www.flickr.com/photos/spacex/31450835954/

Slide 38

Slide 38 text

Communication

Slide 39

Slide 39 text

• Build a trusting relationship with stakeholders • Understand their metrics • Get their perspective • Determine authority

Slide 40

Slide 40 text

Visibility

Slide 41

Slide 41 text

• System monitoring
 – infrastructure & client-side • Client / stakeholder dashboards & reporting
 – see what they see • Customer engagement
 – social media, customer support • Instant access to logs
 – filterable, searchable

Slide 42

Slide 42 text

Above shows how New Relic tracked a 3rd party script harming site performance but the server-side was fine.

Slide 43

Slide 43 text

Roleplay

Slide 44

Slide 44 text

• What could go wrong? • Who would you escalate to? • How would you solve? • What people do you need access to? • What systems do you need access to?

Slide 45

Slide 45 text

Traffic Reduction

Slide 46

Slide 46 text

No content

Slide 47

Slide 47 text

• Avoid scheduling big campaigns • Paid advertising is easy to turn off • Reduce offering

Slide 48

Slide 48 text

Launch Day https://unsplash.com/photos/yJv97tE7GDM

Slide 49

Slide 49 text

Scale-up

Slide 50

Slide 50 text

“Big Bang” vs Canary Release

Slide 51

Slide 51 text

Feature Toggles: Off

Slide 52

Slide 52 text

Keep Calm and Carry On

Slide 53

Slide 53 text

• Expect issues • Keep a level-head • Remain professional • You’re an expert – you’ve got this

Slide 54

Slide 54 text

Post-launch https://unsplash.com/photos/-p-KCm6xB9I

Slide 55

Slide 55 text

Continue Building Confidence

Slide 56

Slide 56 text

• Gather actual real metrics & usage patterns • Revisit your load tests and re-assess • Re-run load tests for future releases • Ship some safe releases • Ship small releases, often

Slide 57

Slide 57 text

Since Launch https://unsplash.com/photos/MEW1f-yu2KI

Slide 58

Slide 58 text

Optimising Caching

Slide 59

Slide 59 text

Strong Migrations

Slide 60

Slide 60 text

Started working towards micro macro-services

Slide 61

Slide 61 text

Event Sourcing

Slide 62

Slide 62 text

Static Site Generation

Slide 63

Slide 63 text

Communication is Paramount

Slide 64

Slide 64 text

Thank you @ryantownsend