Slide 1

Slide 1 text

Quality for 'cloud natives': what changes when your systems are complex and distributed? Sarah Wells Technical Director for Operations & Reliability, The Financial Times @sarahjwells

Slide 2

Slide 2 text

No content

Slide 3

Slide 3 text

No content

Slide 4

Slide 4 text

@sarahjwells “Experiment” for most organizations really means “try” Linda Rising Experiments: the Good, the Bad and the Beautiful

Slide 5

Slide 5 text

@sarahjwells How quickly can you spin up an MVP?

Slide 6

Slide 6 text

No content

Slide 7

Slide 7 text

@sarahjwells We’re able to do this because we adopted a cloud-native architecture

Slide 8

Slide 8 text

“microservices (n,pl): an efficient device for transforming business problems into distributed transaction problems” @drsnooks

Slide 9

Slide 9 text

@sarahjwells Distributed systems fail in new and interesting ways

Slide 10

Slide 10 text

No content

Slide 11

Slide 11 text

No content

Slide 12

Slide 12 text

No content

Slide 13

Slide 13 text

No content

Slide 14

Slide 14 text

No content

Slide 15

Slide 15 text

No content

Slide 16

Slide 16 text

@sarahjwells We need to change how we approach quality

Slide 17

Slide 17 text

@sarahjwells We need to test in production

Slide 18

Slide 18 text

@sarahjwells Cloud native: an introduction Testing in production Optimising for fixing things fast

Slide 19

Slide 19 text

@sarahjwells Cloud native: an introduction

Slide 20

Slide 20 text

@sarahjwells What IS cloud native?

Slide 21

Slide 21 text

@sarahjwells It’s definitely about “the cloud”

Slide 22

Slide 22 text

@sarahjwells Cloud native means building things to benefit from the cloud not just run on it

Slide 23

Slide 23 text

Infrastructure as a service

Slide 24

Slide 24 text

Infrastructure as a service Automation

Slide 25

Slide 25 text

Infrastructure as a service Continuous Delivery Automation

Slide 26

Slide 26 text

Infrastructure as a service Microservices Continuous Delivery Automation

Slide 27

Slide 27 text

Infrastructure as a service Microservices Containers & Orchestration Continuous Delivery Automation

Slide 28

Slide 28 text

Infrastructure as a service Microservices Containers & Orchestration Software as a Service Continuous Delivery Automation

Slide 29

Slide 29 text

Download at: https:// info.container- solutions.com/ introduction-to-cloud- native

Slide 30

Slide 30 text

@sarahjwells Sounds complicated?

Slide 31

Slide 31 text

@sarahjwells Why adopt it?

Slide 32

Slide 32 text

@sarahjwells “Cloud native technologies enable software developers to build great products faster” - the CNCF

Slide 33

Slide 33 text

@sarahjwells Making small releases, quickly and frequently

Slide 34

Slide 34 text

No content

Slide 35

Slide 35 text

@sarahjwells You can’t experiment when you do 12 releases a year

Slide 36

Slide 36 text

@sarahjwells Small changes are much easier to understand

Slide 37

Slide 37 text

The more often you release, the lower your failure rate for those releases

Slide 38

Slide 38 text

@sarahjwells ~15% failure rate vs < 1% failure rate

Slide 39

Slide 39 text

@sarahjwells You don’t have to choose between speed and stability

Slide 40

Slide 40 text

@sarahjwells Why does the focus for testing change?

Slide 41

Slide 41 text

@sarahjwells The kind of testing you do when you release once a month doesn’t work when you release 10 times a day

Slide 42

Slide 42 text

No content

Slide 43

Slide 43 text

No content

Slide 44

Slide 44 text

No content

Slide 45

Slide 45 text

@sarahjwells “Not wrong long” Sally Goble https://www.theguardian.com/info/developer-blog/2016/dec/04/ perfect-software-the-enemy-of-rapid-deployment

Slide 46

Slide 46 text

@sarahjwells “We’re not a nuclear power station or a hospital”

Slide 47

Slide 47 text

@sarahjwells Cloud native: an introduction Testing in production

Slide 48

Slide 48 text

@sarahjwells Pre-release testing

Slide 49

Slide 49 text

@sarahjwells We should still be writing automated tests for the service

Slide 50

Slide 50 text

Cindy Sridharan: https://medium.com/@copyconstruct/ testing-microservices-the-sane-way-9bb31d158c16

Slide 51

Slide 51 text

@sarahjwells Don’t try to regression test the whole system

Slide 52

Slide 52 text

@sarahjwells Acceptance tests running locally pushes developers towards a ‘full stack on your laptop’

Slide 53

Slide 53 text

@sarahjwells You end up with a distributed monolith

Slide 54

Slide 54 text

@sarahjwells Test fixtures can be brittle

Slide 55

Slide 55 text

No content

Slide 56

Slide 56 text

@sarahjwells A 30 minute code change took 2 weeks to get the acceptance tests working

Slide 57

Slide 57 text

@sarahjwells Almost all the time, the code was fine, the tests were broken

Slide 58

Slide 58 text

@sarahjwells Learn from the pain!

Slide 59

Slide 59 text

@sarahjwells Shifting right?

Slide 60

Slide 60 text

@sarahjwells Introduce synthetic monitoring

Slide 61

Slide 61 text

@sarahjwells This replaced our acceptance tests

Slide 62

Slide 62 text

No content

Slide 63

Slide 63 text

No content

Slide 64

Slide 64 text

No content

Slide 65

Slide 65 text

No content

Slide 66

Slide 66 text

@sarahjwells No data fixtures required

Slide 67

Slide 67 text

@sarahjwells Also helps us know things are broken even if no user is currently doing anything

Slide 68

Slide 68 text

@sarahjwells Make sure you know if things are working in production

Slide 69

Slide 69 text

@sarahjwells Our editorial team is inventive

Slide 70

Slide 70 text

@sarahjwells What does it mean for a publish to be ‘successful’?

Slide 71

Slide 71 text

No content

Slide 72

Slide 72 text

No content

Slide 73

Slide 73 text

No content

Slide 74

Slide 74 text

No content

Slide 75

Slide 75 text

@sarahjwells Define a contract

Slide 76

Slide 76 text

@sarahjwells Contract testing for key interfaces

Slide 77

Slide 77 text

@sarahjwells Simple documentation is a start

Slide 78

Slide 78 text

No content

Slide 79

Slide 79 text

created by Matt Hinchliffe (https://github.com/i-like-robots)

Slide 80

Slide 80 text

No content

Slide 81

Slide 81 text

@sarahjwells Separate releasing code from releasing functionality

Slide 82

Slide 82 text

@sarahjwells Feature flags

Slide 83

Slide 83 text

No content

Slide 84

Slide 84 text

@sarahjwells Canary releases

Slide 85

Slide 85 text

No content

Slide 86

Slide 86 text

Cindy Sridharan: https://medium.com/@copyconstruct/ testing-in-production-the-safe-way-18ca102d0ef1

Slide 87

Slide 87 text

@sarahjwells Cloud native: an introduction Testing in production Optimising for fixing things fast

Slide 88

Slide 88 text

@sarahjwells Mitigate first

Slide 89

Slide 89 text

@sarahjwells Make sure you can work out what’s going on

Slide 90

Slide 90 text

@sarahjwells Log aggregation

Slide 91

Slide 91 text

No content

Slide 92

Slide 92 text

@sarahjwells transaction_id=“SYN_ABC”

Slide 93

Slide 93 text

@sarahjwells Practice

Slide 94

Slide 94 text

“If it hurts, do it more frequently, and bring the pain forward.”

Slide 95

Slide 95 text

@sarahjwells Failovers, database restores

Slide 96

Slide 96 text

@sarahjwells Chaos engineering https://principlesofchaos.org/

Slide 97

Slide 97 text

@sarahjwells Understand your steady state Look at what you can change - minimise the blast radius Work out what you expect to see happen Run the experiment and see if you were right

Slide 98

Slide 98 text

@sarahjwells Use the skills you already have

Slide 99

Slide 99 text

@sarahjwells Good QAs understand the features of the system

Slide 100

Slide 100 text

@sarahjwells Chaos engineering uses the same skills as exploratory testing - “hmm, I wonder what will happen if I do this?”

Slide 101

Slide 101 text

@sarahjwells Work on operational stuff too

Slide 102

Slide 102 text

@sarahjwells Cloud native: an introduction Testing in production Optimising for fixing things fast

Slide 103

Slide 103 text

@sarahjwells What worked before doesn’t work so well for cloud native

Slide 104

Slide 104 text

@sarahjwells Focus on delivering maximum value to your users while minimising the times when things are broken or unavailable

Slide 105

Slide 105 text

@sarahjwells Understand where the QA mindset has the most impact

Slide 106

Slide 106 text

@sarahjwells Use synthetic monitoring Use clever monitoring Make sure logs are aggregated With tracing of events Practice things Chaos engineering IS exploratory testing!

Slide 107

Slide 107 text

@sarahjwells Thank you!