Slide 1

Slide 1 text

@amyngyn Big Red Button How Stripe automates incident management Amy Nguyen She/Her @amyngyn Chatbots SG

Slide 2

Slide 2 text

@amyngyn What's Stripe? 2 ● Stripe builds economic infrastructure for the Internet ● Security and reliability are the most important values we can provide to our users ● We started an engineering hub in Singapore last year! Who are you? ● I'm a software engineer living here in Singapore developing payment APIs ● I was on the Reliability Tooling team in San Francisco ● Find me online at amy.dev and on twitter @amyngyn

Slide 3

Slide 3 text

@amyngyn 3 What do you do when something's broken?

Slide 4

Slide 4 text

@amyngyn 4 What do you do when something's broken? ● Page someone ● Find the largest conference room and take it over ● Scream ● Fix the problem

Slide 5

Slide 5 text

@amyngyn 5 What about... ● Announce where incident firefighting is happening ● Update your company's status page ● Update stakeholders and leadership ● Create a ticket to record the incident ● Document the incident timeline ● Write a public-facing retrospective ● Track remediation items ● Sending messages to users ● Lock deploys

Slide 6

Slide 6 text

@amyngyn 6

Slide 7

Slide 7 text

@amyngyn 7 Introducing Big Red Button

Slide 8

Slide 8 text

@amyngyn 8

Slide 9

Slide 9 text

@amyngyn 9

Slide 10

Slide 10 text

@amyngyn 10

Slide 11

Slide 11 text

@amyngyn 11

Slide 12

Slide 12 text

@amyngyn 12

Slide 13

Slide 13 text

@amyngyn 13

Slide 14

Slide 14 text

@amyngyn 14

Slide 15

Slide 15 text

@amyngyn 15

Slide 16

Slide 16 text

@amyngyn 16

Slide 17

Slide 17 text

@amyngyn 17

Slide 18

Slide 18 text

@amyngyn 18

Slide 19

Slide 19 text

@amyngyn 19

Slide 20

Slide 20 text

@amyngyn 20

Slide 21

Slide 21 text

@amyngyn 21

Slide 22

Slide 22 text

@amyngyn 22

Slide 23

Slide 23 text

@amyngyn 23

Slide 24

Slide 24 text

@amyngyn 24

Slide 25

Slide 25 text

@amyngyn 25

Slide 26

Slide 26 text

@amyngyn 26

Slide 27

Slide 27 text

@amyngyn 27

Slide 28

Slide 28 text

@amyngyn 28

Slide 29

Slide 29 text

@amyngyn 29

Slide 30

Slide 30 text

@amyngyn 30

Slide 31

Slide 31 text

@amyngyn 31

Slide 32

Slide 32 text

@amyngyn 32 web UI

Slide 33

Slide 33 text

@amyngyn 33 web UI

Slide 34

Slide 34 text

@amyngyn 1 34 web UI

Slide 35

Slide 35 text

@amyngyn 1 3 2 35 web UI

Slide 36

Slide 36 text

@amyngyn 36 incident event web UI

Slide 37

Slide 37 text

@amyngyn 37 incident event web UI event event event

Slide 38

Slide 38 text

@amyngyn 38

Slide 39

Slide 39 text

@amyngyn 39

Slide 40

Slide 40 text

@amyngyn 40 abstract service

Slide 41

Slide 41 text

@amyngyn 41 slack service email service abstract service […] service

Slide 42

Slide 42 text

@amyngyn 42 slack service email service abstract service […] service incident close event handler recovery steps incident [...] event handler

Slide 43

Slide 43 text

@amyngyn 43

Slide 44

Slide 44 text

@amyngyn 44 Thanks!