Slide 1

Slide 1 text

Think About! 2019 Ryn Daniels @rynchantress they/them CI/CD: More than Just Code

Slide 2

Slide 2 text

RESILIENCE @rynchantress Think About! 2019

Slide 3

Slide 3 text

RESILIENCE IS NOT ROBUSTNESS @rynchantress Think About! 2019

Slide 4

Slide 4 text

Computer systems can be robust against a finite set of known problems. Only humans can be resilient against unknown problems. @rynchantress Think About! 2019

Slide 5

Slide 5 text

RESILIENCE @rynchantress Think About! 2019 - not of technical systems - of processes - of human systems

Slide 6

Slide 6 text

RESILIENCE @rynchantress Think About! 2019 OF OUTCOMES

Slide 7

Slide 7 text

Devops is a cultural and technical movement that enables resilience of outcomes. @rynchantress Think About! 2019

Slide 8

Slide 8 text

@rynchantress Think About! 2019 2 STORIES ABOUT RESILIENCE OF OUTCOMES

Slide 9

Slide 9 text

PUTTING THE DEV IN DEVOPS @rynchantress Think About! 2019 Or: Improving resilience of outcomes for (ops-related) code

Slide 10

Slide 10 text

(Infrastructure provisioning tooling at Etsy*) PROJECT INDIGO @rynchantress Think About! 2019

Slide 11

Slide 11 text

@rynchantress Think About! 2019

Slide 12

Slide 12 text

ConCnuous I...nsecurity about whether or not these changes were going to break something horribly. ConCnuous Delivery... of things that were broken in producCon that I promise worked on my machine, really. Yes, we have CI/CD! @rynchantress Think About! 2019

Slide 13

Slide 13 text

@rynchantress Think About! 2019 HERE BE DRAGONS

Slide 14

Slide 14 text

- Most commonly used workflow was completely untested - No versioning of any components - Lack of separate tesCng environment - Manual, error-prone deploy process - No tests (or even testable code) Project Indigo: Labeling the Dragons @rynchantress Think About! 2019

Slide 15

Slide 15 text

- Most commonly used workflow was completely untested - No versioning of any components - Lack of separate tesCng environment - Manual, error-prone deploy process - No tests (or even testable code) Project Indigo: Labeling the Dragons @rynchantress Think About! 2019

Slide 16

Slide 16 text

Untested Workflow: Unattended Mode @rynchantress Think About! 2019 - AOended mode: An engineer in the office runs Indigo by hand - UnaOended mode: The data center team boots up a new machine and it automaCcally netboots to Indigo

Slide 17

Slide 17 text

@rynchantress Think About! 2019

Slide 18

Slide 18 text

@rynchantress Think About! 2019

Slide 19

Slide 19 text

@rynchantress Think About! 2019

Slide 20

Slide 20 text

- Most commonly used workflow was completely untested - No versioning of any components - Lack of separate tesCng environment - Manual, error-prone deploy process - No tests (or even testable code) Project Indigo: Labeling the Dragons @rynchantress Think About! 2019

Slide 21

Slide 21 text

@rynchantress Think About! 2019

Slide 22

Slide 22 text

@rynchantress Think About! 2019

Slide 23

Slide 23 text

@rynchantress Think About! 2019

Slide 24

Slide 24 text

- Most commonly used workflow was completely untested - No versioning of any components - Lack of separate tes

Slide 25

Slide 25 text

@rynchantress Think About! 2019

Slide 26

Slide 26 text

@rynchantress Think About! 2019

Slide 27

Slide 27 text

@rynchantress Think About! 2019

Slide 28

Slide 28 text

- Most commonly used workflow was completely untested - No versioning of any components - Lack of separate tesCng environment - Manual, error-prone deploy process - No tests (or even testable code) Project Indigo: Labeling the Dragons @rynchantress Think About! 2019

Slide 29

Slide 29 text

AUTOMATE! @rynchantress Think About! 2019

Slide 30

Slide 30 text

- Most commonly used workflow was completely untested - No versioning of any components - Lack of separate tesCng environment - Manual, error-prone deploy process - No tests (or even testable code) Project Indigo: Labeling the Dragons @rynchantress Think About! 2019

Slide 31

Slide 31 text

@rynchantress Think About! 2019

Slide 32

Slide 32 text

@rynchantress Think About! 2019

Slide 33

Slide 33 text

@rynchantress Think About! 2019

Slide 34

Slide 34 text

- End-to-end tesCng of both workflows - Versioning of payload and API - Ability to create a tesCng environment for various components - Automated deploy process with Deployinator - Testable code with >0 unit tests Project Indigo: Slaying the Dragons @rynchantress Think About! 2019

Slide 35

Slide 35 text

- Reducing the set of unknowns by running tests against known problems - Decreasing surprises with smaller changes - Surfacing problems more quickly and automaCcally for quicker response CI/CD: Resilience of Outcomes for Code Changes @rynchantress Think About! 2019

Slide 36

Slide 36 text

Just one building block of resilience CI/CD: @rynchantress Think About! 2019

Slide 37

Slide 37 text

THE APACHE SNAFU @rynchantress Think About! 2019 Or: Improving resilience of outcomes for human incident response

Slide 38

Slide 38 text

@rynchantress Think About! 2019

Slide 39

Slide 39 text

@rynchantress Think About! 2019 WHOOPS.

Slide 40

Slide 40 text

- SomeCmes automaCon goes awry - SomeCmes broken automaCon saves you - SomeCmes an incident breaks all your incident monitoring/response tools - Always give people weird sweaters when they break things Lessons Learned, Part I @rynchantress Think About! 2019

Slide 41

Slide 41 text

CULTURE @rynchantress Think About! 2019

Slide 42

Slide 42 text

CULTURE @rynchantress Think About! 2019 Collec

Slide 43

Slide 43 text

- Understand when processes don't work - Encourage public communicaCon - IncenCvize people to help each other out - Budget in slack - Maintain psychological safety Lessons Learned, Part II @rynchantress Think About! 2019

Slide 44

Slide 44 text

Processes and documentation @rynchantress Think About! 2019 - Which processes are explicit versus implicit? - Do you know when people are working around processes that frustrate them? - How do processes get changed?

Slide 45

Slide 45 text

Communication and social scripts @rynchantress Think About! 2019 - What channels do people use for communicaCon? - Does communicaCon default to public or private? - What are the standards for how people interact with each other?

Slide 46

Slide 46 text

Motivation and incentive structures @rynchantress Think About! 2019 - What behaviors are explicitly encouraged in your organizaCon? - What is part of your career framework? - What behaviors are implicitly rewarded? - Are incenCves consistent across the org?

Slide 47

Slide 47 text

Prioritization and budgets @rynchantress Think About! 2019 - Who is available to help respond to incidents? - Are incident response and remediaCon items considered during planning and scheduling? - How much slack is in your schedules? - What happens when schedules slip?

Slide 48

Slide 48 text

Psychological safety @rynchantress Think About! 2019 - Do people feel comfortable asking for help? - Do people admit when they don't know something? - How are mistakes treated within your organizaCon?

Slide 49

Slide 49 text

- Processes and documentaCon - Social scripts - IncenCve structures - PrioriCes and budgets - Psychological safety Cultural Building Blocks @rynchantress Think About! 2019

Slide 50

Slide 50 text

@rynchantress Think About! 2019 Multiple ways to build resilience of outcomes

Slide 51

Slide 51 text

Technical and cultural building blocks @rynchantress Think About! 2019

Slide 52

Slide 52 text

You aren't just shipping code. You're also shipping everything that enables you to ship code. @rynchantress Think About! 2019

Slide 53

Slide 53 text

@rynchantress Think About! 2019 Devops: building blocks of resilience

Slide 54

Slide 54 text

THANK YOU! @rynchantress Think About! 2019