Slide 1

Slide 1 text

A learning culture in practice Nishan Subedi

Slide 2

Slide 2 text

Agenda Introduction What is a learning culture Modeling socio-technical systems Human factors Practices @ Etsy Resources

Slide 3

Slide 3 text

Introduction 3

Slide 4

Slide 4 text

Who am I? Sr. Machine Learning Engineer on the Search Ranking Team: • Been at the company > 3.5 years • Been a PostMortem Facilitator for > 2 years • Teach PostMortem Facilitation Course 4

Slide 5

Slide 5 text

Etsy is a global marketplace where people around the world connect, both online and offline, to make, sell and buy unique goods. 5

Slide 6

Slide 6 text

By The Numbers 6 1.8M active sellers AS OF 2017 30.6M active buyers AS OF 2017 $2.84B annual GMS IN 2016 45M items for sale AS OF 2017 Photo by Kirsty-Lyn Jameson

Slide 7

Slide 7 text

We are always deploying 7

Slide 8

Slide 8 text

LEARNING CULTURE

Slide 9

Slide 9 text

“Failure is success if we learn from it.” - Malcolm Forbes 9

Slide 10

Slide 10 text

LEARNING CULTURE

Slide 11

Slide 11 text

Culture is a shared set of beliefs, behaviors, and routines. 11

Slide 12

Slide 12 text

Culture is to a group what personality or character is to an individual. It’s constantly evolving. 12

Slide 13

Slide 13 text

• A strong culture can overcome almost any set of poor technical decisions. •A weak culture can’t be saved by using the best technology. •Culture is reinforced by, and reinforces your tooling and process. Why focus on culture? 13

Slide 14

Slide 14 text

Failure for etsy.com 14 From: etsystatus.com

Slide 15

Slide 15 text

Why let a good outage go to a waste? 15

Slide 16

Slide 16 text

Event Investigation a.k.a PostMortems 16 https://github.com/etsy/morgue

Slide 17

Slide 17 text

Event Investigation a.k.a PostMortems 17 https://github.com/etsy/morgue

Slide 18

Slide 18 text

Event Investigation Survey 2017 18 HTTPS://GITHUB.COM/ETSY/MORGUE

Slide 19

Slide 19 text

Get a full and honest picture of what happened, and the steps needed to help prevent it from happening again. - Anonymous Response 19

Slide 20

Slide 20 text

Including a diverse group of people with different perspectives on the issue. - Anonymous Response 20

Slide 21

Slide 21 text

Creating an environment where we can learn from an incidents in a blameless manner. - Anonymous Response 21

Slide 22

Slide 22 text

Conditions for maximizing learning from PostMortems • Blameless • Open meetings • Everyone is invited: default to @tech-all • Accountability • Remediation • Better understanding of our socio-technical systems 22

Slide 23

Slide 23 text

All models are wrong but some are useful. - George P. Box 23

Slide 24

Slide 24 text

ROBUST UNPREDICTABLE NO CLEAR CAUSALITY DRIFT TO DEGRADATION HUMANS AS A SOURCE OF ADAPTABILITY MODELING OUR SYSTEMS AS COMPLEX SYSTEMS 24

Slide 25

Slide 25 text

IMPLICATIONS OF COMPLEXITY 25

Slide 26

Slide 26 text

HUMAN FACTORS 26

Slide 27

Slide 27 text

Safety is the potential for the system to adapt and perform acceptably under widely varying conditions. Human variability provides this adaptive capacity. 27

Slide 28

Slide 28 text

ETSY’s DEPLOY DASHBOARD 28

Slide 29

Slide 29 text

WE SIMPLIFY, UNAWARE OF OUR BIASES 29

Slide 30

Slide 30 text

HINDSIGHT BIAS BIASES 30

Slide 31

Slide 31 text

OUTCOME BIAS BIASES 31

Slide 32

Slide 32 text

CONFIRMATION BIAS BIASES 32

Slide 33

Slide 33 text

Goals 33

Slide 34

Slide 34 text

Sensemaking is not about truth and getting it right. Instead, it is about continued redrafting of an emerging narrative so that it becomes more comprehensive, incorporates more of the observed data, and is more resilient in the face of criticism. Ongoing semsemaking in PostMortems 34

Slide 35

Slide 35 text

WHO DO I BLAME? WHAT KIND OF ACCOUNTABILTY DO YOU WANT? 35

Slide 36

Slide 36 text

WHO IS ACCOUNTABLE FOR IMPLEMENTING CHANGES TO MAKE THINGS BETTER? WHAT KIND OF ACCOUNTABILTY DO YOU WANT? 36

Slide 37

Slide 37 text

BLAME-FREE ≠ ACCOUNTABILITY-FREE

Slide 38

Slide 38 text

Architecture Review Operability Review Have you tried Pre-Mortems? 38

Slide 39

Slide 39 text

Morgue: http://github.com/etsy/morgue PostMortem Facilitation Guide: https://extfiles.etsy.com/DebriefingFacilitationGuide.pdf Etsy’s Engineering Blog: http://codeascraft.com Talks @ Etsy: http://etsy.com/codeascraft/talks We’re hiring! http://etsy.com/careers Resources 39

Slide 40

Slide 40 text

References 40 Blameless PostMortems and a Just Culture: https://codeascraft.com/2012/05/22/blameless- postmortems/ Cook, Richard I. "How complex systems fail." Cognitive Technologies Laboratory, University of Chicago. Chicago IL (1998) Weick, Karl E. Sensemaking in organizations. Vol. 3. Sage, 1995. Tversky, Amos and Kahneman, Daniel. “Judgment under Uncertainty: Heuristics and Biases.” Science, September 1974, 185(4157), pp. 1124–31. ‘Life After Human Error’ Steven Shorrock, Velocity 2014 https://www.youtube.com/watch? v=STU3Or6ZU60 Rasmussen, Jens. "Risk management in a dynamic society: a modelling problem." Safety science 27.2 (1997): 183-213 ‘Revisiting the Swiss Cheese Model of Accidents’, J. Reason, E. Hollnagel, J. Paries Eurocontrol Oct 2006 Sidney Dekker. 2006. The Field Guide to Understanding Human Error. Ashgate Publishing Company, Brookfield, VT, USA.

Slide 41

Slide 41 text

Thankyou! Find me @ Challenge Your Peers 7 41