Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Reckoning with the harm we do: In search of Res...

Reckoning with the harm we do: In search of Restorative Just Culture in software and web operations

Slides from my Qcon 22 talk presenting my research on harm and trauma in tech workers.

Avatar for Jessica DeVita

Jessica DeVita

March 23, 2023
Tweet

More Decks by Jessica DeVita

Other Decks in Technology

Transcript

  1. © by Jessica DeVita, 2022, all rights reserved Reckoning With

    the Harm We Do In Search of Restorative Just Culture in Software and Web Operations
  2. Having a Just Culture means that you’re making effort to

    balance safety and accountability. Having a “blameless” Post-Mortem process means that engineers whose actions have contributed to an accident can give a detailed account of: • what actions they took at what time, • what effects they observed, • expectations they had, • assumptions they had made, • and their understanding of timeline of events as they occurred. ...and that they can give this detailed account without fear of punishment or retribution. 2012 2022
  3. © by Jessica DeVita, 2022, all rights reserved @UberGeekGirl “Just

    Culture” is a culture in which front-line operators and others are not punished for actions, omissions or decisions taken by them which are commensurate with their experience and training, but where gross negligence, wilful violations and destructive acts are not tolerated https://skybrary.aero/bookshelf/just-culture-manifesto-2
  4. © by Jessica DeVita, 2022, all rights reserved @UberGeekGirl 3rd

    Victims - Incident Analysts/Investigator “...experience psychosocial harm as a result of indirect exposure to an incident, such as leading incident investigations.” They reported “…an almost complete lack of emotional support… the harm they experience goes unacknowledged. But this harm is clearly real. Respondents experienced anxiety, lost sleep, emotional exhaustion, and a sense of being blamed by everyone for events they weren’t involved in. This led some to consider leaving the profession.”
  5. © by Jessica DeVita, 2022, all rights reserved @UberGeekGirl What

    is your title and industry? Human Factors & Safety Specialist Engineering Management Software Engineer (including Sr, Staff, Lead, Director) Clinical Lab Quality Specialist SRE CTO Domain Architect Principal Product Owner Director of Cloud Operations Director, Security and Trust Team Lead, Incident Management Innovation Engineer Divisional Intervention Lead
  6. © by Jessica DeVita, 2022, all rights reserved @UberGeekGirl Not

    at all likely Very likely Not at all likely Very likely
  7. © by Jessica DeVita, 2022, all rights reserved @UberGeekGirl Have

    you felt harmed or traumatized from your involvement in incidents or outages?
  8. © by Jessica DeVita, 2022, all rights reserved @UberGeekGirl “Trauma

    results from an event or series of events, that is experienced by an individual as physically or emotionally harmful or threatening and that has lasting adverse effects on the individual’s functioning and physical, social, emotional, or spiritual well-being” SAMHSA, 2012, p. 2 Trauma-Informed Care in Behavioral Health Services
  9. © by Jessica DeVita, 2022, all rights reserved @UberGeekGirl Very

    few participants said they had not felt harmed or traumatized. “Outages are considered an opportunity to learn as well as share what you learned. If an outage is called people will volunteer to be the IC or chime in *Hands!* to let people know they're available to help. Even with upper management there's very little negativity or blame” - P16
  10. © by Jessica DeVita, 2022, all rights reserved @UberGeekGirl A

    majority of participants described experiencing harm or trauma. The impact was not limited to them, they described how their relationships with family were impacted as well.
  11. © by Jessica DeVita, 2022, all rights reserved @UberGeekGirl Punitive

    Culture “CTO told me to take the blame for an incident or he'd fire me” - P12 “I remember getting yelled at by the CEO” - P24 “The 5 Whys process we have to deal with incidents inevitably leads to the determination of a "root cause" that more often than not ends with "human error”- P10 “The postmortems were called "recrimination meetings” - P23
  12. © by Jessica DeVita, 2022, all rights reserved @UberGeekGirl Punitive

    Culture “In the aftermath of that incident, our COO asked me "Why did you release the software?". I told him we’d done all these tests and we thought we were in good shape but we missed it. And then he said again, "Why did you release the software?" I said “I made a mistake, an error in judgment”. This is an hour long meeting by the way, but the 3rd time, he asked again, "Why did you do this?" and I said, “I don't know what to tell you. I screwed up. I don't know what more you want from me”. And it wasn't like I got fired or anything like that, but I definitely felt blamed - P21
  13. © by Jessica DeVita, 2022, all rights reserved @UberGeekGirl Trauma

    - Mental and Physical “I have been traumatized for being involved in incidents for the past 5 years, to the point that every time I see an alert today I get an anxiety attack. - P20
  14. © by Jessica DeVita, 2022, all rights reserved @UberGeekGirl Trauma

    - Mental and Physical “I remember my lower back hurting because of the amount of adrenaline I'd been running on for the past 36 hours - P12
  15. © by Jessica DeVita, 2022, all rights reserved @UberGeekGirl Trauma

    - Mental and Physical “what really was a short incident, but just the confluence of when I was woken up and then the rush of adrenaline for the incident. By the way, when I tried to get back to sleep, my adrenaline was still going, so it took me a little while to get back to sleep. It wiped me out for the whole day - R24, Focus Group 2
  16. © by Jessica DeVita, 2022, all rights reserved @UberGeekGirl Trauma

    - Blaming oneself Yes, but it was completely self-inflicted. I felt terrible that I didn't have enough answers to help solve the issue, and we were reliant on other people on our team who were inaccessible, and I felt terrible that I was on call and in theory had the most knowledge of the app that had the error but I still felt totally lost.
  17. © by Jessica DeVita, 2022, all rights reserved @UberGeekGirl “I

    had long lasting (probably minor) trauma from an outage I was instrumental in solving. Even though everything went as well as it probably could have… it still lingered for a long time. It probably only went away after some of the larger, harder mitigations were done almost 2 years later.” - P31 Trauma - Blaming oneself
  18. © by Jessica DeVita, 2022, all rights reserved @UberGeekGirl My

    input wasn't heard during an outage. When I raised this during the postmortem, the on-call manager asked if I'd made myself clear enough. I feel a lot warier and more guarded after both the outage and the postmortem. - P33 Trauma - Feeling unheard
  19. © by Jessica DeVita, 2022, all rights reserved @UberGeekGirl Trauma

    - Mental and Physical: Sleep “It was very stressful and gave me a lot of anxiety which led to a loss of sleep and; I would say, a more profound sense of disengagement from the workplace” - P11
  20. © by Jessica DeVita, 2022, all rights reserved @UberGeekGirl Q:

    What’s your relationship with sleep like? “Sleep damage makes it very difficult for me to be on-call; when a page wakes me up, I generally do not sleep afterwards.” - R15, Focus Group 1 Q: Is sleep discussed by management? “Formal discussion of sleep doesn’t happen, because it’s a very dangerous discussion.” - R15, Focus Group 1 Trauma - Mental and Physical: Sleep
  21. © by Jessica DeVita, 2022, all rights reserved @UberGeekGirl https://commons.wikimedia.org/wiki/File:Eilat_Dolphin_Reef_(3).jpg

    “It’s like I’m asleep with one eye open. Don’t dolphins do that? It’s like half my brain is at work, ready and poised to respond to the pager, while the other half of my brain is trying to relax - P24, Focus Group 2
  22. © by Jessica DeVita, 2022, all rights reserved @UberGeekGirl Sleep

    is very holy to me. I’m not on-call right now. I have a hard time separating, work and non-work stuff and so being on-call right now, I couldn't cope with it in a healthy way.” - P5 “Not being able to go to sleep at a reasonable hour takes its toll on your mental abilities” - P6 Trauma - Mental and Physical: Sleep
  23. © by Jessica DeVita, 2022, all rights reserved @UberGeekGirl Impact

    on Relationships with Family “ Exhausted, spent, drained. Guilty for working and [being] away from my children” - P19 “Scheduling of shifts is never done with any consideration for family events” - R23
  24. © by Jessica DeVita, 2022, all rights reserved @UberGeekGirl “They

    were unable to count on me being able to participate through the duration of some event, whether that be a meal or anything else… that interruption became a highly negative, emotionally charged topic” - P23
  25. © by Jessica DeVita, 2022, all rights reserved @UberGeekGirl Coping….

    (or not) “I mostly get angry and try to change our industry as a way to cope. Using it as fuel so that I feel better knowing at least I have helped others not deal with it alone too” - P6 “It made me realize how wrong we were doing incident management learnings, and I used that as motivation to completely change the way our company thought and dealt with incidents. The resentment I felt from those experiences turned into fuel.” - P11 “Generally I'm not coping well. Having a good support network and therapy helps but this industry can be absolute s*%# - P12
  26. © by Jessica DeVita, 2022, all rights reserved @UberGeekGirl 2nd

    Victims - Engineers involved in the incident “These individuals may suffer significant emotional harm regardless of whether their actions actually contributed to the incident – or whether it was preventable at all. The impact on 2nd victims can be severe, and may take the form of signs and symptoms associated with acute stress syndrome or post-traumatic stress disorder.”
  27. © by Jessica DeVita, 2022, all rights reserved @UberGeekGirl 3rd

    Victims - Incident Analysts/Investigator “...experience psychosocial harm as a result of indirect exposure to an incident, such as leading incident investigations.” They reported “…an almost complete lack of emotional support… the harm they experience goes unacknowledged. But this harm is clearly real. Respondents experienced anxiety, lost sleep, emotional exhaustion, and a sense of being blamed by everyone for events they weren’t involved in. This led some to consider leaving the profession.”
  28. © by Jessica DeVita, 2022, all rights reserved @UberGeekGirl Organization

    as a Victim “Organizations certainly can suffer reputational, economic, and even cultural harm after adverse events, and effective crisis management is important. But we argue that “corporate victimhood” is qualitatively different from psychosocial harm experienced by individual human beings (the hallmark of second victim)... [Organizations] do not experience acute stress syndrome, though their employees might. [Organizations] do not burn out and leave the profession, although their employees might.” (Holden & Card, 2019)
  29. © by Jessica DeVita, 2022, all rights reserved @UberGeekGirl Educate

    Management “Just give us a break” “Education for senior management around learning from incidents, accident models, helpful and unhelpful behavior including language.” - P5 “Better training for managers on how to manage their staff, several managers simply shouldn't be managing people.” - P26 “We would need really serious culture change from the top, but there is no appetite for that.” - P30
  30. © by Jessica DeVita, 2022, all rights reserved @UberGeekGirl “

    It has to be six people minimum to do a somewhat humane rotation. And I think managers get very upset when I say that. - R15
  31. © by Jessica DeVita, 2022, all rights reserved @UberGeekGirl Training

    - sustainable rotations More training including a including a buddy system for engineers new to on call support - P19 “Regular "fire drills" or empowerment training so it doesn't feel like a total shock and scare and unfamiliar when you're a frontend engineer and suddenly have to parse through APM logs like you're a DevOps person at 1:00AM“ - P29
  32. © by Jessica DeVita, 2022, all rights reserved @UberGeekGirl Focus

    on Learning Instead of Blaming “Allowing for human nature to thrive instead of sanctioning individuals due to the complexity of systems.” - P24 “Adopting investigation approaches that are capable of uncovering systemic challenges” - P5
  33. © by Jessica DeVita, 2022, all rights reserved @UberGeekGirl Help

    people who may be blaming themselves “How do you offset guilt? That's extremely challenging, because it requires so much personal attention, time, and care. When placed in the hands of a company, that might be impractical to the point of impossible, as a company can't feel. - P28
  34. © by Jessica DeVita, 2022, all rights reserved @UberGeekGirl Take

    care of people - Talk about what happened “Personal outreach and giving people an opportunity to talk through things afterwards is very helpful, but also quite rare. - P23 “Provide a safe space to talk about what happened” - P13 “Our incident review template has a section on human responder impact with questions to prompt people to thing about wellbeing, impact on sleep, family, and personal life change events.” - P5
  35. © by Jessica DeVita, 2022, all rights reserved @UberGeekGirl What

    is “Blameless”? What does it mean to you?
  36. © by Jessica DeVita, 2022, all rights reserved @UberGeekGirl Blameless

    is a behavior “Blameless means not pointing fingers.” - P21 “Blameless means we don't accept "so and so messed up" as the root cause of an incident” - P22 No names - P18 No shame - P23
  37. © by Jessica DeVita, 2022, all rights reserved @UberGeekGirl Blameless

    is just a word “Blameless is just something people say nowadays, like "we do agile" or psychological safety. It would be weird not to do it. A lot of folks don’t understand what it takes. I've seen some very blameful conclusions come out of blameless postmortems. You can still see the blame in the language and the actions” - P5 “Blameless" has been Agile-ified to mean whatever the person in charge wants it to mean”- P12
  38. © by Jessica DeVita, 2022, all rights reserved @UberGeekGirl Blameless

    is just a word “Blameless is a squishy marketing term used by a part of the safety community to try to make blame attributed to frontline workers go down. Hard to tell how well it succeeded. - P6
  39. © by Jessica DeVita, 2022, all rights reserved @UberGeekGirl Blameless

    - Acceptance and Recognition “Accept that all involved were doing the best they knew how, and the incidents occur because of systems factors and systemic pressures, not individual mistakes” - P30 “Recognize we are human, recognize how blame occurs, what it can tell us about how our brain recognizes patterns, and transform that into something more useful” - P24 “Blameless recognizes that software is hard and mistakes happen generally because of the system, not the individual.” - P14
  40. © by Jessica DeVita, 2022, all rights reserved @UberGeekGirl “Blameless

    means the emptiness of Blame, not the dissolution of it. It means accepting that blame will happen as a natural result of a fleeting human emotional reaction, and that we should see it as a doorway for inquiry -P24 A doorway into the garden at Berrington Hall by Rod Allday https://commons.wikimedia.org/wiki/File:A_doorway_into_the_garden_at_Berrington_Hall_-_geograph.org.uk_-_3928141.jpg
  41. © by Jessica DeVita, 2022, all rights reserved @UberGeekGirl What

    is “Accountability”? What does it mean to you?
  42. © by Jessica DeVita, 2022, all rights reserved @UberGeekGirl Accountability

    as a Capability “The capability to accept your own role in an event and your ability to go through restorative steps with other people involved.” - P6 “People have control over their work but also the responsibility for it - without both of those concepts, things fall apart.” - P12 “You can be doing the best you can and still fall short of your goals, but that's where we help each other out and don't beat each other up when we miss.” - P29
  43. © by Jessica DeVita, 2022, all rights reserved @UberGeekGirl Capability:

    Account Giving “Forthrightly being able to recount (account) decisions and actions that were taken” - P23 “Responsibility, ownership, and that a person has the chance to explain what happened when they take accountability for a thing or incident.” - P27
  44. © by Jessica DeVita, 2022, all rights reserved @UberGeekGirl Accountability

    - It takes a team “We all pitch in to get service restored as quickly as possible and to figure out how to improve the situation in the future.” - P21 “There is a group of individuals that are stewards of a given system/service, and are committed to its improvement and long term sustainability.” - P10
  45. © by Jessica DeVita, 2022, all rights reserved @UberGeekGirl Accountability

    as Prevention “Accepting that existing tools or procedures failed to prevent the incident and spending time fixing those issues as a team/department before moving onto more exciting work.” - P22 “Taking action so that the system cannot allow this mistake again.” - P14
  46. © by Jessica DeVita, 2022, all rights reserved @UberGeekGirl Accountability

    - From Self “Unpopular opinion - I think accountability can only come from yourself. You can "hold someone accountable" but that's typically punitive in nature. Accountability being a self imposed action means "I'm going to take steps to educate myself and hopefully others around me to the best of my abilities such that future events in this space are mitigated based on what I've learned.” - P28
  47. © by Jessica DeVita, 2022, all rights reserved @UberGeekGirl Is

    there conflict in “blameless” and “accountability”?
  48. © by Jessica DeVita, 2022, all rights reserved @UberGeekGirl Is

    there conflict in “blameless” and “accountability”? Yes“: Some events aren’t blameless. When someone intentionally violates a policy or is malicious then there should be accountability. When policies and culture don’t support staff in being successful, I can see where blameless may be OK” - P4 No: “Despite the fact that blame serves a social function, "blameless" and "accountable" are not necessarily at odds especially if organizations want to learn from incidents, and explore why locally rational decisions, that may have been successful until there was an incident, had surprising effects.” - P10
  49. © by Jessica DeVita, 2022, all rights reserved @UberGeekGirl Is

    there conflict in “blameless” and “accountability”? Accountability is the thing revealed to us when blame is but a passing phase instead of a concrete resting point. We discover that accountability is a plurality! It takes a team to be accountable in complex systems. So how can we treat blame as anything but a film that we allow ourselves to recognize, politely remove, and move on?” - P24
  50. © by Jessica DeVita, 2022, all rights reserved @UberGeekGirl Even

    in organizations that practice “blameless” or claim to have a “Just Culture”, that doesn’t stop people from blaming themselves.
  51. © by Jessica DeVita, 2022, all rights reserved @UberGeekGirl We

    need to support 2nd and 3rd victims Make it safe to report hazards/unsafe environments confidentially Peer support Human factors experts Staff psychologist
  52. © by Jessica DeVita, 2022, all rights reserved @UberGeekGirl “…accountability

    can also be forward-looking (Sharpe, 2003). Restorative justice achieves accountability by listening to multiple accounts and looking ahead at what must be done to repair the trust and relationships that were harmed.” - Sidney Dekker https://commons.wikimedia.org/wiki/File:Wikipedian_looking_to_forward_coming_lemmas.jpg
  53. © by Jessica DeVita, 2022, all rights reserved @UberGeekGirl “Perhaps

    operators involved in mishaps could be held ” accountable” by inviting them to tell their story (their ” account”), systematizing and distributing the lessons in it, and using this to sponsor vicarious learning for all. Perhaps such notions of accountability would be better able to move us in the direction of an as yet elusive blame-free culture.” - Sidney Dekker
  54. © by Jessica DeVita, 2022, all rights reserved @UberGeekGirl Tell

    people what you mean - and try to understand what people mean when they say words like “accountability” Our words matter. Our words have consequences. Our words help conjure up worlds for other people… These are worlds where our words attain representational powers that go way beyond the innocuous operationalism we might have intended for them. These are worlds in which real people—professional practitioners—are put in harm’s way by what we come up with. We cannot just walk away from that.” Sidney Dekker, 2015 The danger of losing situation awareness
  55. © by Jessica DeVita, 2022, all rights reserved @UberGeekGirl Dedicated

    to my friend Dr. Richard Cook 1953-2022 “There is no such thing as “Just Culture” There’s just culture, and where complex system failure has occurred, culture plays out predictably. It’s more about the power dynamics. Reserving the decision about what is acceptable and calling that ‘Just’ is a species of nonsense… “Just Culture” is almost entirely a fig leaf for the usual management blame assignment. Justice is in the eye of the beholder
  56. © by Jessica DeVita, 2022, all rights reserved @UberGeekGirl References

    https://skybrary.aero/enhancing-safety/just-culture/about-just-culture/just-culture-manifesto https://safetydifferently.com/restorative-just-culture-checklist/ https://codeascraft.com/2012/05/22/blameless-postmortems/ https://humanisticsystems.com/2014/09/30/safety-ii-and-just-culture-where-now/ McCall, J. R., & Pruchnicki, S. (2017). Just culture: A case study of accountability relationship boundaries influence on safety in high- consequence industries. Safety Science, 94, 143–151. Dekker, S. W. A. (2003). When human error becomes a crime. Human Factors and Aerospace Safety, 3(1), 83-92. Cook, R.I., (2019) Learning from Incidents Woods, David. (2005). Conflicts between Learning and Accountability in Patient Safety. De Paul law review. 54. 485-502. Dekker, S. W. A., & Breakey, H. (2016). “Just culture:” Improving safety by achieving substantive, procedural and restorative justice. Safety Science, 85, 187–193. Dekker, S. W. A. (2015). The danger of losing situation awareness. Cognition, Technology & Work , 17(2), 159–161.Holden, J., & Card, A. J. (2019). Patient safety professionals as the third victims of adverse events. Journal of Patient Safety and Risk Management, 24(4), 166– 175. Sharpe, V. A. (2003). Promoting patient safety: an ethical basis for policy deliberation. The Hastings Center Report, 33(5), S3.
  57. © by Jessica DeVita, 2022, all rights reserved @UberGeekGirl “I

    lost 2 people out of the team mentioned above due to heavy on call burden.”
  58. © by Jessica DeVita, 2022, all rights reserved @UberGeekGirl “[Just

    Culture] should evolve… JC should focus on a mindset of trust, mutual understanding and openness, as well as language that is non-blaming. This should apply not only ‘vertically’ (e.g. between managers and workers); it should apply between all of us. Assuming goodwill should not be only a response to adverse events, but a baseline assumption, especially when things don’t go our our way. Whatever our view of the human – as hazard or resource – just culture reminds us that we are human, and that we need to be mindful of our reactions to failure.” - Steven Shorrock