Upgrade to Pro — share decks privately, control downloads, hide ads and more …

AA261: DevOps lessons in collaborative maintenance

AA261: DevOps lessons in collaborative maintenance

On January 31, 2000, Alaska Airlines Flight 261 plunged into the Pacific ocean in an extreme "nose down" position, killing all 88 crew and passengers on board. The NTSB concluded AA261's horizontal stabiliser trim system's jackscrew was inadequately maintained, causing the pilots to lose all control of the plane.

There are striking parallels with the problems we face daily in IT operations & software development, and the 30 years of give and take between the aircraft manufacturer's engineers, airline maintenance staff, and federal regulators that preceded AA261's simple mechanical failure.

In this talk, Lindsay looks at the complex interplay between the parties in the AA261 crash through a DevOps lens, investigating the collaborative approach to maintenance and operation of the MD-83 aircraft, and relating the complexities back to the complex IT systems we build and maintain.

Lindsay Holmwood

March 11, 2013
Tweet

More Decks by Lindsay Holmwood

Other Decks in Technology

Transcript

  1. This is a maintenance accident. Alaska Airlines' maintenance and inspection

    of its horizontal stabilizer activation system was poorly conceived and woefully executed. The failure was compounded by poor oversight... had any of the managers, mechanics, inspectors, supervisors or FAA overseers whose job it was to protect this mechanism done their job conscientiously, this accident cannot happen. -- John J. Goglia, NTSB Board Member
  2. This is a maintenance accident. Alaska Airlines' maintenance and inspection

    of its horizontal stabilizer activation system was poorly conceived and woefully executed. The failure was compounded by poor oversight... had any of the managers, mechanics, inspectors, supervisors or FAA overseers whose job it was to protect this mechanism done their job conscientiously, this accident cannot happen. -- John J. Goglia, NTSB Board Member
  3. This is a maintenance accident. Alaska Airlines' maintenance and inspection

    of its horizontal stabilizer activation system was poorly conceived and woefully executed. The failure was compounded by poor oversight... had any of the managers, mechanics, inspectors, supervisors or FAA overseers whose job it was to protect this mechanism done their job conscientiously, this accident cannot happen. -- John J. Goglia, NTSB Board Member
  4. 1965 every 300-350 hours launch of DC-9 1985 every 700

    hours industry deregulation 1987 every 1000 hours industry standardisation 1991 every 1200 hours industry standardisation 1994 every 1600 hours industry standardisation 1996 every 8 months (2550 hours) Alaska Airlines policy change
  5. 1965 every 300-350 hours launch of DC-9 1985 every 700

    hours industry deregulation 1987 every 1000 hours industry standardisation 1991 every 1200 hours industry standardisation 1994 every 1600 hours industry standardisation 1996 every 8 months (2550 hours) Alaska Airlines policy change
  6. 1965 every 300-350 hours launch of DC-9 1985 every 700

    hours industry deregulation 1987 every 1000 hours industry standardisation 1991 every 1200 hours industry standardisation 1994 every 1600 hours industry standardisation 1996 every 8 months (2550 hours) Alaska Airlines policy change
  7. 1965 every 300-350 hours launch of DC-9 1985 every 700

    hours industry deregulation 1987 every 1000 hours industry standardisation 1991 every 1200 hours industry standardisation 1994 every 1600 hours industry standardisation 1996 every 8 months (2550 hours) Alaska Airlines policy change
  8. 1965 every 300-350 hours launch of DC-9 1985 every 700

    hours industry deregulation 1987 every 1000 hours industry standardisation 1991 every 1200 hours industry standardisation 1994 every 1600 hours industry standardisation 1996 every 8 months (2550 hours) Alaska Airlines policy change
  9. 1965 every 300-350 hours launch of DC-9 1985 every 700

    hours industry deregulation 1987 every 1000 hours industry standardisation 1991 every 1200 hours industry standardisation 1994 every 1600 hours industry standardisation 1996 every 8 months (2550 hours) Alaska Airlines policy change
  10. 1965 every 300-350 hours launch of DC-9 1985 every 700

    hours industry deregulation 1987 every 1000 hours industry standardisation 1991 every 1200 hours industry standardisation 1994 every 1600 hours industry standardisation 1996 every 8 months (2550 hours) Alaska Airlines policy change
  11. “people make what they consider to be the best decision

    based on available knowledge at the time”
  12. This is a maintenance accident. Alaska Airlines' maintenance and inspection

    of its horizontal stabilizer activation system was poorly conceived and woefully executed. The failure was compounded by poor oversight... had any of the managers, mechanics, inspectors, supervisors or FAA overseers whose job it was to protect this mechanism done their job conscientiously, this accident cannot happen. -- John J. Goglia, NTSB Board Member
  13. “God, our ops team are arseholes. I just want to

    deploy this change and go home!”
  14. “God, our ops team are arseholes. I just want to

    deploy this change and go home!” workload economy safety
  15. “God, our ops team are arseholes. I just want to

    deploy this change and go home!” workload economy safety workload economy safety
  16. “It’s 3am an the pager has gone off again. Why

    can’t these devs just write code that works?”
  17. workload economy safety “It’s 3am an the pager has gone

    off again. Why can’t these devs just write code that works?”
  18. workload economy safety “It’s 3am an the pager has gone

    off again. Why can’t these devs just write code that works?” workload economy safety
  19. Sidney Dekker [books] Field Guide to Understand Human Error Drift

    Into Failure Just Culture Dan Manges [blog] How incidents affect infrastructure priorities