Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Andrew Godwin - What can programmers learn from pilots?

Andrew Godwin - What can programmers learn from pilots?

What can Python-based software teams learn from aviation? Why should software always fail hard? What's wrong with too many error logs? And why are ops people already like pilots? Learn all this, and about planes, too.

https://us.pycon.org/2015/schedule/presentation/375/

PyCon 2015

April 18, 2015
Tweet

More Decks by PyCon 2015

Other Decks in Programming

Transcript

  1. Andrew Godwin @andrewgodwin
    Programmers
    LEARN FROM
    WHAT CAN
    Pilots?

    View full-size slide

  2. Andrew Godwin
    Hi, I'm
    Author of 1.7 Django & South migrations
    Senior Software Engineer at
    Really likes cheese
    FAA & EASA PPL, working on IR

    View full-size slide

  3. flickr.com/photos/russss/16735398019/

    View full-size slide

  4. Learning about aviation
    Applying lessons to coding
    1
    2

    View full-size slide

  5. Commercial flying is very safe
    AIRLINES
    GA
    0.2
    11.2
    CARS/TRUCKS 0.53
    MOTORCYCLES 15.6
    Source: 2005 Nall report, 2004 NHTSA stats, 1991-2000 FAA stats, 40mph avg. road speed
    (fatal accidents per million hours)
    General aviation is still not bad

    View full-size slide

  6. Pilot
    Source: 2005 Nall report
    Mechanical
    Other
    76%
    16%
    9%
    GA
    ACCIDENT
    CAUSES

    View full-size slide

  7. COMMON CAUSES
    Controlled flight into terrain (CFIT)
    Disorientation in clouds (VFR in IMC)
    Bad decision making (get-there-itis)

    View full-size slide

  8. WHY DO I KNOW THIS?
    Detailed investigation of every accident

    View full-size slide

  9. HOW DOES IT HELP US?
    Let's look at common problems

    View full-size slide

  10. Soft Failure
    Explicit disengage signals
    Covering inaccurate instruments
    Replacing parts at first sign of issues

    View full-size slide

  11. Soft Failure
    Crash hard on any serious error
    Redundancy, not single system reliability
    Freedom to get rid of servers whenever

    View full-size slide

  12. Noisy Warnings
    Limited number of warning sounds
    Clear, unambiguous text & speech
    No constant low-level warnings

    View full-size slide

  13. Noisy Warnings
    Don't email/notify on every tiny error
    Choose 5 top errors, solve them first
    If you ignore it for a week, delete the warning

    View full-size slide

  14. Poor Testing
    Every part tested to destruction
    Well known statistical limits
    Knowing when, not if, things fail

    View full-size slide

  15. Image: © Boeing 2010

    View full-size slide

  16. Poor Testing
    Test latency, memory issues, dodgy
    network and other unusual things
    Interactions are as important as
    individual units

    View full-size slide

  17. Automation Reliance
    Tested without autopilot/instruments
    Plane usually advises, rarely controls
    Easy to see what's happening and why

    View full-size slide

  18. flickr.com/photos/wkharmon/4631001766

    View full-size slide

  19. Automation Reliance
    Don't rely on magical automatic failover
    Regularly practice manual recovery steps
    Know what your systems are doing

    View full-size slide

  20. People Reliance
    Checklists for everything
    Warnings built around common assumptions
    Reduce workload at critical times

    View full-size slide

  21. People Reliance
    Checklists for releases/testing/onboarding
    Automate common tasks
    Reduce workload at critical times

    View full-size slide

  22. Bad Priorities
    Aviate, Navigate, Communicate
    Minimum Equipment Lists
    Mayday priority

    View full-size slide

  23. Minimum Equipment Quiz
    Passenger video screens
    Lavatory ashtrays
    Air conditioning
    Fuel recepticle caps
    Seatbelt signs
    Weather radar

    View full-size slide

  24. Minimum Equipment Quiz
    Passenger video screens
    Lavatory ashtrays
    Air conditioning
    Fuel recepticle caps
    Seatbelt signs
    Weather radar

    View full-size slide

  25. Margaret Hamilton

    View full-size slide

  26. Bad Priorities
    What are your critical features?
    What can you do without?
    Know what you want to fix first and test most

    View full-size slide

  27. Unclear Responsibility
    Single person always in command
    Others are always listened to
    Clear, concise communication

    View full-size slide

  28. Unclear Responsibility
    Single person makes key decisions
    Others are always listened to
    Clear specifications and expectations

    View full-size slide

  29. Blame Culture
    There is never a single cause of an accident
    Individual problems identified and addressed
    Blaming someone solves nothing

    View full-size slide

  30. Blame Culture
    There is never a single cause of a problem
    Work back and find all of the bad factors
    Blaming people makes things worse

    View full-size slide

  31. Deadlines
    Always carry extra fuel
    Always have an alternate
    Land safely rather than at the destination

    View full-size slide

  32. Deadlines
    Don't schedule everyone at maximum
    Always expect unknown problems
    Ship good code rather than to a deadline

    View full-size slide

  33. Checklists
    First step before automation

    View full-size slide

  34. Filter unimportant errors
    Keep ignoring it? It's not important.

    View full-size slide

  35. Pick your key features
    Don't worry about breaking minor stuff

    View full-size slide

  36. Reward good decisions
    It's often not the people staying late

    View full-size slide

  37. Ops are like pilots
    Boredom punctuated by moments of terror

    View full-size slide

  38. Thanks.
    Andrew Godwin
    @andrewgodwin
    eventbrite.com/jobs

    View full-size slide