Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Andrew Godwin - What can programmers learn from pilots?

Andrew Godwin - What can programmers learn from pilots?

What can Python-based software teams learn from aviation? Why should software always fail hard? What's wrong with too many error logs? And why are ops people already like pilots? Learn all this, and about planes, too.

https://us.pycon.org/2015/schedule/presentation/375/

PyCon 2015

April 18, 2015
Tweet

More Decks by PyCon 2015

Other Decks in Programming

Transcript

  1. Andrew Godwin @andrewgodwin
    Programmers
    LEARN FROM
    WHAT CAN
    Pilots?

    View Slide

  2. Andrew Godwin
    Hi, I'm
    Author of 1.7 Django & South migrations
    Senior Software Engineer at
    Really likes cheese
    FAA & EASA PPL, working on IR

    View Slide

  3. flickr.com/photos/russss/16735398019/

    View Slide

  4. Learning about aviation
    Applying lessons to coding
    1
    2

    View Slide

  5. Commercial flying is very safe
    AIRLINES
    GA
    0.2
    11.2
    CARS/TRUCKS 0.53
    MOTORCYCLES 15.6
    Source: 2005 Nall report, 2004 NHTSA stats, 1991-2000 FAA stats, 40mph avg. road speed
    (fatal accidents per million hours)
    General aviation is still not bad

    View Slide

  6. Pilot
    Source: 2005 Nall report
    Mechanical
    Other
    76%
    16%
    9%
    GA
    ACCIDENT
    CAUSES

    View Slide

  7. COMMON CAUSES
    Controlled flight into terrain (CFIT)
    Disorientation in clouds (VFR in IMC)
    Bad decision making (get-there-itis)

    View Slide

  8. WHY DO I KNOW THIS?
    Detailed investigation of every accident

    View Slide

  9. HOW DOES IT HELP US?
    Let's look at common problems

    View Slide

  10. Soft Failure
    Explicit disengage signals
    Covering inaccurate instruments
    Replacing parts at first sign of issues

    View Slide

  11. Soft Failure
    Crash hard on any serious error
    Redundancy, not single system reliability
    Freedom to get rid of servers whenever

    View Slide

  12. Noisy Warnings
    Limited number of warning sounds
    Clear, unambiguous text & speech
    No constant low-level warnings

    View Slide

  13. Noisy Warnings
    Don't email/notify on every tiny error
    Choose 5 top errors, solve them first
    If you ignore it for a week, delete the warning

    View Slide

  14. Poor Testing
    Every part tested to destruction
    Well known statistical limits
    Knowing when, not if, things fail

    View Slide

  15. Image: © Boeing 2010

    View Slide

  16. Poor Testing
    Test latency, memory issues, dodgy
    network and other unusual things
    Interactions are as important as
    individual units

    View Slide

  17. Automation Reliance
    Tested without autopilot/instruments
    Plane usually advises, rarely controls
    Easy to see what's happening and why

    View Slide

  18. flickr.com/photos/wkharmon/4631001766

    View Slide

  19. Automation Reliance
    Don't rely on magical automatic failover
    Regularly practice manual recovery steps
    Know what your systems are doing

    View Slide

  20. People Reliance
    Checklists for everything
    Warnings built around common assumptions
    Reduce workload at critical times

    View Slide

  21. View Slide

  22. People Reliance
    Checklists for releases/testing/onboarding
    Automate common tasks
    Reduce workload at critical times

    View Slide

  23. Bad Priorities
    Aviate, Navigate, Communicate
    Minimum Equipment Lists
    Mayday priority

    View Slide

  24. Minimum Equipment Quiz
    Passenger video screens
    Lavatory ashtrays
    Air conditioning
    Fuel recepticle caps
    Seatbelt signs
    Weather radar

    View Slide

  25. Minimum Equipment Quiz
    Passenger video screens
    Lavatory ashtrays
    Air conditioning
    Fuel recepticle caps
    Seatbelt signs
    Weather radar

    View Slide

  26. Margaret Hamilton

    View Slide

  27. Bad Priorities
    What are your critical features?
    What can you do without?
    Know what you want to fix first and test most

    View Slide

  28. Unclear Responsibility
    Single person always in command
    Others are always listened to
    Clear, concise communication

    View Slide

  29. Unclear Responsibility
    Single person makes key decisions
    Others are always listened to
    Clear specifications and expectations

    View Slide

  30. Blame Culture
    There is never a single cause of an accident
    Individual problems identified and addressed
    Blaming someone solves nothing

    View Slide

  31. Blame Culture
    There is never a single cause of a problem
    Work back and find all of the bad factors
    Blaming people makes things worse

    View Slide

  32. Deadlines
    Always carry extra fuel
    Always have an alternate
    Land safely rather than at the destination

    View Slide

  33. Deadlines
    Don't schedule everyone at maximum
    Always expect unknown problems
    Ship good code rather than to a deadline

    View Slide

  34. Takeaways

    View Slide

  35. Checklists
    First step before automation

    View Slide

  36. Filter unimportant errors
    Keep ignoring it? It's not important.

    View Slide

  37. Pick your key features
    Don't worry about breaking minor stuff

    View Slide

  38. Reward good decisions
    It's often not the people staying late

    View Slide

  39. Ops are like pilots
    Boredom punctuated by moments of terror

    View Slide

  40. Thanks.
    Andrew Godwin
    @andrewgodwin
    eventbrite.com/jobs

    View Slide