A talk I gave at PyCon US 2015
Andrew Godwin @andrewgodwinProgrammersLEARN FROMWHAT CANPilots?
View Slide
Andrew GodwinHi, I'mAuthor of 1.7 Django & South migrationsSenior Software Engineer atReally likes cheeseFAA & EASA PPL, working on IR
flickr.com/photos/russss/16735398019/
Learning about aviationApplying lessons to coding12
Commercial flying is very safeAIRLINESGA0.211.2CARS/TRUCKS 0.53MOTORCYCLES 15.6Source: 2005 Nall report, 2004 NHTSA stats, 1991-2000 FAA stats, 40mph avg. road speed(fatal accidents per million hours)General aviation is still not bad
PilotSource: 2005 Nall reportMechanicalOther76%16%9%GAACCIDENTCAUSES
COMMON CAUSESControlled flight into terrain (CFIT)Disorientation in clouds (VFR in IMC)Bad decision making (get-there-itis)
WHY DO I KNOW THIS?Detailed investigation of every accident
HOW DOES IT HELP US?Let's look at common problems
Soft FailureExplicit disengage signalsCovering inaccurate instrumentsReplacing parts at first sign of issues
Soft FailureCrash hard on any serious errorRedundancy, not single system reliabilityFreedom to get rid of servers whenever
Noisy WarningsLimited number of warning soundsClear, unambiguous text & speechNo constant low-level warnings
Noisy WarningsDon't email/notify on every tiny errorChoose 5 top errors, solve them firstIf you ignore it for a week, delete the warning
Poor TestingEvery part tested to destructionWell known statistical limitsKnowing when, not if, things fail
Image: © Boeing 2010
Poor TestingTest latency, memory issues, dodgynetwork and other unusual thingsInteractions are as important asindividual units
Automation RelianceTested without autopilot/instrumentsPlane usually advises, rarely controlsEasy to see what's happening and why
flickr.com/photos/wkharmon/4631001766
Automation RelianceDon't rely on magical automatic failoverRegularly practice manual recovery stepsKnow what your systems are doing
People RelianceChecklists for everythingWarnings built around common assumptionsReduce workload at critical times
People RelianceChecklists for releases/testing/onboardingAutomate common tasksReduce workload at critical times
Bad PrioritiesAviate, Navigate, CommunicateMinimum Equipment ListsMayday priority
Minimum Equipment QuizPassenger video screensLavatory ashtraysAir conditioningFuel recepticle capsSeatbelt signsWeather radar
Margaret Hamilton
Bad PrioritiesWhat are your critical features?What can you do without?Know what you want to fix first and test most
Unclear ResponsibilitySingle person always in commandOthers are always listened toClear, concise communication
Unclear ResponsibilitySingle person makes key decisionsOthers are always listened toClear specifications and expectations
Blame CultureThere is never a single cause of an accidentIndividual problems identified and addressedBlaming someone solves nothing
Blame CultureThere is never a single cause of a problemWork back and find all of the bad factorsBlaming people makes things worse
DeadlinesAlways carry extra fuelAlways have an alternateLand safely rather than at the destination
DeadlinesDon't schedule everyone at maximumAlways expect unknown problemsShip good code rather than to a deadline
Takeaways
ChecklistsFirst step before automation
Filter unimportant errorsKeep ignoring it? It's not important.
Pick your key featuresDon't worry about breaking minor stuff
Reward good decisionsIt's often not the people staying late
Ops are like pilotsBoredom punctuated by moments of terror
Thanks.Andrew Godwin@andrewgodwineventbrite.com/jobs