20+ years of software experience • Specialities: ◦ System dynamics ◦ Applied formal methods ◦ Architecture and system rescue • Engineering manager at Rebellion Defense
of safety ◦ Safety as ergonomics and systems thinking • The Role of Models and Specification in Safety • Building Models ◦ Common problems ◦ Approaches ◦ Verification hot spots • Drafting a Model ◦ Ground to Takeoff transition ◦ Develop a model ◦ Feedback and discussion
Interactive! ◦ Small group work ◦ Be prepared to move a little bit • Please respect everyone’s threat models ◦ If asked to mask up by neighbors, please do so. I have masks available • Join the channel #workshop-safety • Make sure you have a scrap paper pack and a pen ;)
it unsafe if it’s a contributing factor? ◦ 1.4 million accidents involving flip flops (Sheilas' Wheels, 2013) ◦ An estimated 1,200 people die in fatal muggings over their sneakers (GQ, 2015) • Is it unsafe if the harm is intentional? (CDC, 2013) ◦ 99.4% of car deaths are accidental ◦ Only 4% of gun deaths are accidental. ◦ 65% of gun fatalities are suicides.
traditional view of Safety Critical: ◦ Likelihood of hazard (SLO) ◦ Unit testing ◦ Configuration control ◦ Formal change management ◦ Software of unknown pedigree • Specific to industries: ◦ Aerospace: DO-178C ◦ Rail: EN 50126, EN 50129 ◦ Automotive: ISO 26262 ◦ Nuclear: IEC 61513 • IEC 61508 ← closest thing to a general standard
engineering focuses on formally verifying that the technology will adhere to its requirements in all situations • When it’s impossible to verify safety engineering estimates the likelihood of failure and assembles an acceptable risk budget. Similar to an SLO ◦ Hardware can always break ◦ “Soft” real time constraints • Traditional safety engineering relies on the requirements of safe operation being clearly defined. • As systems get more intertwined and complex we end up with more software that we do not think of a “safety critical” but can nevertheless cause problems.
◦ Does the operator understand the risks of using the technology in a particular way? ◦ Does the technology hide something about the context ▪ Distraction ▪ Misleading/Confusing • Mitigation ◦ Can the operator stop unsafe events in progress? ◦ Is operation predictable/deterministic?
assurance of isolation between applications running in the system • Model system in AADL (Architecture Analysis & Design Language) • Check model • Separate functions into verifiable components • Write components in domain specific language that eliminate bad habits in C
◦ Frontend where people enter code ◦ Button to run/execute code ◦ Results displayed back to the user • Examples ◦ https://go.dev/play/ ◦ https://www.ideone.com/ ◦ https://www.codiva.io/
a system looks ◦ What hardware is there? ◦ What instances are there? ◦ What protocols/APIs do they interact over? • This doesn’t tell us anything about how the system behaves, which is what we need to verify.
areas of behavior are important ◦ How consistent is your scope across the model? ◦ How do we know the generalization are correct? • Only shows you problems you already know about • Thinking in axioms: ◦ We’re making an assumption that this state is always true ◦ We document and monitor it.
◦ Identity ◦ Policy ◦ Shared resources (memory) • Will processes return in time? ◦ Time deadlines ◦ Concurrency issues • How do we transition between states? ◦ What is the correct behavior? ◦ What is the impossible behavior?
• Normal • Hand • Manual • Flightplan • Followme • Lookat • Point of Interest (POI) • Return to Home (RTH) • Normal • Hand • Critical RTH • Critical Landing • Emergency Landing • Emergency Ground
◦ What are the components (software, sensors, hardware) ◦ What are the steps that create the transition? ◦ What controls or fall backs are involved? • Design some tests ◦ What should be impossible? ◦ What should eventually be true? ◦ What should always be true? ◦ How do we know?
health • Check for obstructions Impossible: • Takeoff if hardware in failure • Takeoff if operator too close Fallback: • If sensor fails, do not allow takeoff
Moore machines: f(state) → state’ ◦ Mealy machines: f(state, input) → state’ • A state is a product of a previous state or a previous state AND an input • What are the inputs to the component? ◦ Distance sensor: query from flight supervisor ◦ Does Pending + Query = a distinct state not otherwise recorded?
with a Pending state to produce a state we don’t otherwise know about. • But this depends on the system and the behavior we wish to model • Database ◦ Inputs: Read query, write query ◦ Pending write + Read query = blocked
◦ Numbers • Again, the key issue is distinct state change, so to model these cases we tend to sort those inputs into categories ◦ Valid/invalid ◦ Less than threshold/More than threshold • Typically we don’t use magic numbers in specs but it’s okay to do that when you’re starting out.
◦ What are the components (software, sensors, hardware) ◦ What are the inputs of each component? ◦ What are the steps that create the transition? ◦ What controls or fall backs are involved? • Design some tests ◦ What should be impossible? ◦ What should eventually be true? ◦ What should always be true? ◦ How do we know?