Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Building Safety Critical Systems

Avatar for Marianne Bellotti Marianne Bellotti
September 22, 2022
170

Building Safety Critical Systems

A two hour workshop on safety in the software engineering space done as part of Strange Loop 2022

Avatar for Marianne Bellotti

Marianne Bellotti

September 22, 2022
Tweet

Transcript

  1. About Me • Author of “Kill It With Fire” •

    20+ years of software experience • Specialities: ◦ System dynamics ◦ Applied formal methods ◦ Architecture and system rescue • Engineering manager at Rebellion Defense
  2. The Workshop • Overview of Safety ◦ Traditional engineering idea

    of safety ◦ Safety as ergonomics and systems thinking • The Role of Models and Specification in Safety • Building Models ◦ Common problems ◦ Approaches ◦ Verification hot spots • Drafting a Model ◦ Ground to Takeoff transition ◦ Develop a model ◦ Feedback and discussion
  3. The Workshop • Workshop not a two hour lecture! ◦

    Interactive! ◦ Small group work ◦ Be prepared to move a little bit • Please respect everyone’s threat models ◦ If asked to mask up by neighbors, please do so. I have masks available • Join the channel #workshop-safety • Make sure you have a scrap paper pack and a pen ;)
  4. What Do We Mean When We Say “Safe”? • Is

    it unsafe if it’s a contributing factor? ◦ 1.4 million accidents involving flip flops (Sheilas' Wheels, 2013) ◦ An estimated 1,200 people die in fatal muggings over their sneakers (GQ, 2015) • Is it unsafe if the harm is intentional? (CDC, 2013) ◦ 99.4% of car deaths are accidental ◦ Only 4% of gun deaths are accidental. ◦ 65% of gun fatalities are suicides.
  5. What Do We Mean When We Say “Safe”? • The

    traditional view of Safety Critical: ◦ Likelihood of hazard (SLO) ◦ Unit testing ◦ Configuration control ◦ Formal change management ◦ Software of unknown pedigree • Specific to industries: ◦ Aerospace: DO-178C ◦ Rail: EN 50126, EN 50129 ◦ Automotive: ISO 26262 ◦ Nuclear: IEC 61513 • IEC 61508 ← closest thing to a general standard
  6. What Do We Mean When We Say “Safe”? • The

    traditional view of Safety Critical: Credit: CMU SEI, 2013
  7. What Do We Mean When We Say “Safe”? • Safety

    engineering focuses on formally verifying that the technology will adhere to its requirements in all situations • When it’s impossible to verify safety engineering estimates the likelihood of failure and assembles an acceptable risk budget. Similar to an SLO ◦ Hardware can always break ◦ “Soft” real time constraints • Traditional safety engineering relies on the requirements of safe operation being clearly defined. • As systems get more intertwined and complex we end up with more software that we do not think of a “safety critical” but can nevertheless cause problems.
  8. What Do We Mean When We Say “Safe”? • Risk

    ◦ Does the operator understand the risks of using the technology in a particular way? ◦ Does the technology hide something about the context ▪ Distraction ▪ Misleading/Confusing • Mitigation ◦ Can the operator stop unsafe events in progress? ◦ Is operation predictable/deterministic?
  9. What Do We Mean When We Say “Safe”? Resilient Available

    Accessible Controlled Verified Reliable Explainable Fault-Tolerant
  10. What Do We Mean When We Say “Safe”? Resilient Available

    Accessible Controlled Verified Reliable Explainable Fault-Tolerant Group A Group B
  11. High-Assurance Cyber Military Systems (HACSM) • seL4 Kernel ◦ highest

    assurance of isolation between applications running in the system • Model system in AADL (Architecture Analysis & Design Language) • Check model • Separate functions into verifiable components • Write components in domain specific language that eliminate bad habits in C
  12. Reasoning About Systems • What are the parts of the

    system? • How do they interact? • What is the expected behavior? • How do the parts create the behavior?
  13. Model This System • Online REPL ◦ Web based application

    ◦ Frontend where people enter code ◦ Button to run/execute code ◦ Results displayed back to the user • Examples ◦ https://go.dev/play/ ◦ https://www.ideone.com/ ◦ https://www.codiva.io/
  14. Problems with Models • Most engineers start by modeling how

    a system looks ◦ What hardware is there? ◦ What instances are there? ◦ What protocols/APIs do they interact over? • This doesn’t tell us anything about how the system behaves, which is what we need to verify.
  15. Problems with Models • Creating blindspots by self selecting what

    areas of behavior are important ◦ How consistent is your scope across the model? ◦ How do we know the generalization are correct? • Only shows you problems you already know about • Thinking in axioms: ◦ We’re making an assumption that this state is always true ◦ We document and monitor it.
  16. That being said…. Even models with biases and flaws can

    be useful. The process of writing them down often triggers ah-ha! moments
  17. Cheat Sheet to Verification • What can connect to what?

    ◦ Identity ◦ Policy ◦ Shared resources (memory) • Will processes return in time? ◦ Time deadlines ◦ Concurrency issues • How do we transition between states? ◦ What is the correct behavior? ◦ What is the impossible behavior?
  18. Ground Takeoff Hovering Flying Landing Critical • Idle • Calibration

    • Normal • Hand • Manual • Flightplan • Followme • Lookat • Point of Interest (POI) • Return to Home (RTH) • Normal • Hand • Critical RTH • Critical Landing • Emergency Landing • Emergency Ground
  19. Unsafe States • Sensor failure • Latency • Losing line

    of sight • Connectivity • Weather conditions • Projectiles/obstacles Assess the Risk Mitigate the Risk
  20. Ground Takeoff Hovering Flying Landing Critical Accelerometer Gyroscope Magnetic compass

    Barometer GPS Sensor Distance Sensor (ultrasonic, laser or LIDAR)
  21. Draft a Model • Pick a transition between two states

    ◦ What are the components (software, sensors, hardware) ◦ What are the steps that create the transition? ◦ What controls or fall backs are involved? • Design some tests ◦ What should be impossible? ◦ What should eventually be true? ◦ What should always be true? ◦ How do we know?
  22. Ground Takeoff Barometer Distance Sensor (ultrasonic, laser or LIDAR) •

    Get ground elevation • Check hardware system health • Check for obstructions
  23. Ground Takeoff • Get ground elevation • Check hardware system

    health • Check for obstructions Impossible: • Takeoff if hardware in failure • Takeoff if operator too close Fallback: • If sensor fails, do not allow takeoff
  24. Determining States • Going back to finite state machines ◦

    Moore machines: f(state) → state’ ◦ Mealy machines: f(state, input) → state’ • A state is a product of a previous state or a previous state AND an input • What are the inputs to the component? ◦ Distance sensor: query from flight supervisor ◦ Does Pending + Query = a distinct state not otherwise recorded?
  25. Determining States • There are no inputs that can combine

    with a Pending state to produce a state we don’t otherwise know about. • But this depends on the system and the behavior we wish to model • Database ◦ Inputs: Read query, write query ◦ Pending write + Read query = blocked
  26. Generalizing Inputs • Some inputs have infinite states ◦ Strings!

    ◦ Numbers • Again, the key issue is distinct state change, so to model these cases we tend to sort those inputs into categories ◦ Valid/invalid ◦ Less than threshold/More than threshold • Typically we don’t use magic numbers in specs but it’s okay to do that when you’re starting out.
  27. Draft a Model • Pick a transition between two states

    ◦ What are the components (software, sensors, hardware) ◦ What are the inputs of each component? ◦ What are the steps that create the transition? ◦ What controls or fall backs are involved? • Design some tests ◦ What should be impossible? ◦ What should eventually be true? ◦ What should always be true? ◦ How do we know?
  28. Thank You! • Check #workshop-safety for slides and resources •

    Give me feedback: ◦ https://forms.gle/1PTdSNB4xmVfcCco9 • Watch this space → bellotti.tech