Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Building Safety Critical Systems

Marianne Bellotti
September 22, 2022
170

Building Safety Critical Systems

A two hour workshop on safety in the software engineering space done as part of Strange Loop 2022

Marianne Bellotti

September 22, 2022
Tweet

Transcript

  1. About Me • Author of “Kill It With Fire” •

    20+ years of software experience • Specialities: ◦ System dynamics ◦ Applied formal methods ◦ Architecture and system rescue • Engineering manager at Rebellion Defense
  2. The Workshop • Overview of Safety ◦ Traditional engineering idea

    of safety ◦ Safety as ergonomics and systems thinking • The Role of Models and Specification in Safety • Building Models ◦ Common problems ◦ Approaches ◦ Verification hot spots • Drafting a Model ◦ Ground to Takeoff transition ◦ Develop a model ◦ Feedback and discussion
  3. The Workshop • Workshop not a two hour lecture! ◦

    Interactive! ◦ Small group work ◦ Be prepared to move a little bit • Please respect everyone’s threat models ◦ If asked to mask up by neighbors, please do so. I have masks available • Join the channel #workshop-safety • Make sure you have a scrap paper pack and a pen ;)
  4. What Do We Mean When We Say “Safe”? • Is

    it unsafe if it’s a contributing factor? ◦ 1.4 million accidents involving flip flops (Sheilas' Wheels, 2013) ◦ An estimated 1,200 people die in fatal muggings over their sneakers (GQ, 2015) • Is it unsafe if the harm is intentional? (CDC, 2013) ◦ 99.4% of car deaths are accidental ◦ Only 4% of gun deaths are accidental. ◦ 65% of gun fatalities are suicides.
  5. What Do We Mean When We Say “Safe”? • The

    traditional view of Safety Critical: ◦ Likelihood of hazard (SLO) ◦ Unit testing ◦ Configuration control ◦ Formal change management ◦ Software of unknown pedigree • Specific to industries: ◦ Aerospace: DO-178C ◦ Rail: EN 50126, EN 50129 ◦ Automotive: ISO 26262 ◦ Nuclear: IEC 61513 • IEC 61508 ← closest thing to a general standard
  6. What Do We Mean When We Say “Safe”? • The

    traditional view of Safety Critical: Credit: CMU SEI, 2013
  7. What Do We Mean When We Say “Safe”? • Safety

    engineering focuses on formally verifying that the technology will adhere to its requirements in all situations • When it’s impossible to verify safety engineering estimates the likelihood of failure and assembles an acceptable risk budget. Similar to an SLO ◦ Hardware can always break ◦ “Soft” real time constraints • Traditional safety engineering relies on the requirements of safe operation being clearly defined. • As systems get more intertwined and complex we end up with more software that we do not think of a “safety critical” but can nevertheless cause problems.
  8. What Do We Mean When We Say “Safe”? • Risk

    ◦ Does the operator understand the risks of using the technology in a particular way? ◦ Does the technology hide something about the context ▪ Distraction ▪ Misleading/Confusing • Mitigation ◦ Can the operator stop unsafe events in progress? ◦ Is operation predictable/deterministic?
  9. What Do We Mean When We Say “Safe”? Resilient Available

    Accessible Controlled Verified Reliable Explainable Fault-Tolerant
  10. What Do We Mean When We Say “Safe”? Resilient Available

    Accessible Controlled Verified Reliable Explainable Fault-Tolerant Group A Group B
  11. High-Assurance Cyber Military Systems (HACSM) • seL4 Kernel ◦ highest

    assurance of isolation between applications running in the system • Model system in AADL (Architecture Analysis & Design Language) • Check model • Separate functions into verifiable components • Write components in domain specific language that eliminate bad habits in C
  12. Reasoning About Systems • What are the parts of the

    system? • How do they interact? • What is the expected behavior? • How do the parts create the behavior?
  13. Model This System • Online REPL ◦ Web based application

    ◦ Frontend where people enter code ◦ Button to run/execute code ◦ Results displayed back to the user • Examples ◦ https://go.dev/play/ ◦ https://www.ideone.com/ ◦ https://www.codiva.io/
  14. Problems with Models • Most engineers start by modeling how

    a system looks ◦ What hardware is there? ◦ What instances are there? ◦ What protocols/APIs do they interact over? • This doesn’t tell us anything about how the system behaves, which is what we need to verify.
  15. Problems with Models • Creating blindspots by self selecting what

    areas of behavior are important ◦ How consistent is your scope across the model? ◦ How do we know the generalization are correct? • Only shows you problems you already know about • Thinking in axioms: ◦ We’re making an assumption that this state is always true ◦ We document and monitor it.
  16. That being said…. Even models with biases and flaws can

    be useful. The process of writing them down often triggers ah-ha! moments
  17. Cheat Sheet to Verification • What can connect to what?

    ◦ Identity ◦ Policy ◦ Shared resources (memory) • Will processes return in time? ◦ Time deadlines ◦ Concurrency issues • How do we transition between states? ◦ What is the correct behavior? ◦ What is the impossible behavior?
  18. Ground Takeoff Hovering Flying Landing Critical • Idle • Calibration

    • Normal • Hand • Manual • Flightplan • Followme • Lookat • Point of Interest (POI) • Return to Home (RTH) • Normal • Hand • Critical RTH • Critical Landing • Emergency Landing • Emergency Ground
  19. Unsafe States • Sensor failure • Latency • Losing line

    of sight • Connectivity • Weather conditions • Projectiles/obstacles Assess the Risk Mitigate the Risk
  20. Ground Takeoff Hovering Flying Landing Critical Accelerometer Gyroscope Magnetic compass

    Barometer GPS Sensor Distance Sensor (ultrasonic, laser or LIDAR)
  21. Draft a Model • Pick a transition between two states

    ◦ What are the components (software, sensors, hardware) ◦ What are the steps that create the transition? ◦ What controls or fall backs are involved? • Design some tests ◦ What should be impossible? ◦ What should eventually be true? ◦ What should always be true? ◦ How do we know?
  22. Ground Takeoff Barometer Distance Sensor (ultrasonic, laser or LIDAR) •

    Get ground elevation • Check hardware system health • Check for obstructions
  23. Ground Takeoff • Get ground elevation • Check hardware system

    health • Check for obstructions Impossible: • Takeoff if hardware in failure • Takeoff if operator too close Fallback: • If sensor fails, do not allow takeoff
  24. Determining States • Going back to finite state machines ◦

    Moore machines: f(state) → state’ ◦ Mealy machines: f(state, input) → state’ • A state is a product of a previous state or a previous state AND an input • What are the inputs to the component? ◦ Distance sensor: query from flight supervisor ◦ Does Pending + Query = a distinct state not otherwise recorded?
  25. Determining States • There are no inputs that can combine

    with a Pending state to produce a state we don’t otherwise know about. • But this depends on the system and the behavior we wish to model • Database ◦ Inputs: Read query, write query ◦ Pending write + Read query = blocked
  26. Generalizing Inputs • Some inputs have infinite states ◦ Strings!

    ◦ Numbers • Again, the key issue is distinct state change, so to model these cases we tend to sort those inputs into categories ◦ Valid/invalid ◦ Less than threshold/More than threshold • Typically we don’t use magic numbers in specs but it’s okay to do that when you’re starting out.
  27. Draft a Model • Pick a transition between two states

    ◦ What are the components (software, sensors, hardware) ◦ What are the inputs of each component? ◦ What are the steps that create the transition? ◦ What controls or fall backs are involved? • Design some tests ◦ What should be impossible? ◦ What should eventually be true? ◦ What should always be true? ◦ How do we know?
  28. Thank You! • Check #workshop-safety for slides and resources •

    Give me feedback: ◦ https://forms.gle/1PTdSNB4xmVfcCco9 • Watch this space → bellotti.tech