Slide 1

Slide 1 text

fortiss, Munich, 2016-03-21 Stefan Wagner Applying system- theoretic safety analysis to software- intensive systems

Slide 2

Slide 2 text

You can copy, share and change, film and photograph, blog, live-blog and tweet this presentation given that you attribute it to its author and respect the rights and licences of its parts. based on slides by @SMEasterbrook und @ethanwhite

Slide 3

Slide 3 text

Software systems need a new safety analysis approach!

Slide 4

Slide 4 text

Assumption 1: Safety is increased by increasing system or component reliability. If components or systems do not fail, then accidents will not occur. from: Leveson. Engineering a Safer World. MIT Press, 2011

Slide 5

Slide 5 text

A plane not taking off because of a software check is total safe but not reliable.

Slide 6

Slide 6 text

New Assumption 1: High reliability is neither necessary nor sufficient for safety. Assumption 1: Safety is increased by increasing system or component reliability. If components or systems do not fail, then accidents will not occur. from: Leveson. Engineering a Safer World. MIT Press, 2011

Slide 7

Slide 7 text

Assumption 2: Accidents are caused by chains of directly related events. We can understand accidents and assess risk by looking at the chain of events leading to the loss. from: Leveson. Engineering a Safer World. MIT Press, 2011

Slide 8

Slide 8 text

The Swiss Cheese Model

Slide 9

Slide 9 text

Subjective Selection Why do we always hear that „human error“ of the operators, drivers or pilots caused an accident?

Slide 10

Slide 10 text

New Assumption 2: Accidents are complex processes involving the entire socio-technical system. Traditional event-chain models cannot describe this process adequately. Assumption 2: Accidents are caused by chains of directly related events. We can understand accidents and assess risk by looking at the chain of events leading to the loss. from: Leveson. Engineering a Safer World. MIT Press, 2011

Slide 11

Slide 11 text

Assumption 3: Most accidents are caused by operator error. Rewarding safe behavior and punishing unsafe behavior will eliminate or reduce accidents significantly. from: Leveson. Engineering a Safer World. MIT Press, 2011

Slide 12

Slide 12 text

Hindsight Bias

Slide 13

Slide 13 text

The influence of system design

Slide 14

Slide 14 text

New Assumption 3: Operator behavior is a product of the environment in which it occurs. To reduce operator “error” we must change the environment in which the operator works. Assumption 3: Most accidents are caused by operator error. Rewarding safe behavior and punishing unsafe behavior will eliminate or reduce accidents significantly. from: Leveson. Engineering a Safer World. MIT Press, 2011

Slide 15

Slide 15 text

Assumption 4: Probabilistic risk analysis based on event chains is the best way to assess and communicate safety and risk information. from: Leveson. Engineering a Safer World. MIT Press, 2011

Slide 16

Slide 16 text

The Titanic Effect

Slide 17

Slide 17 text

Design faults

Slide 18

Slide 18 text

New Assumption 4: Risk and safety may be best understood and communicated in ways other than probabilistic risk analysis. Assumption 4: Probabilistic risk analysis based on event chains is the best way to assess and communicate safety and risk information. from: Leveson. Engineering a Safer World. MIT Press, 2011

Slide 19

Slide 19 text

Assumption 5: Highly reliable software is safe. from: Leveson. Engineering a Safer World. MIT Press, 2011

Slide 20

Slide 20 text

Software is reliable but unsafe when • The software correctly implements the requirements, but the specified behavior is unsafe from a system perspective. • The software requirements do not specify some particular behavior required for system safety (that is, they are incomplete). • The software has unintended (and unsafe) behavior beyond what is specified in the requirements. from: Leveson. Engineering a Safer World. MIT Press, 2011

Slide 21

Slide 21 text

New Assumption 5: Highly reliable software is not necessarily safe. Increasing software reliability or reducing implementation errors will have little impact on safety. Assumption 5: Highly reliable software is safe. from: Leveson. Engineering a Safer World. MIT Press, 2011

Slide 22

Slide 22 text

System Theory

Slide 23

Slide 23 text

STAMP Control Process Behavior Inadequate Enforcement Hazardous Process Hierarchical Safety Control Structure Hazardous System State Inadequate of Safety Constraints on from: Leveson. Engineering a Safer World. MIT Press, 2011

Slide 24

Slide 24 text

Problem Reports Operating Procedures Revised operating procedures Whistleblowers Change reports Certification Info. Manufacturing Management Safety Reports Policy, stds. Work Procedures safety reports audits work logs Manufacturing inspections Hazard Analyses Documentation Design Rationale Company Resources Standards Safety Policy Operations Reports Management Operations Resources Standards Safety Policy Incident Reports Risk Assessments Status Reports Safety−Related Changes Test reports Test Requirements Standards Review Results Safety Constraints Implementation Hazard Analyses Progress Reports Safety Standards Hazard Analyses Progress Reports Design, Work Instructions Change requests Audit reports Problem reports Maintenance Congress and Legislatures Legislation Company Congress and Legislatures Legislation Legal penalties Certification Standards Regulations Government Reports Lobbying Hearings and open meetings Accidents Case Law Legal penalties Certification Standards Regulations Accidents and incidents Government Reports Lobbying Hearings and open meetings Accidents Whistleblowers Change reports Maintenance Reports Operations reports Accident and incident reports Change Requests Performance Audits Hardware replacements Software revisions Hazard Analyses Operating Process Case Law SYSTEM DEVELOPMENT Insurance Companies, Courts User Associations, Unions, Industry Associations, Government Regulatory Agencies Management Management Management Project Government Regulatory Agencies Industry Associations, User Associations, Unions, Documentation and assurance and Evolution SYSTEM OPERATIONS Insurance Companies, Courts Physical Actuator(s) Incidents Operating Assumptions Process Controller Automated Human Controller(s) Sensor(s) from: Leveson. Engineering a Safer World. MIT Press, 2011

Slide 25

Slide 25 text

Problem Reports Operating Procedures Revised erating procedures Audit reports Problem reports Change Requests rdware replacements Software revisions Operating Process Physical Actuator(s) Incidents Operating Assumptions Process Controller Automated Human Controller(s) Sensor(s)

Slide 26

Slide 26 text

Controlled Process Sensors Controller Actuators Disturbances Process Outputs Process Inputs Measured Variables Controlled Variables Control Algorithms Set Points, Safety is a control problem. Hall Effect Sensor Volume of noise Off switch Thrust

Slide 27

Slide 27 text

Controlled Process Sensors Controller Actuators Disturbances Process Outputs Process Inputs Measured Variables Controlled Variables Control Algorithms Set Points, Automatic controller Human Physical design Social control Process Safety is a control problem. Off switch Hall Effect Sensor Volume of noise Thrust

Slide 28

Slide 28 text

System-Theoretic Process Analysis (STPA)

Slide 29

Slide 29 text

Controller 2 process changes, (Flaws in creation, or adaptation) Sensor Component failures Incorrect or no Feedback Delays missing feedback Inadequate or control action ineffective or missing Inappropriate, Delayed operation Controller Actuator Controlled Process extermal information Control input or wrong or missing Changes over time Measurement Feedback delays information provided inaccuracies Inadequate Operation operation Inadequate 1 4 4 3 Algorithm Inadequate Control 2 Process Model inconsistent, incomplete, or incorrect 3 Unidentified or out−of−range Process output contributes to disturbance system hazard Process input missing or wrong Conflicting control actions incorrect modification Unsichere Eingaben von höheren Ebenen Unsichere Algorithmen Falsches Modell des Prozesses Falsche Prozess- ausführung

Slide 30

Slide 30 text

Example not followed Power off Power turned off when door closed Applicable Not Given Incorrectly Stopped too soon Wrong Timing or order Control Action Applicable Not opened when door closed or Power not turned on Power on Power not turned off when door opened Door opened, controller waits too long to turn Power turned on while door opened Power turned on too early; door not fully closed off power Not Given or

Slide 31

Slide 31 text

STPA for software-intensive systems

Slide 32

Slide 32 text

Connection to Verification Abdulkhaleq, Wagner, Leveson. A Comprehensive Safety Engineering Approach for Software-Intensive Systems Based on STPA. Procedia Engineering 128:2–11, 2015

Slide 33

Slide 33 text

STPA in Agile Development

Slide 34

Slide 34 text

Software systems need a new safety analysis approach!

Slide 35

Slide 35 text

Prof. Dr. Stefan Wagner e-mail [email protected] phone +49 (0) 711 685-88455 WWW www.iste.uni-stuttgart.de/se Twitter prof_wagnerst ORCID 0000-0002-5256-8429 Institute of Software Technology

Slide 36

Slide 36 text

Pictures used in this slide deck Safety by GotCredit (https://flic.kr/p/qHCmfo, Got Credit) Unsafe Area by Jerome Vial under CC BY-SA 2.0 (https://flic.kr/p/71Kpk7) Airplane by StockSnap (https://pixabay.com/de/flugzeug-reisen-transport- airasia-926744/) Swiss Cheese Model by Davidmack - Own work, CC BY-SA 3.0, (https:// commons.wikimedia.org/w/index.php?curid=31679759) Die Titanic im Hafen von Southhampton - Gemeinfrei (https:// commons.wikimedia.org/w/index.php?curid=19027661) Pisa by Aaron Kreis (https://flic.kr/p/wzEw5K) Looking back by Susanne Nilsson (https://flic.kr/p/niBFZo) Concorde Cockpit by Dr. Richard Murray (https://commons.wikimedia.org/wiki/ File:Concorde_Cockpit_-_geograph.org.uk_-_1357498.jpg)