ICSE 2015: Evaluating Legal Implementation Readiness Decision-making

Evaluating Legal Implementation Readiness Decision-making Aaron K. Massey, Paul N.
Otto and Annie I. Antón May 22, 2015 TSE: http://dx.doi.org/10.1109/TSE.2014.2383374 [email protected] http://www.cc.gatech.edu/~akmassey @akmassey 1

© 2006-2014 Aaron Massey et al., Georgia Institute of Technology
Overview 2 § Background and Motivation § Case #1: initial study § Case #2: replication study § Case #3: Wideband Delphi study § Summary and Findings § Questions

Why examine regulatory compliance? § State of the Art in software engineering is not enough for regulatory compliance. § Software systems are failing to meet their legal obligations. – ChoicePoint data breach total cost: $28M – Target data breach total cost: $162M and counting – Anthem data breach total cost: ???? 3

Domain: Healthcare § Health Insurance Portability and Accountability Act (HIPAA) – Passed in 1996, Amended in 2009 – Serious penalties for non-criminal violations § HIPAA Settlement Actions: – Concentra Health Services – $1.7M (April 2014) – New York and Presbyterian Hospital – $3.3M (May 2014) – Columbia University Hospital – $1.5M (May 2014) § Laws and regulations continue to evolve – Traditional Law: 21st Century Cure bill – Case Law: Emily Byrne vs. Avery Center for Obstetrics and Gynecology 4

Research Question How are legal implementation readiness decisions made? 5

Legal Implementation Readiness Requirements that meet or exceed their legal obligations are Legally Implementation Ready (LIR). 6

7 © 2006-2014 Aaron Massey et al., Georgia Institute of
Technology Example LIR Requirement Consider Requirement A: iTrust shall generate a unique user ID and default password upon account creation by a system administrator. Traces to § 164.312(a)(1) and § 164.312(a)(2)(i) Relevant HIPAA Sections: (a)(1) Standard: Access control. Implement technical policies and procedures for electronic information systems that maintain electronic protected health information to allow access only to those persons or software programs that have been granted access rights as specified in § 164.308(a)(4). (2) Implementation specifications: (i) Unique user identification (Required). Assign a unique name and/ or number for identifying and tracking user identity.

8 © 2006-2014 Aaron Massey et al., Georgia Institute of
Technology Example Non-LIR Requirement Consider Requirement B: iTrust shall allow an authenticated user to change their user ID and password. Traces to §164.312(a)(1) and §164.312(a)(2)(i) Relevant HIPAA Sections: (a)(1) Standard: Access control. Implement technical policies and procedures for electronic information systems that maintain electronic protected health information to allow access only to those persons or software programs that have been granted access rights as specified in § 164.308(a)(4). (2) Implementation specifications: (i) Unique user identification (Required). Assign a unique name and/ or number for identifying and tracking user identity.

Cases and Participant Population § All participants had… – No prior experience with legal compliance – Coursework included: • 150 minutes of lectures on requirements engineering • 75 minutes of lectures on regulatory and policy compliance § Case #1: 32 graduate-level software engineering students § Case #2: 34 graduate-level software engineering students § Case #3: 14 graduate-level software engineering students 9

Case Study Materials § 31 iTrust requirements traced to Health Insurance Portability and Accountability Act (HIPAA) § 164.312 § Traceability Matrix § Text of HIPAA §164.312 – Familiarity [BA08, MA10a, MA10b, MOH10, MOA09, MA09a, MA09b] – Focuses on Technical Measures of protection – Complete, self-contained, section of the legal text § Comparison with subject matter expert consensus – Three software engineers with HIPAA experience, one of whom is also a lawyer – Consensus achieved using Wideband Delphi 10

Case #1: Research Questions § Is there consensus among: – [Q1] subject matter experts about which requirements are LIR? – [Q2] graduate students about which requirements are LIR? § [Q3] Can graduate students accurately assess which requirements are LIR? 11

Results: Consensus among Subject Matter Experts § Q1: Is there consensus among subject matter experts on which requirements are LIR? § Result: Moderate agreement among the experts about the requirements prior to the discussion session. – Fleiss’ Kappa = 0.517 (p < 0.0001) – Universal agreement on 19 of the 31 requirements 12

Case #1 Results:   Consensus among participants § Q2: Is there consensus among participants on which requirements are LIR? § Result: Slight agreement about the requirements. – Fleiss’ Kappa = 0.0792 (p < 0.0001) – Only somewhat better than “agreement” found in perfectly random responses. 13

Case #1 Results:   Assessment of LIR § Q3: Can graduate students accurately assess which requirements are LIR? § Measurement: Best possible voting cutoﬀ (20/32) § Result: Students cannot accurately assess the LIR status of a requirement and are more likely to miss requirements that are not LIR. – Average Cohen’s Kappa = 0.110 – Voting Cohen’s Kappa: 0.357 (fair), Agreement 67.74% • Best Individual: Kappa 0.362; Agreement 67.74% • Third Quartile: Kappa 0.219, Agreement 61.29% – Sensitivity = 0.714, Speciﬁcity = 0.647 14

Case #2: Replication Study Design § Study Goal: Repeat our previous results with a diﬀerent graduate student population. § Focusing on two Research Questions: –[Q2] Is there consensus among graduate students about which requirements are LIR? –[Q3] Can graduate students accurately assess which requirements are LIR? 15

Case #2 Results:   Consensus among participants § Q2: Is there consensus among participants on which requirements are LIR? § Result: Slight agreement about the requirements. –Fleiss’ κ = 0.114 (p < 0.0001) –Marginally better than the agreement between participants in the LIR Assessment Case Study These results conﬁrm our previous ﬁndings. 16

Case #2 Results:   Assessment of LIR § Q3: Can graduate students accurately assess which requirements are LIR? § Measurement: Best possible voting cutoff (19/34) § Result: Students cannot accurately assess the LIR status of a requirement and are more likely to miss requirements that are not LIR. – Average Cohen’s Kappa = 0.103 – Voting Cohen’s Kappa 0.413 (fair), Agreement 71.0% • Best Individual: Kappa 0.545, Agreement 77.4% • Third Quartile: Kappa 0.218, Agreement 61.29% – Sensitivity = 0.556, Specificity = 0.508 These results confirm our previous findings. 17

Case #3: Research Questions § Study Goal: Examine how LIR decisions are made in groups working together § [Q4] Can graduate students working together using a Wideband Delphi method accurately assess which requirements are LIR? – Q4 mirrors Q3 from the previous study. The diﬀerence is this uses Wideband Delphi for consensus and that study used the “best” voting cutoﬀ § [Q5] What is the extent of the discussion on requirements during the application of the Wideband Delphi method? 18

Case #3: Wideband Delphi Study Design § 14 graduate student participants –All participants made an initial determination for each of the 31 requirements. –For each requirement, participants either: • Achieved unanimous consensus that the requirement was LIR or was not LIR • Were unable to achieve unanimous consensus, which everyone agreed meant the requirement should be considered not LIR 19

Results: Wideband Delphi  Assessment of LIR § Q4: Can graduate students working together using a Wideband Delphi method accurately assess which requirements are LIR? § Result: Students cannot accurately assess the LIR status of a requirement and are more likely to miss requirements that are not LIR. – Cohen’s Kappa 0.111, and Agreement 54.8% – Sensitivity = 0.625, Speciﬁcity = 0.522 § The participants were much more conservative when working together to achieve consensus than they were individually. 20

Results: Consensus among Wideband Delphi Participants § Q5: What is the extent of the discussion on requirements during the application of the Wideband Delphi method? § Result: Fair agreement among the participants about the requirements prior to the discussion session. – Fleiss’ Kappa = 0.252 (p < 0.0001) – Recall: Experts from LIR Assessment Case Study started at Fleiss’ Kappa = 0.517 (p < 0.0001) § Unable to achieve consensus on 7 of the 31 requirements after discussion 21

Results Summary 22 Case Percent Agreement Cohen’s Kappa Sensitivity Speciﬁcity Case #1 (Average) 55.95% 0.110 0.576 0.548 Case #1 (Best Vote) 67.74% 0.357 0.714 0.647 Case #2 (Average) 55.69% 0.103 0.556 0.509 Case #2 (Best Vote) 70.94% 0.413 0.667 0.800 Case #3 (Average) 51.25% 0.044 0.521 0.492 Case #3 (Delphi) 54.84% 0.111 0.625 0.522

§ First Case: graduate-level software engineers are ill- prepared to
make LIR determinations. – Maxwell found professional engineers are similarly ill-prepared to identify cross references in legal texts. § Second Case (Replication Study): confirms our findings from the First Case. § Third Case (Wideband Delphi Study): Wideband Delphi consensus technique slightly improves LIR assessment accuracy. – But our findings suggest that using the Wideband Delphi method to achieve consensus results in overly cautious assessments. Subject Matter Experts are critical! © 2006-2014 Aaron Massey et al., Georgia Institute of Technology Findings 23

Thank you! Questions? 25 Aaron K. Massey, Paul N. Otto
and Annie I. Antón May 22, 2015 TSE: http://dx.doi.org/10.1109/TSE.2014.2383374 [email protected] http://www.cc.gatech.edu/~akmassey @akmassey

ICSE 2015: Evaluating Legal Implementation Read...

ICSE 2015: Evaluating Legal Implementation Readiness Decision-making

akmassey

More Decks by akmassey

Other Decks in Research

Featured

Transcript

Evaluating Legal Implementation Readiness Decision-making Aaron K. Massey, Paul N.

© 2006-2014 Aaron Massey et al., Georgia Institute of Technology

© 2006-2014 Aaron Massey et al., Georgia Institute of Technology

© 2006-2014 Aaron Massey et al., Georgia Institute of Technology

© 2006-2014 Aaron Massey et al., Georgia Institute of Technology

© 2006-2014 Aaron Massey et al., Georgia Institute of Technology

7 © 2006-2014 Aaron Massey et al., Georgia Institute of

8 © 2006-2014 Aaron Massey et al., Georgia Institute of

© 2006-2014 Aaron Massey et al., Georgia Institute of Technology

© 2006-2014 Aaron Massey et al., Georgia Institute of Technology

© 2006-2014 Aaron Massey et al., Georgia Institute of Technology

© 2006-2014 Aaron Massey et al., Georgia Institute of Technology

© 2006-2014 Aaron Massey et al., Georgia Institute of Technology

© 2006-2014 Aaron Massey et al., Georgia Institute of Technology

© 2006-2014 Aaron Massey et al., Georgia Institute of Technology

© 2006-2014 Aaron Massey et al., Georgia Institute of Technology

© 2006-2014 Aaron Massey et al., Georgia Institute of Technology

© 2006-2014 Aaron Massey et al., Georgia Institute of Technology

© 2006-2014 Aaron Massey et al., Georgia Institute of Technology

© 2006-2014 Aaron Massey et al., Georgia Institute of Technology

© 2006-2014 Aaron Massey et al., Georgia Institute of Technology

© 2006-2014 Aaron Massey et al., Georgia Institute of Technology

§ First Case: graduate-level software engineers are ill- prepared to

Thank you! Questions? 25 Aaron K. Massey, Paul N. Otto