Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Critical Care Health Informatics Collaborative overview

Critical Care Health Informatics Collaborative overview

May 2017, overview and work in progress on CCHIC project

Steve Harris

May 26, 2017
Tweet

Other Decks in Research

Transcript

  1. Health Informatics Collaboratives: Better use of health data for research

    • 5 clinical themes sharing data across 5 BRCs • Acute coronary syndromes (Imperial) • Viral hepatitis (GST/Kings) • Ovarian cancer (Cambridge) • Renal transplantation (Oxford) • Critical care (UCL) • To develop • an IT capability supporting research and patient care • a capability that is sustainable (beyond 2015) • a capability that is scalable (on a national basis) • a capability that is expandable • the capability in the most efficient way possible
  2. Data sharing approach (1):
 Hospitals manage the data • In

    house data linkage ‘engine’ and expert to validate • Trust database to repetively query HES until patient death • Pseudo-anonymisation prior to export to UCL (as records would need updating) • Advantages: • Identifiable patient data does not leave Trust, • Trust is able to monitor linkage quality • Disadvantage: • Cost of linkage engine (£30,000p.a.) and IT validation and support • Cost of querying HES (~£3000p.a) • Difficult to update if new database linkage is added • Not easily scalable to smaller hospitals with smaller IT capability
  3. Data sharing approach (2):
 Central data management • Identifiable data

    is securely exported from Hospital to UCL safe haven • Repetitive linkage is performed within UCL safe haven • Data de-identified on death • Advantages • Reduces burden on local IT infra-structure • Data linkage can be done by handful of experts • Reduces times sensitive data is handled, simplified audit trail • Facilitates the addition of other datasets • Enables less ‘fortunate’ hospitals to partake • Disadvantage • Identifiable data is outside Trust boundaries • More complex data sharing agreements
  4. Regulatory work • UCL Identifiable data handling solution (IDHS) •

    BSI ISO 27001 security standard • Information Governance toolkit for compliance with • The Data Protection Act 1998. • The common law duty of confidentiality. • The Confidentiality NHS Code of Practice. • The NHS Care Record Guarantee for England. • The Social Care Record Guarantee for England. • The international information security standard: ISO/IEC 27002: 2013 and ISO/IEC 27001: 2013. • The Information Security NHS Code of Practice. • The Records Management NHS Code of Practice. • The Freedom of Information Act 2000. • The Human Rights Act article 8 • R&D approval • Individual trust Data sharing agreements • Research ethics approval • Section 251 (NHS Act 2006) approval • "was established to enable the common law duty of confidentiality to be overridden to enable disclosure of confidential patient information for medical purposes, where it was not possible to use anonymised information and where seeking consent was not practical, having regard to the cost and technology available"
  5. CCHIC MIMIC-III ICNARC 1st 24 hour physiology Potentially linkable Multi-centre

    Single US centre Explicitly linkable Complete physiology Complete treatment Explicit research purpose Explicit research purpose Audit first 1st 24 hour treatment Multi-centre Complete physiology Complete treatment Potentially linkable
  6. Big data definition • Volume: The quantity of generated and

    stored data. The size of the data determines the value and potential insight- and whether it can actually be considered big data or not. • Variety: The type and nature of the data. This helps people who analyze it to effectively use the resulting insight. • Velocity: In this context, the speed at which the data is generated and processed to meet the demands and challenges that lie in the path of growth and development. • Veracity: The quality of captured data can vary greatly, affecting accurate analysis. • Variability: Inconsistency of the data set can hamper processes to handle and manage it. • Viable: A usable and practicable resource.
  7. Shared code library • Language agnostic configuration • R code

    library if wished • Built-in 'heavy lifting' functionality • Audit trail for data quality • Shared code library
  8. Anonymised development data • CCHIC SOP for data anonymization •

    Information Commissioner's Office (ICO) code of practice with respect to the Data Protection Act (DPA) • Dual use • Research specific data release • Development data
  9. Generic anonymization steps Specification Generic anonymisation steps Identifiable source data

    Delete direct identifiers Convert dates from absolute to relative measures Remove high risk individuals and patient opt outs Specify k-anonymity Data requested [Field list and date range] Micro-aggregation of all key continuous and date-time variables Initial anonymisation configuration
  10. Specific anonymization steps Specific anonymisation steps Measure k-anonymity and l-diversity

    K-anonymity L-diversity acceptable No Micro-aggregation or Local suppression Adjust anonymisation configuration Measure information loss Record data release Anonymised data Additional safeguards (e.g. remove living subjects) Yes
  11. Approved clinical or scientific research team UCL Data safe haven

    BRC hospitals Secure storage Researcher Electronic health records Standard XML schema Research database Statistical analysis engine Final scientific report Data quality reporting Data validation Cloned analysis engine Example anonymised or synthetic data Analysis script Example scientific report returned to research team CCHIC Researcher Ready