Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Unlocking Healthcare data: the power of Open Formats in Python Data Science

Unlocking Healthcare data: the power of Open Formats in Python Data Science

Talk: https://www.youtube.com/watch?v=-ZVJ_eZ0aWg

Are you a data scientist or developer working in healthcare? Are you tired of dealing with proprietary data formats for biological and vital sign information? It's time to unlock the power of open data and make your research more impactful.

In this talk, we'll explore how you can leverage Python analytics to manipulate and analyze complex datasets of patient information, including blood work, ECG, EEG, echocardiography, radiography, and more.

We'll also dive into the world of open data formats, and show you how using these formats can make it easier to anonymize, convert, and collaborate on research.

Don't miss this opportunity to learn how Python analytics and open data formats can help you unlock the insights hidden in your data and improve patient outcomes.

Stefano Cotta Ramusino

July 20, 2023
Tweet

More Decks by Stefano Cotta Ramusino

Other Decks in Science

Transcript

  1. UNLOCKING HEALTHCARE DATA The Power of Open Formats in Python

    Data Science 2023.07.20 - Stefano Cotta Ramusino
  2. • Lack of standarization: dif fi cult to compare or

    combine data from different sources • Privacy and security concerns: heath data is sensitive and con fi dential • Data quality issues: incomplete, inconsistent or inaccurate ISSUES WITH HEALTH DATA 2023.07.20 - Unlocking Healthcare Data - Stefano Cotta Ramusino - EuroPython 2023
  3. • All these issues can affect the accuracy and usefulness

    of statistical analyses • Efforts in progress to address these challenges: development of data standards and protocol, improved privacy and security measures and increase investment in data infrastructure and analysis tools ISSUES WITH HEALTH DATA 2023.07.20 - Unlocking Healthcare Data - Stefano Cotta Ramusino - EuroPython 2023
  4. • Vast and complex with a wide variety of data

    types and structures used to represent health information • One of the challenges in this space is the explosion of new data formats, mostly proprietary • It’s important to establish standards and best practices for health data formats: guidelines for the creation of new formats? THE UNIVERSE OF HEALTH DATA FORMATS 2023.07.20 - Unlocking Healthcare Data - Stefano Cotta Ramusino - EuroPython 2023
  5. THE UNIVERSE OF HEALTH DATA FORMATS • The majority of

    medical device manufacturer create their own data format • Normally they provide also a way to convert to an open format, but they don’t disclosure the spec of their formats • If there is a bug in their converter maybe will be never discovered 2023.07.20 - Unlocking Healthcare Data - Stefano Cotta Ramusino - EuroPython 2023
  6. • Adhering to open data format standards • Avoid the

    use of proprietary format • Avoid the use of proprietary extensions of an open format • Do not limit collaboration and hinder progress in healthcare research THE IMPORTANCE OF USE OPEN FORMATS 2023.07.20 - Unlocking Healthcare Data - Stefano Cotta Ramusino - EuroPython 2023
  7. THE IMPORTANCE OF USE OPEN FORMATS • EDF (European Data

    Format) / BDF (BioSemi Data Format) for medical time series • ISHNE (International Society for Holter and Noninvasive Electrocardiology) for Holter • FASTA / FASTQ / SAM for biological sequences • DICOM (Digital Imaging and Communications in Medicine) for medical image 2023.07.20 - Unlocking Healthcare Data - Stefano Cotta Ramusino - EuroPython 2023
  8. PYTHON ANALYTICS • Patient information • Blood analysis • ECG

    • EEG • Echography • Radiography 2023.07.20 - Unlocking Healthcare Data - Stefano Cotta Ramusino - EuroPython 2023 • Manipulate • Analyze • Complex datasets • Compare
  9. PYTHON ANALYTICS • NumPy • SciPy • Pandas • Matplotlib

    • Biopython • MNE-Python • pydicom • EDFlib-Python • ISHNEHolterLib 2023.07.20 - Unlocking Healthcare Data - Stefano Cotta Ramusino - EuroPython 2023
  10. BIOPYTHON • Computational biology and bioinformatics • Handle biological sequences

    and sequence annotations • Protein structure, population genetics • Machine learning • Read/write FASTA, FASTQ, SAM and other common sequence formats 2023.07.20 - Unlocking Healthcare Data - Stefano Cotta Ramusino - EuroPython 2023
  11. BIOPYTHON 2023.07.20 - Unlocking Healthcare Data - Stefano Cotta Ramusino

    - EuroPython 2023 from Bio import SeqIO genomes = SeqIO.parse(“whatever.gb”, “genbank”) for genome in genomes: SeqIO.write(genome, genome.id + “.fasta”, “fasta”)
  12. MNE-PYTHON • MEG (magnetoencephalography) 
 EEG (electroencephalography) 
 sEEG (stereoelectoencephalography)

    
 ECoG (Electrocorticography) 
 NIRS (Near-infrared spectroscopy) • Analysis, visualization, exploration • Swiss knife for a lot of data formats • Permissive reader 2023.07.20 - Unlocking Healthcare Data - Stefano Cotta Ramusino - EuroPython 2023
  13. MNE-PYTHON 2023.07.20 - Unlocking Healthcare Data - Stefano Cotta Ramusino

    - EuroPython 2023 import mne edf = mne.io.read_raw_edf(“not_valid.edf”, preload = True) edf.plot()
  14. NOT BEING PERMISSIVE IN LIBRARIES • Strict reading of the

    open data format • Manufacturer have to comply to the open format • Warning if fi le is not compliant 2023.07.20 - Unlocking Healthcare Data - Stefano Cotta Ramusino - EuroPython 2023
  15. NOT BEING PERMISSIVE IN LIBRARIES 2023.07.20 - Unlocking Healthcare Data

    - Stefano Cotta Ramusino - EuroPython 2023 
 
 from EDFlib.edfreader import EDFreader 
 
 edf = EDFreader(“not_valid.edf”) 
 EDFlib.edfreader.EDFexception: File is not valid EDF(+) or BDF(+). 

  16. PYDICOM • Medical image datasets • Storage and transfer •

    Not only data format, but also protocol implementation 2023.07.20 - Unlocking Healthcare Data - Stefano Cotta Ramusino - EuroPython 2023
  17. PYDICOM 2023.07.20 - Unlocking Healthcare Data - Stefano Cotta Ramusino

    - EuroPython 2023 from matplotlib import pyplot import pydicom import pydicom.data dcm fi le = pydicom.data.get_testdata_ fi le(“my_leg.dcm”) dcm = dcmread(dcm fi le) pyplot.imshow(dcm.pixel_array, cmap=pyplot.cm.gray) pyplot.show()
  18. LET’S MAKE AN OPEN FORMAT • Generate an example •

    De fi nition (Schema) • Create a validator • Describe the format • Use case 2023.07.20 - Unlocking Healthcare Data - Stefano Cotta Ramusino - EuroPython 2023
  19. LET’S CREATE AN OPEN FORMAT "measurements": [ { "datetime": "2023-07-19T21:12:39+01:00",

    "sys": 126, "map": 101, "dia": 86, "pp": 40, "pr": 71, "mode": “automatic" } ] 2023.07.20 - Unlocking Healthcare Data - Stefano Cotta Ramusino - EuroPython 2023
  20. LET’S CREATE AN OPEN FORMAT "patient": { "name": "John Doe",

    "id": "1234" } "intervals": { "wakeup": { "start": "08:00", "interval": 20 }, "sleep": { "start": "23:00", "interval": 45 } } 2023.07.20 - Unlocking Healthcare Data - Stefano Cotta Ramusino - EuroPython 2023
  21. LET’S CREATE AN OPEN FORMAT "measurements": [ { "datetime": "2023-07-19T20:52:39+01:00",

    "error": { "code": "ERR2", "message": “Reached maximum in fl ation time" } } ] 2023.07.20 - Unlocking Healthcare Data - Stefano Cotta Ramusino - EuroPython 2023
  22. LET’S CREATE AN OPEN FORMAT { "version": "1.0.0", 
 "device":

    { "name": “My Cool ABPM", "mode": "usb", "type": "ordinary", "version": { " fi rmware": "2.3" } }, 2023.07.20 - Unlocking Healthcare Data - Stefano Cotta Ramusino - EuroPython 2023
  23. LET’S CREATE AN OPEN FORMAT { "$schema": "http://json-schema.org/draft-04/schema#", "type": "object",

    "title": "exam", "description": "ABPM exam", "properties": { "version": { "type": "string", "description": "Schema version", "default": "1.0.0", "pattern": "^(\\d+\\.)?(\\d+\\.)?(\\d+)$" }, … 2023.07.20 - Unlocking Healthcare Data - Stefano Cotta Ramusino - EuroPython 2023
  24. LET’S CREATE AN OPEN FORMAT import jsl class Version(jsl.Document): protocol

    = jsl.StringField( description = 'Protocol version', pattern='^(\d+\.)?(\d+\.)?(\d+)?([A-Za-z0-9\.]+)?$', required=True) 2023.07.20 - Unlocking Healthcare Data - Stefano Cotta Ramusino - EuroPython 2023
  25. LET’S CREATE AN OPEN FORMAT 2023.07.20 - Unlocking Healthcare Data

    - Stefano Cotta Ramusino - EuroPython 2023 • Create a library to support the format • When reading the format, check the adherence to the schema • Create a converter from an another format to this open format • Spread the open format