$30 off During Our Annual Pro Sale. View Details »

Unlocking Healthcare data: the power of Open Formats in Python Data Science

Unlocking Healthcare data: the power of Open Formats in Python Data Science

Talk: https://www.youtube.com/watch?v=-ZVJ_eZ0aWg

Are you a data scientist or developer working in healthcare? Are you tired of dealing with proprietary data formats for biological and vital sign information? It's time to unlock the power of open data and make your research more impactful.

In this talk, we'll explore how you can leverage Python analytics to manipulate and analyze complex datasets of patient information, including blood work, ECG, EEG, echocardiography, radiography, and more.

We'll also dive into the world of open data formats, and show you how using these formats can make it easier to anonymize, convert, and collaborate on research.

Don't miss this opportunity to learn how Python analytics and open data formats can help you unlock the insights hidden in your data and improve patient outcomes.

Stefano Cotta Ramusino

July 20, 2023
Tweet

More Decks by Stefano Cotta Ramusino

Other Decks in Science

Transcript

  1. UNLOCKING HEALTHCARE DATA
    The Power of Open Formats in Python Data Science
    2023.07.20 - Stefano Cotta Ramusino

    View Slide

  2. • Lack of standarization: dif
    fi
    cult to compare or combine
    data from different sources
    • Privacy and security concerns: heath data is sensitive and
    con
    fi
    dential
    • Data quality issues: incomplete, inconsistent or inaccurate
    ISSUES WITH HEALTH DATA
    2023.07.20 - Unlocking Healthcare Data - Stefano Cotta Ramusino - EuroPython 2023

    View Slide

  3. • All these issues can affect the accuracy and usefulness of
    statistical analyses
    • Efforts in progress to address these challenges:
    development of data standards and protocol, improved
    privacy and security measures and increase investment in
    data infrastructure and analysis tools
    ISSUES WITH HEALTH DATA
    2023.07.20 - Unlocking Healthcare Data - Stefano Cotta Ramusino - EuroPython 2023

    View Slide

  4. • Vast and complex with a wide variety of data types and
    structures used to represent health information
    • One of the challenges in this space is the explosion of new data
    formats, mostly proprietary
    • It’s important to establish standards and best practices for health
    data formats: guidelines for the creation of new formats?
    THE UNIVERSE OF HEALTH DATA FORMATS
    2023.07.20 - Unlocking Healthcare Data - Stefano Cotta Ramusino - EuroPython 2023

    View Slide

  5. THE UNIVERSE OF HEALTH DATA FORMATS
    • The majority of medical device
    manufacturer create their own data
    format
    • Normally they provide also a way to
    convert to an open format, but they
    don’t disclosure the spec of their
    formats
    • If there is a bug in their converter
    maybe will be never discovered
    2023.07.20 - Unlocking Healthcare Data - Stefano Cotta Ramusino - EuroPython 2023

    View Slide

  6. • Adhering to open data format standards
    • Avoid the use of proprietary format
    • Avoid the use of proprietary extensions of an open format
    • Do not limit collaboration and hinder progress in healthcare research
    THE IMPORTANCE OF USE OPEN FORMATS
    2023.07.20 - Unlocking Healthcare Data - Stefano Cotta Ramusino - EuroPython 2023

    View Slide

  7. THE IMPORTANCE OF USE OPEN FORMATS
    • EDF (European Data Format) / BDF
    (BioSemi Data Format) for medical time series
    • ISHNE (International Society for Holter and
    Noninvasive Electrocardiology) for Holter
    • FASTA / FASTQ / SAM for biological
    sequences
    • DICOM (Digital Imaging and
    Communications in Medicine) for medical
    image
    2023.07.20 - Unlocking Healthcare Data - Stefano Cotta Ramusino - EuroPython 2023

    View Slide

  8. PYTHON ANALYTICS
    • Patient information
    • Blood analysis
    • ECG
    • EEG
    • Echography
    • Radiography
    2023.07.20 - Unlocking Healthcare Data - Stefano Cotta Ramusino - EuroPython 2023
    • Manipulate
    • Analyze
    • Complex datasets
    • Compare

    View Slide

  9. PYTHON ANALYTICS
    • NumPy
    • SciPy
    • Pandas
    • Matplotlib
    • Biopython
    • MNE-Python
    • pydicom
    • EDFlib-Python
    • ISHNEHolterLib
    2023.07.20 - Unlocking Healthcare Data - Stefano Cotta Ramusino - EuroPython 2023

    View Slide

  10. BIOPYTHON
    • Computational biology and bioinformatics
    • Handle biological sequences and sequence
    annotations
    • Protein structure, population genetics
    • Machine learning
    • Read/write FASTA, FASTQ, SAM and other
    common sequence formats
    2023.07.20 - Unlocking Healthcare Data - Stefano Cotta Ramusino - EuroPython 2023

    View Slide

  11. BIOPYTHON
    2023.07.20 - Unlocking Healthcare Data - Stefano Cotta Ramusino - EuroPython 2023
    from Bio import SeqIO


    genomes = SeqIO.parse(“whatever.gb”, “genbank”)


    for genome in genomes:


    SeqIO.write(genome, genome.id + “.fasta”, “fasta”)

    View Slide

  12. MNE-PYTHON
    • MEG (magnetoencephalography)

    EEG (electroencephalography)

    sEEG (stereoelectoencephalography)

    ECoG (Electrocorticography)

    NIRS (Near-infrared spectroscopy)
    • Analysis, visualization, exploration
    • Swiss knife for a lot of data formats
    • Permissive reader
    2023.07.20 - Unlocking Healthcare Data - Stefano Cotta Ramusino - EuroPython 2023

    View Slide

  13. MNE-PYTHON
    2023.07.20 - Unlocking Healthcare Data - Stefano Cotta Ramusino - EuroPython 2023
    import mne


    edf = mne.io.read_raw_edf(“not_valid.edf”, preload = True)


    edf.plot()

    View Slide

  14. NOT BEING PERMISSIVE IN LIBRARIES
    • Strict reading of the open data format
    • Manufacturer have to comply to the open format
    • Warning if
    fi
    le is not compliant
    2023.07.20 - Unlocking Healthcare Data - Stefano Cotta Ramusino - EuroPython 2023

    View Slide

  15. NOT BEING PERMISSIVE IN LIBRARIES
    2023.07.20 - Unlocking Healthcare Data - Stefano Cotta Ramusino - EuroPython 2023


    from EDFlib.edfreader import EDFreader


    edf = EDFreader(“not_valid.edf”)



    EDFlib.edfreader.EDFexception: File is not valid EDF(+) or BDF(+).

    View Slide

  16. PYDICOM
    • Medical image datasets
    • Storage and transfer
    • Not only data format, but also protocol
    implementation
    2023.07.20 - Unlocking Healthcare Data - Stefano Cotta Ramusino - EuroPython 2023

    View Slide

  17. PYDICOM
    2023.07.20 - Unlocking Healthcare Data - Stefano Cotta Ramusino - EuroPython 2023
    from matplotlib import pyplot
    import pydicom
    import pydicom.data
    dcm
    fi
    le = pydicom.data.get_testdata_
    fi
    le(“my_leg.dcm”)
    dcm = dcmread(dcm
    fi
    le)
    pyplot.imshow(dcm.pixel_array, cmap=pyplot.cm.gray)
    pyplot.show()

    View Slide

  18. LET’S MAKE AN OPEN FORMAT
    • Generate an example
    • De
    fi
    nition (Schema)
    • Create a validator
    • Describe the format
    • Use case
    2023.07.20 - Unlocking Healthcare Data - Stefano Cotta Ramusino - EuroPython 2023

    View Slide

  19. LET’S CREATE AN OPEN FORMAT
    "measurements": [


    {


    "datetime": "2023-07-19T21:12:39+01:00",


    "sys": 126,


    "map": 101,


    "dia": 86,


    "pp": 40,


    "pr": 71,


    "mode": “automatic"


    }


    ]
    2023.07.20 - Unlocking Healthcare Data - Stefano Cotta Ramusino - EuroPython 2023

    View Slide

  20. LET’S CREATE AN OPEN FORMAT
    "patient": {


    "name": "John Doe",


    "id": "1234"


    }
    "intervals": {


    "wakeup": {


    "start": "08:00",


    "interval": 20


    },


    "sleep": {


    "start": "23:00",


    "interval": 45


    }


    }
    2023.07.20 - Unlocking Healthcare Data - Stefano Cotta Ramusino - EuroPython 2023

    View Slide

  21. LET’S CREATE AN OPEN FORMAT
    "measurements": [


    {


    "datetime": "2023-07-19T20:52:39+01:00",


    "error": {


    "code": "ERR2",


    "message": “Reached maximum in
    fl
    ation time"


    }


    }


    ]
    2023.07.20 - Unlocking Healthcare Data - Stefano Cotta Ramusino - EuroPython 2023

    View Slide

  22. LET’S CREATE AN OPEN FORMAT
    {


    "version": "1.0.0",

    "device": {


    "name": “My Cool ABPM",


    "mode": "usb",


    "type": "ordinary",


    "version": {


    "
    fi
    rmware": "2.3"


    }


    },
    2023.07.20 - Unlocking Healthcare Data - Stefano Cotta Ramusino - EuroPython 2023

    View Slide

  23. LET’S CREATE AN OPEN FORMAT
    {


    "$schema": "http://json-schema.org/draft-04/schema#",


    "type": "object",


    "title": "exam",


    "description": "ABPM exam",


    "properties": {


    "version": {


    "type": "string",


    "description": "Schema version",


    "default": "1.0.0",


    "pattern": "^(\\d+\\.)?(\\d+\\.)?(\\d+)$"


    },



    2023.07.20 - Unlocking Healthcare Data - Stefano Cotta Ramusino - EuroPython 2023

    View Slide

  24. LET’S CREATE AN OPEN FORMAT
    import jsl


    class Version(jsl.Document):


    protocol = jsl.StringField(


    description = 'Protocol version',


    pattern='^(\d+\.)?(\d+\.)?(\d+)?([A-Za-z0-9\.]+)?$',


    required=True)
    2023.07.20 - Unlocking Healthcare Data - Stefano Cotta Ramusino - EuroPython 2023

    View Slide

  25. LET’S CREATE AN OPEN FORMAT
    2023.07.20 - Unlocking Healthcare Data - Stefano Cotta Ramusino - EuroPython 2023
    • Create a library to support the format
    • When reading the format, check the adherence to the schema
    • Create a converter from an another format to this open format
    • Spread the open format

    View Slide

  26. CONTACTS
    2023.07.20 - Unlocking Healthcare Data - Stefano Cotta Ramusino - EuroPython 2023
    [email protected]
    torino.python.it @databeerstorino

    View Slide