Slide 1

Slide 1 text

UNLOCKING HEALTHCARE DATA The Power of Open Formats in Python Data Science 2023.07.20 - Stefano Cotta Ramusino

Slide 2

Slide 2 text

• Lack of standarization: dif fi cult to compare or combine data from different sources • Privacy and security concerns: heath data is sensitive and con fi dential • Data quality issues: incomplete, inconsistent or inaccurate ISSUES WITH HEALTH DATA 2023.07.20 - Unlocking Healthcare Data - Stefano Cotta Ramusino - EuroPython 2023

Slide 3

Slide 3 text

• All these issues can affect the accuracy and usefulness of statistical analyses • Efforts in progress to address these challenges: development of data standards and protocol, improved privacy and security measures and increase investment in data infrastructure and analysis tools ISSUES WITH HEALTH DATA 2023.07.20 - Unlocking Healthcare Data - Stefano Cotta Ramusino - EuroPython 2023

Slide 4

Slide 4 text

• Vast and complex with a wide variety of data types and structures used to represent health information • One of the challenges in this space is the explosion of new data formats, mostly proprietary • It’s important to establish standards and best practices for health data formats: guidelines for the creation of new formats? THE UNIVERSE OF HEALTH DATA FORMATS 2023.07.20 - Unlocking Healthcare Data - Stefano Cotta Ramusino - EuroPython 2023

Slide 5

Slide 5 text

THE UNIVERSE OF HEALTH DATA FORMATS • The majority of medical device manufacturer create their own data format • Normally they provide also a way to convert to an open format, but they don’t disclosure the spec of their formats • If there is a bug in their converter maybe will be never discovered 2023.07.20 - Unlocking Healthcare Data - Stefano Cotta Ramusino - EuroPython 2023

Slide 6

Slide 6 text

• Adhering to open data format standards • Avoid the use of proprietary format • Avoid the use of proprietary extensions of an open format • Do not limit collaboration and hinder progress in healthcare research THE IMPORTANCE OF USE OPEN FORMATS 2023.07.20 - Unlocking Healthcare Data - Stefano Cotta Ramusino - EuroPython 2023

Slide 7

Slide 7 text

THE IMPORTANCE OF USE OPEN FORMATS • EDF (European Data Format) / BDF (BioSemi Data Format) for medical time series • ISHNE (International Society for Holter and Noninvasive Electrocardiology) for Holter • FASTA / FASTQ / SAM for biological sequences • DICOM (Digital Imaging and Communications in Medicine) for medical image 2023.07.20 - Unlocking Healthcare Data - Stefano Cotta Ramusino - EuroPython 2023

Slide 8

Slide 8 text

PYTHON ANALYTICS • Patient information • Blood analysis • ECG • EEG • Echography • Radiography 2023.07.20 - Unlocking Healthcare Data - Stefano Cotta Ramusino - EuroPython 2023 • Manipulate • Analyze • Complex datasets • Compare

Slide 9

Slide 9 text

PYTHON ANALYTICS • NumPy • SciPy • Pandas • Matplotlib • Biopython • MNE-Python • pydicom • EDFlib-Python • ISHNEHolterLib 2023.07.20 - Unlocking Healthcare Data - Stefano Cotta Ramusino - EuroPython 2023

Slide 10

Slide 10 text

BIOPYTHON • Computational biology and bioinformatics • Handle biological sequences and sequence annotations • Protein structure, population genetics • Machine learning • Read/write FASTA, FASTQ, SAM and other common sequence formats 2023.07.20 - Unlocking Healthcare Data - Stefano Cotta Ramusino - EuroPython 2023

Slide 11

Slide 11 text

BIOPYTHON 2023.07.20 - Unlocking Healthcare Data - Stefano Cotta Ramusino - EuroPython 2023 from Bio import SeqIO genomes = SeqIO.parse(“whatever.gb”, “genbank”) for genome in genomes: SeqIO.write(genome, genome.id + “.fasta”, “fasta”)

Slide 12

Slide 12 text

MNE-PYTHON • MEG (magnetoencephalography) 
 EEG (electroencephalography) 
 sEEG (stereoelectoencephalography) 
 ECoG (Electrocorticography) 
 NIRS (Near-infrared spectroscopy) • Analysis, visualization, exploration • Swiss knife for a lot of data formats • Permissive reader 2023.07.20 - Unlocking Healthcare Data - Stefano Cotta Ramusino - EuroPython 2023

Slide 13

Slide 13 text

MNE-PYTHON 2023.07.20 - Unlocking Healthcare Data - Stefano Cotta Ramusino - EuroPython 2023 import mne edf = mne.io.read_raw_edf(“not_valid.edf”, preload = True) edf.plot()

Slide 14

Slide 14 text

NOT BEING PERMISSIVE IN LIBRARIES • Strict reading of the open data format • Manufacturer have to comply to the open format • Warning if fi le is not compliant 2023.07.20 - Unlocking Healthcare Data - Stefano Cotta Ramusino - EuroPython 2023

Slide 15

Slide 15 text

NOT BEING PERMISSIVE IN LIBRARIES 2023.07.20 - Unlocking Healthcare Data - Stefano Cotta Ramusino - EuroPython 2023 
 
 from EDFlib.edfreader import EDFreader 
 
 edf = EDFreader(“not_valid.edf”) 
 EDFlib.edfreader.EDFexception: File is not valid EDF(+) or BDF(+). 


Slide 16

Slide 16 text

PYDICOM • Medical image datasets • Storage and transfer • Not only data format, but also protocol implementation 2023.07.20 - Unlocking Healthcare Data - Stefano Cotta Ramusino - EuroPython 2023

Slide 17

Slide 17 text

PYDICOM 2023.07.20 - Unlocking Healthcare Data - Stefano Cotta Ramusino - EuroPython 2023 from matplotlib import pyplot import pydicom import pydicom.data dcm fi le = pydicom.data.get_testdata_ fi le(“my_leg.dcm”) dcm = dcmread(dcm fi le) pyplot.imshow(dcm.pixel_array, cmap=pyplot.cm.gray) pyplot.show()

Slide 18

Slide 18 text

LET’S MAKE AN OPEN FORMAT • Generate an example • De fi nition (Schema) • Create a validator • Describe the format • Use case 2023.07.20 - Unlocking Healthcare Data - Stefano Cotta Ramusino - EuroPython 2023

Slide 19

Slide 19 text

LET’S CREATE AN OPEN FORMAT "measurements": [ { "datetime": "2023-07-19T21:12:39+01:00", "sys": 126, "map": 101, "dia": 86, "pp": 40, "pr": 71, "mode": “automatic" } ] 2023.07.20 - Unlocking Healthcare Data - Stefano Cotta Ramusino - EuroPython 2023

Slide 20

Slide 20 text

LET’S CREATE AN OPEN FORMAT "patient": { "name": "John Doe", "id": "1234" } "intervals": { "wakeup": { "start": "08:00", "interval": 20 }, "sleep": { "start": "23:00", "interval": 45 } } 2023.07.20 - Unlocking Healthcare Data - Stefano Cotta Ramusino - EuroPython 2023

Slide 21

Slide 21 text

LET’S CREATE AN OPEN FORMAT "measurements": [ { "datetime": "2023-07-19T20:52:39+01:00", "error": { "code": "ERR2", "message": “Reached maximum in fl ation time" } } ] 2023.07.20 - Unlocking Healthcare Data - Stefano Cotta Ramusino - EuroPython 2023

Slide 22

Slide 22 text

LET’S CREATE AN OPEN FORMAT { "version": "1.0.0", 
 "device": { "name": “My Cool ABPM", "mode": "usb", "type": "ordinary", "version": { " fi rmware": "2.3" } }, 2023.07.20 - Unlocking Healthcare Data - Stefano Cotta Ramusino - EuroPython 2023

Slide 23

Slide 23 text

LET’S CREATE AN OPEN FORMAT { "$schema": "http://json-schema.org/draft-04/schema#", "type": "object", "title": "exam", "description": "ABPM exam", "properties": { "version": { "type": "string", "description": "Schema version", "default": "1.0.0", "pattern": "^(\\d+\\.)?(\\d+\\.)?(\\d+)$" }, … 2023.07.20 - Unlocking Healthcare Data - Stefano Cotta Ramusino - EuroPython 2023

Slide 24

Slide 24 text

LET’S CREATE AN OPEN FORMAT import jsl class Version(jsl.Document): protocol = jsl.StringField( description = 'Protocol version', pattern='^(\d+\.)?(\d+\.)?(\d+)?([A-Za-z0-9\.]+)?$', required=True) 2023.07.20 - Unlocking Healthcare Data - Stefano Cotta Ramusino - EuroPython 2023

Slide 25

Slide 25 text

LET’S CREATE AN OPEN FORMAT 2023.07.20 - Unlocking Healthcare Data - Stefano Cotta Ramusino - EuroPython 2023 • Create a library to support the format • When reading the format, check the adherence to the schema • Create a converter from an another format to this open format • Spread the open format

Slide 26

Slide 26 text

CONTACTS 2023.07.20 - Unlocking Healthcare Data - Stefano Cotta Ramusino - EuroPython 2023 [email protected] torino.python.it @databeerstorino