Machine learning with ventilator data to improve reporting on critically ill newborn infants
Our efforts to date as an open collaboration between the NHS and ModelInsight to use machine learning to label start-of-breaths in baby ventilator time series data. Presented at PyDataLondon 2017 conference.
neonatal ventilation and data analysis with Python Ian Ozsvald – Long-time Pythonista, PyDataLondon co- founder, ML consultant, author Giles Weaver – Bioinformatician turned Python data scientist, Review Committee member This collaboration builds on our PyData London January talk
Prematurity: lung, muscles and brain are too immature to support adequate gas exchange • Full-term babies may require intensive care (e.g. infection, after an operation, birth depression etc.) • We have >1500 “ventilator days” yearly • Ventilation is also an important part of paediatric and adult intensive care
ACTIVE process with the patient using negative pressures • Mechanical ventilation during general anaesthesia is a PASSIVE process with the ventilator using positive pressures • During neonatal intensive care patient do not receive full sedation or relaxation: ventilation is the combination and superimposition of these two pumps
the neonatal intensive care unit • Downloaded ~160 days of ventilator data from 59 ventilated neonates • Most recordings are >24 hours, usually 2-4 days • Time series data, sampling rate is 100 Hz (every 10 msec) • Data are retrieved as csv files • Generates approximately 650 Mbyte data / 24 hours of ventilation (1 ventilator day)
Mechanical ventilation is always a complex physical process due to interaction between the ventilator and the patient 3 hours of ventilation (~1,000,000 data points)
QUANTITATIVE indicators ventilator-patient interactions However… …this requires looking at individual breaths in isolation …which would require ventilator data to be split into individual breaths… …that is not feasible to do manually on a longer trace
night) – we have a working prototype • Summarisation of breathing statistics only possible if we’ve segmented them • Calculate “auto-PEEP” - a harmful condition for the baby • (future) Begin to classify patient-initiated or ventilator-initiated breaths and other ideas once we have segmentations in place
the breaths – what do we want to see? Ventilator delivers backup inflations if baby does not breathe for some time Breaths triggered by the baby are regularly spaced Different levels of ventilator contribution
data issues (timestamps #sigh) • Exploratory Data Analysis • Hand-building a Gold Standard for ML • Simple many-moving-averages “classifier” • Use of Random Forests and building up features for improved ML • Review with Dr. Belteki
25% test split per patient on 5 minutes of data (100Hz), 1 positive sample per 100 samples (approximately) • Built up a set of features that solved the problem reliably for most patients – rates of change and short-history indicators • Developed a GUI diagnostic tool
odd shapes, note the double predict_proba indications Conscious patient, they have strong breathing effort, ventilator has some contribution We see delayed triggering of the ventilator back-up breath in the blue example
annotating timeseries sections • Matplotlib used for ML diagnosis (last 2 slides) • Used Notebook widgets (ipywidgets) • Had to ask for a new feature (Notebook team responsive – thanks!) • If you wrote these from scratch you’d probably want to put aside several days
and we can explain why it works • Trained models generalise over time, we haven’t tested how well they generalise over patients • Next we’ll validate if we’re “good enough” to start work on auto-PEEP calcuations
different ways? • Do you have experience with recurrent neural networks (or similar recurrent approach) that might work? • We’re very open to feedback and ideas!
for providing the ventilator data download tool • All the doctors and nurses of Cambridge Neonatal Intensive Care Unit • ModelInsight and Endava’s financial support