A Brief Introduction to Hyperparamter Optimization (*with a focus on medical data)
This talk walks through a case study of building a sepsis prediction model, and discusses 3 techniques for sampling hyperparameters:
1) grid search
2) random search
3) sequential model-based optimization
the body's response to infection causes injury to its own tissues and organs” [1] 750, 000 patients are diagnosed with severe sepsis in the United States each year with a 30% mortality rate [2] costs $20.3 billion each year ($55.6 million per day) in U.S. hospitals [3] every hour that passes before treatment begins, a patients’ risk of death from sepsis increases by 8% [4]
of admission/discharge Patient demographics Age, gender, religion, marital status Prescriptions Which drugs were they prescribed and when? Unit transfers Did they move from the medical ward to ICU? Vital signs Heart rate, blood pressure, respiratory rate, spO2 Lab results Blood tests, urine tests Diagnoses ICD-10 codes Chest X-ray images DICOM format 50,000 hospital admissions and 40,000 patients
lung opacities in X-ray image • lung_abnormality = (0,1) • infection_size = [x,y,width,height] Pneumonia Pulmonary assess Clean up inconsistencies in medical terms • Aspirin vs. ASA (acetylsalicylic acid) • NS (normal saline) vs. 0.9% sodium chloride Unified Medical Language System This is a separate model in itself! NIH CXR dataset contains +100,000 annotated X-ray images
a patient? • ICD-10 codes [4], [5] : - Bacteremia - R78.81 - Sepsis unspecific - A41.9 - Acute hepatic failure without coma - K72.00 • Severity scores based on lab results and vitals: - SOFA: Sequential Organ Failure Assessment [6] - SIRS: Systemic Inflammatory Response Syndrome [7] - LODS: Logistic Organ Dysfunction System [8] * International Statistical Classification of Diseases and Related Health Problems (ICD), 10th revision, developed by the World Health Organization (WHO) * ICD codes are listed for billing patients at end of stay
prediction score that is based on the degree of dysfunction of six organ systems Jones et al. 2010. Crit Care Med. vitals blood test results urine test results Sepsis = acute change in total SOFA score ≥ 2 points upon infection (regardless of baseline) [9]
1002 1 1003 0 1004 1 A binary classification problem Output A probability score between 0 and 1 representing a patient’s likelihood of sepsis A forest of decision trees Patient Sepsis Sepsis No sepsis Final prediction: SEPSIS prob=0.667
really matter…” “…different hyper-parameters are important on different data sets” • Based on assumption that not all hyperparameters are equally important • Works by sampling hyperparamater values from a distribution Random Search
feature selection Sets weight of irrelevant features to 0 L2 norm (Ridge Regression) Handles multicollinearity Reduces weight of less important features ElasticNet Combination of L1 and L2 Define “mixture ratio” λ
- Bootstrapping - e.g. Synthetic Minority Over-sampling Technique (SMOTE) • Use information retrieval metrics (recall, precision, F1, confusion matrix) rather than accuracy • Example: 90% of patients did not have sepsis • Predict that all patients did not have sepsis = 90% accuracy
between male and female study subjects could all complicate the design of the study” Defining the “ground truth” Selecting the appropriate evaluation metric False positives vs. False negatives Is SOFA a reliable indicator of sepsis?
decades of mortality trends among patients with severe sepsis: a comparative meta-analysis. Crit Care Med 2014;42:625. 3) Cost H et al. In Healthcare Cost and Utilization Project (HCUP) Statistical Briefs: MDAgency for Healthcare Research and Quality USA, 2006. 4) Angus DC et al. Epidemiology of severe sepsis in the United States: analysis of incidence, outcome, and associated costs of care. Criti Care Med. 2001;1303-10. 5) Martin GS et al. The Epidemiology of Sepsis in the United States from 1979 through 2000. N Engl J Med 2003; 348:1546-1554. 6) Vincent JL et al. The SOFA (Sepsis-related Organ Failure Assessment) score to describe organ dysfunction/failure. Intensive Care Med. 1996;22:707–710. 7) Bone RC, Balk RA, Cerra FB, et al. Definitions for Sepsis and Organ Failure and Guidelines for the Use of Innovative Therapies in Sepsis. Chest 1992;101:1644-55. 8) Le Gall JR. et al. The Logistic Organ Dysfunction system. A new way to assess organ dysfunction in the intensive care unit. ICU Scoring Group. JAMA. 1996;276(10):802–10. 9) Seymour CW, Rea TD, Kahn JM, Walkey AJ, Yealy DM, Angus DC. Severe sepsis in pre-hospital emergency care: analysis of incidence, care, and outcome. Am J Respir Crit Care Med. 2012;186(12):1264–1271. References