Noise Pollution in Hospital Readmission Prediction: Long Document Classification with Reinforcement Learning

Noise Pollution in Hospital Readmission Prediction: Classiﬁcation with RL

Task 30-day hospital readmission prediction after kidney transplant using clinical
notes. Document classification problem without using specific domain knowledge.

Dataset Emory Kidney Transplant Dataset (EKTD) • 2060 patients; 3:7
positive-negative • 8 types of unstructured clinical notes • Zero-to-many available notes for each patient of each type

Challenges • Small sample size ◦ Mostly fewer than 2000
patients/labels • Long documents ◦ 10k+ tokens • Noisy documents ◦ Tabular text ◦ Task-irrelevant sentences Effective Representation Noise-Awareness

Preprocessing Remove tokens containing digits or symbols Noise Pollution

Noisy Text Tabular data, e.g. lab results, prescription Lab Fishbone
(BMP, CBC, CMP, Diff) and critical labs - Last 24 hours (Not an official lab report. Please see flowsheet (or printed official lab reports) for official lab results.) 07/20/2013 03:25 ~ 07/20/2013 03:25 146H(Na) 110(Cl) 16(BUN) ~ 10.6L(Hgb) -----|-----|-----<108(Glu) 5.3(WBC)>-------<178(Plt) 3.6(K) 29(CO2) 5.83H(Cr) ~ 34.6L(Hct) 07/20/2013 03:25 Ca 9.7 07/20/2013 03:25 ~~~~~ALP ~~~~~ALT ~~~~~AST ~~~~~~Bili ~~~~~~Prot ~~~~~~ALB -----|-----|-----|-----|-----|-----? ~~~~~54 ~~~~12 ~~~~13L ~~~~0.7 ~~~~6.4 ~~~~3.6 (c) = Corrected C = Critical H = High L = Low NA = Not applicable A = Abnormal (ftn) = footnote

Statistics Note Type # Patients # Tokens Descriptions Consultations (CO)
1354 4395.3 Report for every outpatient consultation before transplantation Discharge Summary (DS) 514 1296.7 Summary at the time of discharge from every hospital admission happened before transplant Echocardiography (EC) 1110 1073.6 Results of echocardiography History and Physical (HP) 1422 3025.1 Summary of the patient’s medical history and clinical examination Operative (OP) 1472 4224.8 Report of surgical procedures Progress (PG) 1415 13723.4 Medical note during hospitalization summarizing the patient’s medical status each day Selection Conference (SC) 2033 1189.2 Report from the evaluation of each transplant candidate by the selection committee Social Worker (SW) 1118 1407.6 Report from encounters with social workers

Approach Document classification of different encoders Document classification with reinforcement
learning Effective Representation Noise-Awareness

Approach Encoders: 1. Bag-of-Words (BoW) 2. Averaged word embedding 3.
Deep-learning encoders a. Transformers-based: ClinicalBERT (Huang et al., 2019) b. RNN-based: Bi-LSTM Same training objective: minimize negative log-likelihood of gold labels Baseline Overfitting

Encoders 3. ClinicalBERT (Huang et al., 2019) • For each
patient, split notes into independent segments • Each segment uses the same label as patient • Ensemble predictions from segments Introduce More Noise

Encoders 4. Bi-LSTM • For each patient, split notes into
short segments • Represent each segment by averaged word embedding • Run Bi-LSTM over segment representation for each patient • With weight-dropped (Merity et al., 2018)

Reinforcement Learning Objective: automatic noise pruning Assumption: reduce feature space
to alleviate overfitting Method: model the pruning process as a sequential decision problem on segment level (align with the fact that clinical documents are received in time-order)

Reinforcement Learning Components: best performed encoder + policy gradient Episode:
a sequence of segments of the patient State: previously selected segments + current segment Action: {keep, prune} Reward: log-likelihood of gold label using final selected segments

Reinforcement Learning Training: REINFORCE (Williams, 1992) Policy network: Policy gradient:

Reinforcement Learning Encourage pruning: provide additional reward in proportion to
pruning ratio Add entropy regularization (Mnih et al., 2016):

Experiments Evaluation metric: Area Under the Curve (AUC) Randomly split
5 folds, report average. Encoder CO DS EC HP OP PG SC SW BoW 58.6 62.1 52.0 58.9 51.8 61.2 59.3 51.6 + cutoff at 2 58.6 62.3 52.8 59.0 51.9 61.3 59.3 51.9 + stemming 58.9 61.8 53.4 59.4 51.9 61.5 59.3 51.6 Avg. emb 56.3 53.7 52.4 54.0 53.4 54.7 54.2 46.6 ClinicalBERT 51.9 53.3 - 52.7 - - 52.3 - LSTM 53.7 55.8 - 54.2 - - 54.5 -

Experiments Verify our noise observation: feature space reduction on BoW
Type Vanilla + Cutoff + Stemming CO 28213 15022 (46.8%) 12243 (56.6%) DS 11029 6117 (44.5%) 5228 (52.6%) HP 20245 11276 (44.3%) 9329 (53.9%) SC 19050 9873 (48.2%) 8200 (57.0%)

Experiments Bag-of-Words has the best performance. Deep learning fails? Strong
overfitting.

Experiments Performance of reinforcement learning: CO DS HP SC Best
58.9 62.3 59.4 59.3 RL 59.8 62.4 60.6 60.2 Pruning Ratio 26% 5% 19% 23%

Experiments RL tuning essentials: 1. Reward discount rate: keep the
scale of policy gradient stable 2. Entropy regularization: avoid local optima

Analysis Example of pruned segments Bold text: feature importance <
-1e-3

Analysis Example of kept segments Bold text: feature importance >
5e-4

Analysis Document frequency of tokens with top negative feature importance

Summary • The old bag-of-words is still a strong encoder
for this dataset with long text and small sample size. • Deep learning experiences strong overfitting for this dataset. • RL is able to further improve performance, while doing automatic noise pruning. • RL is able to identify two types of noise: typical noisy tokens, and task-specific noisy text.

References Kexin Huang, Jaan Altosaar, and Rajesh Ranganath. 2019. Clinicalbert:
Modeling clinical notes and predicting hospital readmission. Stephen Merity, Nitish Shirish Keskar, and Richard Socher. 2018. Regularizing and optimizing LSTM language models. Volodymyr Mnih, Adria Puigdomenech Badia, Mehdi Mirza, Alex Graves, Timothy Lillicrap, Tim Harley, David Silver, and Koray Kavukcuoglu. 2016. Asynchronous methods for deep reinforcement learning.

Noise Pollution in Hospital Readmission Predict...

Noise Pollution in Hospital Readmission Prediction: Long Document Classification with Reinforcement Learning

Emory NLP

More Decks by Emory NLP

Other Decks in Technology

Featured

Transcript

Noise Pollution in Hospital Readmission Prediction: Classiﬁcation with RL

Task 30-day hospital readmission prediction after kidney transplant using clinical

Dataset Emory Kidney Transplant Dataset (EKTD) • 2060 patients; 3:7

Challenges • Small sample size ◦ Mostly fewer than 2000

Preprocessing Remove tokens containing digits or symbols Noise Pollution

Noisy Text Tabular data, e.g. lab results, prescription Lab Fishbone

Statistics Note Type # Patients # Tokens Descriptions Consultations (CO)

Approach Document classification of different encoders Document classification with reinforcement

Approach Encoders: 1. Bag-of-Words (BoW) 2. Averaged word embedding 3.

Encoders 3. ClinicalBERT (Huang et al., 2019) • For each

Encoders 4. Bi-LSTM • For each patient, split notes into

Reinforcement Learning Objective: automatic noise pruning Assumption: reduce feature space

Reinforcement Learning Components: best performed encoder + policy gradient Episode:

Reinforcement Learning Training: REINFORCE (Williams, 1992) Policy network: Policy gradient:

Reinforcement Learning Encourage pruning: provide additional reward in proportion to

Experiments Evaluation metric: Area Under the Curve (AUC) Randomly split

Experiments Verify our noise observation: feature space reduction on BoW

Experiments Bag-of-Words has the best performance. Deep learning fails? Strong

Experiments Performance of reinforcement learning: CO DS HP SC Best

Experiments RL tuning essentials: 1. Reward discount rate: keep the

Analysis Example of pruned segments Bold text: feature importance <

Analysis Example of kept segments Bold text: feature importance >

Analysis Document frequency of tokens with top negative feature importance

Summary • The old bag-of-words is still a strong encoder

References Kexin Huang, Jaan Altosaar, and Rajesh Ranganath. 2019. Clinicalbert: