View Distillation with Unlabeled Data for Extracting Adverse Drug Effects from User-Generated Data

Payam Karisani, Jinho D. Choi, Li Xiong Emory University SMM4H@NAACL
2021

2  ADRs are the unintended effects of drugs for
prevention, diagnosis, or treatment  ADR Classification task is detecting ADR reports in user-generated data [Yates and Goharian, 2013]  Positive: “this prozac is hitting me”  Negative: “prozac is the best drug”  It has been shown that social media users report the negative impact of drugs 11 months earlier than regular patients [Duh et al., 2016] Adverse Drug Effect Classification Task

3  Highly imbalanced class distributions  On average less
than 9% positive documents  Highly sparse language model  There are virtually unlimited inventive ways of using drug names in language  User-generated data is noisy  It is informal, lacks capitalization, lacks punctuation, typos are common  On average the F1 measure of the state-of-the-art models is about 0.65 Challenges in ADR Classification Task

4  ADR classification task has a long history 
Leaman et al., published the first work in 2010. They crawled drug discussion forums and used a sliding window to detect positive reports.  Currently state-of-the-art models rely on pretrained transformers  In SMM4H 2019 shared task, Chen et al., used pretrained BERT and were ranked 1st. (F1=0.645)  In SMM4H 2020 shared task, Wang et al., used pretrained RoBERTa and were ranked 1st. (F1=0.640)  In SMM4H 2021 shared task, Ramesh et al., used pretrained RoBERTa and were ranked 1st. (F1=0.610) Related Studies

5  Our model consists of four steps: 1) Extracting
two views from documents 2) Training one classifier on the data in each view and generating pseudo- labels 3) Using the pseudo-labels in each view to initialize a classifier in the other view 4) Further training the classifier in each view using labeled data View Distillation with Unlabeled Data

6 1) We use the document and keyword representations as
two views  The views are not conditionally independent, but represent different aspects [Balcan et al., 2004] 2) Each classifier labels the representation in one view, but the labels can be assigned to documents Extracting Two Views and Generating Pseudo-Labels [CLS] this prozac is hitting me soooo hard [SEP] BERT Document View Document Classifier [CLS] this prozac is hitting me soooo hard [SEP] BERT Keyword View Keyword Classifier [CLS] this prozac is hitting me [SEP] + ?  The labels in one view can be transferred to another view  Using two classifiers we label a large set of unlabeled documents

7 3) We use the pseudo-labels in each view to
pretrain a classifier in the other view Cross View Pretraining with Pseudo- Labels Drug Classifier (𝐶𝑔 ) Document Classifier (𝐶𝑑 ) Train Train Label Label Labeled Documents Unlabeled Documents Documents + - - - + - +++ - + - - - + - +++ - +++ - + - + - + - ++ - - + - - - - - +++ - - ++ - Drug Classifier (෢ 𝐶𝑔 ) Document Classifier (෢ 𝐶𝑑 ) Distill Distill Resulting Classifiers

8 4) Further training the classifier in each view using
the initial labeled documents Finetuning with Labeled Data  To label unseen documents the two classifiers are aggregated

9  We used the SMM4H ADR dataset to evaluate
our model.  This dataset is the largest benchmark on this topic.  It consists of 30,174 tweets of which 25,616 tweets are in the training set (with 9.2% positive rate).  Evaluation is done via a CodaLab webpage  We used BERT pretrained on 800K unlabeled drug related tweets as the feature extractor, and used the same tweet set to generate pseudo-labels  We compare with two sets of baseline models: 1. Models that we implemented with our own pretrained BERT model 2. Models available on CodaLab webpage Experiments

10  Main Result:  Comparison with single view algorithms:
 Comparison with other variations of pretraining/finetuning: Results and Analysis

Thank You! 11

View Distillation with Unlabeled Data for Extra...

View Distillation with Unlabeled Data for Extracting Adverse Drug Effects from User-Generated Data

Emory NLP

More Decks by Emory NLP

Other Decks in Technology

Featured

Transcript

Payam Karisani, Jinho D. Choi, Li Xiong Emory University SMM4H@NAACL

2  ADRs are the unintended effects of drugs for

3  Highly imbalanced class distributions  On average less

4  ADR classification task has a long history 

5  Our model consists of four steps: 1) Extracting

6 1) We use the document and keyword representations as

7 3) We use the pseudo-labels in each view to

8 4) Further training the classifier in each view using

9  We used the SMM4H ADR dataset to evaluate

10  Main Result:  Comparison with single view algorithms:

Thank You! 11