I presented a talk on a research paper published by a research group in Arizona State University for an elective subject Bio-Medical NLP, I chose during my masters.
Karen O’Connor, Pranoti Pimpalkhute, Azadeh Nikfarjam, MS, Rachel Ginn, Karen L Smith and Graciela Gonzalez Presenter: Mahak Gupta Date: 26th Jan 2016 16.02.16 1
Pharmacovigilance ▪ Drug Development Life cycle ▪ Twitter as a Data source to for post marketing surveillance. ▪ Case study : How twitter data analytics compare with data analytics on public forums for Public health research ▪ Exciting session for next 45-50 minutes J 16.02.16 2
therapeutic action. - No Drug produces one effect. All unwanted effects are called side effects. - Maximize therapeutic effect and minimize side/adverse effects. ▪ Adverse Drug Reaction (WHO, 1972) “A response to a drug which is noxious and unintended, and which occurs at doses normally used in man for the prophylaxis, diagnosis, or therapy of disease, or for the modifications of physiological function” e.g Narcotic Analgesics -> Constipation antihistamines -> Sedation 16.02.16 4 Introduction – Some Important Terms
and activities relating to the detection, assessment, understanding and prevention of adverse effects or any other drug-related problem.” ▪ Why is it needed ? - Drug usage data collection on a larger public(post market) - Identify potential adverse reactions in the larger population - Product's safety, efficacy, or optimal use (post market) - Greater responsiveness and ability to fix a problem. 16.02.16 6 Introduction – Some Important Terms
Media : a very popular platform for sharing personal health- related information. (Health Forums like patientslikeme.com, Twitter, Facebook, Daily Strength) ▪ Spontaneous Reporting Systems : - US FDA’s MedWatch program - UK MHRA’s Yellow Card Scheme - EU EudraVigilance ▪ Cohort Event Monitoring 16.02.16 7
and opinions on various topics. ▪ 140 char limit, very Little informational value - Aggregation of a high volume of messages can generate important knowledge. ▪ Have broad implications on public health research. ▪ Advantages : Geographical dataview, past history dataview (of course post market), Identify KOL’s (key opinion leaders) ▪ Publically available and accessible by an API. However restricted access for large data. 16.02.16 8
pulmonary arterial hypertensionIndication . http://dlvr.it/CG9Nny #PH ▪ Anaphylactic reactionADR to Tylenoldrug , NSAIDs with 20 other things while I can't give her something for her fever ADR and make her heart stop ADR racing Some Examples from Twitter (Labelled) 16.02.16 9
has been increasing over the past years. ▪ Morbidity (disease) and mortality (death) rates associated with ADRs are considerable. An effective post market drug study related scheme is needed - Extract syndrome information (Geography based risk factors), - Information about symptoms and medications. 16.02.16 10
of a drug with 1-edit distance using CMU pronunciation dictionary For e.g Example: "prozac“: "proxac", "prozacc", "prozaq”... etc A list of 74 drugs are used to create the corpus. http://diego.asu.edu/download s/publications/ADRMine/drug_ na mes.txt Initially 187000 Tweets are found with mention of one of our target drug After noise removal. We’ve 10822 tweets. Is ADR Present ? More sophisticated Tweet Annotation 16.02.16 12
malignancies in NHL patients receiving Rituximab ADR: Malignancies (C0006826) Morphine makes me cry most of the time , blocks most of my pain ADR: crying (C00103) Indication: Emotional Change(C0001726) ah yes, I’m starting to think my paroxetine turns panic attacks into fat. Indication: Panic attack(C0086769) ADR: Weight gain (C0043094) Red color text : Drug name 1. 2. 3. 16.02.16 14
Semantic type * ADR * Indication * Drug interaction (ROA) * Beneficial effect - Drug name - UMLS/Concept Ids ▪ Data is collected from: SIDER, a subset of CHV (Consumer health vocabulary) and COSTART http://diego.asu.edu/downloads/publications/ADRMine/ADR_lexicon.tsv 16.02.16 15
for indexing and retrieval of lexicon in tweets. ▪ Every lexicon entry is lemmatized and the stop words are removed before indexing. ▪ To identify the ADR concepts in a post, a Lucene search is made on the preprocessed tokens of the tweet - Pattern Matching for concept identification. 16.02.16 17
word removal, stemming. ▪ Compare lexicon with comment entries - Sliding window of 5 tokens - Word order insensitive (Bag of words) - Misspelling – Jaro Winkler string similarity ▪ ADR Potentially present if the similarity score is greater than a configurable threshold. 16.02.16 18
will choke on my own vomit during sleep. I blame #Olanzapine #timetochange #bipolar” Concept Extraction: Over-eaten=increased appetite:ADR Bipolar=Bipolar disorder:ADR Olanzapine:Drug ▪ “Rules of Prozac: 1: You can never sleep, ever again. NEVER EVER 2: No you may NOT switch your brain off. Ever. 3: Exhaustion is your friend.” Concept Extraction: Never sleep = insomnia “not switch your brain off” = racing thoughts “exhaustion” – exhaustion: adverse effect Prozac: Drug 16.02.16 19 Examples
false positive (FP) and 50 false negative (FN) results (Machine analysis vs Manually annotated) ▪ Manually analysis of individual groups ▪ FP Group: 3 categories - Extraction of indications as ADRs Citrazine doesn’t take away my sneezing, But still makes me sleep - Extraction of terms from the tweets that were in the lexicon but not being used to discuss an ADR @TScpCancer You are sick of meds #crazyyou - Extraction of ADR mentions that were not experienced directly by the user. I’m too numb to feel, blow out the candle, blindness #np #cymbalta 16.02.16 24
ADR was expressed using colloquial or descriptive terminology It feels like I’m having a bath in my room -> excessive sweat - ADR mention was expressed in a similar but not a direct match to the lexicon entry. This makes my stomach feel like completely filled with air 16.02.16 25
FP and 29 FN ▪ Main three categories: - Novel adverse reaction phrases “liver problem”, “burn like a lobster”, “TURNED ME INTO THE SPAWN OF SATAN!!!” - poor approximate string matching - Ambiguous terms “brain fog” could refer to mental dullness or somnolence “numb” may refer to loss of sensation or emotional indifference 16.02.16 26
(char restriction, condensed user sentiments, Lot of abbreviations). #Tracleer #sleep #NeverWokeUp ▪ Very few tweets contain a drug name and ADR. Tracleer is ****ing good. It surprises me every time I take it J ▪ Lot of Spam/Noise Sildenafil viagra gets you best. Order online here ▪ Restricted access (query limit) 16.02.16 27
▪ Drug Development Lifecycle. What is Pharmacovigilance ? ▪ A lot of People do tweet about their adverse effects experiences from medications. And Yes, we can mine them to generate knowledge source ▪ Lexicon based approach for ADR extraction on tweets is just a moderate success. It is mainly due to twitter data format. i.e 140 char limit. ▪ There are some more research work on the similar topic which uses Machine Learning algorithms and also a case study to compare Machine learning approach with lexicon based approach. 16.02.16 29
Corpus and Classification Benchmark ▪ Pharmacovigilance on Twitter? Mining Tweets for Adverse Drug Reactions ▪ Towards Internet-Age Pharmacovigilance: Extracting Adverse Drug Reactions from User Posts to Health-Related Social Networks ▪ WHO Website ▪ FDA Website 16.02.16 30