Clustering and Comparing Information Extracted from Personal Health Messages

Clustering and Comparing Information Extracted from Personal Health Messages Yunliang
Jiang, PhD Jan.29, 2015

• Online communities in healthcare domain: • Challenges • Noisy
• Unstructured • Lack of connection with clinical practice How does healthcare info spread on web? 2 effectively analyze and re-organize the rich source of personal health messages by applying advanced data mining, IR and NLP

Yahoo! Groups(P1,P2) PubMed Health(P2) Medhelp(P3) Yahoo! Answers Twitter, Voice Messages
…… Whole text(P1) Drug(P2) Disease(P3) Treatment(P3) Symptom, diagnoses … Classification(P1) Topic model(P2) Clustering(P2) Sentiment Analysis(P2,P3) Statistical Testing(P3) Opinion Mining (P3) Hypothesis Prediction(P3) Time-series Analysis Social Network Analysis P1: Personal message Classification P2: Patient Drug Outcomes Integration P3: CER hypothesis prediction …… Yahoo! Groups(P1,P2) Sentiment Analysis(P2,P3) 3 Clustering and comparing information extracted from personal health messages Future directions

P1: Multi-class classification in online personal health messages (MedEx 2011
– workshop of CIKM 2011) • Task • Input: Message collection • Output: Three categories: News (N), User comments (C) and Spam (S) • Approach • Multi-class SVM classification • Features: Term-appearance, Lexical, Semantic • Result:

P2: Designing and evaluating a clustering system for integrating patient
drug outcomes (AMIA 2012) 5 Yahoo! Messages organized by Drug Classification News, Spam (eliminated) User comment (target) Comme nt units D Refine, extraction Outcome Integration & Separation Integrated Results Learn & Annotation Request, Search Audience s Gold Standard Judg es Make Insert Interactive system Interface Expert comments E

P2(cont.) • High performance to cluster patient outcomes using semi-PLSA
model • Effectively separate similar and opposite opinions • Find “unknown outcome” (e.g. Clonazepam may help to relief burning mouth syndrome) • A comprehensive evaluation framework with medical students

P3: Comparative Effectiveness Research(CER) hypothesis prediction • The opinions on
the effectiveness of different treatments are truly expressed by patients (patient opinion) • Personal messages: easy to achieve, scalable • We can conduct several CER hypothesis such as “Chemo is more effective than Hormonal therapy to treat breast cancer” – prove the existing CER conclusions – predict the unknown CER hypotheses 7

P3(cont.) • Source: MedHelp – Breast Cancer: 70K messages, 18K
patients • Chemo, Radiation, Hormonal therapy – Depression: 186K messages, 38K patients • Meditation therapy(yoga) v.s. Drug treatments (SSRI, SNRI, TCA) • Approach: – Preference and Sentiment Analysis – Train the comparison model, sentiment model 8

Medical Validation • Compared with real clinical trials • “Chemo
is more effective to younger group (age <50), compared with older group(age >50)” [Lancet] • “8 weeks of mindfulness meditation training was just as good as prolonged antidepressant treatment (SSRI, SNRI) over 18 months” [Archives of General Psychiatry – American Medical Association] • All consistent with our discoveries

Machine learning is everywhere! • ‘Who to follow’ recommendation •
User interest (TV/sports) detection and prediction • Request-pro matching • Relevant project recommendation

Clustering and Comparing Information Extracted ...

Clustering and Comparing Information Extracted from Personal Health Messages

Hakka Labs

More Decks by Hakka Labs

Other Decks in Programming

Featured

Transcript

Clustering and Comparing Information Extracted from Personal Health Messages Yunliang

• Online communities in healthcare domain: • Challenges • Noisy

Yahoo! Groups(P1,P2) PubMed Health(P2) Medhelp(P3) Yahoo! Answers Twitter, Voice Messages

P1: Multi-class classification in online personal health messages (MedEx 2011

P2: Designing and evaluating a clustering system for integrating patient

P2(cont.) • High performance to cluster patient outcomes using semi-PLSA

P3: Comparative Effectiveness Research(CER) hypothesis prediction • The opinions on

P3(cont.) • Source: MedHelp – Breast Cancer: 70K messages, 18K

Medical Validation • Compared with real clinical trials • “Chemo

Machine learning is everywhere! • ‘Who to follow’ recommendation •