Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Clustering and Comparing Information Extracted ...

Hakka Labs
February 13, 2015

Clustering and Comparing Information Extracted from Personal Health Messages

Video here:

Hakka Labs

February 13, 2015
Tweet

More Decks by Hakka Labs

Other Decks in Programming

Transcript

  1. • Online communities in healthcare domain: • Challenges • Noisy

    • Unstructured • Lack of connection with clinical practice How does healthcare info spread on web? 2 effectively analyze and re-organize the rich source of personal health messages by applying advanced data mining, IR and NLP
  2. Yahoo! Groups(P1,P2) PubMed Health(P2) Medhelp(P3) Yahoo! Answers Twitter, Voice Messages

    …… Whole text(P1) Drug(P2) Disease(P3) Treatment(P3) Symptom, diagnoses … Classification(P1) Topic model(P2) Clustering(P2) Sentiment Analysis(P2,P3) Statistical Testing(P3) Opinion Mining (P3) Hypothesis Prediction(P3) Time-series Analysis Social Network Analysis P1: Personal message Classification P2: Patient Drug Outcomes Integration P3: CER hypothesis prediction …… Yahoo! Groups(P1,P2) Sentiment Analysis(P2,P3) 3 Clustering and comparing information extracted from personal health messages Future directions
  3. P1: Multi-class classification in online personal health messages (MedEx 2011

    – workshop of CIKM 2011) • Task • Input: Message collection • Output: Three categories: News (N), User comments (C) and Spam (S) • Approach • Multi-class SVM classification • Features: Term-appearance, Lexical, Semantic • Result:
  4. P2: Designing and evaluating a clustering system for integrating patient

    drug outcomes (AMIA 2012) 5 Yahoo! Messages organized by Drug Classification News, Spam (eliminated) User comment (target) Comme nt units D Refine, extraction Outcome Integration & Separation Integrated Results Learn & Annotation Request, Search Audience s Gold Standard Judg es Make Insert Interactive system Interface Expert comments E
  5. P2(cont.) • High performance to cluster patient outcomes using semi-PLSA

    model • Effectively separate similar and opposite opinions • Find “unknown outcome” (e.g. Clonazepam may help to relief burning mouth syndrome) • A comprehensive evaluation framework with medical students
  6. P3: Comparative Effectiveness Research(CER) hypothesis prediction • The opinions on

    the effectiveness of different treatments are truly expressed by patients (patient opinion) • Personal messages: easy to achieve, scalable • We can conduct several CER hypothesis such as “Chemo is more effective than Hormonal therapy to treat breast cancer” – prove the existing CER conclusions – predict the unknown CER hypotheses 7
  7. P3(cont.) • Source: MedHelp – Breast Cancer: 70K messages, 18K

    patients • Chemo, Radiation, Hormonal therapy – Depression: 186K messages, 38K patients • Meditation therapy(yoga) v.s. Drug treatments (SSRI, SNRI, TCA) • Approach: – Preference and Sentiment Analysis – Train the comparison model, sentiment model 8
  8. Medical Validation • Compared with real clinical trials • “Chemo

    is more effective to younger group (age <50), compared with older group(age >50)” [Lancet] • “8 weeks of mindfulness meditation training was just as good as prolonged antidepressant treatment (SSRI, SNRI) over 18 months” [Archives of General Psychiatry – American Medical Association] • All consistent with our discoveries
  9. Machine learning is everywhere! • ‘Who to follow’ recommendation •

    User interest (TV/sports) detection and prediction • Request-pro matching • Relevant project recommendation