Upgrade to Pro — share decks privately, control downloads, hide ads and more …

PhD Final Seminar: "Semantic Search as Inference"

Bevan Koopman
December 13, 2013

PhD Final Seminar: "Semantic Search as Inference"

Slides from the final seminar of PhD thesis entitled "Semantic Search as Inference: Applications in Health Informatics"

Bevan Koopman

December 13, 2013
Tweet

More Decks by Bevan Koopman

Other Decks in Research

Transcript

  1. Semantic Search as Inference Applications in Health Informatics Bevan Koopman

    School of Information Systems, QUT Australian e-Health Research Centre, CSIRO 1 Wednesday, 11 December 13
  2. Unifying Hypothesis 2 A unified theoretical model of semantic search

    as inference: • Integration of structured domain knowledge (ontologies) with • Statistical, information retrieval methods Provides the necessary mechanism for inference For effective semantic search of medical data. Wednesday, 11 December 13
  3. Unifying Hypothesis 2 A unified theoretical model of semantic search

    as inference: • Integration of structured domain knowledge (ontologies) with • Statistical, information retrieval methods Provides the necessary mechanism for inference For effective semantic search of medical data. Models that elicit the meaning behind the words found in documents and queries semantics Wednesday, 11 December 13
  4. Unifying Hypothesis 2 A unified theoretical model of semantic search

    as inference: • Integration of structured domain knowledge (ontologies) with • Statistical, information retrieval methods Provides the necessary mechanism for inference For effective semantic search of medical data. Wednesday, 11 December 13
  5. Unifying Hypothesis 2 A unified theoretical model of semantic search

    as inference: • Integration of structured domain knowledge (ontologies) with • Statistical, information retrieval methods Wednesday, 11 December 13
  6. Unifying Hypothesis 2 A unified theoretical model of semantic search

    as inference: • Integration of structured domain knowledge (ontologies) with • Statistical, information retrieval methods Provides the necessary mechanism for inference For effective semantic search of medical data. Wednesday, 11 December 13
  7. Unifying Hypothesis 2 A unified theoretical model of semantic search

    as inference: • Integration of structured domain knowledge (ontologies) with • Statistical, information retrieval methods Provides the necessary mechanism for inference For effective semantic search of medical data. Leveraging inference and semantics to infer relevant information. Wednesday, 11 December 13
  8. Unifying Hypothesis 2 A unified theoretical model of semantic search

    as inference: • Integration of structured domain knowledge (ontologies) with • Statistical, information retrieval methods Provides the necessary mechanism for inference For effective semantic search of medical data. Wednesday, 11 December 13
  9. Vocabulary Mismatch • Concept expressed in different ways, yet have

    similar meaning • hypertension = high blood pressure • Exacerbated in medical domain: • Medications and abbreviations • Associational and deductive inference required to overcome 4 Wednesday, 11 December 13
  10. Granularity Mismatch • Queries using general terms, relevant documents contain

    specific terms • Antipsychotics vs. Diazepam • Deductive inference required 5 Wednesday, 11 December 13
  11. Conceptual Implication • Terms within the document may logically infer

    the query terms • Dialysis Machine → Kidney Disease • Prevalent in medical domain: • Treatment → Disease • Organism → Disease • Deductive inference required 6 Wednesday, 11 December 13
  12. Inferences of Similarity • Strong semantic associations • Comorbidities •

    Innate dependence between medical concepts • Not modelled in ontologies • Association inference required 7 Wednesday, 11 December 13
  13. The Semantic Gap • Vocabulary mismatch • Granularity mismatch •

    Conceptual implication • Inferences of similarity • Context-specific Semantic Gap Issues 8 Wednesday, 11 December 13
  14. 10 Symbolic Representations and Ontologies Information Retrieval Often seen as

    competing paradigms, actually should be seen as complementary. “Attack cognitive problems on different levels” [Gardenfors, 1997] Wednesday, 11 December 13
  15. Symbolic Representations and Ontologies 11 • Formal semantics - well

    defined syntactic structures and has definite semantic interpretations • Inference is typically based on first order logic and is therefore deductive. • Often realised as ontologies. Wednesday, 11 December 13
  16. SNOMED CT • Large medical ontology (~300,000 concepts, ~1 million

    relationships) 12 Wednesday, 11 December 13
  17. Ontologies for Semantic Search 13 Advantage Limitation Deductive inference and

    reasoning Reliance on deductive reasoning No natural measure of semantic similarity Dealing with uncertainty and inconsistency Explicit background knowledge Context Insensitive Coverage Standardisation and interoperability Dealing with natural language Wednesday, 11 December 13
  18. IR for Semantic Search 15 Advantage Limitation Inference with uncertainty

    No support for deductive reasoning Context specific Generally applicable No explicit background knowledge Support for natural language Dependence on terms Wednesday, 11 December 13
  19. Retrieval Models • Probabilistic language models • Graph-based retrieval models

    • Capture dependence between terms / documents • Can utilise graph-based algorithms, e.g. PageRank • Graph represents underly most ontologies 16 P(Q|D) = Y q2Q P(q|D) P(q|D) = tfq,D + µcfq |C| µ + |D| [Ponte & Croft, 1998; Hiemstra, 1998; Zhai, 2007] [Turtle & Croft, 1991; Blanco & Lioma, 2012] Wednesday, 11 December 13
  20. 17 Semantics Bag-of-concepts Model Vocabulary Mismatch Graph-based Concept Weighing Model

    Vocabulary Mismatch Inference of Similarity Wednesday, 11 December 13
  21. 17 Semantics Graph INference Model (GIN) Vocabulary Mismatch Granularity Mismatch

    Conceptual Implication Inferences of Similarity Bag-of-concepts Model Vocabulary Mismatch Graph-based Concept Weighing Model Vocabulary Mismatch Inference of Similarity Wednesday, 11 December 13
  22. 17 Semantics Inference Graph INference Model (GIN) Vocabulary Mismatch Granularity

    Mismatch Conceptual Implication Inferences of Similarity Bag-of-concepts Model Vocabulary Mismatch Graph-based Concept Weighing Model Vocabulary Mismatch Inference of Similarity Wednesday, 11 December 13
  23. BoC Overview 19 • Represent query & docs as SNOMED

    CT concepts • Concept-based representation differ from terms: semantically & statistically • Tackles the issue of semantics mainly by addressing vocabulary mismatch Wednesday, 11 December 13
  24. Convert Terms to Concepts 20 “metastatic breast cancer” “metastatic” “breast”

    “cancer” 60278488 Term Encapsulation Wednesday, 11 December 13
  25. Convert Terms to Concepts “human immunodeficiency virus” “T-lymphotropic virus” “HIV”

    “AIDS” 20 “metastatic breast cancer” “metastatic” “breast” “cancer” 60278488 Term Encapsulation Wednesday, 11 December 13
  26. Convert Terms to Concepts “human immunodeficiency virus” “T-lymphotropic virus” “HIV”

    “AIDS” 86406008 20 “metastatic breast cancer” “metastatic” “breast” “cancer” 60278488 Term Encapsulation Wednesday, 11 December 13
  27. Convert Terms to Concepts “human immunodeficiency virus” “T-lymphotropic virus” “HIV”

    “AIDS” 86406008 20 “metastatic breast cancer” “metastatic” “breast” “cancer” 60278488 Term Encapsulation Conflating Term-variants Wednesday, 11 December 13
  28. Convert Terms to Concepts “esophageal reflux” “human immunodeficiency virus” “T-lymphotropic

    virus” “HIV” “AIDS” 86406008 20 “metastatic breast cancer” “metastatic” “breast” “cancer” 60278488 Term Encapsulation Conflating Term-variants Wednesday, 11 December 13
  29. Convert Terms to Concepts “esophageal reflux” “human immunodeficiency virus” “T-lymphotropic

    virus” “HIV” “AIDS” 86406008 235595009 Gastroesophageal reflux 196600005 Acid reflux or oesophagitis 47268002 Reflux 249496004 Esophageal reflux finding 20 “metastatic breast cancer” “metastatic” “breast” “cancer” 60278488 Term Encapsulation Conflating Term-variants Wednesday, 11 December 13
  30. Convert Terms to Concepts “esophageal reflux” “human immunodeficiency virus” “T-lymphotropic

    virus” “HIV” “AIDS” 86406008 235595009 Gastroesophageal reflux 196600005 Acid reflux or oesophagitis 47268002 Reflux 249496004 Esophageal reflux finding 20 “metastatic breast cancer” “metastatic” “breast” “cancer” 60278488 Term Encapsulation Conflating Term-variants Concept Expansion Wednesday, 11 December 13
  31. BoC Implementation • Convert documents and queries to concepts using

    MetaMap • Concept-based representation • Two concept-based retrieval models: 21 [Aronson and Lang, 2010] P ( Qc |Dc) / X qc 2Qc log cfqc,Dc + µCfqc |Cc | µ + |Dc | ! RSV( Dc, Qc) = X qc 2Qc cfqc,Dc ( k1 + 1) cfqc,Dc + k1(1 b + b |Dc | |Davg c | ) log N nqc Concept Language Model Lemur’s TF-IDF Model (BM25 weighting) Wednesday, 11 December 13
  32. Empirical Evaluation • Long history empirical evaluation • TREC paradigm

    • Queries • Document corpus • Relevance Assessments • TREC Medical Records Track 22 Wednesday, 11 December 13
  33. Evaluation Measures 23 precision = |Drel \ Dret | |Dret

    | 6 10 = 0.6000 Wednesday, 11 December 13
  34. Evaluation Measures 23 precision = |Drel \ Dret | |Dret

    | recall = |Drel \ Dret | |Drel | 6 10 = 0.6000 6 12 = 0.5000 Wednesday, 11 December 13
  35. • Bpref counts only judged documents Evaluation Measures 23 precision

    = |Drel \ Dret | |Dret | recall = |Drel \ Dret | |Drel | 6 10 = 0.6000 6 12 = 0.5000 bpref = 1 |R| X r2R (1 |8n(n 2 ¯ R ^ n < r)| |R| ) Wednesday, 11 December 13
  36. Results 24 Model Bpref Prec@10 Terms 0.3934 0.4753 Bag-of-concepts 0.4513

    (+15%) 0.5395 (+14%) Wednesday, 11 December 13
  37. Results 25 0.0 0.2 0.4 0.6 0.8 UMLS bpref Query

    • • • • • • Q110 Q128 Q119 Q161 Q117 Q133 Terms UMLS 0.0 0.2 0.4 0.6 0.8 SNOMED bpref Quer Q110 Q Q128 Q119 • • • Terms SNOMED Bag-of-concepts Wednesday, 11 December 13
  38. 26 Semantics Inference Bag-of-concepts Model Vocabulary Mismatch Graph-based Concept Weighing

    Model Vocabulary Mismatch Inference of Similarity Wednesday, 11 December 13
  39. Graph Weighting 27 • Graphs capture the innate dependence between

    medical concepts • Apply PageRank for concept weighting dental pain lower procedures right S(vright) S(vpain) S(vlower) S(vprocedurs) Wednesday, 11 December 13
  40. Concept Importance Weighting • Adjust original concept weight by the

    “background” importance of concept in medical domain: Wednesday, 11 December 13
  41. Concept Importance Weighting • Adjust original concept weight by the

    “background” importance of concept in medical domain: w ( c, dc) = idf ( c ) ⇤ S ( vc) ⇤ log( |Vs( c ) | ) Wednesday, 11 December 13
  42. Concept Importance Weighting • Adjust original concept weight by the

    “background” importance of concept in medical domain: document w ( c, dc) = idf ( c ) ⇤ S ( vc) ⇤ log( |Vs( c ) | ) Wednesday, 11 December 13
  43. Concept Importance Weighting • Adjust original concept weight by the

    “background” importance of concept in medical domain: corpus document w ( c, dc) = idf ( c ) ⇤ S ( vc) ⇤ log( |Vs( c ) | ) Wednesday, 11 December 13
  44. Concept Importance Weighting • Adjust original concept weight by the

    “background” importance of concept in medical domain: corpus document domain w ( c, dc) = idf ( c ) ⇤ S ( vc) ⇤ log( |Vs( c ) | ) Wednesday, 11 December 13
  45. Run Bpref Prec@10 terms-tfidf 0.4722 0.4882 concepts-tfidf 0.4993 0.5176 terms-graph

    0.4393 0.4882 concepts-graph 0.5050 (+15%) 0.5441 (+11%) concepts-graph-snomed 0.5245 (+19%) 0.5559 (+14%) Retrieval Results Wednesday, 11 December 13
  46. 30 Semantics Bag-of-concepts Model Vocabulary Mismatch Graph-based Concept Weighing Model

    Vocabulary Mismatch Inference of Similarity Wednesday, 11 December 13
  47. 30 Semantics Inference Graph INference Model (GIN) Vocabulary Mismatch Granularity

    Mismatch Conceptual Implication Inferences of Similarity Bag-of-concepts Model Vocabulary Mismatch Graph-based Concept Weighing Model Vocabulary Mismatch Inference of Similarity Wednesday, 11 December 13
  48. Retrieval as Inference • Models retrieval as non-classical implication [van

    Rijsbergen, 1986; Nie, 1989] 32 d ! q Wednesday, 11 December 13
  49. Retrieval as Inference • Models retrieval as non-classical implication [van

    Rijsbergen, 1986; Nie, 1989] 32 d ! q P(d ! q) Wednesday, 11 December 13
  50. Retrieval as Inference • Models retrieval as non-classical implication [van

    Rijsbergen, 1986; Nie, 1989] 32 d ! q P(d ! q) d0 ! q • Logical Uncertainty Principle Wednesday, 11 December 13
  51. Retrieval as Inference • Models retrieval as non-classical implication [van

    Rijsbergen, 1986; Nie, 1989] 32 d ! q P(d ! q) d0 ! q P(d ! q) / (d, d0) / O i (di 1, di) • Logical Uncertainty Principle Wednesday, 11 December 13
  52. d ... ... d2 d' ... ... ... d1 ...

    Retrieval as Inference • Models retrieval as non-classical implication [van Rijsbergen, 1986; Nie, 1989] 32 d ! q P(d ! q) d0 ! q P(d ! q) / (d, d0) / O i (di 1, di) • Logical Uncertainty Principle Wednesday, 11 December 13
  53. d ... ... d2 d' ... ... ... d1 ...

    Retrieval as Inference • Models retrieval as non-classical implication [van Rijsbergen, 1986; Nie, 1989] 32 d ! q P(d ! q) d0 ! q P(d ! q) / (d, d0) / O i (di 1, di) • Logical Uncertainty Principle Wednesday, 11 December 13
  54. Graph Inference Model Theory 33 P(d ! q) = K

    uq 2q m ud 2d P(ud ! uq) / K uq 2q m ud 2d P(ud |d) (ud, uq). (u2, u1) (u3, uq) (u4, uq) (u1, uq) P(u3 |d2) P(u2 |d1) P(uq |d1), P(uq |d2) P(u1 |d1) P(u4 |d3) ( u, u0 ) = 8 > < > : 1 , 0( u, u0 ) , arg maxui 2U:uRui ( u, ui) ⌦ Wednesday, 11 December 13
  55. Building Blocks 34 • Information Units (IUs) • Basic term

    or concept • Information Relationships • Relationship between IUs • Information Graph • Queries & Documents u 2 U R ✓ U ⇥ U G = hU, Ri q = hu0, . . . , um i d = hu0, . . . , un i Wednesday, 11 December 13
  56. Corpus Graph 35 u2 {d1, d2 } u1 {d1 }

    u0 {d1 } Wednesday, 11 December 13
  57. Corpus Graph 35 u2 {d1, d2 } u1 {d1 }

    u0 {d1 } Attach documents Wednesday, 11 December 13
  58. Corpus Graph 35 P(u2 |d1), P(u2d2) P(u1 |d1) P(u0 |d1)

    Assign Initial Probabilities Wednesday, 11 December 13
  59. Corpus Graph 35 P(u2 |d1), P(u2d2) P(u1 |d1) P(u0 |d1)

    Diffusion Factor Wednesday, 11 December 13
  60. Diffusion Factor • Measure of the strength of association between

    Information Units 36 ( u, u0 ) = 8 > < > : 1 , if u = u0 0( u, u0 ) , if uRu0 arg maxui 2U:uRui ( u, ui) ⌦ ( ui, u0 ) , otherwise 0(u, u0) = ↵ sim(u, u0) + (1 ↵) rel(u, u0) 0  ↵  1 • Captures both Semantic Similarity and Relationship Type Wednesday, 11 December 13
  61. • Probability of the implication between document and query: Retrieval

    Function 37 P(d ! q) P(ud ! uq) • Probability of the implication between single document IU and single query IU: Wednesday, 11 December 13
  62. Retrieval Function... • Strength of implication proportional to the diffusion

    factor: 38 P(ud ! uq) / (ud, uq) P(ud ! uq) / P(ud |d) (ud, uq) • And including the Initial Probabilities gives: Wednesday, 11 December 13
  63. General Retrieval Function • Evaluating each combination of query IU

    and document IU: 39 P(d ! q) = K uq 2q m ud 2d P(ud ! uq) / K uq 2q m ud 2d P(ud |d) (ud, uq) Wednesday, 11 December 13
  64. General Retrieval Function • Evaluating each combination of query IU

    and document IU: 39 P(d ! q) = K uq 2q m ud 2d P(ud ! uq) / K uq 2q m ud 2d P(ud |d) (ud, uq) RSV(d, q) = Y uq 2q Y ud 2d P(ud |d) (ud, uq). Wednesday, 11 December 13
  65. General Retrieval Function • Evaluating each combination of query IU

    and document IU: 39 Y X . . . Y X . . . P(d ! q) / K uq 2q m ud 2d P(ud |d) (ud, uq). Wednesday, 11 December 13
  66. Worked Example 40 q = huq i d1 = hu1,

    u2, uq i d2 = hu3, uq i d3 = hu4 i Wednesday, 11 December 13
  67. Worked Example 40 u1 : d1 u2 : d1 u3

    : d2 u4 : d3 uq : d1, d2, q q = huq i d1 = hu1, u2, uq i d2 = hu3, uq i d3 = hu4 i Wednesday, 11 December 13
  68. Corpus Graph 41 ra rc rd ra u3 {d2 }

    u2 {d1 } uq {d1, d2 } u1 {d1 } u4 {d3 } q = huq i d1 = hu1, u2, uq i d2 = hu3, uq i d3 = hu4 i Wednesday, 11 December 13
  69. Corpus Graph 41 (u2, u1) (u3, uq) (u4, uq) (u1,

    uq) P(u3 |d2) P(u2 |d1) P(uq |d1), P(uq |d2) P(u1 |d1) P(u4 |d3) ra rc rd ra u3 {d2 } u2 {d1 } uq {d1, d2 } u1 {d1 } u4 {d3 } q = huq i d1 = hu1, u2, uq i d2 = hu3, uq i d3 = hu4 i Wednesday, 11 December 13
  70. 42 P(u2 |dq) ⇤ (u2, u1) ⇤ (u1, uq) P(uq

    |d1) P(u1 |d1) ⇤ (u1, uq) P(u3 |d2) P(u2 |d1) P(uq |d1), P(uq |d2) P(u1 |d1) P(u4 |d3) Document d1 q = huq i d1 = hu1, u2, uq i d2 = hu3, uq i d3 = hu4 i Wednesday, 11 December 13
  71. 43 P(uq |d2) P(u3 |d2) ⇤ (u3, uq) P(u3 |d2)

    P(u2 |d1) P(uq |d1),P(uq |d2) P(u1 |d1) P(u4 |d3) Document d2 q = huq i d1 = hu1, u2, uq i d2 = hu3, uq i d3 = hu4 i Wednesday, 11 December 13
  72. 44 P(u4 |dq) ⇤ (u4, uq) P(u3 |d2) P(u2 |d1)

    P(uq |d1), P(uq |d2) P(u1 |d1) P(u4 |d3) Document d3 q = huq i d1 = hu1, u2, uq i d2 = hu3, uq i d3 = hu4 i Wednesday, 11 December 13
  73. Graph Inference Model Implementation 45 Algorithm 1 Pseudo code for

    e cient depath-first-search grap Input: Idx, Q, G, k . Index, Query, Gr Output: scores { d0, . . . , dn } . D 1: 2: for uq 2 Q do 3: DFS(uq, 0) . Start traverse from quer 4: 5: function DFS (u, depth) 6: if depth  k then 7: for di 2 Idx.docs(u) do . Documents containing 8: scores[di] = scores[di] + P(u | di) ⇤ df(u, uq) . Sc 9: 10: for u 0 2 children(u) do 11: DFS(u 0 , depth + 1) . Recursively traver Wednesday, 11 December 13
  74. Implementation 46 • Two parts: Indexing & Retrieval • Underlying

    ontology: SNOMED CT • Information Unit = SNOMED concept • Relationship = SNOMED concept relationship Wednesday, 11 December 13
  75. Graph Indexing 47 Inverted File Index Concept Document Ontology (SNOMED

    CT) Serialised Graph Wednesday, 11 December 13
  76. Graph Retrieval 48 (u2, u1) (u3, uq) (u4, uq) (u1,

    uq) P(u3 |d2) P(u2 |d1) P(uq |d1), P(uq |d2) P(u1 |d1) P(u4 |d3) ... ... (. . . , u2) Depth-first-search from query node Wednesday, 11 December 13
  77. Graph Retrieval 48 (u2, u1) (u3, uq) (u4, uq) (u1,

    uq) P(u3 |d2) P(u2 |d1) P(uq |d1), P(uq |d2) P(u1 |d1) P(u4 |d3) ... ... (. . . , u2) Wednesday, 11 December 13
  78. Graph Retrieval 48 (u2, u1) (u3, uq) (u4, uq) (u1,

    uq) P(u3 |d2) P(u2 |d1) P(uq |d1), P(uq |d2) P(u1 |d1) P(u4 |d3) ... ... (. . . , u2) Wednesday, 11 December 13
  79. Graph Retrieval 48 (u2, u1) (u3, uq) (u4, uq) (u1,

    uq) P(u3 |d2) P(u2 |d1) P(uq |d1), P(uq |d2) P(u1 |d1) P(u4 |d3) ... ... (. . . , u2) Wednesday, 11 December 13
  80. Graph Retrieval 48 (u2, u1) (u3, uq) (u4, uq) (u1,

    uq) P(u3 |d2) P(u2 |d1) P(uq |d1), P(uq |d2) P(u1 |d1) P(u4 |d3) ... ... (. . . , u2) level 0 level 1 level 2 level 3 Wednesday, 11 December 13
  81. Diffusion Factor • Corpus-driven semantic similarity measure • DocCosine between

    document vectors • SNOMED Relationship types • e.g., ISA, Causative agent or Finding site • Manually assigned weights u1 ra u2 49 Concept A Concept B Wednesday, 11 December 13
  82. Empirical Evaluation 50 0.0 0.2 0.4 0.6 0.8 1.0 lvl0

    vs. lvl1 bpref Query lvl0 lvl1 0.0 0.2 0.4 0.6 0.8 1.0 lvl0 vs. lvl2 bpref Query lvl0 lvl2 Wednesday, 11 December 13
  83. Retrieval Results 51 Depth (k) Bpref Prec@10 terms 0.3917 0.4975

    lvl0 0.4290 0.5123 lvl1 0.4229 0.4481† lvl2 0.4138 0.4259† Wednesday, 11 December 13
  84. Hard Queries 52 0.0 0.2 0.4 0.6 0.8 lvl1 vs.

    TREC median bpref Query TREC median lvl1 0.0 0.2 0.4 0.6 0.8 lvl2 vs. TREC median bpref Query TREC median lvl2 Wednesday, 11 December 13
  85. Consistent Improvements 54 • • • • • • •

    • • • • 0 2 4 6 8 10 0.0 0.4 0.8 Query 108 Depth bpref • • • • • • • • • • • 0 2 4 6 8 10 0.0 0.4 0.8 Query 171 Depth bpref Thyrotoxicosis with or without goiter (6/6) #12 Thyrotoxicosis (6/6) #12 Is a (1) Thyroid structure (5/11) #1929 Finding site (0.145163) Finding site (0.145163) Treated with (8 4087390 Is a Hyperthyroidism (4/10) #140 Is a (0.335019) Query 171: “Patients with thyrotoxicosis treated with beta blockers” Wednesday, 11 December 13
  86. Consistent Improvements 54 • • • • • • •

    • • • • 0 2 4 6 8 10 0.0 0.4 0.8 Query 108 Depth bpref • • • • • • • • • • • 0 2 4 6 8 10 0.0 0.4 0.8 Query 171 Depth bpref Thyrotoxicosis with or without goiter (6/6) #12 Thyrotoxicosis (6/6) #12 Is a (1) Thyroid structure (5/11) #1929 Finding site (0.145163) Finding site (0.145163) Treated with (8 4087390 Is a Hyperthyroidism (4/10) #140 Is a (0.335019) Query 171: “Patients with thyrotoxicosis treated with beta blockers” • Valuable related concepts from the ontology. • Inference always improves results and diffusion factor controls noise. Wednesday, 11 December 13
  87. Inference Not Required 55 • • • • • •

    • • • • • 0 2 4 6 8 10 0.0 0.4 0.8 Query 104 Depth bpref • • • • • • • • • • • 0 2 4 6 8 10 0.0 0.4 0.8 Query 161 Depth bpref Query 104: “Patients diagnosed with localized prostate cancer and treated with robotic surgery” oplasm of prostate (0/7) #58 Robot, device (8/8) #14 Biomedical device (0/1) #1272 Is a (0.100042) Surgical (9/9) #8471 106236003 Is a (0.1) mary (0/8) #2911 .374905) Wednesday, 11 December 13
  88. Inference Not Required 55 • • • • • •

    • • • • • 0 2 4 6 8 10 0.0 0.4 0.8 Query 104 Depth bpref • • • • • • • • • • • 0 2 4 6 8 10 0.0 0.4 0.8 Query 161 Depth bpref Query 104: “Patients diagnosed with localized prostate cancer and treated with robotic surgery” oplasm of prostate (0/7) #58 Robot, device (8/8) #14 Biomedical device (0/1) #1272 Is a (0.100042) Surgical (9/9) #8471 106236003 Is a (0.1) mary (0/8) #2911 .374905) • Easy, unambiguous queries, often with small number of relevant documents. • Inference degrades performance. Wednesday, 11 December 13
  89. Query Requiring Reranking 56 • • • • • •

    • • • • • 0 2 4 6 8 10 0.0 0.4 0.8 Query 113 Depth bpref • • • • • • • • • • • 0 2 4 6 8 10 0.0 0.4 0.8 Query 135 Depth bpref • • • • • • • • • • • 0 2 4 6 8 10 0.0 0.4 0.8 Query 119 Depth bpref 135 Secondary malignant neoplasm of liver (50/50) #158 Is a (0.1) Malignant neoplasm of liver (1/9) #58 Is a (0.227376) 275266006 Is a (0.1) Neoplasm, metastatic (9/59) #6447 Associated morphology (0.407046) Liver structure (9/58) #3966 Finding site (0.326051) Query 135: “Cancer patients with liver metastasis treated in the hospital who underwent a procedure” Wednesday, 11 December 13
  90. Query Requiring Reranking 56 • • • • • •

    • • • • • 0 2 4 6 8 10 0.0 0.4 0.8 Query 113 Depth bpref • • • • • • • • • • • 0 2 4 6 8 10 0.0 0.4 0.8 Query 135 Depth bpref • • • • • • • • • • • 0 2 4 6 8 10 0.0 0.4 0.8 Query 119 Depth bpref 135 Secondary malignant neoplasm of liver (50/50) #158 Is a (0.1) Malignant neoplasm of liver (1/9) #58 Is a (0.227376) 275266006 Is a (0.1) Neoplasm, metastatic (9/59) #6447 Associated morphology (0.407046) Liver structure (9/58) #3966 Finding site (0.326051) Query 135: “Cancer patients with liver metastasis treated in the hospital who underwent a procedure” • Verbose queries with multiple dependent query aspects. • The key query aspects contained many related concepts. • Small amounts of inference required (depth 1-2). Wednesday, 11 December 13
  91. Inference of New Relevant Documents 57 • • • •

    • • • • • • • 0 2 4 6 8 10 0.0 0.4 0.8 Query 147 Depth bpref • • • • • • • • • • • 0 2 4 6 8 10 0.0 0.4 0.8 Query 154 Depth bpref 147 Left lower quadrant pain (101/101) #165 68505006 Finding site (0.1) Lower abdominal pain (29/95) #301 Is a (0.31847) Lower abdomen structure (34/56) #359 Finding site (0.11778) Abdominal pain (146/244) #7097 Is a (0.0918044) Left sided abdominal pain (6/19) #58 Is a (0.0361314) Is a (0.195962) 423713007 Finding site (0.0195962) Query 147: “Patients with left lower quadrant abdominal pain” Wednesday, 11 December 13
  92. Inference of New Relevant Documents 57 • • • •

    • • • • • • • 0 2 4 6 8 10 0.0 0.4 0.8 Query 147 Depth bpref • • • • • • • • • • • 0 2 4 6 8 10 0.0 0.4 0.8 Query 154 Depth bpref 147 Left lower quadrant pain (101/101) #165 68505006 Finding site (0.1) Lower abdominal pain (29/95) #301 Is a (0.31847) Lower abdomen structure (34/56) #359 Finding site (0.11778) Abdominal pain (146/244) #7097 Is a (0.0918044) Left sided abdominal pain (6/19) #58 Is a (0.0361314) Is a (0.195962) 423713007 Finding site (0.0195962) Query 147: “Patients with left lower quadrant abdominal pain” • Domain knowledge essential in interpreting the query. • Relevant documents that do not contain any query terms. • Queries with multiple semantic gap issues. • Inference always improves performance. Wednesday, 11 December 13
  93. Queries unaffected by Inference 58 139 4190 Termination of pregnancy

    (22/22) #959 386637004 Is a (0.1) 360239007 Method (0.1) 128927009 Is a (0.1) Abortion (22/22) #214 Disorder of pregnancy (0/0) #1 Is a (0.1) • • • • • • • • • • • 0 2 4 6 8 10 0.0 0.4 0.8 Query 137 Depth bpref • • • • • • • • • • • 0 2 4 6 8 10 0.0 0.4 0.8 Query 139 Depth bpref Query 139: “Patients who presented to the emergency room with an actual or suspected miscarriage” Wednesday, 11 December 13
  94. Queries unaffected by Inference 58 139 4190 Termination of pregnancy

    (22/22) #959 386637004 Is a (0.1) 360239007 Method (0.1) 128927009 Is a (0.1) Abortion (22/22) #214 Disorder of pregnancy (0/0) #1 Is a (0.1) • • • • • • • • • • • 0 2 4 6 8 10 0.0 0.4 0.8 Query 137 Depth bpref • • • • • • • • • • • 0 2 4 6 8 10 0.0 0.4 0.8 Query 139 Depth bpref Query 139: “Patients who presented to the emergency room with an actual or suspected miscarriage” • Very hard queries; semantic gap cannot be bridged. • No domain knowledge available for terms/concepts in the query. • Inference has no effect. Wednesday, 11 December 13
  95. Bias in Evaluation 59 • GIN retrieves large number of

    unjudged documents Model Unjudged documents in top 20 results P@20 Terms 210 (2.5 docs / query) 0.4244 Bag-of-concepts (lvl0) 257 (3.0 docs / query) 0.4389 Graph-model (lvl1) 468 (5.5 docs / query) 0.4086 Graph-model (lvl2) 616 (7.2 docs / query) 0.3630 Wednesday, 11 December 13
  96. Additional Relevance Assessments • Recruited 4 UQ medical graduates •

    Judged approx. 1000 documents • Pooled from: • Bag-of-concepts (lvl0) • GIN (lvl1) • GIN (lvl2) 60 Wednesday, 11 December 13
  97. Additional Relevance Assessments • Recruited 4 UQ medical graduates •

    Judged approx. 1000 documents • Pooled from: • Bag-of-concepts (lvl0) • GIN (lvl1) • GIN (lvl2) 60 Irrelevant Somewhat relevant Highly relevant Number of documents 0 100 200 300 400 500 600 Relevant docs: 29% Wednesday, 11 December 13
  98. GIN Re-evaluation 61 Qrel set System Bpref P@10 P@20 TREC

    lvl0 0.4309 0.5123 0.4389 lvl1 0.4294 0.4481 0.4086 lvl2 0.4208 0.4247 0.3630 TREC + UQ lvl0 0.4252 (-1%) 0.5415 (+6%) 0.4732 (+8%) lvl1 0.4264 (0%) 0.5037 (+12%) 0.4604 (+12%) lvl2 0.4113 (-2%) 0.4878 (+15%) 0.4220 (+16%) Wednesday, 11 December 13
  99. GIN Re-evaluation 61 Qrel set System Bpref P@10 P@20 TREC

    lvl0 0.4309 0.5123 0.4389 lvl1 0.4294 0.4481 0.4086 lvl2 0.4208 0.4247 0.3630 TREC + UQ lvl0 0.4252 (-1%) 0.5415 (+6%) 0.4732 (+8%) lvl1 0.4264 (0%) 0.5037 (+12%) 0.4604 (+12%) lvl2 0.4113 (-2%) 0.4878 (+15%) 0.4220 (+16%) Wednesday, 11 December 13
  100. GIN Re-evaluation 61 Qrel set System Bpref P@10 P@20 TREC

    lvl0 0.4309 0.5123 0.4389 lvl1 0.4294 0.4481 0.4086 lvl2 0.4208 0.4247 0.3630 TREC + UQ lvl0 0.4252 (-1%) 0.5415 (+6%) 0.4732 (+8%) lvl1 0.4264 (0%) 0.5037 (+12%) 0.4604 (+12%) lvl2 0.4113 (-2%) 0.4878 (+15%) 0.4220 (+16%) Wednesday, 11 December 13
  101. GIN Re-evaluation 61 Qrel set System Bpref P@10 P@20 TREC

    lvl0 0.4309 0.5123 0.4389 lvl1 0.4294 0.4481 0.4086 lvl2 0.4208 0.4247 0.3630 TREC + UQ lvl0 0.4252 (-1%) 0.5415 (+6%) 0.4732 (+8%) lvl1 0.4264 (0%) 0.5037 (+12%) 0.4604 (+12%) lvl2 0.4113 (-2%) 0.4878 (+15%) 0.4220 (+16%) Wednesday, 11 December 13
  102. 62 0 20 40 60 80 0.0 0.2 0.4 0.6

    0.8 1.0 Queries Precision @ 20 lvl1 − TREC lvl1 − TREC + UQ GIN Re-evaluation Wednesday, 11 December 13
  103. Bridging the Semantic Gap 64 Semantic Gap Bag-of- concepts Graph

    Weighting Graph Inference Vocabulary Mismatch Granularity Mismatch # # Conceptual Implication Inference of Similarity # Wednesday, 11 December 13
  104. GIN: Unified Model A unified theoretical model of semantic search

    as inference: • Integration of structured domain knowledge (ontologies) • Statistical, information retrieval methods Provides the necessary mechanism for inference 65 Wednesday, 11 December 13
  105. GIN: Unified Model A unified theoretical model of semantic search

    as inference: • Integration of structured domain knowledge (ontologies) • Statistical, information retrieval methods Provides the necessary mechanism for inference 65 • Corpus Graph • Concept-based representations Wednesday, 11 December 13
  106. GIN: Unified Model A unified theoretical model of semantic search

    as inference: • Integration of structured domain knowledge (ontologies) • Statistical, information retrieval methods Provides the necessary mechanism for inference 65 • Corpus Graph • Concept-based representations • Probabilistic Relevance Estimation • Diffusion Factor Wednesday, 11 December 13
  107. GIN: Unified Model A unified theoretical model of semantic search

    as inference: • Integration of structured domain knowledge (ontologies) • Statistical, information retrieval methods Provides the necessary mechanism for inference 65 • Corpus Graph • Concept-based representations • Probabilistic Relevance Estimation • Diffusion Factor • GIN traversal Wednesday, 11 December 13
  108. Understanding Inference • Inference can be risky - hard queries

    • Definition vs. Retrieval Inference • The “what” vs. the “how” • Devise a domain knowledge resource specifically suited to retrieval inference? 66 [Frixione and Lieto, 2012] Wednesday, 11 December 13
  109. Resource for Retrieval Inference • Vocabulary: how things are described,

    not defined • Associations: relationships and strength • Granularity: quantified parent-child • Uncertainty: pragmatics - known, suspected. 67 Wednesday, 11 December 13
  110. Resource for Retrieval Inference 67 Requirement SNOMED CT Vocabulary Associations

    # Granularity # Uncertainty Wednesday, 11 December 13
  111. Successful Semantic Search Model • Good source of domain knowledge

    • Effective mapping free-text • Adaptive Inference Mechanism • Effective evaluation method 68 Wednesday, 11 December 13
  112. Future Work • Adaptive Depth - query prediction • Navigation

    and visualisation using the GIN • Query dependence • Query reduction • Web search using the GIN 69 Wednesday, 11 December 13
  113. Contributions: Models 1. Development & evaluation of concept- based representation

    for Medical IR 2. Graph-based Concept Weighting model 3. Unified model of semantic search as inference: Graph Inference Model 70 [Ch. 4] [Ch. 5] [Ch. 6] Wednesday, 11 December 13
  114. Contributions: Findings 4. Empirical evaluation of all three models 4.4.

    Understanding when and how to apply inference 4.5. Quality of underlying representation 5. Identification of Semantic Gap problems 6. Evaluating semantic search 71 [Ch. 6,8] [Ch. 2] Wednesday, 11 December 13
  115. Conclusion • Significant step forward in the integration of structured

    domain knowledge and data-driven IR methods. • Allows IR systems to exploit valuable information trapped in domain knowledge resources • GIN generally defined and applicable to other applications wanting to utilise structured knowledge resources for more effective semantic search 72 Wednesday, 11 December 13