PhD Final Seminar: "Semantic Search as Inference"

Semantic Search as Inference Applications in Health Informatics Bevan Koopman
School of Information Systems, QUT Australian e-Health Research Centre, CSIRO 1 Wednesday, 11 December 13

Unifying Hypothesis 2 A uniﬁed theoretical model of semantic search
as inference: • Integration of structured domain knowledge (ontologies) with • Statistical, information retrieval methods Provides the necessary mechanism for inference For effective semantic search of medical data. Wednesday, 11 December 13

as inference: • Integration of structured domain knowledge (ontologies) with • Statistical, information retrieval methods Provides the necessary mechanism for inference For effective semantic search of medical data. Models that elicit the meaning behind the words found in documents and queries semantics Wednesday, 11 December 13

as inference: • Integration of structured domain knowledge (ontologies) with • Statistical, information retrieval methods Wednesday, 11 December 13

as inference: • Integration of structured domain knowledge (ontologies) with • Statistical, information retrieval methods Provides the necessary mechanism for inference For effective semantic search of medical data. Leveraging inference and semantics to infer relevant information. Wednesday, 11 December 13

Bridging the Semantic Gap 3 Wednesday, 11 December 13

Vocabulary Mismatch • Concept expressed in different ways, yet have
similar meaning • hypertension = high blood pressure • Exacerbated in medical domain: • Medications and abbreviations • Associational and deductive inference required to overcome 4 Wednesday, 11 December 13

Granularity Mismatch • Queries using general terms, relevant documents contain
speciﬁc terms • Antipsychotics vs. Diazepam • Deductive inference required 5 Wednesday, 11 December 13

Conceptual Implication • Terms within the document may logically infer
the query terms • Dialysis Machine → Kidney Disease • Prevalent in medical domain: • Treatment → Disease • Organism → Disease • Deductive inference required 6 Wednesday, 11 December 13

Inferences of Similarity • Strong semantic associations • Comorbidities •
Innate dependence between medical concepts • Not modelled in ontologies • Association inference required 7 Wednesday, 11 December 13

The Semantic Gap • Vocabulary mismatch • Granularity mismatch •
Conceptual implication • Inferences of similarity • Context-speciﬁc Semantic Gap Issues 8 Wednesday, 11 December 13

Semantic Search and Medical IR 9 Wednesday, 11 December 13

10 Symbolic Representations and Ontologies Information Retrieval Wednesday, 11 December
13

10 Symbolic Representations and Ontologies Information Retrieval Often seen as
competing paradigms, actually should be seen as complementary. “Attack cognitive problems on different levels” [Gardenfors, 1997] Wednesday, 11 December 13

10 Symbolic Representations and Ontologies Information Retrieval Wednesday, 11 December
13

10 Symbolic Representations and Ontologies Information Retrieval Semantic Search Wednesday,
11 December 13

10 Symbolic Representations and Ontologies Information Retrieval Semantic Search Semantic
Search as Inference Wednesday, 11 December 13

Symbolic Representations and Ontologies 11 • Formal semantics - well
defined syntactic structures and has definite semantic interpretations • Inference is typically based on first order logic and is therefore deductive. • Often realised as ontologies. Wednesday, 11 December 13

SNOMED CT • Large medical ontology (~300,000 concepts, ~1 million
relationships) 12 Wednesday, 11 December 13

Ontologies for Semantic Search 13 Advantage Limitation Deductive inference and
reasoning Reliance on deductive reasoning No natural measure of semantic similarity Dealing with uncertainty and inconsistency Explicit background knowledge Context Insensitive Coverage Standardisation and interoperability Dealing with natural language Wednesday, 11 December 13

Information Retrieval 14 Wednesday, 11 December 13

IR for Semantic Search 15 Advantage Limitation Inference with uncertainty
No support for deductive reasoning Context speciﬁc Generally applicable No explicit background knowledge Support for natural language Dependence on terms Wednesday, 11 December 13

Retrieval Models • Probabilistic language models • Graph-based retrieval models
• Capture dependence between terms / documents • Can utilise graph-based algorithms, e.g. PageRank • Graph represents underly most ontologies 16 P(Q|D) = Y q2Q P(q|D) P(q|D) = tfq,D + µcfq |C| µ + |D| [Ponte & Croft, 1998; Hiemstra, 1998; Zhai, 2007] [Turtle & Croft, 1991; Blanco & Lioma, 2012] Wednesday, 11 December 13

17 Wednesday, 11 December 13

17 Bag-of-concepts Model Vocabulary Mismatch Wednesday, 11 December 13

17 Semantics Bag-of-concepts Model Vocabulary Mismatch Wednesday, 11 December 13

17 Semantics Bag-of-concepts Model Vocabulary Mismatch Graph-based Concept Weighing Model
Vocabulary Mismatch Inference of Similarity Wednesday, 11 December 13

17 Semantics Graph INference Model (GIN) Vocabulary Mismatch Granularity Mismatch
Conceptual Implication Inferences of Similarity Bag-of-concepts Model Vocabulary Mismatch Graph-based Concept Weighing Model Vocabulary Mismatch Inference of Similarity Wednesday, 11 December 13

17 Semantics Inference Graph INference Model (GIN) Vocabulary Mismatch Granularity
Mismatch Conceptual Implication Inferences of Similarity Bag-of-concepts Model Vocabulary Mismatch Graph-based Concept Weighing Model Vocabulary Mismatch Inference of Similarity Wednesday, 11 December 13

Bag-of-Concepts Model 18 Wednesday, 11 December 13

BoC Overview 19 • Represent query & docs as SNOMED
CT concepts • Concept-based representation differ from terms: semantically & statistically • Tackles the issue of semantics mainly by addressing vocabulary mismatch Wednesday, 11 December 13

Convert Terms to Concepts 20 “metastatic breast cancer” Wednesday, 11
December 13

Convert Terms to Concepts 20 “metastatic breast cancer” “metastatic” “breast”
“cancer” Wednesday, 11 December 13

“cancer” 60278488 Wednesday, 11 December 13

“cancer” 60278488 Term Encapsulation Wednesday, 11 December 13

Convert Terms to Concepts “human immunodeﬁciency virus” “T-lymphotropic virus” “HIV”
“AIDS” 20 “metastatic breast cancer” “metastatic” “breast” “cancer” 60278488 Term Encapsulation Wednesday, 11 December 13

“AIDS” 86406008 20 “metastatic breast cancer” “metastatic” “breast” “cancer” 60278488 Term Encapsulation Wednesday, 11 December 13

“AIDS” 86406008 20 “metastatic breast cancer” “metastatic” “breast” “cancer” 60278488 Term Encapsulation Conﬂating Term-variants Wednesday, 11 December 13

Convert Terms to Concepts “esophageal reflux” “human immunodeficiency virus” “T-lymphotropic
virus” “HIV” “AIDS” 86406008 20 “metastatic breast cancer” “metastatic” “breast” “cancer” 60278488 Term Encapsulation Conflating Term-variants Wednesday, 11 December 13

virus” “HIV” “AIDS” 86406008 235595009 Gastroesophageal reflux 196600005 Acid reflux or oesophagitis 47268002 Reflux 249496004 Esophageal reflux finding 20 “metastatic breast cancer” “metastatic” “breast” “cancer” 60278488 Term Encapsulation Conflating Term-variants Wednesday, 11 December 13

virus” “HIV” “AIDS” 86406008 235595009 Gastroesophageal reflux 196600005 Acid reflux or oesophagitis 47268002 Reflux 249496004 Esophageal reflux finding 20 “metastatic breast cancer” “metastatic” “breast” “cancer” 60278488 Term Encapsulation Conflating Term-variants Concept Expansion Wednesday, 11 December 13

BoC Implementation • Convert documents and queries to concepts using
MetaMap • Concept-based representation • Two concept-based retrieval models: 21 [Aronson and Lang, 2010] P ( Qc |Dc) / X qc 2Qc log cfqc,Dc + µCfqc |Cc | µ + |Dc | ! RSV( Dc, Qc) = X qc 2Qc cfqc,Dc ( k1 + 1) cfqc,Dc + k1(1 b + b |Dc | |Davg c | ) log N nqc Concept Language Model Lemur’s TF-IDF Model (BM25 weighting) Wednesday, 11 December 13

Empirical Evaluation • Long history empirical evaluation • TREC paradigm
• Queries • Document corpus • Relevance Assessments • TREC Medical Records Track 22 Wednesday, 11 December 13

Evaluation Measures 23 precision = |Drel \ Dret | |Dret
| 6 10 = 0.6000 Wednesday, 11 December 13

Results 24 Model Bpref Prec@10 Terms 0.3934 0.4753 Bag-of-concepts 0.4513
(+15%) 0.5395 (+14%) Wednesday, 11 December 13

Results 25 0.0 0.2 0.4 0.6 0.8 UMLS bpref Query
• • • • • • Q110 Q128 Q119 Q161 Q117 Q133 Terms UMLS 0.0 0.2 0.4 0.6 0.8 SNOMED bpref Quer Q110 Q Q128 Q119 • • • Terms SNOMED Bag-of-concepts Wednesday, 11 December 13

26 Semantics Inference Bag-of-concepts Model Vocabulary Mismatch Wednesday, 11 December
13

26 Semantics Inference Bag-of-concepts Model Vocabulary Mismatch Graph-based Concept Weighing
Model Vocabulary Mismatch Inference of Similarity Wednesday, 11 December 13

Graph Weighting 27 • Graphs capture the innate dependence between
medical concepts • Apply PageRank for concept weighting dental pain lower procedures right S(vright) S(vpain) S(vlower) S(vprocedurs) Wednesday, 11 December 13

Concept Importance Weighting • Adjust original concept weight by the
“background” importance of concept in medical domain: Wednesday, 11 December 13

“background” importance of concept in medical domain: w ( c, dc) = idf ( c ) ⇤ S ( vc) ⇤ log( |Vs( c ) | ) Wednesday, 11 December 13

“background” importance of concept in medical domain: document w ( c, dc) = idf ( c ) ⇤ S ( vc) ⇤ log( |Vs( c ) | ) Wednesday, 11 December 13

“background” importance of concept in medical domain: corpus document w ( c, dc) = idf ( c ) ⇤ S ( vc) ⇤ log( |Vs( c ) | ) Wednesday, 11 December 13

“background” importance of concept in medical domain: corpus document domain w ( c, dc) = idf ( c ) ⇤ S ( vc) ⇤ log( |Vs( c ) | ) Wednesday, 11 December 13

Run Bpref Prec@10 terms-tﬁdf 0.4722 0.4882 concepts-tﬁdf 0.4993 0.5176 terms-graph
0.4393 0.4882 concepts-graph 0.5050 (+15%) 0.5441 (+11%) concepts-graph-snomed 0.5245 (+19%) 0.5559 (+14%) Retrieval Results Wednesday, 11 December 13

30 Semantics Bag-of-concepts Model Vocabulary Mismatch Graph-based Concept Weighing Model
Vocabulary Mismatch Inference of Similarity Wednesday, 11 December 13

30 Semantics Inference Graph INference Model (GIN) Vocabulary Mismatch Granularity
Mismatch Conceptual Implication Inferences of Similarity Bag-of-concepts Model Vocabulary Mismatch Graph-based Concept Weighing Model Vocabulary Mismatch Inference of Similarity Wednesday, 11 December 13

Graph Inference Model 31 Wednesday, 11 December 13

Retrieval as Inference • Models retrieval as non-classical implication [van
Rijsbergen, 1986; Nie, 1989] 32 d ! q Wednesday, 11 December 13

Rijsbergen, 1986; Nie, 1989] 32 d ! q P(d ! q) Wednesday, 11 December 13

Rijsbergen, 1986; Nie, 1989] 32 d ! q P(d ! q) d0 ! q • Logical Uncertainty Principle Wednesday, 11 December 13

Rijsbergen, 1986; Nie, 1989] 32 d ! q P(d ! q) d0 ! q P(d ! q) / (d, d0) / O i (di 1, di) • Logical Uncertainty Principle Wednesday, 11 December 13

d ... ... d2 d' ... ... ... d1 ...
Retrieval as Inference • Models retrieval as non-classical implication [van Rijsbergen, 1986; Nie, 1989] 32 d ! q P(d ! q) d0 ! q P(d ! q) / (d, d0) / O i (di 1, di) • Logical Uncertainty Principle Wednesday, 11 December 13

Building Blocks 34 • Information Units (IUs) • Basic term
or concept • Information Relationships • Relationship between IUs • Information Graph • Queries & Documents u 2 U R ✓ U ⇥ U G = hU, Ri q = hu0, . . . , um i d = hu0, . . . , un i Wednesday, 11 December 13

Corpus Graph 35 u2 {d1, d2 } u1 {d1 }
u0 {d1 } Wednesday, 11 December 13

Corpus Graph 35 u2 {d1, d2 } u1 {d1 }
u0 {d1 } Attach documents Wednesday, 11 December 13

Corpus Graph 35 P(u2 |d1), P(u2d2) P(u1 |d1) P(u0 |d1)
Assign Initial Probabilities Wednesday, 11 December 13

Corpus Graph 35 P(u2 |d1), P(u2d2) P(u1 |d1) P(u0 |d1)
Diffusion Factor Wednesday, 11 December 13

Diffusion Factor • Measure of the strength of association between
Information Units 36 ( u, u0 ) = 8 > < > : 1 , if u = u0 0( u, u0 ) , if uRu0 arg maxui 2U:uRui ( u, ui) ⌦ ( ui, u0 ) , otherwise 0(u, u0) = ↵ sim(u, u0) + (1 ↵) rel(u, u0) 0  ↵  1 • Captures both Semantic Similarity and Relationship Type Wednesday, 11 December 13

• Probability of the implication between document and query: Retrieval
Function 37 P(d ! q) P(ud ! uq) • Probability of the implication between single document IU and single query IU: Wednesday, 11 December 13

Retrieval Function... • Strength of implication proportional to the diffusion
factor: 38 P(ud ! uq) / (ud, uq) P(ud ! uq) / P(ud |d) (ud, uq) • And including the Initial Probabilities gives: Wednesday, 11 December 13

General Retrieval Function • Evaluating each combination of query IU
and document IU: 39 P(d ! q) = K uq 2q m ud 2d P(ud ! uq) / K uq 2q m ud 2d P(ud |d) (ud, uq) Wednesday, 11 December 13

and document IU: 39 P(d ! q) = K uq 2q m ud 2d P(ud ! uq) / K uq 2q m ud 2d P(ud |d) (ud, uq) RSV(d, q) = Y uq 2q Y ud 2d P(ud |d) (ud, uq). Wednesday, 11 December 13

and document IU: 39 Y X . . . Y X . . . P(d ! q) / K uq 2q m ud 2d P(ud |d) (ud, uq). Wednesday, 11 December 13

Worked Example 40 q = huq i d1 = hu1,
u2, uq i d2 = hu3, uq i d3 = hu4 i Wednesday, 11 December 13

Worked Example 40 u1 : d1 u2 : d1 u3
: d2 u4 : d3 uq : d1, d2, q q = huq i d1 = hu1, u2, uq i d2 = hu3, uq i d3 = hu4 i Wednesday, 11 December 13

Corpus Graph 41 ra rc rd ra u3 {d2 }
u2 {d1 } uq {d1, d2 } u1 {d1 } u4 {d3 } q = huq i d1 = hu1, u2, uq i d2 = hu3, uq i d3 = hu4 i Wednesday, 11 December 13

Corpus Graph 41 (u2, u1) (u3, uq) (u4, uq) (u1,
uq) P(u3 |d2) P(u2 |d1) P(uq |d1), P(uq |d2) P(u1 |d1) P(u4 |d3) ra rc rd ra u3 {d2 } u2 {d1 } uq {d1, d2 } u1 {d1 } u4 {d3 } q = huq i d1 = hu1, u2, uq i d2 = hu3, uq i d3 = hu4 i Wednesday, 11 December 13

42 P(u2 |dq) ⇤ (u2, u1) ⇤ (u1, uq) P(uq
|d1) P(u1 |d1) ⇤ (u1, uq) P(u3 |d2) P(u2 |d1) P(uq |d1), P(uq |d2) P(u1 |d1) P(u4 |d3) Document d1 q = huq i d1 = hu1, u2, uq i d2 = hu3, uq i d3 = hu4 i Wednesday, 11 December 13

43 P(uq |d2) P(u3 |d2) ⇤ (u3, uq) P(u3 |d2)
P(u2 |d1) P(uq |d1),P(uq |d2) P(u1 |d1) P(u4 |d3) Document d2 q = huq i d1 = hu1, u2, uq i d2 = hu3, uq i d3 = hu4 i Wednesday, 11 December 13

44 P(u4 |dq) ⇤ (u4, uq) P(u3 |d2) P(u2 |d1)
P(uq |d1), P(uq |d2) P(u1 |d1) P(u4 |d3) Document d3 q = huq i d1 = hu1, u2, uq i d2 = hu3, uq i d3 = hu4 i Wednesday, 11 December 13

Graph Inference Model Implementation 45 Algorithm 1 Pseudo code for
e cient depath-ﬁrst-search grap Input: Idx, Q, G, k . Index, Query, Gr Output: scores { d0, . . . , dn } . D 1: 2: for uq 2 Q do 3: DFS(uq, 0) . Start traverse from quer 4: 5: function DFS (u, depth) 6: if depth  k then 7: for di 2 Idx.docs(u) do . Documents containing 8: scores[di] = scores[di] + P(u | di) ⇤ df(u, uq) . Sc 9: 10: for u 0 2 children(u) do 11: DFS(u 0 , depth + 1) . Recursively traver Wednesday, 11 December 13

Implementation 46 • Two parts: Indexing & Retrieval • Underlying
ontology: SNOMED CT • Information Unit = SNOMED concept • Relationship = SNOMED concept relationship Wednesday, 11 December 13

Graph Indexing 47 Inverted File Index Concept Document Ontology (SNOMED
CT) Wednesday, 11 December 13

Graph Indexing 47 Inverted File Index Concept Document Ontology (SNOMED
CT) Serialised Graph Wednesday, 11 December 13

Graph Retrieval 48 (u2, u1) (u3, uq) (u4, uq) (u1,
uq) P(u3 |d2) P(u2 |d1) P(uq |d1), P(uq |d2) P(u1 |d1) P(u4 |d3) ... ... (. . . , u2) Depth-ﬁrst-search from query node Wednesday, 11 December 13

uq) P(u3 |d2) P(u2 |d1) P(uq |d1), P(uq |d2) P(u1 |d1) P(u4 |d3) ... ... (. . . , u2) Wednesday, 11 December 13

uq) P(u3 |d2) P(u2 |d1) P(uq |d1), P(uq |d2) P(u1 |d1) P(u4 |d3) ... ... (. . . , u2) level 0 level 1 level 2 level 3 Wednesday, 11 December 13

Diffusion Factor • Corpus-driven semantic similarity measure • DocCosine between
document vectors • SNOMED Relationship types • e.g., ISA, Causative agent or Finding site • Manually assigned weights u1 ra u2 49 Concept A Concept B Wednesday, 11 December 13

Empirical Evaluation 50 0.0 0.2 0.4 0.6 0.8 1.0 lvl0
vs. lvl1 bpref Query lvl0 lvl1 0.0 0.2 0.4 0.6 0.8 1.0 lvl0 vs. lvl2 bpref Query lvl0 lvl2 Wednesday, 11 December 13

Retrieval Results 51 Depth (k) Bpref Prec@10 terms 0.3917 0.4975
lvl0 0.4290 0.5123 lvl1 0.4229 0.4481† lvl2 0.4138 0.4259† Wednesday, 11 December 13

Hard Queries 52 0.0 0.2 0.4 0.6 0.8 lvl1 vs.
TREC median bpref Query TREC median lvl1 0.0 0.2 0.4 0.6 0.8 lvl2 vs. TREC median bpref Query TREC median lvl2 Wednesday, 11 December 13

53 Per-query Depth Wednesday, 11 December 13

Consistent Improvements 54 • • • • • • •
• • • • 0 2 4 6 8 10 0.0 0.4 0.8 Query 108 Depth bpref • • • • • • • • • • • 0 2 4 6 8 10 0.0 0.4 0.8 Query 171 Depth bpref Thyrotoxicosis with or without goiter (6/6) #12 Thyrotoxicosis (6/6) #12 Is a (1) Thyroid structure (5/11) #1929 Finding site (0.145163) Finding site (0.145163) Treated with (8 4087390 Is a Hyperthyroidism (4/10) #140 Is a (0.335019) Query 171: “Patients with thyrotoxicosis treated with beta blockers” Wednesday, 11 December 13

Consistent Improvements 54 • • • • • • •
• • • • 0 2 4 6 8 10 0.0 0.4 0.8 Query 108 Depth bpref • • • • • • • • • • • 0 2 4 6 8 10 0.0 0.4 0.8 Query 171 Depth bpref Thyrotoxicosis with or without goiter (6/6) #12 Thyrotoxicosis (6/6) #12 Is a (1) Thyroid structure (5/11) #1929 Finding site (0.145163) Finding site (0.145163) Treated with (8 4087390 Is a Hyperthyroidism (4/10) #140 Is a (0.335019) Query 171: “Patients with thyrotoxicosis treated with beta blockers” • Valuable related concepts from the ontology. • Inference always improves results and diffusion factor controls noise. Wednesday, 11 December 13

Inference Not Required 55 • • • • • •
• • • • • 0 2 4 6 8 10 0.0 0.4 0.8 Query 104 Depth bpref • • • • • • • • • • • 0 2 4 6 8 10 0.0 0.4 0.8 Query 161 Depth bpref Query 104: “Patients diagnosed with localized prostate cancer and treated with robotic surgery” oplasm of prostate (0/7) #58 Robot, device (8/8) #14 Biomedical device (0/1) #1272 Is a (0.100042) Surgical (9/9) #8471 106236003 Is a (0.1) mary (0/8) #2911 .374905) Wednesday, 11 December 13

Inference Not Required 55 • • • • • •
• • • • • 0 2 4 6 8 10 0.0 0.4 0.8 Query 104 Depth bpref • • • • • • • • • • • 0 2 4 6 8 10 0.0 0.4 0.8 Query 161 Depth bpref Query 104: “Patients diagnosed with localized prostate cancer and treated with robotic surgery” oplasm of prostate (0/7) #58 Robot, device (8/8) #14 Biomedical device (0/1) #1272 Is a (0.100042) Surgical (9/9) #8471 106236003 Is a (0.1) mary (0/8) #2911 .374905) • Easy, unambiguous queries, often with small number of relevant documents. • Inference degrades performance. Wednesday, 11 December 13

Query Requiring Reranking 56 • • • • • •
• • • • • 0 2 4 6 8 10 0.0 0.4 0.8 Query 113 Depth bpref • • • • • • • • • • • 0 2 4 6 8 10 0.0 0.4 0.8 Query 135 Depth bpref • • • • • • • • • • • 0 2 4 6 8 10 0.0 0.4 0.8 Query 119 Depth bpref 135 Secondary malignant neoplasm of liver (50/50) #158 Is a (0.1) Malignant neoplasm of liver (1/9) #58 Is a (0.227376) 275266006 Is a (0.1) Neoplasm, metastatic (9/59) #6447 Associated morphology (0.407046) Liver structure (9/58) #3966 Finding site (0.326051) Query 135: “Cancer patients with liver metastasis treated in the hospital who underwent a procedure” Wednesday, 11 December 13

Query Requiring Reranking 56 • • • • • •
• • • • • 0 2 4 6 8 10 0.0 0.4 0.8 Query 113 Depth bpref • • • • • • • • • • • 0 2 4 6 8 10 0.0 0.4 0.8 Query 135 Depth bpref • • • • • • • • • • • 0 2 4 6 8 10 0.0 0.4 0.8 Query 119 Depth bpref 135 Secondary malignant neoplasm of liver (50/50) #158 Is a (0.1) Malignant neoplasm of liver (1/9) #58 Is a (0.227376) 275266006 Is a (0.1) Neoplasm, metastatic (9/59) #6447 Associated morphology (0.407046) Liver structure (9/58) #3966 Finding site (0.326051) Query 135: “Cancer patients with liver metastasis treated in the hospital who underwent a procedure” • Verbose queries with multiple dependent query aspects. • The key query aspects contained many related concepts. • Small amounts of inference required (depth 1-2). Wednesday, 11 December 13

Inference of New Relevant Documents 57 • • • •
• • • • • • • 0 2 4 6 8 10 0.0 0.4 0.8 Query 147 Depth bpref • • • • • • • • • • • 0 2 4 6 8 10 0.0 0.4 0.8 Query 154 Depth bpref 147 Left lower quadrant pain (101/101) #165 68505006 Finding site (0.1) Lower abdominal pain (29/95) #301 Is a (0.31847) Lower abdomen structure (34/56) #359 Finding site (0.11778) Abdominal pain (146/244) #7097 Is a (0.0918044) Left sided abdominal pain (6/19) #58 Is a (0.0361314) Is a (0.195962) 423713007 Finding site (0.0195962) Query 147: “Patients with left lower quadrant abdominal pain” Wednesday, 11 December 13

Inference of New Relevant Documents 57 • • • •
• • • • • • • 0 2 4 6 8 10 0.0 0.4 0.8 Query 147 Depth bpref • • • • • • • • • • • 0 2 4 6 8 10 0.0 0.4 0.8 Query 154 Depth bpref 147 Left lower quadrant pain (101/101) #165 68505006 Finding site (0.1) Lower abdominal pain (29/95) #301 Is a (0.31847) Lower abdomen structure (34/56) #359 Finding site (0.11778) Abdominal pain (146/244) #7097 Is a (0.0918044) Left sided abdominal pain (6/19) #58 Is a (0.0361314) Is a (0.195962) 423713007 Finding site (0.0195962) Query 147: “Patients with left lower quadrant abdominal pain” • Domain knowledge essential in interpreting the query. • Relevant documents that do not contain any query terms. • Queries with multiple semantic gap issues. • Inference always improves performance. Wednesday, 11 December 13

Queries unaffected by Inference 58 139 4190 Termination of pregnancy
(22/22) #959 386637004 Is a (0.1) 360239007 Method (0.1) 128927009 Is a (0.1) Abortion (22/22) #214 Disorder of pregnancy (0/0) #1 Is a (0.1) • • • • • • • • • • • 0 2 4 6 8 10 0.0 0.4 0.8 Query 137 Depth bpref • • • • • • • • • • • 0 2 4 6 8 10 0.0 0.4 0.8 Query 139 Depth bpref Query 139: “Patients who presented to the emergency room with an actual or suspected miscarriage” Wednesday, 11 December 13

Queries unaffected by Inference 58 139 4190 Termination of pregnancy
(22/22) #959 386637004 Is a (0.1) 360239007 Method (0.1) 128927009 Is a (0.1) Abortion (22/22) #214 Disorder of pregnancy (0/0) #1 Is a (0.1) • • • • • • • • • • • 0 2 4 6 8 10 0.0 0.4 0.8 Query 137 Depth bpref • • • • • • • • • • • 0 2 4 6 8 10 0.0 0.4 0.8 Query 139 Depth bpref Query 139: “Patients who presented to the emergency room with an actual or suspected miscarriage” • Very hard queries; semantic gap cannot be bridged. • No domain knowledge available for terms/concepts in the query. • Inference has no effect. Wednesday, 11 December 13

Bias in Evaluation 59 • GIN retrieves large number of
unjudged documents Model Unjudged documents in top 20 results P@20 Terms 210 (2.5 docs / query) 0.4244 Bag-of-concepts (lvl0) 257 (3.0 docs / query) 0.4389 Graph-model (lvl1) 468 (5.5 docs / query) 0.4086 Graph-model (lvl2) 616 (7.2 docs / query) 0.3630 Wednesday, 11 December 13

Additional Relevance Assessments • Recruited 4 UQ medical graduates •
Judged approx. 1000 documents • Pooled from: • Bag-of-concepts (lvl0) • GIN (lvl1) • GIN (lvl2) 60 Wednesday, 11 December 13

Additional Relevance Assessments • Recruited 4 UQ medical graduates •
Judged approx. 1000 documents • Pooled from: • Bag-of-concepts (lvl0) • GIN (lvl1) • GIN (lvl2) 60 Irrelevant Somewhat relevant Highly relevant Number of documents 0 100 200 300 400 500 600 Relevant docs: 29% Wednesday, 11 December 13

GIN Re-evaluation 61 Qrel set System Bpref P@10 P@20 TREC
lvl0 0.4309 0.5123 0.4389 lvl1 0.4294 0.4481 0.4086 lvl2 0.4208 0.4247 0.3630 TREC + UQ lvl0 0.4252 (-1%) 0.5415 (+6%) 0.4732 (+8%) lvl1 0.4264 (0%) 0.5037 (+12%) 0.4604 (+12%) lvl2 0.4113 (-2%) 0.4878 (+15%) 0.4220 (+16%) Wednesday, 11 December 13

62 0 20 40 60 80 0.0 0.2 0.4 0.6
0.8 1.0 Queries Precision @ 20 lvl1 − TREC lvl1 − TREC + UQ GIN Re-evaluation Wednesday, 11 December 13

Discussion 63 Wednesday, 11 December 13

Bridging the Semantic Gap 64 Semantic Gap Bag-of- concepts Graph
Weighting Graph Inference Vocabulary Mismatch Granularity Mismatch # # Conceptual Implication Inference of Similarity # Wednesday, 11 December 13

GIN: Uniﬁed Model A uniﬁed theoretical model of semantic search
as inference: • Integration of structured domain knowledge (ontologies) • Statistical, information retrieval methods Provides the necessary mechanism for inference 65 Wednesday, 11 December 13

as inference: • Integration of structured domain knowledge (ontologies) • Statistical, information retrieval methods Provides the necessary mechanism for inference 65 • Corpus Graph • Concept-based representations Wednesday, 11 December 13

as inference: • Integration of structured domain knowledge (ontologies) • Statistical, information retrieval methods Provides the necessary mechanism for inference 65 • Corpus Graph • Concept-based representations • Probabilistic Relevance Estimation • Diffusion Factor Wednesday, 11 December 13

as inference: • Integration of structured domain knowledge (ontologies) • Statistical, information retrieval methods Provides the necessary mechanism for inference 65 • Corpus Graph • Concept-based representations • Probabilistic Relevance Estimation • Diffusion Factor • GIN traversal Wednesday, 11 December 13

Understanding Inference • Inference can be risky - hard queries
• Deﬁnition vs. Retrieval Inference • The “what” vs. the “how” • Devise a domain knowledge resource speciﬁcally suited to retrieval inference? 66 [Frixione and Lieto, 2012] Wednesday, 11 December 13

Resource for Retrieval Inference • Vocabulary: how things are described,
not deﬁned • Associations: relationships and strength • Granularity: quantiﬁed parent-child • Uncertainty: pragmatics - known, suspected. 67 Wednesday, 11 December 13

Resource for Retrieval Inference 67 Requirement SNOMED CT Vocabulary Associations
# Granularity # Uncertainty Wednesday, 11 December 13

Successful Semantic Search Model • Good source of domain knowledge
• Effective mapping free-text • Adaptive Inference Mechanism • Effective evaluation method 68 Wednesday, 11 December 13

Future Work • Adaptive Depth - query prediction • Navigation
and visualisation using the GIN • Query dependence • Query reduction • Web search using the GIN 69 Wednesday, 11 December 13

Contributions: Models 1. Development & evaluation of concept- based representation
for Medical IR 2. Graph-based Concept Weighting model 3. Uniﬁed model of semantic search as inference: Graph Inference Model 70 [Ch. 4] [Ch. 5] [Ch. 6] Wednesday, 11 December 13

Contributions: Findings 4. Empirical evaluation of all three models 4.4.
Understanding when and how to apply inference 4.5. Quality of underlying representation 5. Identiﬁcation of Semantic Gap problems 6. Evaluating semantic search 71 [Ch. 6,8] [Ch. 2] Wednesday, 11 December 13

Conclusion • Signiﬁcant step forward in the integration of structured
domain knowledge and data-driven IR methods. • Allows IR systems to exploit valuable information trapped in domain knowledge resources • GIN generally deﬁned and applicable to other applications wanting to utilise structured knowledge resources for more effective semantic search 72 Wednesday, 11 December 13

Acknowledgements 73 Peter Bruza, Laurianne Sitbon Michael Lawley, Guido Zuccon
Wednesday, 11 December 13

PhD Final Seminar: "Semantic Search as Inference"

PhD Final Seminar: "Semantic Search as Inference"

More Decks by Bevan Koopman

Other Decks in Research

Featured

Transcript