Slide 1

Slide 1 text

Semantic Search as Inference Applications in Health Informatics Bevan Koopman School of Information Systems, QUT Australian e-Health Research Centre, CSIRO 1 Wednesday, 11 December 13

Slide 2

Slide 2 text

Unifying Hypothesis 2 A unified theoretical model of semantic search as inference: • Integration of structured domain knowledge (ontologies) with • Statistical, information retrieval methods Provides the necessary mechanism for inference For effective semantic search of medical data. Wednesday, 11 December 13

Slide 3

Slide 3 text

Unifying Hypothesis 2 A unified theoretical model of semantic search as inference: • Integration of structured domain knowledge (ontologies) with • Statistical, information retrieval methods Provides the necessary mechanism for inference For effective semantic search of medical data. Models that elicit the meaning behind the words found in documents and queries semantics Wednesday, 11 December 13

Slide 4

Slide 4 text

Unifying Hypothesis 2 A unified theoretical model of semantic search as inference: • Integration of structured domain knowledge (ontologies) with • Statistical, information retrieval methods Provides the necessary mechanism for inference For effective semantic search of medical data. Wednesday, 11 December 13

Slide 5

Slide 5 text

Unifying Hypothesis 2 A unified theoretical model of semantic search as inference: • Integration of structured domain knowledge (ontologies) with • Statistical, information retrieval methods Wednesday, 11 December 13

Slide 6

Slide 6 text

Unifying Hypothesis 2 A unified theoretical model of semantic search as inference: • Integration of structured domain knowledge (ontologies) with • Statistical, information retrieval methods Provides the necessary mechanism for inference For effective semantic search of medical data. Wednesday, 11 December 13

Slide 7

Slide 7 text

Unifying Hypothesis 2 A unified theoretical model of semantic search as inference: • Integration of structured domain knowledge (ontologies) with • Statistical, information retrieval methods Provides the necessary mechanism for inference For effective semantic search of medical data. Leveraging inference and semantics to infer relevant information. Wednesday, 11 December 13

Slide 8

Slide 8 text

Unifying Hypothesis 2 A unified theoretical model of semantic search as inference: • Integration of structured domain knowledge (ontologies) with • Statistical, information retrieval methods Provides the necessary mechanism for inference For effective semantic search of medical data. Wednesday, 11 December 13

Slide 9

Slide 9 text

Bridging the Semantic Gap 3 Wednesday, 11 December 13

Slide 10

Slide 10 text

Vocabulary Mismatch • Concept expressed in different ways, yet have similar meaning • hypertension = high blood pressure • Exacerbated in medical domain: • Medications and abbreviations • Associational and deductive inference required to overcome 4 Wednesday, 11 December 13

Slide 11

Slide 11 text

Granularity Mismatch • Queries using general terms, relevant documents contain specific terms • Antipsychotics vs. Diazepam • Deductive inference required 5 Wednesday, 11 December 13

Slide 12

Slide 12 text

Conceptual Implication • Terms within the document may logically infer the query terms • Dialysis Machine → Kidney Disease • Prevalent in medical domain: • Treatment → Disease • Organism → Disease • Deductive inference required 6 Wednesday, 11 December 13

Slide 13

Slide 13 text

Inferences of Similarity • Strong semantic associations • Comorbidities • Innate dependence between medical concepts • Not modelled in ontologies • Association inference required 7 Wednesday, 11 December 13

Slide 14

Slide 14 text

The Semantic Gap • Vocabulary mismatch • Granularity mismatch • Conceptual implication • Inferences of similarity • Context-specific Semantic Gap Issues 8 Wednesday, 11 December 13

Slide 15

Slide 15 text

Semantic Search and Medical IR 9 Wednesday, 11 December 13

Slide 16

Slide 16 text

10 Symbolic Representations and Ontologies Information Retrieval Wednesday, 11 December 13

Slide 17

Slide 17 text

10 Symbolic Representations and Ontologies Information Retrieval Often seen as competing paradigms, actually should be seen as complementary. “Attack cognitive problems on different levels” [Gardenfors, 1997] Wednesday, 11 December 13

Slide 18

Slide 18 text

10 Symbolic Representations and Ontologies Information Retrieval Wednesday, 11 December 13

Slide 19

Slide 19 text

10 Symbolic Representations and Ontologies Information Retrieval Semantic Search Wednesday, 11 December 13

Slide 20

Slide 20 text

10 Symbolic Representations and Ontologies Information Retrieval Semantic Search Semantic Search as Inference Wednesday, 11 December 13

Slide 21

Slide 21 text

Symbolic Representations and Ontologies 11 • Formal semantics - well defined syntactic structures and has definite semantic interpretations • Inference is typically based on first order logic and is therefore deductive. • Often realised as ontologies. Wednesday, 11 December 13

Slide 22

Slide 22 text

SNOMED CT • Large medical ontology (~300,000 concepts, ~1 million relationships) 12 Wednesday, 11 December 13

Slide 23

Slide 23 text

Ontologies for Semantic Search 13 Advantage Limitation Deductive inference and reasoning Reliance on deductive reasoning No natural measure of semantic similarity Dealing with uncertainty and inconsistency Explicit background knowledge Context Insensitive Coverage Standardisation and interoperability Dealing with natural language Wednesday, 11 December 13

Slide 24

Slide 24 text

Information Retrieval 14 Wednesday, 11 December 13

Slide 25

Slide 25 text

IR for Semantic Search 15 Advantage Limitation Inference with uncertainty No support for deductive reasoning Context specific Generally applicable No explicit background knowledge Support for natural language Dependence on terms Wednesday, 11 December 13

Slide 26

Slide 26 text

Retrieval Models • Probabilistic language models • Graph-based retrieval models • Capture dependence between terms / documents • Can utilise graph-based algorithms, e.g. PageRank • Graph represents underly most ontologies 16 P(Q|D) = Y q2Q P(q|D) P(q|D) = tfq,D + µcfq |C| µ + |D| [Ponte & Croft, 1998; Hiemstra, 1998; Zhai, 2007] [Turtle & Croft, 1991; Blanco & Lioma, 2012] Wednesday, 11 December 13

Slide 27

Slide 27 text

17 Wednesday, 11 December 13

Slide 28

Slide 28 text

17 Bag-of-concepts Model Vocabulary Mismatch Wednesday, 11 December 13

Slide 29

Slide 29 text

17 Semantics Bag-of-concepts Model Vocabulary Mismatch Wednesday, 11 December 13

Slide 30

Slide 30 text

17 Semantics Bag-of-concepts Model Vocabulary Mismatch Graph-based Concept Weighing Model Vocabulary Mismatch Inference of Similarity Wednesday, 11 December 13

Slide 31

Slide 31 text

17 Semantics Graph INference Model (GIN) Vocabulary Mismatch Granularity Mismatch Conceptual Implication Inferences of Similarity Bag-of-concepts Model Vocabulary Mismatch Graph-based Concept Weighing Model Vocabulary Mismatch Inference of Similarity Wednesday, 11 December 13

Slide 32

Slide 32 text

17 Semantics Inference Graph INference Model (GIN) Vocabulary Mismatch Granularity Mismatch Conceptual Implication Inferences of Similarity Bag-of-concepts Model Vocabulary Mismatch Graph-based Concept Weighing Model Vocabulary Mismatch Inference of Similarity Wednesday, 11 December 13

Slide 33

Slide 33 text

Bag-of-Concepts Model 18 Wednesday, 11 December 13

Slide 34

Slide 34 text

BoC Overview 19 • Represent query & docs as SNOMED CT concepts • Concept-based representation differ from terms: semantically & statistically • Tackles the issue of semantics mainly by addressing vocabulary mismatch Wednesday, 11 December 13

Slide 35

Slide 35 text

Convert Terms to Concepts 20 “metastatic breast cancer” Wednesday, 11 December 13

Slide 36

Slide 36 text

Convert Terms to Concepts 20 “metastatic breast cancer” “metastatic” “breast” “cancer” Wednesday, 11 December 13

Slide 37

Slide 37 text

Convert Terms to Concepts 20 “metastatic breast cancer” “metastatic” “breast” “cancer” 60278488 Wednesday, 11 December 13

Slide 38

Slide 38 text

Convert Terms to Concepts 20 “metastatic breast cancer” “metastatic” “breast” “cancer” 60278488 Term Encapsulation Wednesday, 11 December 13

Slide 39

Slide 39 text

Convert Terms to Concepts “human immunodeficiency virus” “T-lymphotropic virus” “HIV” “AIDS” 20 “metastatic breast cancer” “metastatic” “breast” “cancer” 60278488 Term Encapsulation Wednesday, 11 December 13

Slide 40

Slide 40 text

Convert Terms to Concepts “human immunodeficiency virus” “T-lymphotropic virus” “HIV” “AIDS” 86406008 20 “metastatic breast cancer” “metastatic” “breast” “cancer” 60278488 Term Encapsulation Wednesday, 11 December 13

Slide 41

Slide 41 text

Convert Terms to Concepts “human immunodeficiency virus” “T-lymphotropic virus” “HIV” “AIDS” 86406008 20 “metastatic breast cancer” “metastatic” “breast” “cancer” 60278488 Term Encapsulation Conflating Term-variants Wednesday, 11 December 13

Slide 42

Slide 42 text

Convert Terms to Concepts “esophageal reflux” “human immunodeficiency virus” “T-lymphotropic virus” “HIV” “AIDS” 86406008 20 “metastatic breast cancer” “metastatic” “breast” “cancer” 60278488 Term Encapsulation Conflating Term-variants Wednesday, 11 December 13

Slide 43

Slide 43 text

Convert Terms to Concepts “esophageal reflux” “human immunodeficiency virus” “T-lymphotropic virus” “HIV” “AIDS” 86406008 235595009 Gastroesophageal reflux 196600005 Acid reflux or oesophagitis 47268002 Reflux 249496004 Esophageal reflux finding 20 “metastatic breast cancer” “metastatic” “breast” “cancer” 60278488 Term Encapsulation Conflating Term-variants Wednesday, 11 December 13

Slide 44

Slide 44 text

Convert Terms to Concepts “esophageal reflux” “human immunodeficiency virus” “T-lymphotropic virus” “HIV” “AIDS” 86406008 235595009 Gastroesophageal reflux 196600005 Acid reflux or oesophagitis 47268002 Reflux 249496004 Esophageal reflux finding 20 “metastatic breast cancer” “metastatic” “breast” “cancer” 60278488 Term Encapsulation Conflating Term-variants Concept Expansion Wednesday, 11 December 13

Slide 45

Slide 45 text

BoC Implementation • Convert documents and queries to concepts using MetaMap • Concept-based representation • Two concept-based retrieval models: 21 [Aronson and Lang, 2010] P ( Qc |Dc) / X qc 2Qc log cfqc,Dc + µCfqc |Cc | µ + |Dc | ! RSV( Dc, Qc) = X qc 2Qc cfqc,Dc ( k1 + 1) cfqc,Dc + k1(1 b + b |Dc | |Davg c | ) log N nqc Concept Language Model Lemur’s TF-IDF Model (BM25 weighting) Wednesday, 11 December 13

Slide 46

Slide 46 text

Empirical Evaluation • Long history empirical evaluation • TREC paradigm • Queries • Document corpus • Relevance Assessments • TREC Medical Records Track 22 Wednesday, 11 December 13

Slide 47

Slide 47 text

Evaluation Measures 23 precision = |Drel \ Dret | |Dret | 6 10 = 0.6000 Wednesday, 11 December 13

Slide 48

Slide 48 text

Evaluation Measures 23 precision = |Drel \ Dret | |Dret | recall = |Drel \ Dret | |Drel | 6 10 = 0.6000 6 12 = 0.5000 Wednesday, 11 December 13

Slide 49

Slide 49 text

• Bpref counts only judged documents Evaluation Measures 23 precision = |Drel \ Dret | |Dret | recall = |Drel \ Dret | |Drel | 6 10 = 0.6000 6 12 = 0.5000 bpref = 1 |R| X r2R (1 |8n(n 2 ¯ R ^ n < r)| |R| ) Wednesday, 11 December 13

Slide 50

Slide 50 text

Results 24 Model Bpref Prec@10 Terms 0.3934 0.4753 Bag-of-concepts 0.4513 (+15%) 0.5395 (+14%) Wednesday, 11 December 13

Slide 51

Slide 51 text

Results 25 0.0 0.2 0.4 0.6 0.8 UMLS bpref Query ● ● ● ● ● ● Q110 Q128 Q119 Q161 Q117 Q133 Terms UMLS 0.0 0.2 0.4 0.6 0.8 SNOMED bpref Quer Q110 Q Q128 Q119 ● ● ● Terms SNOMED Bag-of-concepts Wednesday, 11 December 13

Slide 52

Slide 52 text

26 Semantics Inference Bag-of-concepts Model Vocabulary Mismatch Wednesday, 11 December 13

Slide 53

Slide 53 text

26 Semantics Inference Bag-of-concepts Model Vocabulary Mismatch Graph-based Concept Weighing Model Vocabulary Mismatch Inference of Similarity Wednesday, 11 December 13

Slide 54

Slide 54 text

Graph Weighting 27 • Graphs capture the innate dependence between medical concepts • Apply PageRank for concept weighting dental pain lower procedures right S(vright) S(vpain) S(vlower) S(vprocedurs) Wednesday, 11 December 13

Slide 55

Slide 55 text

Concept Importance Weighting • Adjust original concept weight by the “background” importance of concept in medical domain: Wednesday, 11 December 13

Slide 56

Slide 56 text

Concept Importance Weighting • Adjust original concept weight by the “background” importance of concept in medical domain: w ( c, dc) = idf ( c ) ⇤ S ( vc) ⇤ log( |Vs( c ) | ) Wednesday, 11 December 13

Slide 57

Slide 57 text

Concept Importance Weighting • Adjust original concept weight by the “background” importance of concept in medical domain: document w ( c, dc) = idf ( c ) ⇤ S ( vc) ⇤ log( |Vs( c ) | ) Wednesday, 11 December 13

Slide 58

Slide 58 text

Concept Importance Weighting • Adjust original concept weight by the “background” importance of concept in medical domain: corpus document w ( c, dc) = idf ( c ) ⇤ S ( vc) ⇤ log( |Vs( c ) | ) Wednesday, 11 December 13

Slide 59

Slide 59 text

Concept Importance Weighting • Adjust original concept weight by the “background” importance of concept in medical domain: corpus document domain w ( c, dc) = idf ( c ) ⇤ S ( vc) ⇤ log( |Vs( c ) | ) Wednesday, 11 December 13

Slide 60

Slide 60 text

Run Bpref Prec@10 terms-tfidf 0.4722 0.4882 concepts-tfidf 0.4993 0.5176 terms-graph 0.4393 0.4882 concepts-graph 0.5050 (+15%) 0.5441 (+11%) concepts-graph-snomed 0.5245 (+19%) 0.5559 (+14%) Retrieval Results Wednesday, 11 December 13

Slide 61

Slide 61 text

30 Semantics Bag-of-concepts Model Vocabulary Mismatch Graph-based Concept Weighing Model Vocabulary Mismatch Inference of Similarity Wednesday, 11 December 13

Slide 62

Slide 62 text

30 Semantics Inference Graph INference Model (GIN) Vocabulary Mismatch Granularity Mismatch Conceptual Implication Inferences of Similarity Bag-of-concepts Model Vocabulary Mismatch Graph-based Concept Weighing Model Vocabulary Mismatch Inference of Similarity Wednesday, 11 December 13

Slide 63

Slide 63 text

Graph Inference Model 31 Wednesday, 11 December 13

Slide 64

Slide 64 text

Retrieval as Inference • Models retrieval as non-classical implication [van Rijsbergen, 1986; Nie, 1989] 32 d ! q Wednesday, 11 December 13

Slide 65

Slide 65 text

Retrieval as Inference • Models retrieval as non-classical implication [van Rijsbergen, 1986; Nie, 1989] 32 d ! q P(d ! q) Wednesday, 11 December 13

Slide 66

Slide 66 text

Retrieval as Inference • Models retrieval as non-classical implication [van Rijsbergen, 1986; Nie, 1989] 32 d ! q P(d ! q) d0 ! q • Logical Uncertainty Principle Wednesday, 11 December 13

Slide 67

Slide 67 text

Retrieval as Inference • Models retrieval as non-classical implication [van Rijsbergen, 1986; Nie, 1989] 32 d ! q P(d ! q) d0 ! q P(d ! q) / (d, d0) / O i (di 1, di) • Logical Uncertainty Principle Wednesday, 11 December 13

Slide 68

Slide 68 text

d ... ... d2 d' ... ... ... d1 ... Retrieval as Inference • Models retrieval as non-classical implication [van Rijsbergen, 1986; Nie, 1989] 32 d ! q P(d ! q) d0 ! q P(d ! q) / (d, d0) / O i (di 1, di) • Logical Uncertainty Principle Wednesday, 11 December 13

Slide 69

Slide 69 text

d ... ... d2 d' ... ... ... d1 ... Retrieval as Inference • Models retrieval as non-classical implication [van Rijsbergen, 1986; Nie, 1989] 32 d ! q P(d ! q) d0 ! q P(d ! q) / (d, d0) / O i (di 1, di) • Logical Uncertainty Principle Wednesday, 11 December 13

Slide 70

Slide 70 text

Graph Inference Model Theory 33 P(d ! q) = K uq 2q m ud 2d P(ud ! uq) / K uq 2q m ud 2d P(ud |d) (ud, uq). (u2, u1) (u3, uq) (u4, uq) (u1, uq) P(u3 |d2) P(u2 |d1) P(uq |d1), P(uq |d2) P(u1 |d1) P(u4 |d3) ( u, u0 ) = 8 > < > : 1 , 0( u, u0 ) , arg maxui 2U:uRui ( u, ui) ⌦ Wednesday, 11 December 13

Slide 71

Slide 71 text

Building Blocks 34 • Information Units (IUs) • Basic term or concept • Information Relationships • Relationship between IUs • Information Graph • Queries & Documents u 2 U R ✓ U ⇥ U G = hU, Ri q = hu0, . . . , um i d = hu0, . . . , un i Wednesday, 11 December 13

Slide 72

Slide 72 text

Corpus Graph 35 u2 {d1, d2 } u1 {d1 } u0 {d1 } Wednesday, 11 December 13

Slide 73

Slide 73 text

Corpus Graph 35 u2 {d1, d2 } u1 {d1 } u0 {d1 } Attach documents Wednesday, 11 December 13

Slide 74

Slide 74 text

Corpus Graph 35 P(u2 |d1), P(u2d2) P(u1 |d1) P(u0 |d1) Assign Initial Probabilities Wednesday, 11 December 13

Slide 75

Slide 75 text

Corpus Graph 35 P(u2 |d1), P(u2d2) P(u1 |d1) P(u0 |d1) Diffusion Factor Wednesday, 11 December 13

Slide 76

Slide 76 text

Diffusion Factor • Measure of the strength of association between Information Units 36 ( u, u0 ) = 8 > < > : 1 , if u = u0 0( u, u0 ) , if uRu0 arg maxui 2U:uRui ( u, ui) ⌦ ( ui, u0 ) , otherwise 0(u, u0) = ↵ sim(u, u0) + (1 ↵) rel(u, u0) 0  ↵  1 • Captures both Semantic Similarity and Relationship Type Wednesday, 11 December 13

Slide 77

Slide 77 text

• Probability of the implication between document and query: Retrieval Function 37 P(d ! q) P(ud ! uq) • Probability of the implication between single document IU and single query IU: Wednesday, 11 December 13

Slide 78

Slide 78 text

Retrieval Function... • Strength of implication proportional to the diffusion factor: 38 P(ud ! uq) / (ud, uq) P(ud ! uq) / P(ud |d) (ud, uq) • And including the Initial Probabilities gives: Wednesday, 11 December 13

Slide 79

Slide 79 text

General Retrieval Function • Evaluating each combination of query IU and document IU: 39 P(d ! q) = K uq 2q m ud 2d P(ud ! uq) / K uq 2q m ud 2d P(ud |d) (ud, uq) Wednesday, 11 December 13

Slide 80

Slide 80 text

General Retrieval Function • Evaluating each combination of query IU and document IU: 39 P(d ! q) = K uq 2q m ud 2d P(ud ! uq) / K uq 2q m ud 2d P(ud |d) (ud, uq) RSV(d, q) = Y uq 2q Y ud 2d P(ud |d) (ud, uq). Wednesday, 11 December 13

Slide 81

Slide 81 text

General Retrieval Function • Evaluating each combination of query IU and document IU: 39 Y X . . . Y X . . . P(d ! q) / K uq 2q m ud 2d P(ud |d) (ud, uq). Wednesday, 11 December 13

Slide 82

Slide 82 text

Worked Example 40 q = huq i d1 = hu1, u2, uq i d2 = hu3, uq i d3 = hu4 i Wednesday, 11 December 13

Slide 83

Slide 83 text

Worked Example 40 u1 : d1 u2 : d1 u3 : d2 u4 : d3 uq : d1, d2, q q = huq i d1 = hu1, u2, uq i d2 = hu3, uq i d3 = hu4 i Wednesday, 11 December 13

Slide 84

Slide 84 text

Corpus Graph 41 ra rc rd ra u3 {d2 } u2 {d1 } uq {d1, d2 } u1 {d1 } u4 {d3 } q = huq i d1 = hu1, u2, uq i d2 = hu3, uq i d3 = hu4 i Wednesday, 11 December 13

Slide 85

Slide 85 text

Corpus Graph 41 (u2, u1) (u3, uq) (u4, uq) (u1, uq) P(u3 |d2) P(u2 |d1) P(uq |d1), P(uq |d2) P(u1 |d1) P(u4 |d3) ra rc rd ra u3 {d2 } u2 {d1 } uq {d1, d2 } u1 {d1 } u4 {d3 } q = huq i d1 = hu1, u2, uq i d2 = hu3, uq i d3 = hu4 i Wednesday, 11 December 13

Slide 86

Slide 86 text

42 P(u2 |dq) ⇤ (u2, u1) ⇤ (u1, uq) P(uq |d1) P(u1 |d1) ⇤ (u1, uq) P(u3 |d2) P(u2 |d1) P(uq |d1), P(uq |d2) P(u1 |d1) P(u4 |d3) Document d1 q = huq i d1 = hu1, u2, uq i d2 = hu3, uq i d3 = hu4 i Wednesday, 11 December 13

Slide 87

Slide 87 text

43 P(uq |d2) P(u3 |d2) ⇤ (u3, uq) P(u3 |d2) P(u2 |d1) P(uq |d1),P(uq |d2) P(u1 |d1) P(u4 |d3) Document d2 q = huq i d1 = hu1, u2, uq i d2 = hu3, uq i d3 = hu4 i Wednesday, 11 December 13

Slide 88

Slide 88 text

44 P(u4 |dq) ⇤ (u4, uq) P(u3 |d2) P(u2 |d1) P(uq |d1), P(uq |d2) P(u1 |d1) P(u4 |d3) Document d3 q = huq i d1 = hu1, u2, uq i d2 = hu3, uq i d3 = hu4 i Wednesday, 11 December 13

Slide 89

Slide 89 text

Graph Inference Model Implementation 45 Algorithm 1 Pseudo code for e cient depath-first-search grap Input: Idx, Q, G, k . Index, Query, Gr Output: scores { d0, . . . , dn } . D 1: 2: for uq 2 Q do 3: DFS(uq, 0) . Start traverse from quer 4: 5: function DFS (u, depth) 6: if depth  k then 7: for di 2 Idx.docs(u) do . Documents containing 8: scores[di] = scores[di] + P(u | di) ⇤ df(u, uq) . Sc 9: 10: for u 0 2 children(u) do 11: DFS(u 0 , depth + 1) . Recursively traver Wednesday, 11 December 13

Slide 90

Slide 90 text

Implementation 46 • Two parts: Indexing & Retrieval • Underlying ontology: SNOMED CT • Information Unit = SNOMED concept • Relationship = SNOMED concept relationship Wednesday, 11 December 13

Slide 91

Slide 91 text

Graph Indexing 47 Inverted File Index Concept Document Ontology (SNOMED CT) Wednesday, 11 December 13

Slide 92

Slide 92 text

Graph Indexing 47 Inverted File Index Concept Document Ontology (SNOMED CT) Serialised Graph Wednesday, 11 December 13

Slide 93

Slide 93 text

Graph Retrieval 48 (u2, u1) (u3, uq) (u4, uq) (u1, uq) P(u3 |d2) P(u2 |d1) P(uq |d1), P(uq |d2) P(u1 |d1) P(u4 |d3) ... ... (. . . , u2) Depth-first-search from query node Wednesday, 11 December 13

Slide 94

Slide 94 text

Graph Retrieval 48 (u2, u1) (u3, uq) (u4, uq) (u1, uq) P(u3 |d2) P(u2 |d1) P(uq |d1), P(uq |d2) P(u1 |d1) P(u4 |d3) ... ... (. . . , u2) Wednesday, 11 December 13

Slide 95

Slide 95 text

Graph Retrieval 48 (u2, u1) (u3, uq) (u4, uq) (u1, uq) P(u3 |d2) P(u2 |d1) P(uq |d1), P(uq |d2) P(u1 |d1) P(u4 |d3) ... ... (. . . , u2) Wednesday, 11 December 13

Slide 96

Slide 96 text

Graph Retrieval 48 (u2, u1) (u3, uq) (u4, uq) (u1, uq) P(u3 |d2) P(u2 |d1) P(uq |d1), P(uq |d2) P(u1 |d1) P(u4 |d3) ... ... (. . . , u2) Wednesday, 11 December 13

Slide 97

Slide 97 text

Graph Retrieval 48 (u2, u1) (u3, uq) (u4, uq) (u1, uq) P(u3 |d2) P(u2 |d1) P(uq |d1), P(uq |d2) P(u1 |d1) P(u4 |d3) ... ... (. . . , u2) level 0 level 1 level 2 level 3 Wednesday, 11 December 13

Slide 98

Slide 98 text

Diffusion Factor • Corpus-driven semantic similarity measure • DocCosine between document vectors • SNOMED Relationship types • e.g., ISA, Causative agent or Finding site • Manually assigned weights u1 ra u2 49 Concept A Concept B Wednesday, 11 December 13

Slide 99

Slide 99 text

Empirical Evaluation 50 0.0 0.2 0.4 0.6 0.8 1.0 lvl0 vs. lvl1 bpref Query lvl0 lvl1 0.0 0.2 0.4 0.6 0.8 1.0 lvl0 vs. lvl2 bpref Query lvl0 lvl2 Wednesday, 11 December 13

Slide 100

Slide 100 text

Retrieval Results 51 Depth (k) Bpref Prec@10 terms 0.3917 0.4975 lvl0 0.4290 0.5123 lvl1 0.4229 0.4481† lvl2 0.4138 0.4259† Wednesday, 11 December 13

Slide 101

Slide 101 text

Hard Queries 52 0.0 0.2 0.4 0.6 0.8 lvl1 vs. TREC median bpref Query TREC median lvl1 0.0 0.2 0.4 0.6 0.8 lvl2 vs. TREC median bpref Query TREC median lvl2 Wednesday, 11 December 13

Slide 102

Slide 102 text

53 Per-query Depth Wednesday, 11 December 13

Slide 103

Slide 103 text

Consistent Improvements 54 ● ● ● ● ● ● ● ● ● ● ● 0 2 4 6 8 10 0.0 0.4 0.8 Query 108 Depth bpref ● ● ● ● ● ● ● ● ● ● ● 0 2 4 6 8 10 0.0 0.4 0.8 Query 171 Depth bpref Thyrotoxicosis with or without goiter (6/6) #12 Thyrotoxicosis (6/6) #12 Is a (1) Thyroid structure (5/11) #1929 Finding site (0.145163) Finding site (0.145163) Treated with (8 4087390 Is a Hyperthyroidism (4/10) #140 Is a (0.335019) Query 171: “Patients with thyrotoxicosis treated with beta blockers” Wednesday, 11 December 13

Slide 104

Slide 104 text

Consistent Improvements 54 ● ● ● ● ● ● ● ● ● ● ● 0 2 4 6 8 10 0.0 0.4 0.8 Query 108 Depth bpref ● ● ● ● ● ● ● ● ● ● ● 0 2 4 6 8 10 0.0 0.4 0.8 Query 171 Depth bpref Thyrotoxicosis with or without goiter (6/6) #12 Thyrotoxicosis (6/6) #12 Is a (1) Thyroid structure (5/11) #1929 Finding site (0.145163) Finding site (0.145163) Treated with (8 4087390 Is a Hyperthyroidism (4/10) #140 Is a (0.335019) Query 171: “Patients with thyrotoxicosis treated with beta blockers” • Valuable related concepts from the ontology. • Inference always improves results and diffusion factor controls noise. Wednesday, 11 December 13

Slide 105

Slide 105 text

Inference Not Required 55 ● ● ● ● ● ● ● ● ● ● ● 0 2 4 6 8 10 0.0 0.4 0.8 Query 104 Depth bpref ● ● ● ● ● ● ● ● ● ● ● 0 2 4 6 8 10 0.0 0.4 0.8 Query 161 Depth bpref Query 104: “Patients diagnosed with localized prostate cancer and treated with robotic surgery” oplasm of prostate (0/7) #58 Robot, device (8/8) #14 Biomedical device (0/1) #1272 Is a (0.100042) Surgical (9/9) #8471 106236003 Is a (0.1) mary (0/8) #2911 .374905) Wednesday, 11 December 13

Slide 106

Slide 106 text

Inference Not Required 55 ● ● ● ● ● ● ● ● ● ● ● 0 2 4 6 8 10 0.0 0.4 0.8 Query 104 Depth bpref ● ● ● ● ● ● ● ● ● ● ● 0 2 4 6 8 10 0.0 0.4 0.8 Query 161 Depth bpref Query 104: “Patients diagnosed with localized prostate cancer and treated with robotic surgery” oplasm of prostate (0/7) #58 Robot, device (8/8) #14 Biomedical device (0/1) #1272 Is a (0.100042) Surgical (9/9) #8471 106236003 Is a (0.1) mary (0/8) #2911 .374905) • Easy, unambiguous queries, often with small number of relevant documents. • Inference degrades performance. Wednesday, 11 December 13

Slide 107

Slide 107 text

Query Requiring Reranking 56 ● ● ● ● ● ● ● ● ● ● ● 0 2 4 6 8 10 0.0 0.4 0.8 Query 113 Depth bpref ● ● ● ● ● ● ● ● ● ● ● 0 2 4 6 8 10 0.0 0.4 0.8 Query 135 Depth bpref ● ● ● ● ● ● ● ● ● ● ● 0 2 4 6 8 10 0.0 0.4 0.8 Query 119 Depth bpref 135 Secondary malignant neoplasm of liver (50/50) #158 Is a (0.1) Malignant neoplasm of liver (1/9) #58 Is a (0.227376) 275266006 Is a (0.1) Neoplasm, metastatic (9/59) #6447 Associated morphology (0.407046) Liver structure (9/58) #3966 Finding site (0.326051) Query 135: “Cancer patients with liver metastasis treated in the hospital who underwent a procedure” Wednesday, 11 December 13

Slide 108

Slide 108 text

Query Requiring Reranking 56 ● ● ● ● ● ● ● ● ● ● ● 0 2 4 6 8 10 0.0 0.4 0.8 Query 113 Depth bpref ● ● ● ● ● ● ● ● ● ● ● 0 2 4 6 8 10 0.0 0.4 0.8 Query 135 Depth bpref ● ● ● ● ● ● ● ● ● ● ● 0 2 4 6 8 10 0.0 0.4 0.8 Query 119 Depth bpref 135 Secondary malignant neoplasm of liver (50/50) #158 Is a (0.1) Malignant neoplasm of liver (1/9) #58 Is a (0.227376) 275266006 Is a (0.1) Neoplasm, metastatic (9/59) #6447 Associated morphology (0.407046) Liver structure (9/58) #3966 Finding site (0.326051) Query 135: “Cancer patients with liver metastasis treated in the hospital who underwent a procedure” • Verbose queries with multiple dependent query aspects. • The key query aspects contained many related concepts. • Small amounts of inference required (depth 1-2). Wednesday, 11 December 13

Slide 109

Slide 109 text

Inference of New Relevant Documents 57 ● ● ● ● ● ● ● ● ● ● ● 0 2 4 6 8 10 0.0 0.4 0.8 Query 147 Depth bpref ● ● ● ● ● ● ● ● ● ● ● 0 2 4 6 8 10 0.0 0.4 0.8 Query 154 Depth bpref 147 Left lower quadrant pain (101/101) #165 68505006 Finding site (0.1) Lower abdominal pain (29/95) #301 Is a (0.31847) Lower abdomen structure (34/56) #359 Finding site (0.11778) Abdominal pain (146/244) #7097 Is a (0.0918044) Left sided abdominal pain (6/19) #58 Is a (0.0361314) Is a (0.195962) 423713007 Finding site (0.0195962) Query 147: “Patients with left lower quadrant abdominal pain” Wednesday, 11 December 13

Slide 110

Slide 110 text

Inference of New Relevant Documents 57 ● ● ● ● ● ● ● ● ● ● ● 0 2 4 6 8 10 0.0 0.4 0.8 Query 147 Depth bpref ● ● ● ● ● ● ● ● ● ● ● 0 2 4 6 8 10 0.0 0.4 0.8 Query 154 Depth bpref 147 Left lower quadrant pain (101/101) #165 68505006 Finding site (0.1) Lower abdominal pain (29/95) #301 Is a (0.31847) Lower abdomen structure (34/56) #359 Finding site (0.11778) Abdominal pain (146/244) #7097 Is a (0.0918044) Left sided abdominal pain (6/19) #58 Is a (0.0361314) Is a (0.195962) 423713007 Finding site (0.0195962) Query 147: “Patients with left lower quadrant abdominal pain” • Domain knowledge essential in interpreting the query. • Relevant documents that do not contain any query terms. • Queries with multiple semantic gap issues. • Inference always improves performance. Wednesday, 11 December 13

Slide 111

Slide 111 text

Queries unaffected by Inference 58 139 4190 Termination of pregnancy (22/22) #959 386637004 Is a (0.1) 360239007 Method (0.1) 128927009 Is a (0.1) Abortion (22/22) #214 Disorder of pregnancy (0/0) #1 Is a (0.1) ● ● ● ● ● ● ● ● ● ● ● 0 2 4 6 8 10 0.0 0.4 0.8 Query 137 Depth bpref ● ● ● ● ● ● ● ● ● ● ● 0 2 4 6 8 10 0.0 0.4 0.8 Query 139 Depth bpref Query 139: “Patients who presented to the emergency room with an actual or suspected miscarriage” Wednesday, 11 December 13

Slide 112

Slide 112 text

Queries unaffected by Inference 58 139 4190 Termination of pregnancy (22/22) #959 386637004 Is a (0.1) 360239007 Method (0.1) 128927009 Is a (0.1) Abortion (22/22) #214 Disorder of pregnancy (0/0) #1 Is a (0.1) ● ● ● ● ● ● ● ● ● ● ● 0 2 4 6 8 10 0.0 0.4 0.8 Query 137 Depth bpref ● ● ● ● ● ● ● ● ● ● ● 0 2 4 6 8 10 0.0 0.4 0.8 Query 139 Depth bpref Query 139: “Patients who presented to the emergency room with an actual or suspected miscarriage” • Very hard queries; semantic gap cannot be bridged. • No domain knowledge available for terms/concepts in the query. • Inference has no effect. Wednesday, 11 December 13

Slide 113

Slide 113 text

Bias in Evaluation 59 • GIN retrieves large number of unjudged documents Model Unjudged documents in top 20 results P@20 Terms 210 (2.5 docs / query) 0.4244 Bag-of-concepts (lvl0) 257 (3.0 docs / query) 0.4389 Graph-model (lvl1) 468 (5.5 docs / query) 0.4086 Graph-model (lvl2) 616 (7.2 docs / query) 0.3630 Wednesday, 11 December 13

Slide 114

Slide 114 text

Additional Relevance Assessments • Recruited 4 UQ medical graduates • Judged approx. 1000 documents • Pooled from: • Bag-of-concepts (lvl0) • GIN (lvl1) • GIN (lvl2) 60 Wednesday, 11 December 13

Slide 115

Slide 115 text

Additional Relevance Assessments • Recruited 4 UQ medical graduates • Judged approx. 1000 documents • Pooled from: • Bag-of-concepts (lvl0) • GIN (lvl1) • GIN (lvl2) 60 Irrelevant Somewhat relevant Highly relevant Number of documents 0 100 200 300 400 500 600 Relevant docs: 29% Wednesday, 11 December 13

Slide 116

Slide 116 text

GIN Re-evaluation 61 Qrel set System Bpref P@10 P@20 TREC lvl0 0.4309 0.5123 0.4389 lvl1 0.4294 0.4481 0.4086 lvl2 0.4208 0.4247 0.3630 TREC + UQ lvl0 0.4252 (-1%) 0.5415 (+6%) 0.4732 (+8%) lvl1 0.4264 (0%) 0.5037 (+12%) 0.4604 (+12%) lvl2 0.4113 (-2%) 0.4878 (+15%) 0.4220 (+16%) Wednesday, 11 December 13

Slide 117

Slide 117 text

GIN Re-evaluation 61 Qrel set System Bpref P@10 P@20 TREC lvl0 0.4309 0.5123 0.4389 lvl1 0.4294 0.4481 0.4086 lvl2 0.4208 0.4247 0.3630 TREC + UQ lvl0 0.4252 (-1%) 0.5415 (+6%) 0.4732 (+8%) lvl1 0.4264 (0%) 0.5037 (+12%) 0.4604 (+12%) lvl2 0.4113 (-2%) 0.4878 (+15%) 0.4220 (+16%) Wednesday, 11 December 13

Slide 118

Slide 118 text

GIN Re-evaluation 61 Qrel set System Bpref P@10 P@20 TREC lvl0 0.4309 0.5123 0.4389 lvl1 0.4294 0.4481 0.4086 lvl2 0.4208 0.4247 0.3630 TREC + UQ lvl0 0.4252 (-1%) 0.5415 (+6%) 0.4732 (+8%) lvl1 0.4264 (0%) 0.5037 (+12%) 0.4604 (+12%) lvl2 0.4113 (-2%) 0.4878 (+15%) 0.4220 (+16%) Wednesday, 11 December 13

Slide 119

Slide 119 text

GIN Re-evaluation 61 Qrel set System Bpref P@10 P@20 TREC lvl0 0.4309 0.5123 0.4389 lvl1 0.4294 0.4481 0.4086 lvl2 0.4208 0.4247 0.3630 TREC + UQ lvl0 0.4252 (-1%) 0.5415 (+6%) 0.4732 (+8%) lvl1 0.4264 (0%) 0.5037 (+12%) 0.4604 (+12%) lvl2 0.4113 (-2%) 0.4878 (+15%) 0.4220 (+16%) Wednesday, 11 December 13

Slide 120

Slide 120 text

62 0 20 40 60 80 0.0 0.2 0.4 0.6 0.8 1.0 Queries Precision @ 20 lvl1 − TREC lvl1 − TREC + UQ GIN Re-evaluation Wednesday, 11 December 13

Slide 121

Slide 121 text

Discussion 63 Wednesday, 11 December 13

Slide 122

Slide 122 text

Bridging the Semantic Gap 64 Semantic Gap Bag-of- concepts Graph Weighting Graph Inference Vocabulary Mismatch Granularity Mismatch # # Conceptual Implication Inference of Similarity # Wednesday, 11 December 13

Slide 123

Slide 123 text

GIN: Unified Model A unified theoretical model of semantic search as inference: • Integration of structured domain knowledge (ontologies) • Statistical, information retrieval methods Provides the necessary mechanism for inference 65 Wednesday, 11 December 13

Slide 124

Slide 124 text

GIN: Unified Model A unified theoretical model of semantic search as inference: • Integration of structured domain knowledge (ontologies) • Statistical, information retrieval methods Provides the necessary mechanism for inference 65 • Corpus Graph • Concept-based representations Wednesday, 11 December 13

Slide 125

Slide 125 text

GIN: Unified Model A unified theoretical model of semantic search as inference: • Integration of structured domain knowledge (ontologies) • Statistical, information retrieval methods Provides the necessary mechanism for inference 65 • Corpus Graph • Concept-based representations • Probabilistic Relevance Estimation • Diffusion Factor Wednesday, 11 December 13

Slide 126

Slide 126 text

GIN: Unified Model A unified theoretical model of semantic search as inference: • Integration of structured domain knowledge (ontologies) • Statistical, information retrieval methods Provides the necessary mechanism for inference 65 • Corpus Graph • Concept-based representations • Probabilistic Relevance Estimation • Diffusion Factor • GIN traversal Wednesday, 11 December 13

Slide 127

Slide 127 text

Understanding Inference • Inference can be risky - hard queries • Definition vs. Retrieval Inference • The “what” vs. the “how” • Devise a domain knowledge resource specifically suited to retrieval inference? 66 [Frixione and Lieto, 2012] Wednesday, 11 December 13

Slide 128

Slide 128 text

Resource for Retrieval Inference • Vocabulary: how things are described, not defined • Associations: relationships and strength • Granularity: quantified parent-child • Uncertainty: pragmatics - known, suspected. 67 Wednesday, 11 December 13

Slide 129

Slide 129 text

Resource for Retrieval Inference 67 Requirement SNOMED CT Vocabulary Associations # Granularity # Uncertainty Wednesday, 11 December 13

Slide 130

Slide 130 text

Successful Semantic Search Model • Good source of domain knowledge • Effective mapping free-text • Adaptive Inference Mechanism • Effective evaluation method 68 Wednesday, 11 December 13

Slide 131

Slide 131 text

Future Work • Adaptive Depth - query prediction • Navigation and visualisation using the GIN • Query dependence • Query reduction • Web search using the GIN 69 Wednesday, 11 December 13

Slide 132

Slide 132 text

Contributions: Models 1. Development & evaluation of concept- based representation for Medical IR 2. Graph-based Concept Weighting model 3. Unified model of semantic search as inference: Graph Inference Model 70 [Ch. 4] [Ch. 5] [Ch. 6] Wednesday, 11 December 13

Slide 133

Slide 133 text

Contributions: Findings 4. Empirical evaluation of all three models 4.4. Understanding when and how to apply inference 4.5. Quality of underlying representation 5. Identification of Semantic Gap problems 6. Evaluating semantic search 71 [Ch. 6,8] [Ch. 2] Wednesday, 11 December 13

Slide 134

Slide 134 text

Conclusion • Significant step forward in the integration of structured domain knowledge and data-driven IR methods. • Allows IR systems to exploit valuable information trapped in domain knowledge resources • GIN generally defined and applicable to other applications wanting to utilise structured knowledge resources for more effective semantic search 72 Wednesday, 11 December 13

Slide 135

Slide 135 text

Acknowledgements 73 Peter Bruza, Laurianne Sitbon Michael Lawley, Guido Zuccon Wednesday, 11 December 13