Trec Fedweb14 Talk - Speaker Deck

Slide 1

Slide 1 text

Query Expansion For Result Merging Shriphani Palakodety and Jamie Callan Language Technologies InsAtute Carnegie Mellon University

Slide 2

Slide 2 text

Result Merging Approach Merge results returned by diﬀerent resources for a query •  Use the Indri search engine to index all returned snippets –  Krovetz stemming, discard stopwords (Lemur stopword list) •  Use several diﬀerent approaches to form a query –  Next slide •  Rank snippets using Indri’s language modeling algorithm –  Default parameters

Slide 3

Slide 3 text

Research QuesAons Do query transforma7ons improve result-‐merging? Transforma7ons inves7gated •  iden%ty: The query itself •  sdm: The sequenAal dependency model (Metzler & CroS) •  expansion with word vectors: Add addiAonal terms to the query

Slide 4

Slide 4 text

SequenAal Dependency Models Original query: the italian job Indri SDM query: #weight( w1 #combine( the italian job) w2 #combine( #1(the italian) #1(italian job) ) w3 #combine( #uw8(the italian) #uw8(italian job) ) •  #combine: probabilisAc AND •  #1: Bigram •  #uw8: Unordered window of size 8

Slide 5

Slide 5 text

Word2Vec Each word represented by an n-‐dimensional con7nuous vector •  n = 300 for our system •  Vectors preserve semanAc and syntacAc similariAes –  vectors for cat and dog are similar •  Used vectors trained on a Google news corpus provided by Mikolov at al Add terms to query based on how close their vectors are to original query terms

Slide 6

Slide 6 text

Expansion Strategies Add terms to query based on how close their vectors are to original query terms •  Obtain the average word vector v for query q –  Average the word vectors vi for each query term qi ϵ q •  Expansion 1: Select k expansion terms –  The k terms that are closest to v •  Expansion 2: Select k expansion terms for each query term –  The k terms that are closest to qi –  k= 3 (set by a parameter sweep over FW 13 data) •  Distances measured using Euclidean distance with a threshold of 0.7

Slide 7

Slide 7 text

Results Method nDCG @20 nDCG @100 nDCG @20 dups nDCG @20 loc nDCG @100 loc nDCG-‐IA @20 iden7fy 0.277 0.316 0.312 0.379 0.623 0.098 sdm 0.276 0.315 0.315 0.379 0.623 0.096 expansion-‐1 0.285 0.318 0.322 0.389 0.628 0.101 expansion-‐2 0.286 0.319 0.320 0.395 0.632 0.102 Track Best 0.323 0.321 0.361 0.446 0.629 0.107

Slide 8

Slide 8 text

Conclusions All the approaches produce mostly similar performance •  SDM was slightly beger than a plain query –  Bigrams and windows don’t help much in snippets •  Expansion was slightly beger than not expanding –  The two expansion methods performed about equally