Trec Fedweb14 Talk

Query Expansion For Result Merging Shriphani Palakodety
and Jamie Callan Language Technologies InsAtute Carnegie Mellon University

Result Merging Approach Merge results returned by diﬀerent resources
for a query •  Use the Indri search engine to index all returned snippets –  Krovetz stemming, discard stopwords (Lemur stopword list) •  Use several diﬀerent approaches to form a query –  Next slide •  Rank snippets using Indri’s language modeling algorithm –  Default parameters

Research QuesAons Do query transforma7ons improve result-‐merging?
Transforma7ons inves7gated •  iden%ty: The query itself •  sdm: The sequenAal dependency model (Metzler & CroS) •  expansion with word vectors: Add addiAonal terms to the query

SequenAal Dependency Models Original query: the italian job
Indri SDM query: #weight( w1 #combine( the italian job) w2 #combine( #1(the italian) #1(italian job) ) w3 #combine( #uw8(the italian) #uw8(italian job) ) •  #combine: probabilisAc AND •  #1: Bigram •  #uw8: Unordered window of size 8

Word2Vec Each word represented by an n-‐dimensional con7nuous vector
•  n = 300 for our system •  Vectors preserve semanAc and syntacAc similariAes –  vectors for cat and dog are similar •  Used vectors trained on a Google news corpus provided by Mikolov at al Add terms to query based on how close their vectors are to original query terms

Expansion Strategies Add terms to query based on how
close their vectors are to original query terms •  Obtain the average word vector v for query q –  Average the word vectors vi for each query term qi ϵ q •  Expansion 1: Select k expansion terms –  The k terms that are closest to v •  Expansion 2: Select k expansion terms for each query term –  The k terms that are closest to qi –  k= 3 (set by a parameter sweep over FW 13 data) •  Distances measured using Euclidean distance with a threshold of 0.7

Results Method nDCG @20
nDCG @100 nDCG @20 dups nDCG @20 loc nDCG @100 loc nDCG-‐IA @20 iden7fy 0.277 0.316 0.312 0.379 0.623 0.098 sdm 0.276 0.315 0.315 0.379 0.623 0.096 expansion-‐1 0.285 0.318 0.322 0.389 0.628 0.101 expansion-‐2 0.286 0.319 0.320 0.395 0.632 0.102 Track Best 0.323 0.321 0.361 0.446 0.629 0.107

Conclusions All the approaches produce mostly similar performance
•  SDM was slightly beger than a plain query –  Bigrams and windows don’t help much in snippets •  Expansion was slightly beger than not expanding –  The two expansion methods performed about equally

Trec Fedweb14 Talk

Trec Fedweb14 Talk

Shriphani Palakodety

More Decks by Shriphani Palakodety

Other Decks in Research

Featured

Transcript

Query Expansion For Result Merging Shriphani Palakodety

Result Merging Approach Merge results returned by diﬀerent resources

Research QuesAons Do query transforma7ons improve result-‐merging?

SequenAal Dependency Models Original query: the italian job

Word2Vec Each word represented by an n-‐dimensional con7nuous vector

Expansion Strategies Add terms to query based on how

Results Method nDCG @20

Conclusions All the approaches produce mostly similar performance