Pro Yearly is on sale from $80 to $50! »

Trec Fedweb14 Talk

Trec Fedweb14 Talk

Bdf51f76d1d72372c3113366d92ef52e?s=128

Shriphani Palakodety

November 30, 2014
Tweet

Transcript

  1. Query  Expansion     For  Result  Merging   Shriphani  Palakodety

     and  Jamie  Callan   Language  Technologies  InsAtute   Carnegie  Mellon  University  
  2. Result  Merging  Approach   Merge  results  returned  by  different  resources

     for  a  query   •  Use  the  Indri  search  engine  to  index  all  returned  snippets   –  Krovetz  stemming,  discard  stopwords  (Lemur  stopword  list)   •  Use  several  different  approaches  to  form  a  query   –  Next  slide   •  Rank  snippets  using  Indri’s  language  modeling  algorithm   –  Default  parameters  
  3. Research  QuesAons   Do  query  transforma7ons  improve  result-­‐merging?    

    Transforma7ons  inves7gated   •  iden%ty:    The  query  itself   •  sdm:    The  sequenAal  dependency  model  (Metzler  &  CroS)   •  expansion  with  word  vectors:  Add  addiAonal  terms  to  the   query  
  4. SequenAal  Dependency  Models   Original  query:    the  italian  job

    Indri  SDM  query:   #weight(   w1  #combine(  the  italian  job) w2  #combine(  #1(the  italian) #1(italian  job)  ) w3  #combine(  #uw8(the  italian)                               #uw8(italian  job)  ) •  #combine:    probabilisAc  AND   •  #1:    Bigram   •  #uw8:    Unordered  window  of  size  8  
  5. Word2Vec   Each  word  represented  by  an  n-­‐dimensional  con7nuous  vector

      •  n  =  300  for  our  system   •  Vectors  preserve  semanAc  and  syntacAc  similariAes   –  vectors  for  cat  and  dog  are  similar   •  Used  vectors  trained  on  a  Google  news  corpus  provided  by   Mikolov  at  al     Add  terms  to  query  based  on  how  close  their  vectors  are  to   original  query  terms  
  6. Expansion  Strategies   Add  terms  to  query  based  on  how

     close  their  vectors  are  to   original  query  terms   •  Obtain  the  average  word  vector  v  for  query  q   –  Average  the  word  vectors  vi  for  each  query  term  qi  ϵ q   •  Expansion  1:    Select  k  expansion  terms   –  The  k  terms  that  are  closest  to  v   •  Expansion  2:    Select  k  expansion  terms  for  each  query  term   –  The  k  terms  that  are  closest  to  qi   –  k=  3  (set  by  a  parameter  sweep  over  FW  13  data)   •  Distances  measured  using  Euclidean  distance  with  a  threshold   of  0.7  
  7. Results       Method     nDCG   @20

        nDCG   @100   nDCG   @20   dups   nDCG   @20   loc   nDCG   @100   loc     nDCG-­‐IA   @20   iden7fy   0.277   0.316   0.312   0.379   0.623   0.098   sdm   0.276   0.315   0.315   0.379   0.623   0.096   expansion-­‐1   0.285   0.318   0.322   0.389   0.628   0.101   expansion-­‐2   0.286   0.319   0.320   0.395   0.632   0.102   Track  Best   0.323   0.321   0.361   0.446   0.629   0.107  
  8. Conclusions   All  the  approaches  produce  mostly  similar  performance  

    •  SDM  was  slightly  beger  than  a  plain  query   –  Bigrams  and  windows  don’t  help  much  in  snippets   •  Expansion  was  slightly  beger  than  not  expanding   –  The  two  expansion  methods  performed  about  equally