Research Paper Introduction in IR Reading 2022 ...

Takuya Asano
November 12, 2022

Research Paper Introduction in IR Reading 2022 Fall

IR Reading 2022 Fall: https://sigir.jp/post/2022-11-12-irreading_2022fall/

Shitao Xiao, Zheng Liu, Weihao Han, Jianjin Zhang, Defu Lian, Yeyun Gong, Qi Chen, Fan Yang, Hao Sun, Yingxia Shao, and Xing Xie. 2022. Distill-VQ: Learning Retrieval Oriented Vector Quantization By Distilling Knowledge from Dense Embeddings. In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '22). Association for Computing Machinery, New York, NY, USA, 1513–1523. https://doi.org/10.1145/3477495.3531799

  1. Takuya Asano (takuya-a) Distill-VQ: Learning Retrieval Oriented Vector Quantization By

    Distilling Knowledge from Dense Embeddings IR Reading 2022 Fall
  2. Vector Search and ANN Background • ຒΊࠐΈΛ࢖ͬͨϕΫτϧݕࡧ͕޿͕͖͍ͬͯͯΔ • ݕࡧΤϯδϯɺਪનγεςϜͳͲ •

    ΫΤϦͱจॻͷຒΊࠐΈͷྨࣅ౓ʹΑͬͯจॻΛબ୒ • େن໛ͳϕΫτϧݕࡧʹ͓͍ͯ͸ɺ ۙࣅ࠷ۙ๣୳ࡧʢANNʣ͕Ωʔύʔπ • ࣮ੈքʹ͓͍ͯɺઢܗ୳ࡧ͸ݱ࣮తͰ͸ͳ͍ • ଎౓ɾϝϞϦ࢖༻ྔɾਫ਼౓ͷτϨʔυΦϑΛ࣮ݱ
  3. Vector Quantization (VQ) Background • ANN ͷͨΊͷσʔλߏ଄ͱΞϧΰϦζϜ • ϕΫτϧू߹Λ K

    ݸͷηϯτϩΠυͰ୅ද͢Δ • ࣄલʹɺϕΫτϧू߹͔ΒηϯτϩΠυΛܭࢉ • ϕΫτϧΛΤϯίʔυ͢Δͱ͖ʹ͸ɺ࠷΋͍ۙηϯτϩΠυΛٻΊɺͦͷIDͷΈΛه࿥͢Δ • ΋ͱͷϕΫτϧΛID͚ͩͰූ߸ԽͰ͖ΔͷͰίϯύΫτʹ • KΛେ͖͘͢Δͱۙࣅਫ਼౓͸্͕͍͕ͬͯ͘ɺͦͷͿΜ஗͘ͳΓɺϝϞϦ࢖༻ྔ΋େ͖͍
  4. Product Quantization (PQ) Background • ϕΫτϧͷ࣍ݩΛ M ݸʹ෼ׂͯ͠ɺͦΕͧΕ Vector Quantization

    ͢Δ • ͦΕͧΕͷ୅දϕΫτϧͷू߹ΛίʔυϒοΫͱݺͿ • ೖྗϕΫτϧΛMݸʹ෼ׂ͠ɺͦΕͧΕίʔυϒοΫͷத͔Β࠷΋͍ۙ୅දϕ ΫτϧΛ୳͠ɺͦͷIDΛه࿥͢Δ • MݸͷIDͷΈͰೖྗϕΫτϧΛූ߸Խ • ϝϞϦޮ཰΋ۙࣅਫ਼౓΋Α͍
  5. Inverted File (IVF) Background • సஔΠϯσοΫεΛิॿσʔλߏ଄ͱͯ͠ར༻ • ͍ۙϕΫτϧΛసஔϦετʹ·ͱΊΔ • ૸ࠪ͢Δཁૉ͕গͳ͍ͷͰߴ଎

  6. Distill-VQ Summary • IVF ͱ PQ Λซ༻ͨ͠ϕΫτϧྔࢠԽʹΑΓ ANN Λߦ͏ •

    ରরֶश (contrastive learning) ͰҎԼΛ࠷దԽ͢Δ • IVF ͷηϯτϩΠυ • PQ ͷίʔυϒοΫ • ΫΤϦຒΊࠐΈͷΤϯίʔμʔ • Α͘܇࿅͞ΕͨີͳຒΊࠐΈΛڭࢣɺ্هͷίϯϙʔωϯτΛੜెͱֶͯ͠श • ෳ਺ͷσʔληοτɺෳ਺ͷλεΫͰ SOTA
  7. Distill-VQ: Workflow Method • ࣄલ४උ • ͢΂ͯͷจॻͷຒΊࠐΈΛܭࢉʢDistill-VQ Ͱ͸ݻఆʣ • จॻຒΊࠐΈ͔ΒɺIVF

    ͱ PQ ΛॳظԽʢηϯτϩΠυͷܭࢉʣ • ڭࢣείΞͷܭࢉͷͨΊʹɺΑ͘܇࿅͞ΕͨΫΤϦΤϯίʔμʔΛ४උ • ͜ͷΫΤϦΤϯίʔμʔΛ࢖ͬͯΫΤϦຒΊࠐΈΛܭࢉ
  8. Distill-VQ: Workflow Method 1. ΫΤϦຒΊࠐΈΛܭࢉ Shitao Xiao, Zheng Liu, Weihao

  9. Distill-VQ: Workflow Method 2. ࣄલ४උͨ͠จॻຒΊࠐΈ
 ͔ΒαϯϓϦϯά Shitao Xiao, Zheng Liu,

  10. Distill-VQ: Workflow Method 3. IVF Λ࢖ͬͨੜెείΞΛܭࢉ Shitao Xiao, Zheng Liu,

  11. Distill-VQ: Workflow Method 4. PQ Λ࢖ͬͨੜెείΞΛܭࢉ Shitao Xiao, Zheng Liu,

  12. Distill-VQ: Workflow Method 5. ڭࢣείΞΛܭࢉ Shitao Xiao, Zheng Liu, Weihao

  13. Distill-VQ: Workflow Method 6. ੜెείΞͱڭࢣείΞͷ
 ྨࣅ౓Λܭࢉ͠ɺϞσϧΛߋ৽ Shitao Xiao, Zheng Liu,

  14. Distill-VQ: Detailed Algorithm Method • ֶशΞϧΰϦζϜ • L4: จॻίϨΫγϣϯ D

    ͔ΒީิจॻΛαϯϓϦϯά • L5: ڭࢣͷείΞΛܭࢉ
 ɹɹ • L6: IVF ͱ PQ Λ࢖ͬͯੜెͷείΞΛܭࢉ
 ɹɹ • L7: IVFɺPQɺΫΤϦΤϯίʔμʔΛֶश • f: similarity function
  15. Experiment Settings Experiments • σʔληοτ • MS MARCO Passage retrieval

    • σʔληοτ • MS MARCO Passage retrieval • Bing Search ͷΫΤϦ • Natural Questions (NQ) • Google Search ͷΫΤϦ • ϕʔεϥΠϯ • طଘͷϕΫτϧྔࢠԽख๏ (IVFPQ, IVFOPQ, ScaNN) • ࠷ۙͷಉֶ࣌शख๏ (Poeem, JPQ, RepCONC)
  16. Experiment Settings Experiments • Distill-VQ ͷڭࢣϞσϧͱͯ͠ɺΑ͘܇࿅͞Εͨ2छྨͷΤϯίʔμʔΛࢼͨ͠ • AR2-G • CoCondenser

    • ͜ΕΒͷϞσϧ͸ MS MARCO ͱ NQ Ͱ࠷΋ accurate • จॻຒΊࠐΈ • ϑΣΞʹൺֱ͢ΔͨΊʹ͢΂ͯͷख๏Ͱಉ͡΋ͷΛ࢖༻
  17. Overall Performance Experiments • ݕࡧ඼࣭΁ͷΠϯύΫτΛطଘख๏ͱൺֱ • Ұ؏ͯ͠༗ҙʹߴ͍ੑೳˍSOTA • 2छྨͷΤϯίʔμʔ

    CoCondenser • 2छྨͷσʔληοτ
  18. Explorations of Knowledge Distillation Experiments • Distill-VQ Ͱ͸ɺsimilarity function ΍ɺจ

  19. Efficiency and Retrieval Quality Experiments • ଎౓ͱ࠶ݱ཰ͷτϨʔυΦϑΛɺFAISS ͷΦϦδφϧͷ IVFOPQ ͱൺֱ

  20. Personal Impressions • ϥϕϧ͖ͭσʔλ͕ͳͯ͘΋ɺANN ͷੑೳΛ޲্Ͱ͖Δख๏ͱͯ͠ɺେมڵຯਂ͔ͬͨ • σʔλߏ଄ɾΞϧΰϦζϜ͸ม͑ͣʹద༻Ͱ͖ΔͷͰɺΫΤϦॲཧ଎౓΁ͷѱӨڹ΋ͳ͍ • MS MARCO

    Passage ͳͲͰ༗ҙͳੑೳ޲্͕֬ೝ͞ΕͨͷͰ༗๬ • ࣮ΞϓϦέʔγϣϯʹద༻͢Δ͜ͱΛߟ͑ΔͱɺIVFɾPQɾΫΤϦΤϯίʔμʔͷ࠶ֶशͲ͏͢ Δ͔͕ؾʹͳΔ • ANN ΠϯσοΫεશମΛ࡞Γͳ͓͠ʹͳΔͱࢥ͏ͷͰɺֶशʹ͔͔Δ࣌ؒ΋ؾʹͳΔ • ʢߋ৽͕ͳ͍ɺ੩తͳΠϯσοΫεͰ͋Ε͹໰୊ͳ͍͕ɺͦͷΑ͏ͳΞϓϦέʔγϣϯ͸كʣ