Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Research Paper Introduction in IR Reading 2022 Fall

Takuya Asano
November 12, 2022

Research Paper Introduction in IR Reading 2022 Fall

IR Reading 2022 Fall: https://sigir.jp/post/2022-11-12-irreading_2022fall/

Shitao Xiao, Zheng Liu, Weihao Han, Jianjin Zhang, Defu Lian, Yeyun Gong, Qi Chen, Fan Yang, Hao Sun, Yingxia Shao, and Xing Xie. 2022. Distill-VQ: Learning Retrieval Oriented Vector Quantization By Distilling Knowledge from Dense Embeddings. In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '22). Association for Computing Machinery, New York, NY, USA, 1513–1523. https://doi.org/10.1145/3477495.3531799

Takuya Asano

November 12, 2022
Tweet

More Decks by Takuya Asano

Other Decks in Research

Transcript

  1. Takuya Asano (takuya-a) Distill-VQ: Learning Retrieval Oriented Vector Quantization By

    Distilling Knowledge from Dense Embeddings IR Reading 2022 Fall
  2. Vector Search and ANN Background • ຒΊࠐΈΛ࢖ͬͨϕΫτϧݕࡧ͕޿͕͖͍ͬͯͯΔ • ݕࡧΤϯδϯɺਪનγεςϜͳͲ •

    ΫΤϦͱจॻͷຒΊࠐΈͷྨࣅ౓ʹΑͬͯจॻΛબ୒ • େن໛ͳϕΫτϧݕࡧʹ͓͍ͯ͸ɺ ۙࣅ࠷ۙ๣୳ࡧʢANNʣ͕Ωʔύʔπ • ࣮ੈքʹ͓͍ͯɺઢܗ୳ࡧ͸ݱ࣮తͰ͸ͳ͍ • ଎౓ɾϝϞϦ࢖༻ྔɾਫ਼౓ͷτϨʔυΦϑΛ࣮ݱ
  3. Vector Quantization (VQ) Background • ANN ͷͨΊͷσʔλߏ଄ͱΞϧΰϦζϜ • ϕΫτϧू߹Λ K

    ݸͷηϯτϩΠυͰ୅ද͢Δ • ࣄલʹɺϕΫτϧू߹͔ΒηϯτϩΠυΛܭࢉ • ϕΫτϧΛΤϯίʔυ͢Δͱ͖ʹ͸ɺ࠷΋͍ۙηϯτϩΠυΛٻΊɺͦͷIDͷΈΛه࿥͢Δ • ΋ͱͷϕΫτϧΛID͚ͩͰූ߸ԽͰ͖ΔͷͰίϯύΫτʹ • KΛେ͖͘͢Δͱۙࣅਫ਼౓͸্͕͍͕ͬͯ͘ɺͦͷͿΜ஗͘ͳΓɺϝϞϦ࢖༻ྔ΋େ͖͍
  4. Product Quantization (PQ) Background • ϕΫτϧͷ࣍ݩΛ M ݸʹ෼ׂͯ͠ɺͦΕͧΕ Vector Quantization

    ͢Δ • ͦΕͧΕͷ୅දϕΫτϧͷू߹ΛίʔυϒοΫͱݺͿ • ೖྗϕΫτϧΛMݸʹ෼ׂ͠ɺͦΕͧΕίʔυϒοΫͷத͔Β࠷΋͍ۙ୅දϕ ΫτϧΛ୳͠ɺͦͷIDΛه࿥͢Δ • MݸͷIDͷΈͰೖྗϕΫτϧΛූ߸Խ • ϝϞϦޮ཰΋ۙࣅਫ਼౓΋Α͍
  5. Inverted File (IVF) Background • సஔΠϯσοΫεΛิॿσʔλߏ଄ͱͯ͠ར༻ • ͍ۙϕΫτϧΛసஔϦετʹ·ͱΊΔ • ૸ࠪ͢Δཁૉ͕গͳ͍ͷͰߴ଎

    • ࠷ॳʹૈ͍ྔࢠԽΛߦ͍ɺసஔϦετΛऔಘ • సஔϦετΛ૸ࠪͯ͠࠷ۙ๣ͷϕΫτϧΛܭࢉ H. Jégou, M. Douze and C. Schmid, "Product Quantization for Nearest Neighbor Search," in IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 33, no. 1, pp. 117-128, Jan. 2011, doi: 10.1109/TPAMI.2010.57.
  6. Distill-VQ Summary • IVF ͱ PQ Λซ༻ͨ͠ϕΫτϧྔࢠԽʹΑΓ ANN Λߦ͏ •

    ରরֶश (contrastive learning) ͰҎԼΛ࠷దԽ͢Δ • IVF ͷηϯτϩΠυ • PQ ͷίʔυϒοΫ • ΫΤϦຒΊࠐΈͷΤϯίʔμʔ • Α͘܇࿅͞ΕͨີͳຒΊࠐΈΛڭࢣɺ্هͷίϯϙʔωϯτΛੜెͱֶͯ͠श • ෳ਺ͷσʔληοτɺෳ਺ͷλεΫͰ SOTA
  7. Distill-VQ: Workflow Method • ࣄલ४උ • ͢΂ͯͷจॻͷຒΊࠐΈΛܭࢉʢDistill-VQ Ͱ͸ݻఆʣ • จॻຒΊࠐΈ͔ΒɺIVF

    ͱ PQ ΛॳظԽʢηϯτϩΠυͷܭࢉʣ • ڭࢣείΞͷܭࢉͷͨΊʹɺΑ͘܇࿅͞ΕͨΫΤϦΤϯίʔμʔΛ४උ • ͜ͷΫΤϦΤϯίʔμʔΛ࢖ͬͯΫΤϦຒΊࠐΈΛܭࢉ
  8. Distill-VQ: Workflow Method 1. ΫΤϦຒΊࠐΈΛܭࢉ Shitao Xiao, Zheng Liu, Weihao

    Han, Jianjin Zhang, Defu Lian, Yeyun Gong, Qi Chen, Fan Yang, Hao Sun, Yingxia Shao, and Xing Xie. 2022. Distill-VQ: Learning Retrieval Oriented Vector Quantization By Distilling Knowledge from Dense Embeddings. In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '22). Association for Computing Machinery, New York, NY, USA, 1513–1523. https://doi.org/10.1145/3477495.3531799
  9. Distill-VQ: Workflow Method 2. ࣄલ४උͨ͠จॻຒΊࠐΈ
 ͔ΒαϯϓϦϯά Shitao Xiao, Zheng Liu,

    Weihao Han, Jianjin Zhang, Defu Lian, Yeyun Gong, Qi Chen, Fan Yang, Hao Sun, Yingxia Shao, and Xing Xie. 2022. Distill-VQ: Learning Retrieval Oriented Vector Quantization By Distilling Knowledge from Dense Embeddings. In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '22). Association for Computing Machinery, New York, NY, USA, 1513–1523. https://doi.org/10.1145/3477495.3531799
  10. Distill-VQ: Workflow Method 3. IVF Λ࢖ͬͨੜెείΞΛܭࢉ Shitao Xiao, Zheng Liu,

    Weihao Han, Jianjin Zhang, Defu Lian, Yeyun Gong, Qi Chen, Fan Yang, Hao Sun, Yingxia Shao, and Xing Xie. 2022. Distill-VQ: Learning Retrieval Oriented Vector Quantization By Distilling Knowledge from Dense Embeddings. In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '22). Association for Computing Machinery, New York, NY, USA, 1513–1523. https://doi.org/10.1145/3477495.3531799
  11. Distill-VQ: Workflow Method 4. PQ Λ࢖ͬͨੜెείΞΛܭࢉ Shitao Xiao, Zheng Liu,

    Weihao Han, Jianjin Zhang, Defu Lian, Yeyun Gong, Qi Chen, Fan Yang, Hao Sun, Yingxia Shao, and Xing Xie. 2022. Distill-VQ: Learning Retrieval Oriented Vector Quantization By Distilling Knowledge from Dense Embeddings. In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '22). Association for Computing Machinery, New York, NY, USA, 1513–1523. https://doi.org/10.1145/3477495.3531799
  12. Distill-VQ: Workflow Method 5. ڭࢣείΞΛܭࢉ Shitao Xiao, Zheng Liu, Weihao

    Han, Jianjin Zhang, Defu Lian, Yeyun Gong, Qi Chen, Fan Yang, Hao Sun, Yingxia Shao, and Xing Xie. 2022. Distill-VQ: Learning Retrieval Oriented Vector Quantization By Distilling Knowledge from Dense Embeddings. In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '22). Association for Computing Machinery, New York, NY, USA, 1513–1523. https://doi.org/10.1145/3477495.3531799
  13. Distill-VQ: Workflow Method 6. ੜెείΞͱڭࢣείΞͷ
 ྨࣅ౓Λܭࢉ͠ɺϞσϧΛߋ৽ Shitao Xiao, Zheng Liu,

    Weihao Han, Jianjin Zhang, Defu Lian, Yeyun Gong, Qi Chen, Fan Yang, Hao Sun, Yingxia Shao, and Xing Xie. 2022. Distill-VQ: Learning Retrieval Oriented Vector Quantization By Distilling Knowledge from Dense Embeddings. In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '22). Association for Computing Machinery, New York, NY, USA, 1513–1523. https://doi.org/10.1145/3477495.3531799
  14. Distill-VQ: Detailed Algorithm Method • ֶशΞϧΰϦζϜ • L4: จॻίϨΫγϣϯ D

    ͔ΒީิจॻΛαϯϓϦϯά • L5: ڭࢣͷείΞΛܭࢉ
 ɹɹ • L6: IVF ͱ PQ Λ࢖ͬͯੜెͷείΞΛܭࢉ
 ɹɹ • L7: IVFɺPQɺΫΤϦΤϯίʔμʔΛֶश • f: similarity function
 
 Shitao Xiao, Zheng Liu, Weihao Han, Jianjin Zhang, Defu Lian, Yeyun Gong, Qi Chen, Fan Yang, Hao Sun, Yingxia Shao, and Xing Xie. 2022. Distill-VQ: Learning Retrieval Oriented Vector Quantization By Distilling Knowledge from Dense Embeddings. In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '22). Association for Computing Machinery, New York, NY, USA, 1513–1523. https://doi.org/10.1145/3477495.3531799
  15. Experiment Settings Experiments • σʔληοτ • MS MARCO Passage retrieval

    • Bing Search ͷΫΤϦ • Natural Questions (NQ) • Google Search ͷΫΤϦ • ϕʔεϥΠϯ • طଘͷϕΫτϧྔࢠԽख๏ (IVFPQ, IVFOPQ, ScaNN) • ࠷ۙͷಉֶ࣌शख๏ (Poeem, JPQ, RepCONC) Shitao Xiao, Zheng Liu, Weihao Han, Jianjin Zhang, Defu Lian, Yeyun Gong, Qi Chen, Fan Yang, Hao Sun, Yingxia Shao, and Xing Xie. 2022. Distill-VQ: Learning Retrieval Oriented Vector Quantization By Distilling Knowledge from Dense Embeddings. In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '22). Association for Computing Machinery, New York, NY, USA, 1513–1523. https://doi.org/10.1145/3477495.3531799
  16. Experiment Settings Experiments • Distill-VQ ͷڭࢣϞσϧͱͯ͠ɺΑ͘܇࿅͞Εͨ2छྨͷΤϯίʔμʔΛࢼͨ͠ • AR2-G • CoCondenser

    • ͜ΕΒͷϞσϧ͸ MS MARCO ͱ NQ Ͱ࠷΋ accurate • จॻຒΊࠐΈ • ϑΣΞʹൺֱ͢ΔͨΊʹ͢΂ͯͷख๏Ͱಉ͡΋ͷΛ࢖༻
  17. Overall Performance Experiments • ݕࡧ඼࣭΁ͷΠϯύΫτΛطଘख๏ͱൺֱ • Ұ؏ͯ͠༗ҙʹߴ͍ੑೳˍSOTA • 2छྨͷΤϯίʔμʔ
 AR2-G,

    CoCondenser • 2छྨͷσʔληοτ
 MS MARCO, NQ Shitao Xiao, Zheng Liu, Weihao Han, Jianjin Zhang, Defu Lian, Yeyun Gong, Qi Chen, Fan Yang, Hao Sun, Yingxia Shao, and Xing Xie. 2022. Distill-VQ: Learning Retrieval Oriented Vector Quantization By Distilling Knowledge from Dense Embeddings. In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '22). Association for Computing Machinery, New York, NY, USA, 1513–1523. https://doi.org/10.1145/3477495.3531799
  18. Explorations of Knowledge Distillation Experiments • Distill-VQ Ͱ͸ɺsimilarity function ΍ɺจ

    ॻαϯϓϦϯάํ๏ʹબ୒ͷ༨஍͕͋Δͷ Ͱɺม͑ͯΈ࣮ͯݧ • ϥϯΩϯάΛߟྀͨ͠ similarity function (KL-Div, ListNet, RankNet) ͷ΄͏͕ੑೳ ͕ߴ͍ • όοναϯϓϦϯάͱ Top-K ͷ૊Έ߹Θͤ (IB + Top-K) ͸ɺϥϕϧ෇͖σʔλΛ࢖ͬ ͨ৔߹ (GT) ΑΓߴੑೳʢʂʣ Shitao Xiao, Zheng Liu, Weihao Han, Jianjin Zhang, Defu Lian, Yeyun Gong, Qi Chen, Fan Yang, Hao Sun, Yingxia Shao, and Xing Xie. 2022. Distill-VQ: Learning Retrieval Oriented Vector Quantization By Distilling Knowledge from Dense Embeddings. In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '22). Association for Computing Machinery, New York, NY, USA, 1513–1523. https://doi.org/10.1145/3477495.3531799
  19. Efficiency and Retrieval Quality Experiments • ଎౓ͱ࠶ݱ཰ͷτϨʔυΦϑΛɺFAISS ͷΦϦδφϧͷ IVFOPQ ͱൺֱ

    • ͢΂ͯͷઃఆͰ IVFOPQ Λ্ճͬͨ Shitao Xiao, Zheng Liu, Weihao Han, Jianjin Zhang, Defu Lian, Yeyun Gong, Qi Chen, Fan Yang, Hao Sun, Yingxia Shao, and Xing Xie. 2022. Distill-VQ: Learning Retrieval Oriented Vector Quantization By Distilling Knowledge from Dense Embeddings. In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '22). Association for Computing Machinery, New York, NY, USA, 1513–1523. https://doi.org/10.1145/3477495.3531799
  20. Personal Impressions • ϥϕϧ͖ͭσʔλ͕ͳͯ͘΋ɺANN ͷੑೳΛ޲্Ͱ͖Δख๏ͱͯ͠ɺେมڵຯਂ͔ͬͨ • σʔλߏ଄ɾΞϧΰϦζϜ͸ม͑ͣʹద༻Ͱ͖ΔͷͰɺΫΤϦॲཧ଎౓΁ͷѱӨڹ΋ͳ͍ • MS MARCO

    Passage ͳͲͰ༗ҙͳੑೳ޲্͕֬ೝ͞ΕͨͷͰ༗๬ • ࣮ΞϓϦέʔγϣϯʹద༻͢Δ͜ͱΛߟ͑ΔͱɺIVFɾPQɾΫΤϦΤϯίʔμʔͷ࠶ֶशͲ͏͢ Δ͔͕ؾʹͳΔ • ANN ΠϯσοΫεશମΛ࡞Γͳ͓͠ʹͳΔͱࢥ͏ͷͰɺֶशʹ͔͔Δ࣌ؒ΋ؾʹͳΔ • ʢߋ৽͕ͳ͍ɺ੩తͳΠϯσοΫεͰ͋Ε͹໰୊ͳ͍͕ɺͦͷΑ͏ͳΞϓϦέʔγϣϯ͸كʣ