Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Open-Retrieval Conversational Question Answering

Open-Retrieval Conversational Question Answering

Scatter Lab Inc.

July 24, 2020
Tweet

More Decks by Scatter Lab Inc.

Other Decks in Research

Transcript

  1. ѐਃ ѐਃ • SIGIR 20 • Chen Qu, Liu Yang,

    Cen Chen, Minghui Qiu, W. Bruce Croft, Mohit Iyyer • University of Massachusetts Amherst, Ant Financial, Alibaba Group • Conversational searchਸ ਤ೧ ConvQAܳ open retrieval settingਵ۽ ഛ੢ೞח Ѫ੉ ઱ਃ োҳ ਃ૑
  2. ѐਃ ѐਃ • Conversational search਷ information retrieval੄ Ҿӓ੸ੋ ݾ಴઺ী ೞա

    • ୭Ӕ੄ োҳٜ਷ conversational searchܳ response rankingҗ conversational question answering۽ ೧Ѿ • ૊ ױࣽ൤ ׹߸ਸ ઱য૓ candidate setীࢲ ҊܰѢա ઱য૓ passageীࢲ spanਸ ࢶఖ • ੉ח conversational search৔৉ীࢲ retrieval੄ ӝୡ੸ੋ ৉ഝਸ ޖदೞח ߑध • ࠄ ֤ޙ਷ open-retrieval conversational question answering(ORConvQA) settingਸ ઁউೞৈ ޙઁܳ ೧Ѿ
  3. ѐਃ ѐਃ • ORConvQAী ؀ೠ োҳܳ ਤ೧ OR-QuAC ؘ੉ఠ ࣇਸ

    ٜ݅঻ਵݴ ORConvQAܳ ਤೠ end-to-end दझమਸ ҳ୷ೞݴ ౟ےझನݠ ӝ߈੄ retriever, reranker ৬ reader ١ਸ ನೣ • OR-QuACܳ ؀࢚ਵ۽ ೠ ֤ޙ੄ प೷਷ learnable retriever੄ ઺ਃࢿਸ ૐݺ • ژೠ ݽٚ दझమ ҳࢿ ਃࣗ(retriever, reranker ৬ reader)ীࢲ history modelingਸ ࢎਊೞݶ दझమ੉ ௼ѱ ѐࢶ ؼ ࣻ ੓ ਺ਸ ࠁ੐
  4. ORConvQA? Dataset • conversational search systemsਸ ҳ୷ೞӝ ਤೠ ୶о ױ҅۽ࢲ

    ׹߸ਸ Ҋܰ ӝ ੉੹ী retrieve evidenceܳ large collection۽ ࠗఠ Ѩ࢝ 1. ੿ࠁܳ ҳೞח ؀ചܳ ઁҕ(information seeker৬ information provider)৬ ೞח QuAC dataset 2. QuAC ૕ޙਸ context-independentೞѱ ׮द ੘ࢿೠ CANARD dataset 3. Wikipedia passage
  5. CANARD? Dataset • QuAC੄ dialogsח self-containedೞ૑ ঋ׮ח ড੼੉ ੓חؘ ੉ח

    ࠛ৮੹ೠ ୡӝ ૕ޙਵ۽ ੋ೧ ߊࢤ • ৘ܳ ٜয seekerীѱ a Chinese polymathic scientistੋ Zhang Hengী ؀೧ ߓ਋ۄҊ ೮חؘ ୐ ૕ޙ੉ "җ೟җ ӝࣿҗ যڃ ҙ ҅о ੓঻णפө?” ৓਺ • ੉۞ೠ ࠛౠ੿ೞҊ ݽഐೠ ୡӝ ૕ޙ਷ ؀ചܳ ೧ࢳೞӝ য۵ѱ ೞӝ ٸޙী ҕѐ Ѩ࢝ ജ҃ীࢲ ޙઁܳ ঠӝ • CANARD ؘ੉ఠ ࣁ౟ীࢲ ઁҕೞח context-independent rewritesਵ۽ ؀୓ೞৈ ੉ ޙઁܳ ೧Ѿ, Ӓۢ "Zhang Heng੉ җ೟ ӝ ࣿҗ যڃ ҙ҅о ੓঻णפө?"۽ ૕ޙ੉ ؀୓
  6. CANARD? Dataset • ୐ ߣ૩ ૕ޙী ؀೧ࢲ݅ Ү୓ܳ ࣻ೯ೞݶ ؀ച

    ղীࢲ history dependenciesਸ Ӓ؀۽ ਬ૑ೞݶࢲ ؀ചо self-contained • QuAC੄ test set੉ ҕѐغয ੓૑ ঋӝ ٸޙী QuAC੄ dev setਸ ੉ਊೞৈ CANARD੄ test setਸ ݅ٞ • ژೠ QuAC੄ train set੄ 10%ܳ dev۽ ഝਊ. • CANARDী হח QuAC ૕ޙ਷ ತӝ೮ਵݴ ੉ܳ ੉ਊೠ ౵ࢤ ؘ੉ఠ ૘೤ੋ OR-QuAC੄ ؘ੉ఠ ా҅ח ׮਺җ э׮.
  7. Retriever pretraining Training • retrieval scores for the batch •

    to maximize the probability of the gold passage for each question • Pretraining loss Pretraning ੉റী passage encoderח offlineਵ۽ ك׮. Faissܳ ࢎਊ೧ࢲ Ѿҗܳ ࡳই১.
  8. Inference Training • Retrieval Ѿҗ Top-K ޙࢲܳ ݽف ੋಌ۠झ ೞৈ

    п ޙࢲ߹ spanਸ ৘ஏ • Retriever loss + Reranker loss + Reader lossо ઁੌ ੘਷ ޙࢲ੄ spanਸ ୭ઙ ੿׹ਵ۽ ৘ஏ
  9. Competing Method RESULTS • DrQA : TF-IDF + RNN based

    reader • BERTserini : BM25 + BERT reader
 • ORConvQA without history : our method + window size 0 • ORConvQA : our method • Evaluation Metric : word level F1, human equivalence score (HEQ), Mean Reciprocal Rank(MRR), Recall
  10. хࢎ೤פ׮✌ ୶о ૕ޙ ژח ҾӘೠ ੼੉ ੓׮ݶ ঱ઁٚ ইې োۅ୊۽

    োۅ ઱ࣁਃ! ࢲ࢚਋ (ܻࢲ஖ ࢎ੉঱౭झ౟, ೝಯ) [email protected] Linked in. @pingpong