Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Multi-step Retriever-Reader Interaction for Sca...

Multi-step Retriever-Reader Interaction for Scalable Open-domain Question Answering

Scatter Lab Inc.

August 02, 2019
Tweet

More Decks by Scatter Lab Inc.

Other Decks in Research

Transcript

  1. 스캐터랩(ScatterLab) ੌ࢚؀ച ੋҕ૑מ Scatterlab ML Technical Seminar Session 2 (QA)

    구상준 Mutli-Step Retriever-Reader Interaction For Scalable Open-Domain Question Answering, ICLR 2019 Machine Learning Engineer
  2. • Open domain Question Answering • оמೠ ݽٚ بݫੋ੄ ؘ੉ఠ۽ࠗఠ

    ૕੄਽׹ਸ ೞח दझమ • ޙઁ 1: যڌѱ ؀ਊ۝੄ ޙࢲ ૘೤ীࢲ ҙ۲ػ ޙࢲܳ ଺ਸ ࣻ ੓ਸө? (Document Retrieval) • ޙઁ 2: যڌѱ ҙ۲ػ ޙࢲীࢲ ׹߸ ࠗ࠙ਸ ଺ਸ ࣻ ੓ਸө? (Machine Reading) !3 #1 Introduction Problem Description Document Retrieval Query Documents Relevant Doc. Machine Reading Query Relavant Doc. Answer Span, Score
  3. • Open domain Question Answering • оמೠ ݽٚ بݫੋ੄ ؘ੉ఠ۽ࠗఠ

    ૕੄਽׹ਸ ೞח दझమ • ޙઁ 1: যڌѱ ؀ਊ۝੄ ޙࢲ ૘೤ীࢲ ҙ۲ػ ޙࢲܳ ଺ਸ ࣻ ੓ਸө? (Document Retrieval) • ޙઁ 1-1: যڌѱ ޙࢲ ૘೤ীࢲ ੿ഛೠ ޙ׹ਸ ଺ਸ ࣻ ੓ਸө? • Dr.QA ١੄ SQuAD ӝ߈ QAܳ Open QAী ੸ਊ೮ਸ ҃਋ Retrieverо ੜ ز੘ೞ૑ ঋই ࢿמ੄ ೞۅ੉ ݒ਋ ఀ (69.5% -> 28.4%, Hun et al.) • ޙઁ 1-2: যڌѱ ޙࢲ ૘೤ীࢲ ޙࢲܳ оמೠೠ ࡅܰѱ ଺ਸ ࣻ ੓ਸө? • ޙઁ 2: যڌѱ ҙ۲ػ ޙࢲীࢲ ׹߸ ࠗ࠙ਸ ଺ਸ ࣻ ੓ਸө? (Machine Reading) • Retriever৬ Readerо Ѿ೤ػ ݽ؛ীࢲח ള۲੉ ݒ਋ ൨ٝ • ੉ ֤ޙ੄ Ѿۿ: Retriever৬ Readerܳ ٮ۽ ٮ۽ ೟णदః੗! • ӒܻҊ Retrieverী ࡅܲ Search ঌҊ્ܻਸ ߄۽ ੸ਊदெࠁ੗ !4 #1 Introduction Problem Description
  4. !5 #2 Method Architecture Retriever Query #t (Input Sentence Embedding)

    Documents (Paragraph Embedding) Relevant Doc. (Paragraph Embedding) Reader Query #t+1 Embedding Recurrent GRU Unit ReLU Feed Forward Reasoner Answer Span Answer Score
  5. #2 Method Architecture Retriever Query #t (Input Sentence Embedding) Documents

    (Paragraph Embedding) Relevant Doc. (Paragraph Embedding) • Retriever • ੑ۱ : ௪ܻ ޙ੢ ੐߬٬: Q, ޙױ ੐߬٬: P, P’, P’’, … • ୹۱: ਬࢎب ੼ࣻ (P,Q) = <P,Q> • ള۲: োҙ੓ח P,Qী ؀೧ 1 ইפݶ 0 (Pos-Neg) • ૕ޙ: • ޙ੢ ੐߬٬਷ যڌѱ ٜ݅঻ਸө? • Attentive Pooling of LSTM (Dr.QA৬ ਬࢎ) • ௪ܻ ޙ੢ ੐߬٬਷ 1ఢী݅ ݅ٝ • റࠁ ޙױ੉ ցޖ ݆਷ ޙઁח যڌѱ ೧Ѿ೮ਸө? • ࠺तೠ ޙױՙܻ ഛੋೞח k-NN ঌҊ્ܻ ࢎਊ
  6. #2 Method Architecture Word Vector Bi-LSTM Softmax Layer for ⦁

    x ⦁ p1 p2 p... pm w bj = exp(w ⋅ pj ) ∑ j′ exp(w ⋅ pj′ ) W sj = bj ⋅ pj P = W∑ j′ sj′
  7. !8 #2 Method Architecture Query #t (Input Sentence Embedding) Reader

    Answer Span Answer Score Relevant Doc. (Paragraph Embedding) • Reader (BiDAF or Dr. QA ҳઑ) • ੑ۱ : ௪ܻ ޙ੢ ੐߬٬: L, ޙױ ױয ੐߬٬: p.. • ୹۱: Span ੼ࣻ৬ Span, Reader hidden State • ള۲: Span Objective • ૕ޙ: • Hidden Stateח যڌѱ ҳೡөਃ? • Query৬੄ soft-attention
  8. #2 Method Architecture Query #t+1 Embedding Recurrent GRU Unit ReLU

    Feed Forward Reasoner • Multi-step Reasoner • ੑ۱ : ੹ ױ҅ ௪ܻ q(t), Reader State S • ୹۱: ࢜۽ ݅ٚ ௪ܻ • ള۲: ઺рױ҅ ௪ܻ ੿׹ਸ ٜ݅ ࣻ হਵ޲۽, ъച ೟ण (঴݃ա ੜ Retrieverо ׹߸ਸ ୶୹ೞחо) • State = (௪ܻ, ׹߸, ੹୓ ޙࢲ, ࢚ਤ ޙױ kѐ) • Observation = (௪ܻ, Reader State S) • Action = ੹୓ ޙױ ࢶఖ ৈࠗ • Reward = Reader-F1 q′ (t+1) = GRU(qt , S) q(t+1) = FFN(q′ (t+1) )
  9. !11 #2 Method Architecture Retriever Query #t (Input Sentence Embedding)

    Documents (Paragraph Embedding) Relevant Doc. (Paragraph Embedding) Reader Query #t+1 Embedding Recurrent GRU Unit ReLU Feed Forward Reasoner Answer Span Answer Score
  10. !12 #2 Method Architecture Retriever Query #t (Input Sentence Embedding)

    Documents (Paragraph Embedding) Relevant Doc. (Paragraph Embedding) Reader Query #t+1 Embedding Recurrent GRU Unit ReLU Feed Forward Reasoner Answer Span Answer Score
  11. !13 #2 Method Architecture Retriever Query #t (Input Sentence Embedding)

    Documents (Paragraph Embedding) Relevant Doc. (Paragraph Embedding) Reader Query #t+1 Embedding Recurrent GRU Unit ReLU Feed Forward Reasoner Answer Span Answer Score
  12. !14 #2 Method Architecture Retriever Query #t+1 (Input Sentence Embedding)

    Documents (Paragraph Embedding) Relevant Doc. (Paragraph Embedding) Reader Query #t+1 Embedding Recurrent GRU Unit ReLU Feed Forward Reasoner Answer Span Answer Score
  13. • Retriever • Purpose: ௪ܻ۽ࠗఠ ਗೞח ޙױ റࠁܳ ୶୹ೞח ৉ೡ

    + ޙױ ੐߬٬ਸ ب୹ೣ • Algorihtm: • Query: ୐ ఢীח ௪ܻ ੐߬٬ਸ ௪ܻ۽ࠗఠ ૒੽ ݅ٝ, Ӓ ׮਺ࠗఠח iterativeೞѱ ࢤࢿ • Paragraph: Nearest Neighbor Search۽ റࠁҵਸ ୶ܿ • Reader (BiDAF or Dr. QA) • Purpose: ޙױীࢲ ਗೞח ׹੉ ੓ח ࠗ࠙ਸ ଺ইն • Algorithm: BiDAF ഑਷ Dr. QA ١੄ Span Algorithm • Multi-step Reasonser (Cho 2017, Buck 2018) • Purpose: ׮਺ झయীࢲ ࢎਊೡ ௪ܻ ੐߬٬ਸ ب୹ೣ • Algorithm: GRU + FFN !15 #2 Method Overall
  14. • Quasar-T (2017) • ҳࢿ: ૕ޙ + ૐѢ੗ܐ(ClueWeb09 HTML) •

    ҳࢿ ߑߨ: • ۨ٪ ਬ੷ 007craftо ݽ਷ 54000ৈѐ੄ ౟ܻ࠺ই ૕ޙਵ۽ ҳࢿ • ૕ޙह׼ ClueWeb09ীࢲ ୶୹ೠ 100ѐ੄ HTML ޙࢲ۽ ҳࢿ • ౠ૚: • ੢ࣗ: 26.4% ੋޛ: 21.5% ܳ ର૑ೣ !16 #3 Experiment Experiment Description
  15. • Search QA (2017) • ҳࢿ: ૕ޙ + ૐѢ੗ܐ (Evidence,

    Google snippet) • ҳࢿ ߑߨ: • Jeopardy! (௰ૉࣳ) ੄ ૕ޙٜਸ ݽই֬਷ J-archieveীࢲ ҳࢿ • Googleীࢲ੄ ࢚ਤ 40ѐ੄ Ѩ࢝ ಕ੉૑۽ ҳࢿ • 140,000 ৈѐ੄ ૕׹हҗ 6,900,000ৈѐ੄ झפಛਵ۽ ҳࢿ !17 #3 Experiment Experiment Description
  16. • Trivia QA (2017) • ҳࢿ: ૕ޙ + ૐѢ੗ܐ (Evidence,

    Bing + Wikipedia) • ҳࢿ ߑߨ: • ௰ૉ ࢎ੉౟ 14 Ҕীࢲࠗఠ ૕ޙ-׹߸ਸ ੿ઁೞৈ ࣻ૘ೣ • Wikipedia৬ Bing Ѩ࢝ ࢚ਤ 10ѐ ಕ੉૑ (ਤ੄ ௰ૉ ࢎ੉౟ח ઁ৻) ܳ ୎ࠗ • ড 95,000 ह੄ ૕׹हҗ 650,000ह੄ ਢ Ѩ࢝ Ѿҗ, 78,000ह੄ ਤః ޙࢲ۽ ҳࢿؽ • ౠ૚: • ੋޛ (32%), ੢ࣗ(23%), ױ୓ ੉ܴ(5%) ੉ ࢚׼ೠ ࠗ࠙ਸ ର૑ೣ !18 #3 Experiment Experiment Description
  17. • Trivia QA • ҳࢿ ৘द (Evidenceী ੿׹ ੓਺) !19

    #3 Experiment Experiment Description Query Answer Evidence Miami Beach in Florida borders which ocean? Atlantic Who was Poopdeck Pappys most famous son? Popeye In golf, a six under par score has never been, and is unlikely to ever be recorded, as it requires a hole in one on a par-seven hole. What is the 'mythical' term for such a hypothetical performance? (hint: think about the names of under par scores) Phoenix
  18. • Trivia QA • ҳࢿ ৘द (Evidenceী ੿׹ ੓਺) !20

    #3 Experiment Experiment Description Query Answer Evidence Miami Beach in Florida borders which ocean? Atlantic Who was Poopdeck Pappys most famous son? Popeye In golf, a six under par score has never been, and is unlikely to ever be recorded, as it requires a hole in one on a par-seven hole. What is the 'mythical' term for such a hypothetical performance? (hint: think about the names of under par scores) Phoenix
  19. !23 #4 Result Iteration Example “Diaphoresis” is a medical term

    for what condition? Step 1 - A Greek term for hyper- hidrosis is diaphoresis - Also explaining why it is so difficult to diagnose Step 2 - Hyperhidrosis is a physical condition caused by excessive sweating Answer: Sweating What is name of the ship on which Dracula arrived in England in 1897? Step 1 - The untold story of Dracula’s voyage on the merchant ship “Demeter” from Translyvania… Step 2 - Dracula then sets sail on the ship “Demeter” to England, leaving Harker captive by … Answer: Sweating
  20. • ੢੼ • Retriever৬ Readerܳ ٮ۽ ҳഅೣਵ۽ॄ ة݀੸ਵ۽ ࢿמਸ ೱ࢚दఆ

    ࣻ ੓׮. • BiDAF + Dr. QA ١ী ߄۽ ੸ਊೡ ࣻ ੓׮. • ױ੼ • Query Reformation ੉ ߄۽ ೟ण੉ উ غࢲ Reinforcement Learningਸ ॳח ߑߨ਷ ਤ೷ೞ׮. • Inference ױ҅ীࢲ ҅࢑۝੉ Step ݅ఀ ګ׮. !24 #5 Conclusion Cons and Pros