Upgrade to Pro — share decks privately, control downloads, hide ads and more …

文献紹介:MoverScore: Text Generation Evaluating with Contextualized Embeddings and Earth Mover Distance

Taichi Aida
October 14, 2019

文献紹介:MoverScore: Text Generation Evaluating with Contextualized Embeddings and Earth Mover Distance

Taichi Aida

October 14, 2019
Tweet

More Decks by Taichi Aida

Other Decks in Technology

Transcript

  1. จݙ঺հʢʣ MoverScore: Text Generation Evaluating with Contextualized Embeddings and Earth

    Mover Distance Wei Zhao† , Maxime Peyrard† , Fei Liu‡ , Yang Gao† , Christian M. Meyer† , Steffen Eger† EMNLP2019 ௕Ԭٕज़Պֶେֶ ࣗવݴޠॲཧݚڀࣨɹ ૬ాɹଠҰ
  2. Related work • ৭ʑͳධՁख๏ʢ1ʣ • ཁ໿ɿROUGE(Lin 2004) • ػց຋༁ɿBLEU(Papinemi 2002),

    RUSE(Shimanaka 2018) • Image CaptioningɿBLEU, CIDEr(Vedantam 2015), SPICE(Anderson 2016) 3 #-&6͸޲͔ͳ͍
  3. Related work • ৭ʑͳධՁख๏ʢ2ʣ • ҙຯతྨࣅ౓ɿ “BERTScore”(Zhang 2019) • ຋༁ɿڭࢣ͋Γɾڭࢣͳ͠

    BERT ෼ࢄදݱ(Mathur 2019) • ཁ໿ɺΤοηΠ࠾఺ɿELMo + Sentence Mover’s Simirality(Clark 2019) 4 จ຺Λߟྀͨ͠෼ࢄදݱ $POUFYUVBMJ[FESFQSFTFOUBUJPO Λ༻͍Δख๏͕૿͖͑ͯͨ ࣮ݧͷ#BTFMJOFʹग़͖ͯ·͢
  4. Method • MoverScore Variations • Granularityɿn-gram (n=1, 2, size-of-sentence) •

    Embeddingɿword2vec, BERT, ELMo • Fine-tuningɿMultiNLI, QANLI, QQP • Aggregationɿpower means, routing mechanism 6 /-* 1BSBQISBTJOH #&35 &-.P #&35
  5. Method • MoverScore Variations • Granularityɿn-gram (n=1, 2, size-of-sentence) •

    Embeddingɿword2vec, BERT, ELMo • Fine-tuningɿMultiNLI, QANLI, QQP • Aggregationɿpower means, routing mechanism 7 #&35 &-.P
  6. Method • Aggregation ʢ౷߹ํ๏ʣ • จ຺Λߟྀͨ͠෼ࢄදݱɿBERT, ELMo • ֤୯ޠ͸֤૚͔ΒͦΕͧΕҟͳΔϕΫτϧ͕౉͞ΕΔ •

    Power MeansɿฏۉΛऔΓ ৐( )ɺconcat • Routing Mechanismɿৄ͘͠͸(Zhang 2018) p p = 1, ± ∞ 8
  7. Method • ग़ྗจͱࢀরจͷҙຯతڑ཭ • Word Mover's Distance (WMD) • Sentence

    Mover's Distance (SMD) • ઌ΄Ͳͷ૊Έ߹ΘͤΛɺWMD, SMD ͦΕͧΕͰݕূ͢Δ 9
  8. Experiment • Tasks • ػց຋༁ • ཁ໿ • ର࿩ʢλεΫࢤ޲ʣ •

    Image Captioning 10 ʢࢀরจɺෳ਺ͷγεςϜʹΑΔग़ྗจʣͷϖΞ γεςϜͷग़ྗจʹ͸ਓखධՁ͕͞Ε͍ͯΔ ʲධՁࢦඪɺMoverScore Ͱ΍Δ͜ͱʳ ɾγεςϜͷग़ྗจΛධՁ ɾਓखධՁͱͷ૬ؔΛݟΔ
  9. Experiment • ཁ໿ • DatasetɿTAC-2008, TAC-2009 • Responsivenessɿ಺༰ʴจ๏తͳ඼࣭ • Pyramidɿࢀরจʹؚ·ΕΔॏཁͳ಺༰͕ͲΕ͚ͩଟ͘Χόʔ͞

    Ε͍ͯΔ͔ • BaselinesɿROUGE-1, ROUGE-2, (Peyrard 2017), BERTScore(Zhang 2019) S3 best 14 ڭࢣ͋ΓͷධՁࢦඪ
  10. Experiment • ର࿩ʢλεΫࢤ޲ʣ • DatasetɿBAGEL, SFHOTEL • Informativeness (Inf)ɿఏڙ͢Δ৘ใྔ •

    Naturalness (Nat)ɿਓͷԠ౴΁ͷۙ͞ • Quality (Qual)ɿྲྀெੑɾจ๏ • BaselinesɿBLEU, METEOR, BERTScore(Zhang 2019) 16
  11. Experiment • Image Captioning • DatasetɿMSCOCO • M1 ~ M5

    ͷධՁ͕͋Δ • ࠓճ͸ɺશମͷ඼࣭ʹؔ͢ΔM1, M2 Λ࠾༻ • BaselinesɿCIDEr, SPICE, METEOR, LEIC(Cui 2018), BERTScore(Zhang 2019) 18 ڭࢣ͋ΓͷධՁࢦඪ
  12. Result • Baseline ͷ LEIC ʹྼΔ͕ɺͦΕͰ΋ߴ͍૬ؔΛࣔ͢ 19 M: BERT fine-tuning

    ʹ MultiNLI Λ࢖༻ P: ELMo / BERT ͷ౷߹ (Aggregation) ʹ Power Means Λ࢖༻
  13. Discussion • ࣮ݧͷ Baseline ͱͯ͠ग़͖ͯͨ BERTScore ͱͷൺֱ 21 One-to-one ͷڧ͍

    alignment Many-to-one ͷऑ͍ alignment WMD Ͱద੾ͳڑ཭ ͕औΕ͍ͯΔ
  14. Conclusion • ੜ੒λεΫͷڭࢣͳ͠ධՁࢦඪΛఏҊ • 4ͭͷੜ੒λεΫͰ Baselines Λ ௒͑Δ/ഭΔ ݁Ռʹ •

    ιʔείʔυΛެ։ɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹ https://github.com/AIPHES/emnlp19-moverscore 24