文献紹介:MoverScore: Text Generation Evaluating with Contextualized Embeddings and Earth Mover Distance

文献紹介:MoverScore: Text Generation Evaluating with Contextualized Embeddings and Earth Mover Distance

9e650916f36300d64c9c61eeb4ab697e?s=128

Taichi Aida

October 14, 2019
Tweet

Transcript

  1. จݙ঺հʢʣ MoverScore: Text Generation Evaluating with Contextualized Embeddings and Earth

    Mover Distance Wei Zhao† , Maxime Peyrard† , Fei Liu‡ , Yang Gao† , Christian M. Meyer† , Steffen Eger† EMNLP2019 ௕Ԭٕज़Պֶେֶ ࣗવݴޠॲཧݚڀࣨɹ ૬ాɹଠҰ
  2. Abstract • ੜ੒ͷλεΫʹ͓͍ͯɺؤڧͳධՁई౓Λௐࠪ • จ຺Λߟྀͨ͠୯ޠ෼ࢄදݱ ͱ Word Mover’s Distance ͷ૊Έ߹Θ͕ͤ࠷΋ྑ͔ͬͨ

    • ιʔείʔυΛެ։ɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹ https://github.com/AIPHES/emnlp19-moverscore 2
  3. Related work • ৭ʑͳධՁख๏ʢ1ʣ • ཁ໿ɿROUGE(Lin 2004) • ػց຋༁ɿBLEU(Papinemi 2002),

    RUSE(Shimanaka 2018) • Image CaptioningɿBLEU, CIDEr(Vedantam 2015), SPICE(Anderson 2016) 3 #-&6͸޲͔ͳ͍
  4. Related work • ৭ʑͳධՁख๏ʢ2ʣ • ҙຯతྨࣅ౓ɿ “BERTScore”(Zhang 2019) • ຋༁ɿڭࢣ͋Γɾڭࢣͳ͠

    BERT ෼ࢄදݱ(Mathur 2019) • ཁ໿ɺΤοηΠ࠾఺ɿELMo + Sentence Mover’s Simirality(Clark 2019) 4 จ຺Λߟྀͨ͠෼ࢄදݱ $POUFYUVBMJ[FESFQSFTFOUBUJPO Λ༻͍Δख๏͕૿͖͑ͯͨ ࣮ݧͷ#BTFMJOFʹग़͖ͯ·͢
  5. Method • ༷ʑͳੜ੒λεΫΛධՁͰ͖Δࢦඪ(MoverScore)Λௐࠪ • ੜ੒จͱࢀরจͷྨࣅ౓ʢʁʣΛଌΔ • จ຺Λߟྀͨ͠෼ࢄදݱɿBERT, ELMo • ग़ྗจͱࢀরจͷҙຯతڑ཭ɿWord

    Mover's Distance 5
  6. Method • MoverScore Variations • Granularityɿn-gram (n=1, 2, size-of-sentence) •

    Embeddingɿword2vec, BERT, ELMo • Fine-tuningɿMultiNLI, QANLI, QQP • Aggregationɿpower means, routing mechanism 6 /-* 1BSBQISBTJOH #&35 &-.P #&35
  7. Method • MoverScore Variations • Granularityɿn-gram (n=1, 2, size-of-sentence) •

    Embeddingɿword2vec, BERT, ELMo • Fine-tuningɿMultiNLI, QANLI, QQP • Aggregationɿpower means, routing mechanism 7 #&35 &-.P
  8. Method • Aggregation ʢ౷߹ํ๏ʣ • จ຺Λߟྀͨ͠෼ࢄදݱɿBERT, ELMo • ֤୯ޠ͸֤૚͔ΒͦΕͧΕҟͳΔϕΫτϧ͕౉͞ΕΔ •

    Power MeansɿฏۉΛऔΓ ৐( )ɺconcat • Routing Mechanismɿৄ͘͠͸(Zhang 2018) p p = 1, ± ∞ 8
  9. Method • ग़ྗจͱࢀরจͷҙຯతڑ཭ • Word Mover's Distance (WMD) • Sentence

    Mover's Distance (SMD) • ઌ΄Ͳͷ૊Έ߹ΘͤΛɺWMD, SMD ͦΕͧΕͰݕূ͢Δ 9
  10. Experiment • Tasks • ػց຋༁ • ཁ໿ • ର࿩ʢλεΫࢤ޲ʣ •

    Image Captioning 10 ʢࢀরจɺෳ਺ͷγεςϜʹΑΔग़ྗจʣͷϖΞ γεςϜͷग़ྗจʹ͸ਓखධՁ͕͞Ε͍ͯΔ ʲධՁࢦඪɺMoverScore Ͱ΍Δ͜ͱʳ ɾγεςϜͷग़ྗจΛධՁ ɾਓखධՁͱͷ૬ؔΛݟΔ
  11. Experiment • ػց຋༁ • DatasetɿWMT2017 • ࢀՃγεςϜͷग़ྗจʹɺ࠷௿Ͱ΋15ਓͷਓखධՁ • BaselinesɿSentBLEU, METEOR++,

    RUSE, BERTScore(Zhang 2019) 11
  12. Result • WMD+BERT+MNLI+PMeans ͕ Baseline Λ্ճΔ 12

  13. Result • Sentence Representation Ͱ͸৘ใ͕ࣦΘΕΔʁ 13

  14. Experiment • ཁ໿ • DatasetɿTAC-2008, TAC-2009 • Responsivenessɿ಺༰ʴจ๏తͳ඼࣭ • Pyramidɿࢀরจʹؚ·ΕΔॏཁͳ಺༰͕ͲΕ͚ͩଟ͘Χόʔ͞

    Ε͍ͯΔ͔ • BaselinesɿROUGE-1, ROUGE-2, (Peyrard 2017), BERTScore(Zhang 2019) S3 best 14 ڭࢣ͋ΓͷධՁࢦඪ
  15. Result • WMD+BERT+MNLI+PMeans Ͱ Baselines Λ্ճΔ 15

  16. Experiment • ର࿩ʢλεΫࢤ޲ʣ • DatasetɿBAGEL, SFHOTEL • Informativeness (Inf)ɿఏڙ͢Δ৘ใྔ •

    Naturalness (Nat)ɿਓͷԠ౴΁ͷۙ͞ • Quality (Qual)ɿྲྀெੑɾจ๏ • BaselinesɿBLEU, METEOR, BERTScore(Zhang 2019) 16
  17. Result • શମతʹ૬͕ؔ௿͍͕ɺఏҊख๏͸ͦͷதͰ΋ߴ͍ํ 17

  18. Experiment • Image Captioning • DatasetɿMSCOCO • M1 ~ M5

    ͷධՁ͕͋Δ • ࠓճ͸ɺશମͷ඼࣭ʹؔ͢ΔM1, M2 Λ࠾༻ • BaselinesɿCIDEr, SPICE, METEOR, LEIC(Cui 2018), BERTScore(Zhang 2019) 18 ڭࢣ͋ΓͷධՁࢦඪ
  19. Result • Baseline ͷ LEIC ʹྼΔ͕ɺͦΕͰ΋ߴ͍૬ؔΛࣔ͢ 19 M: BERT fine-tuning

    ʹ MultiNLI Λ࢖༻ P: ELMo / BERT ͷ౷߹ (Aggregation) ʹ Power Means Λ࢖༻
  20. Discussion • ࣮ݧͷ Baseline ͱͯ͠ग़͖ͯͨ BERTScore ͱͷൺֱ 20

  21. Discussion • ࣮ݧͷ Baseline ͱͯ͠ग़͖ͯͨ BERTScore ͱͷൺֱ 21 One-to-one ͷڧ͍

    alignment Many-to-one ͷऑ͍ alignment WMD Ͱద੾ͳڑ཭ ͕औΕ͍ͯΔ
  22. Discussion • ػց຋༁ͰਓखධՁͷߴ͍΋ͷ(good)ͱ௿͍΋ͷ(bad)ͷɹ 2ͭʹ෼͚ɺ෼෍Λௐࠪ • ൺֱର৅ • Baseline: SentBLEU •

    Proposal: MoverScore(WMD+BERT) 22
  23. Discussion • SentBLEU ͸ਓखධՁ͕ྑͯ͘΋தఔ౓ͷՕॴʹଟ͘෼෍ • MoverScore ͸៉ྷʹ2ͭͷۃΛදݱͰ͖͍ͯΔ 23

  24. Conclusion • ੜ੒λεΫͷڭࢣͳ͠ධՁࢦඪΛఏҊ • 4ͭͷੜ੒λεΫͰ Baselines Λ ௒͑Δ/ഭΔ ݁Ռʹ •

    ιʔείʔυΛެ։ɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹ https://github.com/AIPHES/emnlp19-moverscore 24