Upgrade to Pro — share decks privately, control downloads, hide ads and more …

文献紹介:MoverScore: Text Generation Evaluating wit...

Avatar for Taichi Aida Taichi Aida
October 14, 2019

文献紹介:MoverScore: Text Generation Evaluating with Contextualized Embeddings and Earth Mover Distance

Avatar for Taichi Aida

Taichi Aida

October 14, 2019
Tweet

More Decks by Taichi Aida

Other Decks in Technology

Transcript

  1. จݙ঺հʢʣ MoverScore: Text Generation Evaluating with Contextualized Embeddings and Earth

    Mover Distance Wei Zhao† , Maxime Peyrard† , Fei Liu‡ , Yang Gao† , Christian M. Meyer† , Steffen Eger† EMNLP2019 ௕Ԭٕज़Պֶେֶ ࣗવݴޠॲཧݚڀࣨɹ ૬ాɹଠҰ
  2. Related work • ৭ʑͳධՁख๏ʢ1ʣ • ཁ໿ɿROUGE(Lin 2004) • ػց຋༁ɿBLEU(Papinemi 2002),

    RUSE(Shimanaka 2018) • Image CaptioningɿBLEU, CIDEr(Vedantam 2015), SPICE(Anderson 2016) 3 #-&6͸޲͔ͳ͍
  3. Related work • ৭ʑͳධՁख๏ʢ2ʣ • ҙຯతྨࣅ౓ɿ “BERTScore”(Zhang 2019) • ຋༁ɿڭࢣ͋Γɾڭࢣͳ͠

    BERT ෼ࢄදݱ(Mathur 2019) • ཁ໿ɺΤοηΠ࠾఺ɿELMo + Sentence Mover’s Simirality(Clark 2019) 4 จ຺Λߟྀͨ͠෼ࢄදݱ $POUFYUVBMJ[FESFQSFTFOUBUJPO Λ༻͍Δख๏͕૿͖͑ͯͨ ࣮ݧͷ#BTFMJOFʹग़͖ͯ·͢
  4. Method • MoverScore Variations • Granularityɿn-gram (n=1, 2, size-of-sentence) •

    Embeddingɿword2vec, BERT, ELMo • Fine-tuningɿMultiNLI, QANLI, QQP • Aggregationɿpower means, routing mechanism 6 /-* 1BSBQISBTJOH #&35 &-.P #&35
  5. Method • MoverScore Variations • Granularityɿn-gram (n=1, 2, size-of-sentence) •

    Embeddingɿword2vec, BERT, ELMo • Fine-tuningɿMultiNLI, QANLI, QQP • Aggregationɿpower means, routing mechanism 7 #&35 &-.P
  6. Method • Aggregation ʢ౷߹ํ๏ʣ • จ຺Λߟྀͨ͠෼ࢄදݱɿBERT, ELMo • ֤୯ޠ͸֤૚͔ΒͦΕͧΕҟͳΔϕΫτϧ͕౉͞ΕΔ •

    Power MeansɿฏۉΛऔΓ ৐( )ɺconcat • Routing Mechanismɿৄ͘͠͸(Zhang 2018) p p = 1, ± ∞ 8
  7. Method • ग़ྗจͱࢀরจͷҙຯతڑ཭ • Word Mover's Distance (WMD) • Sentence

    Mover's Distance (SMD) • ઌ΄Ͳͷ૊Έ߹ΘͤΛɺWMD, SMD ͦΕͧΕͰݕূ͢Δ 9
  8. Experiment • Tasks • ػց຋༁ • ཁ໿ • ର࿩ʢλεΫࢤ޲ʣ •

    Image Captioning 10 ʢࢀরจɺෳ਺ͷγεςϜʹΑΔग़ྗจʣͷϖΞ γεςϜͷग़ྗจʹ͸ਓखධՁ͕͞Ε͍ͯΔ ʲධՁࢦඪɺMoverScore Ͱ΍Δ͜ͱʳ ɾγεςϜͷग़ྗจΛධՁ ɾਓखධՁͱͷ૬ؔΛݟΔ
  9. Experiment • ཁ໿ • DatasetɿTAC-2008, TAC-2009 • Responsivenessɿ಺༰ʴจ๏తͳ඼࣭ • Pyramidɿࢀরจʹؚ·ΕΔॏཁͳ಺༰͕ͲΕ͚ͩଟ͘Χόʔ͞

    Ε͍ͯΔ͔ • BaselinesɿROUGE-1, ROUGE-2, (Peyrard 2017), BERTScore(Zhang 2019) S3 best 14 ڭࢣ͋ΓͷධՁࢦඪ
  10. Experiment • ର࿩ʢλεΫࢤ޲ʣ • DatasetɿBAGEL, SFHOTEL • Informativeness (Inf)ɿఏڙ͢Δ৘ใྔ •

    Naturalness (Nat)ɿਓͷԠ౴΁ͷۙ͞ • Quality (Qual)ɿྲྀெੑɾจ๏ • BaselinesɿBLEU, METEOR, BERTScore(Zhang 2019) 16
  11. Experiment • Image Captioning • DatasetɿMSCOCO • M1 ~ M5

    ͷධՁ͕͋Δ • ࠓճ͸ɺશମͷ඼࣭ʹؔ͢ΔM1, M2 Λ࠾༻ • BaselinesɿCIDEr, SPICE, METEOR, LEIC(Cui 2018), BERTScore(Zhang 2019) 18 ڭࢣ͋ΓͷධՁࢦඪ
  12. Result • Baseline ͷ LEIC ʹྼΔ͕ɺͦΕͰ΋ߴ͍૬ؔΛࣔ͢ 19 M: BERT fine-tuning

    ʹ MultiNLI Λ࢖༻ P: ELMo / BERT ͷ౷߹ (Aggregation) ʹ Power Means Λ࢖༻
  13. Discussion • ࣮ݧͷ Baseline ͱͯ͠ग़͖ͯͨ BERTScore ͱͷൺֱ 21 One-to-one ͷڧ͍

    alignment Many-to-one ͷऑ͍ alignment WMD Ͱద੾ͳڑ཭ ͕औΕ͍ͯΔ
  14. Conclusion • ੜ੒λεΫͷڭࢣͳ͠ධՁࢦඪΛఏҊ • 4ͭͷੜ੒λεΫͰ Baselines Λ ௒͑Δ/ഭΔ ݁Ռʹ •

    ιʔείʔυΛެ։ɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹ https://github.com/AIPHES/emnlp19-moverscore 24