Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
文献紹介:MoverScore: Text Generation Evaluating wit...
Search
Taichi Aida
October 14, 2019
Technology
0
500
文献紹介:MoverScore: Text Generation Evaluating with Contextualized Embeddings and Earth Mover Distance
Taichi Aida
October 14, 2019
Tweet
Share
More Decks by Taichi Aida
See All by Taichi Aida
PhD Defence: Considering Temporal and Contextual Information for Lexical Semantic Change Detection
a1da4
1
190
文献紹介:A Multidimensional Framework for Evaluating Lexical Semantic Change with Social Science Applications
a1da4
1
290
YANS2024:目指せ国際会議!「ネットワーキングの極意(国際会議編)」
a1da4
0
220
言語処理学会30周年記念事業留学支援交流会@YANS2024:「学生のための短期留学」
a1da4
1
340
新入生向けチュートリアル:文献のサーベイv2
a1da4
15
10k
文献紹介:Isotropic Representation Can Improve Zero-Shot Cross-Lingual Transfer on Multilingual Language Models
a1da4
0
180
文献紹介:WhitenedCSE: Whitening-based Contrastive Learning of Sentence Embeddings
a1da4
1
260
文献紹介:On the Transformation of Latent Space in Fine-Tuned NLP Models
a1da4
0
100
新入生向けチュートリアル:文献のサーベイ
a1da4
0
470
Other Decks in Technology
See All in Technology
Lazy application authentication with Tailscale
bluehatbrit
0
110
Node-REDのFunctionノードでMCPサーバーの実装を試してみた / Node-RED × MCP 勉強会 vol.1
you
PRO
0
130
Oracle Cloud Infrastructure:2025年6月度サービス・アップデート
oracle4engineer
PRO
2
310
GitHub Copilot の概要
tomokusaba
1
150
ネットワーク保護はどう変わるのか?re:Inforce 2025最新アップデート解説
tokushun
0
150
OPENLOGI Company Profile
hr01
0
67k
Witchcraft for Memory
pocke
1
660
整頓のジレンマとの戦い〜Tidy First?で振り返る事業とキャリアの歩み〜/Fighting the tidiness dilemma〜Business and Career Milestones Reflected on in Tidy First?〜
bitkey
0
290
生成AIで小説を書くためにプロンプトの制約や原則について学ぶ / prompt-engineering-for-ai-fiction
nwiizo
4
3.5k
タイミーのデータモデリング事例と今後のチャレンジ
ttccddtoki
4
1.4k
KubeCon + CloudNativeCon Japan 2025 Recap
ren510dev
1
300
MySQL5.6から8.4へ 戦いの記録
kyoshidaxx
1
300
Featured
See All Featured
Unsuck your backbone
ammeep
671
58k
Creating an realtime collaboration tool: Agile Flush - .NET Oxford
marcduiker
30
2.1k
Docker and Python
trallard
44
3.5k
CoffeeScript is Beautiful & I Never Want to Write Plain JavaScript Again
sstephenson
161
15k
Site-Speed That Sticks
csswizardry
10
670
Side Projects
sachag
455
42k
Fireside Chat
paigeccino
37
3.5k
The World Runs on Bad Software
bkeepers
PRO
69
11k
Code Review Best Practice
trishagee
69
18k
Save Time (by Creating Custom Rails Generators)
garrettdimon
PRO
31
1.3k
Exploring the Power of Turbo Streams & Action Cable | RailsConf2023
kevinliebholz
34
5.9k
GitHub's CSS Performance
jonrohan
1031
460k
Transcript
จݙհʢʣ MoverScore: Text Generation Evaluating with Contextualized Embeddings and Earth
Mover Distance Wei Zhao† , Maxime Peyrard† , Fei Liu‡ , Yang Gao† , Christian M. Meyer† , Steffen Eger† EMNLP2019 Ԭٕज़Պֶେֶ ࣗવݴޠॲཧݚڀࣨɹ ૬ాɹଠҰ
Abstract • ੜͷλεΫʹ͓͍ͯɺؤڧͳධՁईΛௐࠪ • จ຺Λߟྀͨ͠୯ޠࢄදݱ ͱ Word Mover’s Distance ͷΈ߹Θ͕ͤ࠷ྑ͔ͬͨ
• ιʔείʔυΛެ։ɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹ https://github.com/AIPHES/emnlp19-moverscore 2
Related work • ৭ʑͳධՁख๏ʢ1ʣ • ཁɿROUGE(Lin 2004) • ػց༁ɿBLEU(Papinemi 2002),
RUSE(Shimanaka 2018) • Image CaptioningɿBLEU, CIDEr(Vedantam 2015), SPICE(Anderson 2016) 3 #-&6͔ͳ͍
Related work • ৭ʑͳධՁख๏ʢ2ʣ • ҙຯతྨࣅɿ “BERTScore”(Zhang 2019) • ༁ɿڭࢣ͋Γɾڭࢣͳ͠
BERT ࢄදݱ(Mathur 2019) • ཁɺΤοηΠ࠾ɿELMo + Sentence Mover’s Simirality(Clark 2019) 4 จ຺Λߟྀͨ͠ࢄදݱ $POUFYUVBMJ[FESFQSFTFOUBUJPO Λ༻͍Δख๏͕૿͖͑ͯͨ ࣮ݧͷ#BTFMJOFʹग़͖ͯ·͢
Method • ༷ʑͳੜλεΫΛධՁͰ͖Δࢦඪ(MoverScore)Λௐࠪ • ੜจͱࢀরจͷྨࣅʢʁʣΛଌΔ • จ຺Λߟྀͨ͠ࢄදݱɿBERT, ELMo • ग़ྗจͱࢀরจͷҙຯతڑɿWord
Mover's Distance 5
Method • MoverScore Variations • Granularityɿn-gram (n=1, 2, size-of-sentence) •
Embeddingɿword2vec, BERT, ELMo • Fine-tuningɿMultiNLI, QANLI, QQP • Aggregationɿpower means, routing mechanism 6 /-* 1BSBQISBTJOH #&35 &-.P #&35
Method • MoverScore Variations • Granularityɿn-gram (n=1, 2, size-of-sentence) •
Embeddingɿword2vec, BERT, ELMo • Fine-tuningɿMultiNLI, QANLI, QQP • Aggregationɿpower means, routing mechanism 7 #&35 &-.P
Method • Aggregation ʢ౷߹ํ๏ʣ • จ຺Λߟྀͨ͠ࢄදݱɿBERT, ELMo • ֤୯ޠ֤͔ΒͦΕͧΕҟͳΔϕΫτϧ͕͞ΕΔ •
Power MeansɿฏۉΛऔΓ ( )ɺconcat • Routing Mechanismɿৄ͘͠(Zhang 2018) p p = 1, ± ∞ 8
Method • ग़ྗจͱࢀরจͷҙຯతڑ • Word Mover's Distance (WMD) • Sentence
Mover's Distance (SMD) • ઌ΄ͲͷΈ߹ΘͤΛɺWMD, SMD ͦΕͧΕͰݕূ͢Δ 9
Experiment • Tasks • ػց༁ • ཁ • ରʢλεΫࢤʣ •
Image Captioning 10 ʢࢀরจɺෳͷγεςϜʹΑΔग़ྗจʣͷϖΞ γεςϜͷग़ྗจʹਓखධՁ͕͞Ε͍ͯΔ ʲධՁࢦඪɺMoverScore ͰΔ͜ͱʳ ɾγεςϜͷग़ྗจΛධՁ ɾਓखධՁͱͷ૬ؔΛݟΔ
Experiment • ػց༁ • DatasetɿWMT2017 • ࢀՃγεςϜͷग़ྗจʹɺ࠷Ͱ15ਓͷਓखධՁ • BaselinesɿSentBLEU, METEOR++,
RUSE, BERTScore(Zhang 2019) 11
Result • WMD+BERT+MNLI+PMeans ͕ Baseline Λ্ճΔ 12
Result • Sentence Representation Ͱใ͕ࣦΘΕΔʁ 13
Experiment • ཁ • DatasetɿTAC-2008, TAC-2009 • Responsivenessɿ༰ʴจ๏తͳ࣭ • Pyramidɿࢀরจʹؚ·ΕΔॏཁͳ༰͕ͲΕ͚ͩଟ͘Χόʔ͞
Ε͍ͯΔ͔ • BaselinesɿROUGE-1, ROUGE-2, (Peyrard 2017), BERTScore(Zhang 2019) S3 best 14 ڭࢣ͋ΓͷධՁࢦඪ
Result • WMD+BERT+MNLI+PMeans Ͱ Baselines Λ্ճΔ 15
Experiment • ରʢλεΫࢤʣ • DatasetɿBAGEL, SFHOTEL • Informativeness (Inf)ɿఏڙ͢Δใྔ •
Naturalness (Nat)ɿਓͷԠͷۙ͞ • Quality (Qual)ɿྲྀெੑɾจ๏ • BaselinesɿBLEU, METEOR, BERTScore(Zhang 2019) 16
Result • શମతʹ૬͕͍͕ؔɺఏҊख๏ͦͷதͰߴ͍ํ 17
Experiment • Image Captioning • DatasetɿMSCOCO • M1 ~ M5
ͷධՁ͕͋Δ • ࠓճɺશମͷ࣭ʹؔ͢ΔM1, M2 Λ࠾༻ • BaselinesɿCIDEr, SPICE, METEOR, LEIC(Cui 2018), BERTScore(Zhang 2019) 18 ڭࢣ͋ΓͷධՁࢦඪ
Result • Baseline ͷ LEIC ʹྼΔ͕ɺͦΕͰߴ͍૬ؔΛࣔ͢ 19 M: BERT fine-tuning
ʹ MultiNLI Λ༻ P: ELMo / BERT ͷ౷߹ (Aggregation) ʹ Power Means Λ༻
Discussion • ࣮ݧͷ Baseline ͱͯ͠ग़͖ͯͨ BERTScore ͱͷൺֱ 20
Discussion • ࣮ݧͷ Baseline ͱͯ͠ग़͖ͯͨ BERTScore ͱͷൺֱ 21 One-to-one ͷڧ͍
alignment Many-to-one ͷऑ͍ alignment WMD Ͱదͳڑ ͕औΕ͍ͯΔ
Discussion • ػց༁ͰਓखධՁͷߴ͍ͷ(good)ͱ͍ͷ(bad)ͷɹ 2ͭʹ͚ɺΛௐࠪ • ൺֱର • Baseline: SentBLEU •
Proposal: MoverScore(WMD+BERT) 22
Discussion • SentBLEU ਓखධՁ͕ྑͯ͘தఔͷՕॴʹଟ͘ • MoverScore ៉ྷʹ2ͭͷۃΛදݱͰ͖͍ͯΔ 23
Conclusion • ੜλεΫͷڭࢣͳ͠ධՁࢦඪΛఏҊ • 4ͭͷੜλεΫͰ Baselines Λ ͑Δ/ഭΔ ݁Ռʹ •
ιʔείʔυΛެ։ɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹ https://github.com/AIPHES/emnlp19-moverscore 24