Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
文献紹介:MoverScore: Text Generation Evaluating wit...
Search
Taichi Aida
October 14, 2019
Technology
0
520
文献紹介:MoverScore: Text Generation Evaluating with Contextualized Embeddings and Earth Mover Distance
Taichi Aida
October 14, 2019
Tweet
Share
More Decks by Taichi Aida
See All by Taichi Aida
PhD Defence: Considering Temporal and Contextual Information for Lexical Semantic Change Detection
a1da4
1
210
文献紹介:A Multidimensional Framework for Evaluating Lexical Semantic Change with Social Science Applications
a1da4
1
310
YANS2024:目指せ国際会議!「ネットワーキングの極意(国際会議編)」
a1da4
0
240
言語処理学会30周年記念事業留学支援交流会@YANS2024:「学生のための短期留学」
a1da4
1
370
新入生向けチュートリアル:文献のサーベイv2
a1da4
15
10k
文献紹介:Isotropic Representation Can Improve Zero-Shot Cross-Lingual Transfer on Multilingual Language Models
a1da4
0
180
文献紹介:WhitenedCSE: Whitening-based Contrastive Learning of Sentence Embeddings
a1da4
1
290
文献紹介:On the Transformation of Latent Space in Fine-Tuned NLP Models
a1da4
0
110
新入生向けチュートリアル:文献のサーベイ
a1da4
0
490
Other Decks in Technology
See All in Technology
バイブスに「型」を!Kent Beckに学ぶ、AI時代のテスト駆動開発
amixedcolor
2
580
Django's GeneratedField by example - DjangoCon US 2025
pauloxnet
0
150
Evolución del razonamiento matemático de GPT-4.1 a GPT-5 - Data Aventura Summit 2025 & VSCode DevDays
lauchacarro
0
210
AIエージェント開発用SDKとローカルLLMをLINE Botと組み合わせてみた / LINEを使ったLT大会 #14
you
PRO
0
130
Oracle Cloud Infrastructure IaaS 新機能アップデート 2025/06 - 2025/08
oracle4engineer
PRO
0
110
Rustから学ぶ 非同期処理の仕組み
skanehira
1
150
Create Ruby native extension gem with Go
sue445
0
120
Generative AI Japan 第一回生成AI実践研究会「AI駆動開発の現在地──ブレイクスルーの鍵を握るのはデータ領域」
shisyu_gaku
0
330
Modern Linux
oracle4engineer
PRO
0
160
slog.Handlerのよくある実装ミス
sakiengineer
4
460
なぜテストマネージャの視点が 必要なのか? 〜 一歩先へ進むために 〜
moritamasami
0
240
Oracle Base Database Service 技術詳細
oracle4engineer
PRO
10
75k
Featured
See All Featured
YesSQL, Process and Tooling at Scale
rocio
173
14k
Stop Working from a Prison Cell
hatefulcrawdad
271
21k
XXLCSS - How to scale CSS and keep your sanity
sugarenia
248
1.3M
Speed Design
sergeychernyshev
32
1.1k
Java REST API Framework Comparison - PWX 2021
mraible
33
8.8k
How To Stay Up To Date on Web Technology
chriscoyier
790
250k
Art, The Web, and Tiny UX
lynnandtonic
303
21k
The Straight Up "How To Draw Better" Workshop
denniskardys
236
140k
Easily Structure & Communicate Ideas using Wireframe
afnizarnur
194
16k
The Power of CSS Pseudo Elements
geoffreycrofte
77
6k
[RailsConf 2023 Opening Keynote] The Magic of Rails
eileencodes
30
9.7k
How to Think Like a Performance Engineer
csswizardry
26
1.9k
Transcript
จݙհʢʣ MoverScore: Text Generation Evaluating with Contextualized Embeddings and Earth
Mover Distance Wei Zhao† , Maxime Peyrard† , Fei Liu‡ , Yang Gao† , Christian M. Meyer† , Steffen Eger† EMNLP2019 Ԭٕज़Պֶେֶ ࣗવݴޠॲཧݚڀࣨɹ ૬ాɹଠҰ
Abstract • ੜͷλεΫʹ͓͍ͯɺؤڧͳධՁईΛௐࠪ • จ຺Λߟྀͨ͠୯ޠࢄදݱ ͱ Word Mover’s Distance ͷΈ߹Θ͕ͤ࠷ྑ͔ͬͨ
• ιʔείʔυΛެ։ɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹ https://github.com/AIPHES/emnlp19-moverscore 2
Related work • ৭ʑͳධՁख๏ʢ1ʣ • ཁɿROUGE(Lin 2004) • ػց༁ɿBLEU(Papinemi 2002),
RUSE(Shimanaka 2018) • Image CaptioningɿBLEU, CIDEr(Vedantam 2015), SPICE(Anderson 2016) 3 #-&6͔ͳ͍
Related work • ৭ʑͳධՁख๏ʢ2ʣ • ҙຯతྨࣅɿ “BERTScore”(Zhang 2019) • ༁ɿڭࢣ͋Γɾڭࢣͳ͠
BERT ࢄදݱ(Mathur 2019) • ཁɺΤοηΠ࠾ɿELMo + Sentence Mover’s Simirality(Clark 2019) 4 จ຺Λߟྀͨ͠ࢄදݱ $POUFYUVBMJ[FESFQSFTFOUBUJPO Λ༻͍Δख๏͕૿͖͑ͯͨ ࣮ݧͷ#BTFMJOFʹग़͖ͯ·͢
Method • ༷ʑͳੜλεΫΛධՁͰ͖Δࢦඪ(MoverScore)Λௐࠪ • ੜจͱࢀরจͷྨࣅʢʁʣΛଌΔ • จ຺Λߟྀͨ͠ࢄදݱɿBERT, ELMo • ग़ྗจͱࢀরจͷҙຯతڑɿWord
Mover's Distance 5
Method • MoverScore Variations • Granularityɿn-gram (n=1, 2, size-of-sentence) •
Embeddingɿword2vec, BERT, ELMo • Fine-tuningɿMultiNLI, QANLI, QQP • Aggregationɿpower means, routing mechanism 6 /-* 1BSBQISBTJOH #&35 &-.P #&35
Method • MoverScore Variations • Granularityɿn-gram (n=1, 2, size-of-sentence) •
Embeddingɿword2vec, BERT, ELMo • Fine-tuningɿMultiNLI, QANLI, QQP • Aggregationɿpower means, routing mechanism 7 #&35 &-.P
Method • Aggregation ʢ౷߹ํ๏ʣ • จ຺Λߟྀͨ͠ࢄදݱɿBERT, ELMo • ֤୯ޠ֤͔ΒͦΕͧΕҟͳΔϕΫτϧ͕͞ΕΔ •
Power MeansɿฏۉΛऔΓ ( )ɺconcat • Routing Mechanismɿৄ͘͠(Zhang 2018) p p = 1, ± ∞ 8
Method • ग़ྗจͱࢀরจͷҙຯతڑ • Word Mover's Distance (WMD) • Sentence
Mover's Distance (SMD) • ઌ΄ͲͷΈ߹ΘͤΛɺWMD, SMD ͦΕͧΕͰݕূ͢Δ 9
Experiment • Tasks • ػց༁ • ཁ • ରʢλεΫࢤʣ •
Image Captioning 10 ʢࢀরจɺෳͷγεςϜʹΑΔग़ྗจʣͷϖΞ γεςϜͷग़ྗจʹਓखධՁ͕͞Ε͍ͯΔ ʲධՁࢦඪɺMoverScore ͰΔ͜ͱʳ ɾγεςϜͷग़ྗจΛධՁ ɾਓखධՁͱͷ૬ؔΛݟΔ
Experiment • ػց༁ • DatasetɿWMT2017 • ࢀՃγεςϜͷग़ྗจʹɺ࠷Ͱ15ਓͷਓखධՁ • BaselinesɿSentBLEU, METEOR++,
RUSE, BERTScore(Zhang 2019) 11
Result • WMD+BERT+MNLI+PMeans ͕ Baseline Λ্ճΔ 12
Result • Sentence Representation Ͱใ͕ࣦΘΕΔʁ 13
Experiment • ཁ • DatasetɿTAC-2008, TAC-2009 • Responsivenessɿ༰ʴจ๏తͳ࣭ • Pyramidɿࢀরจʹؚ·ΕΔॏཁͳ༰͕ͲΕ͚ͩଟ͘Χόʔ͞
Ε͍ͯΔ͔ • BaselinesɿROUGE-1, ROUGE-2, (Peyrard 2017), BERTScore(Zhang 2019) S3 best 14 ڭࢣ͋ΓͷධՁࢦඪ
Result • WMD+BERT+MNLI+PMeans Ͱ Baselines Λ্ճΔ 15
Experiment • ରʢλεΫࢤʣ • DatasetɿBAGEL, SFHOTEL • Informativeness (Inf)ɿఏڙ͢Δใྔ •
Naturalness (Nat)ɿਓͷԠͷۙ͞ • Quality (Qual)ɿྲྀெੑɾจ๏ • BaselinesɿBLEU, METEOR, BERTScore(Zhang 2019) 16
Result • શମతʹ૬͕͍͕ؔɺఏҊख๏ͦͷதͰߴ͍ํ 17
Experiment • Image Captioning • DatasetɿMSCOCO • M1 ~ M5
ͷධՁ͕͋Δ • ࠓճɺશମͷ࣭ʹؔ͢ΔM1, M2 Λ࠾༻ • BaselinesɿCIDEr, SPICE, METEOR, LEIC(Cui 2018), BERTScore(Zhang 2019) 18 ڭࢣ͋ΓͷධՁࢦඪ
Result • Baseline ͷ LEIC ʹྼΔ͕ɺͦΕͰߴ͍૬ؔΛࣔ͢ 19 M: BERT fine-tuning
ʹ MultiNLI Λ༻ P: ELMo / BERT ͷ౷߹ (Aggregation) ʹ Power Means Λ༻
Discussion • ࣮ݧͷ Baseline ͱͯ͠ग़͖ͯͨ BERTScore ͱͷൺֱ 20
Discussion • ࣮ݧͷ Baseline ͱͯ͠ग़͖ͯͨ BERTScore ͱͷൺֱ 21 One-to-one ͷڧ͍
alignment Many-to-one ͷऑ͍ alignment WMD Ͱదͳڑ ͕औΕ͍ͯΔ
Discussion • ػց༁ͰਓखධՁͷߴ͍ͷ(good)ͱ͍ͷ(bad)ͷɹ 2ͭʹ͚ɺΛௐࠪ • ൺֱର • Baseline: SentBLEU •
Proposal: MoverScore(WMD+BERT) 22
Discussion • SentBLEU ਓखධՁ͕ྑͯ͘தఔͷՕॴʹଟ͘ • MoverScore ៉ྷʹ2ͭͷۃΛදݱͰ͖͍ͯΔ 23
Conclusion • ੜλεΫͷڭࢣͳ͠ධՁࢦඪΛఏҊ • 4ͭͷੜλεΫͰ Baselines Λ ͑Δ/ഭΔ ݁Ռʹ •
ιʔείʔυΛެ։ɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹ https://github.com/AIPHES/emnlp19-moverscore 24