Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
文献紹介:MoverScore: Text Generation Evaluating wit...
Search
Taichi Aida
October 14, 2019
Technology
0
560
文献紹介:MoverScore: Text Generation Evaluating with Contextualized Embeddings and Earth Mover Distance
Taichi Aida
October 14, 2019
Tweet
Share
More Decks by Taichi Aida
See All by Taichi Aida
意味を表すベクトル表現を用いたテキスト分析
a1da4
0
110
スウェーデン滞在報告
a1da4
0
18
PhD Defence: Considering Temporal and Contextual Information for Lexical Semantic Change Detection
a1da4
1
270
文献紹介:A Multidimensional Framework for Evaluating Lexical Semantic Change with Social Science Applications
a1da4
1
380
YANS2024:目指せ国際会議!「ネットワーキングの極意(国際会議編)」
a1da4
0
300
言語処理学会30周年記念事業留学支援交流会@YANS2024:「学生のための短期留学」
a1da4
1
430
新入生向けチュートリアル:文献のサーベイv2
a1da4
16
11k
文献紹介:Isotropic Representation Can Improve Zero-Shot Cross-Lingual Transfer on Multilingual Language Models
a1da4
0
220
文献紹介:WhitenedCSE: Whitening-based Contrastive Learning of Sentence Embeddings
a1da4
1
370
Other Decks in Technology
See All in Technology
OCHaCafe S11 #2 コンテナ時代の次の一手:Wasm 最前線
oracle4engineer
PRO
2
130
[E2]CCoEはAI指揮官へ。Bedrock×MCPで構築するコスト・セキュリティ自律運用基盤
taku1418
0
170
Claude Code 2026年 最新アップデート
oikon48
13
10k
JAWSDAYS2026_A-6_現場SEが語る 回せるセキュリティ運用~設計で可視化、AIで加速する「楽に回る」運用設計のコツ~
shoki_hata
0
3k
Everything Claude Code を眺める
oikon48
8
4.7k
Keycloak を使った SSO で CockroachDB にログインする / CockroachDB SSO with Keycloak
kota2and3kan
0
120
フロントエンド刷新 4年間の軌跡
yotahada3
0
430
最強のAIエージェントを諦めたら品質が上がった話 / how quality improved after giving up on the strongest AI agent
kt2mikan
0
190
SRE NEXT 2026 CfP レビュアーが語る聞きたくなるプロポーザルとは?
yutakawasaki0911
1
350
Tebiki Engineering Team Deck
tebiki
0
27k
20260311 技術SWG活動報告(デジタルアイデンティティ人材育成推進WG Ph2 活動報告会)
oidfj
0
350
進化するBits AI SREと私と組織
nulabinc
PRO
0
190
Featured
See All Featured
The Pragmatic Product Professional
lauravandoore
37
7.2k
Effective software design: The role of men in debugging patriarchy in IT @ Voxxed Days AMS
baasie
0
260
10 Git Anti Patterns You Should be Aware of
lemiorhan
PRO
659
61k
Mozcon NYC 2025: Stop Losing SEO Traffic
samtorres
0
180
How to audit for AI Accessibility on your Front & Back End
davetheseo
0
210
Abbi's Birthday
coloredviolet
2
5.4k
How Fast Is Fast Enough? [PerfNow 2025]
tammyeverts
3
480
Scaling GitHub
holman
464
140k
SEO Brein meetup: CTRL+C is not how to scale international SEO
lindahogenes
1
2.4k
The Curse of the Amulet
leimatthew05
1
10k
No one is an island. Learnings from fostering a developers community.
thoeni
21
3.6k
Building the Perfect Custom Keyboard
takai
2
710
Transcript
จݙհʢʣ MoverScore: Text Generation Evaluating with Contextualized Embeddings and Earth
Mover Distance Wei Zhao† , Maxime Peyrard† , Fei Liu‡ , Yang Gao† , Christian M. Meyer† , Steffen Eger† EMNLP2019 Ԭٕज़Պֶେֶ ࣗવݴޠॲཧݚڀࣨɹ ૬ాɹଠҰ
Abstract • ੜͷλεΫʹ͓͍ͯɺؤڧͳධՁईΛௐࠪ • จ຺Λߟྀͨ͠୯ޠࢄදݱ ͱ Word Mover’s Distance ͷΈ߹Θ͕ͤ࠷ྑ͔ͬͨ
• ιʔείʔυΛެ։ɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹ https://github.com/AIPHES/emnlp19-moverscore 2
Related work • ৭ʑͳධՁख๏ʢ1ʣ • ཁɿROUGE(Lin 2004) • ػց༁ɿBLEU(Papinemi 2002),
RUSE(Shimanaka 2018) • Image CaptioningɿBLEU, CIDEr(Vedantam 2015), SPICE(Anderson 2016) 3 #-&6͔ͳ͍
Related work • ৭ʑͳධՁख๏ʢ2ʣ • ҙຯతྨࣅɿ “BERTScore”(Zhang 2019) • ༁ɿڭࢣ͋Γɾڭࢣͳ͠
BERT ࢄදݱ(Mathur 2019) • ཁɺΤοηΠ࠾ɿELMo + Sentence Mover’s Simirality(Clark 2019) 4 จ຺Λߟྀͨ͠ࢄදݱ $POUFYUVBMJ[FESFQSFTFOUBUJPO Λ༻͍Δख๏͕૿͖͑ͯͨ ࣮ݧͷ#BTFMJOFʹग़͖ͯ·͢
Method • ༷ʑͳੜλεΫΛධՁͰ͖Δࢦඪ(MoverScore)Λௐࠪ • ੜจͱࢀরจͷྨࣅʢʁʣΛଌΔ • จ຺Λߟྀͨ͠ࢄදݱɿBERT, ELMo • ग़ྗจͱࢀরจͷҙຯతڑɿWord
Mover's Distance 5
Method • MoverScore Variations • Granularityɿn-gram (n=1, 2, size-of-sentence) •
Embeddingɿword2vec, BERT, ELMo • Fine-tuningɿMultiNLI, QANLI, QQP • Aggregationɿpower means, routing mechanism 6 /-* 1BSBQISBTJOH #&35 &-.P #&35
Method • MoverScore Variations • Granularityɿn-gram (n=1, 2, size-of-sentence) •
Embeddingɿword2vec, BERT, ELMo • Fine-tuningɿMultiNLI, QANLI, QQP • Aggregationɿpower means, routing mechanism 7 #&35 &-.P
Method • Aggregation ʢ౷߹ํ๏ʣ • จ຺Λߟྀͨ͠ࢄදݱɿBERT, ELMo • ֤୯ޠ֤͔ΒͦΕͧΕҟͳΔϕΫτϧ͕͞ΕΔ •
Power MeansɿฏۉΛऔΓ ( )ɺconcat • Routing Mechanismɿৄ͘͠(Zhang 2018) p p = 1, ± ∞ 8
Method • ग़ྗจͱࢀরจͷҙຯతڑ • Word Mover's Distance (WMD) • Sentence
Mover's Distance (SMD) • ઌ΄ͲͷΈ߹ΘͤΛɺWMD, SMD ͦΕͧΕͰݕূ͢Δ 9
Experiment • Tasks • ػց༁ • ཁ • ରʢλεΫࢤʣ •
Image Captioning 10 ʢࢀরจɺෳͷγεςϜʹΑΔग़ྗจʣͷϖΞ γεςϜͷग़ྗจʹਓखධՁ͕͞Ε͍ͯΔ ʲධՁࢦඪɺMoverScore ͰΔ͜ͱʳ ɾγεςϜͷग़ྗจΛධՁ ɾਓखධՁͱͷ૬ؔΛݟΔ
Experiment • ػց༁ • DatasetɿWMT2017 • ࢀՃγεςϜͷग़ྗจʹɺ࠷Ͱ15ਓͷਓखධՁ • BaselinesɿSentBLEU, METEOR++,
RUSE, BERTScore(Zhang 2019) 11
Result • WMD+BERT+MNLI+PMeans ͕ Baseline Λ্ճΔ 12
Result • Sentence Representation Ͱใ͕ࣦΘΕΔʁ 13
Experiment • ཁ • DatasetɿTAC-2008, TAC-2009 • Responsivenessɿ༰ʴจ๏తͳ࣭ • Pyramidɿࢀরจʹؚ·ΕΔॏཁͳ༰͕ͲΕ͚ͩଟ͘Χόʔ͞
Ε͍ͯΔ͔ • BaselinesɿROUGE-1, ROUGE-2, (Peyrard 2017), BERTScore(Zhang 2019) S3 best 14 ڭࢣ͋ΓͷධՁࢦඪ
Result • WMD+BERT+MNLI+PMeans Ͱ Baselines Λ্ճΔ 15
Experiment • ରʢλεΫࢤʣ • DatasetɿBAGEL, SFHOTEL • Informativeness (Inf)ɿఏڙ͢Δใྔ •
Naturalness (Nat)ɿਓͷԠͷۙ͞ • Quality (Qual)ɿྲྀெੑɾจ๏ • BaselinesɿBLEU, METEOR, BERTScore(Zhang 2019) 16
Result • શମతʹ૬͕͍͕ؔɺఏҊख๏ͦͷதͰߴ͍ํ 17
Experiment • Image Captioning • DatasetɿMSCOCO • M1 ~ M5
ͷධՁ͕͋Δ • ࠓճɺશମͷ࣭ʹؔ͢ΔM1, M2 Λ࠾༻ • BaselinesɿCIDEr, SPICE, METEOR, LEIC(Cui 2018), BERTScore(Zhang 2019) 18 ڭࢣ͋ΓͷධՁࢦඪ
Result • Baseline ͷ LEIC ʹྼΔ͕ɺͦΕͰߴ͍૬ؔΛࣔ͢ 19 M: BERT fine-tuning
ʹ MultiNLI Λ༻ P: ELMo / BERT ͷ౷߹ (Aggregation) ʹ Power Means Λ༻
Discussion • ࣮ݧͷ Baseline ͱͯ͠ग़͖ͯͨ BERTScore ͱͷൺֱ 20
Discussion • ࣮ݧͷ Baseline ͱͯ͠ग़͖ͯͨ BERTScore ͱͷൺֱ 21 One-to-one ͷڧ͍
alignment Many-to-one ͷऑ͍ alignment WMD Ͱదͳڑ ͕औΕ͍ͯΔ
Discussion • ػց༁ͰਓखධՁͷߴ͍ͷ(good)ͱ͍ͷ(bad)ͷɹ 2ͭʹ͚ɺΛௐࠪ • ൺֱର • Baseline: SentBLEU •
Proposal: MoverScore(WMD+BERT) 22
Discussion • SentBLEU ਓखධՁ͕ྑͯ͘தఔͷՕॴʹଟ͘ • MoverScore ៉ྷʹ2ͭͷۃΛදݱͰ͖͍ͯΔ 23
Conclusion • ੜλεΫͷڭࢣͳ͠ධՁࢦඪΛఏҊ • 4ͭͷੜλεΫͰ Baselines Λ ͑Δ/ഭΔ ݁Ռʹ •
ιʔείʔυΛެ։ɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹ https://github.com/AIPHES/emnlp19-moverscore 24