Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
文献紹介:MoverScore: Text Generation Evaluating wit...
Search
Sponsored
·
Ship Features Fearlessly
Turn features on and off without deploys. Used by thousands of Ruby developers.
→
Taichi Aida
October 14, 2019
Technology
570
0
Share
文献紹介:MoverScore: Text Generation Evaluating with Contextualized Embeddings and Earth Mover Distance
Taichi Aida
October 14, 2019
More Decks by Taichi Aida
See All by Taichi Aida
意味を表すベクトル表現を用いたテキスト分析
a1da4
0
110
スウェーデン滞在報告
a1da4
0
19
PhD Defence: Considering Temporal and Contextual Information for Lexical Semantic Change Detection
a1da4
1
280
文献紹介:A Multidimensional Framework for Evaluating Lexical Semantic Change with Social Science Applications
a1da4
1
380
YANS2024:目指せ国際会議!「ネットワーキングの極意(国際会議編)」
a1da4
0
300
言語処理学会30周年記念事業留学支援交流会@YANS2024:「学生のための短期留学」
a1da4
1
430
新入生向けチュートリアル:文献のサーベイv2
a1da4
16
11k
文献紹介:Isotropic Representation Can Improve Zero-Shot Cross-Lingual Transfer on Multilingual Language Models
a1da4
0
220
文献紹介:WhitenedCSE: Whitening-based Contrastive Learning of Sentence Embeddings
a1da4
1
370
Other Decks in Technology
See All in Technology
AIエージェント時代に必要な オペレーションマネージャーのロールとは
kentarofujii
0
270
サイボウズ 開発本部採用ピッチ / Cybozu Engineer Recruit
cybozuinsideout
PRO
10
77k
AIエージェント勉強会第3回 エージェンティックAIの時代がやってきた
ymiya55
0
190
契約書からの情報抽出を行うLLMのスループットを、バッチ処理を用いて最大40%改善した話
sansantech
PRO
3
340
第26回FA設備技術勉強会 - Claude/Claude_codeでデータ分析 -
happysamurai294
0
320
トイルを超えたCREは何屋になるのか
bengo4com
0
120
「できない」のアウトプット 同人誌『精神を壊してからの』シリーズ出版を 通して得られたこと
comi190327
3
520
FlutterでPiP再生を実装した話
s9a17
0
240
How to install a gem
indirect
0
2k
開発チームとQAエンジニアの新しい協業モデル -年末調整開発チームで実践する【QAリード施策】-
kaomi_wombat
0
280
Embeddings : Symfony AI en pratique
lyrixx
0
440
GitHub Advanced Security × Defender for Cloudで開発とSecOpsのサイロを超える: コードとクラウドをつなぐ、開発プラットフォームのセキュリティ
yuriemori
1
120
Featured
See All Featured
Code Review Best Practice
trishagee
74
20k
What's in a price? How to price your products and services
michaelherold
247
13k
<Decoding/> the Language of Devs - We Love SEO 2024
nikkihalliwell
1
170
Raft: Consensus for Rubyists
vanstee
141
7.4k
GitHub's CSS Performance
jonrohan
1032
470k
JAMstack: Web Apps at Ludicrous Speed - All Things Open 2022
reverentgeek
1
400
The Straight Up "How To Draw Better" Workshop
denniskardys
239
140k
Facilitating Awesome Meetings
lara
57
6.8k
How to build a perfect <img>
jonoalderson
1
5.3k
XXLCSS - How to scale CSS and keep your sanity
sugarenia
249
1.3M
So, you think you're a good person
axbom
PRO
2
2k
Tell your own story through comics
letsgokoyo
1
880
Transcript
จݙհʢʣ MoverScore: Text Generation Evaluating with Contextualized Embeddings and Earth
Mover Distance Wei Zhao† , Maxime Peyrard† , Fei Liu‡ , Yang Gao† , Christian M. Meyer† , Steffen Eger† EMNLP2019 Ԭٕज़Պֶେֶ ࣗવݴޠॲཧݚڀࣨɹ ૬ాɹଠҰ
Abstract • ੜͷλεΫʹ͓͍ͯɺؤڧͳධՁईΛௐࠪ • จ຺Λߟྀͨ͠୯ޠࢄදݱ ͱ Word Mover’s Distance ͷΈ߹Θ͕ͤ࠷ྑ͔ͬͨ
• ιʔείʔυΛެ։ɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹ https://github.com/AIPHES/emnlp19-moverscore 2
Related work • ৭ʑͳධՁख๏ʢ1ʣ • ཁɿROUGE(Lin 2004) • ػց༁ɿBLEU(Papinemi 2002),
RUSE(Shimanaka 2018) • Image CaptioningɿBLEU, CIDEr(Vedantam 2015), SPICE(Anderson 2016) 3 #-&6͔ͳ͍
Related work • ৭ʑͳධՁख๏ʢ2ʣ • ҙຯతྨࣅɿ “BERTScore”(Zhang 2019) • ༁ɿڭࢣ͋Γɾڭࢣͳ͠
BERT ࢄදݱ(Mathur 2019) • ཁɺΤοηΠ࠾ɿELMo + Sentence Mover’s Simirality(Clark 2019) 4 จ຺Λߟྀͨ͠ࢄදݱ $POUFYUVBMJ[FESFQSFTFOUBUJPO Λ༻͍Δख๏͕૿͖͑ͯͨ ࣮ݧͷ#BTFMJOFʹग़͖ͯ·͢
Method • ༷ʑͳੜλεΫΛධՁͰ͖Δࢦඪ(MoverScore)Λௐࠪ • ੜจͱࢀরจͷྨࣅʢʁʣΛଌΔ • จ຺Λߟྀͨ͠ࢄදݱɿBERT, ELMo • ग़ྗจͱࢀরจͷҙຯతڑɿWord
Mover's Distance 5
Method • MoverScore Variations • Granularityɿn-gram (n=1, 2, size-of-sentence) •
Embeddingɿword2vec, BERT, ELMo • Fine-tuningɿMultiNLI, QANLI, QQP • Aggregationɿpower means, routing mechanism 6 /-* 1BSBQISBTJOH #&35 &-.P #&35
Method • MoverScore Variations • Granularityɿn-gram (n=1, 2, size-of-sentence) •
Embeddingɿword2vec, BERT, ELMo • Fine-tuningɿMultiNLI, QANLI, QQP • Aggregationɿpower means, routing mechanism 7 #&35 &-.P
Method • Aggregation ʢ౷߹ํ๏ʣ • จ຺Λߟྀͨ͠ࢄදݱɿBERT, ELMo • ֤୯ޠ֤͔ΒͦΕͧΕҟͳΔϕΫτϧ͕͞ΕΔ •
Power MeansɿฏۉΛऔΓ ( )ɺconcat • Routing Mechanismɿৄ͘͠(Zhang 2018) p p = 1, ± ∞ 8
Method • ग़ྗจͱࢀরจͷҙຯతڑ • Word Mover's Distance (WMD) • Sentence
Mover's Distance (SMD) • ઌ΄ͲͷΈ߹ΘͤΛɺWMD, SMD ͦΕͧΕͰݕূ͢Δ 9
Experiment • Tasks • ػց༁ • ཁ • ରʢλεΫࢤʣ •
Image Captioning 10 ʢࢀরจɺෳͷγεςϜʹΑΔग़ྗจʣͷϖΞ γεςϜͷग़ྗจʹਓखධՁ͕͞Ε͍ͯΔ ʲධՁࢦඪɺMoverScore ͰΔ͜ͱʳ ɾγεςϜͷग़ྗจΛධՁ ɾਓखධՁͱͷ૬ؔΛݟΔ
Experiment • ػց༁ • DatasetɿWMT2017 • ࢀՃγεςϜͷग़ྗจʹɺ࠷Ͱ15ਓͷਓखධՁ • BaselinesɿSentBLEU, METEOR++,
RUSE, BERTScore(Zhang 2019) 11
Result • WMD+BERT+MNLI+PMeans ͕ Baseline Λ্ճΔ 12
Result • Sentence Representation Ͱใ͕ࣦΘΕΔʁ 13
Experiment • ཁ • DatasetɿTAC-2008, TAC-2009 • Responsivenessɿ༰ʴจ๏తͳ࣭ • Pyramidɿࢀরจʹؚ·ΕΔॏཁͳ༰͕ͲΕ͚ͩଟ͘Χόʔ͞
Ε͍ͯΔ͔ • BaselinesɿROUGE-1, ROUGE-2, (Peyrard 2017), BERTScore(Zhang 2019) S3 best 14 ڭࢣ͋ΓͷධՁࢦඪ
Result • WMD+BERT+MNLI+PMeans Ͱ Baselines Λ্ճΔ 15
Experiment • ରʢλεΫࢤʣ • DatasetɿBAGEL, SFHOTEL • Informativeness (Inf)ɿఏڙ͢Δใྔ •
Naturalness (Nat)ɿਓͷԠͷۙ͞ • Quality (Qual)ɿྲྀெੑɾจ๏ • BaselinesɿBLEU, METEOR, BERTScore(Zhang 2019) 16
Result • શମతʹ૬͕͍͕ؔɺఏҊख๏ͦͷதͰߴ͍ํ 17
Experiment • Image Captioning • DatasetɿMSCOCO • M1 ~ M5
ͷධՁ͕͋Δ • ࠓճɺશମͷ࣭ʹؔ͢ΔM1, M2 Λ࠾༻ • BaselinesɿCIDEr, SPICE, METEOR, LEIC(Cui 2018), BERTScore(Zhang 2019) 18 ڭࢣ͋ΓͷධՁࢦඪ
Result • Baseline ͷ LEIC ʹྼΔ͕ɺͦΕͰߴ͍૬ؔΛࣔ͢ 19 M: BERT fine-tuning
ʹ MultiNLI Λ༻ P: ELMo / BERT ͷ౷߹ (Aggregation) ʹ Power Means Λ༻
Discussion • ࣮ݧͷ Baseline ͱͯ͠ग़͖ͯͨ BERTScore ͱͷൺֱ 20
Discussion • ࣮ݧͷ Baseline ͱͯ͠ग़͖ͯͨ BERTScore ͱͷൺֱ 21 One-to-one ͷڧ͍
alignment Many-to-one ͷऑ͍ alignment WMD Ͱదͳڑ ͕औΕ͍ͯΔ
Discussion • ػց༁ͰਓखධՁͷߴ͍ͷ(good)ͱ͍ͷ(bad)ͷɹ 2ͭʹ͚ɺΛௐࠪ • ൺֱର • Baseline: SentBLEU •
Proposal: MoverScore(WMD+BERT) 22
Discussion • SentBLEU ਓखධՁ͕ྑͯ͘தఔͷՕॴʹଟ͘ • MoverScore ៉ྷʹ2ͭͷۃΛදݱͰ͖͍ͯΔ 23
Conclusion • ੜλεΫͷڭࢣͳ͠ධՁࢦඪΛఏҊ • 4ͭͷੜλεΫͰ Baselines Λ ͑Δ/ഭΔ ݁Ռʹ •
ιʔείʔυΛެ։ɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹ https://github.com/AIPHES/emnlp19-moverscore 24