Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
20240820: Minimum Bayes Risk Decoding for High-...
Search
Sponsored
·
Your Podcast. Everywhere. Effortlessly.
Share. Educate. Inspire. Entertain. You do you. We'll handle the rest.
→
Hiroyuki Deguchi
August 20, 2024
Research
350
0
Share
Embed
Copy iframe code
Copy JS code
Copy link
Start on current slide
20240820: Minimum Bayes Risk Decoding for High-Quality Text Generation Beyond High-Probability Text
Hiroyuki Deguchi
August 20, 2024
More Decks by Hiroyuki Deguchi
See All by Hiroyuki Deguchi
20250226 NLP colloquium: "SoftMatcha: 10億単語規模コーパス検索のための柔らかくも高速なパターンマッチャー"
de9uch1
1
770
サブセット探索を用いた高速なkNNニューラル機械翻訳
de9uch1
0
170
20240226_AAMT-Japio
de9uch1
0
190
Searching for Needles in a Haystack: On the Role of Incidental Bilingualism in PaLM’s Translation Capability
de9uch1
0
160
Paper Reading: Sampling-Based Approximations to Minimum Bayes Risk Decoding for Neural Machine Translation
de9uch1
0
220
My Research Environmental Setup
de9uch1
0
340
Nearest Neighbor Machine Translation
de9uch1
0
280
Paper Reading - Dynamic Programming Encoding for Subword Segmentation in Neural Machine Translation
de9uch1
0
310
paper reading - Tree Transformer
de9uch1
0
280
Other Decks in Research
See All in Research
Ghost in the 7‑Zip: The Shadow of Residential Proxies Creeping into Your Life
nttcom
0
950
老舗ものづくり企業でリサーチが変革を起こすまで - 三菱重工DXの実践
skydats
0
180
The Landscape of Agentic Reinforcement Learning for LLMs: A Survey
shunk031
4
1k
「なんとなく」の顧客理解から脱却する ──顧客の解像度を武器にするインサイトマネジメント
tajima_kaho
10
7.6k
NII S. Koyama's Lab Research Overview AY2026
skoyamalab
0
280
姫路市 -都市OSの「再実装」-
hopin
0
1.7k
「車1割削減、渋滞半減、公共交通2倍」を 熊本から岡山へ@RACDA設立30周年記念都市交通フォーラム2026
trafficbrain
1
1.1k
2026年3月1日(日)福島「除染土」の公共利用をかんがえる
atsukomasano2026
0
620
世界モデルにおける分布外データ対応の方法論
koukyo1994
7
2.2k
多様なデータを許容し学習し続ける模倣学習 / Advanced Imitation Learning for VLA
prinlab
0
210
計算情報学研究室(数理情報学第7研究室)2026
tomohirokoana
0
520
Can We Teach Logical Reasoning to LLMs? – An Approach Using Synthetic Corpora (AAAI 2026 bridge keynote)
morishtr
1
250
Featured
See All Featured
Building Flexible Design Systems
yeseniaperezcruz
330
40k
Put a Button on it: Removing Barriers to Going Fast.
kastner
60
4.3k
Fashionably flexible responsive web design (full day workshop)
malarkey
408
66k
A brief & incomplete history of UX Design for the World Wide Web: 1989–2019
jct
2
390
KATA
mclloyd
PRO
35
15k
Testing 201, or: Great Expectations
jmmastey
46
8.2k
Reflections from 52 weeks, 52 projects
jeffersonlam
356
21k
The Curse of the Amulet
leimatthew05
1
13k
Being A Developer After 40
akosma
91
590k
Keith and Marios Guide to Fast Websites
keithpitt
413
23k
The Anti-SEO Checklist Checklist. Pubcon Cyber Week
ryanjones
0
150
[Rails World 2023 - Day 1 Closing Keynote] - The Magic of Rails
eileencodes
38
2.9k
Transcript
None
◼ ⚫ ⚫ ⚫ ◼ ◼ ⚫ ⚫ ▶ ▶
https://en.wikipedia.org/wiki/Transfer-based_machine_translation
None
◼ ⚫ ◼ ⚫ ⚫
◼ ⚫ 𝒚⋆ ∈ 𝒴 ▶ 𝒴 ≔ 𝒱𝑌 ∗
⚫ 𝒙 ∈ 𝒳 ▶ ◼ ◼ 𝒳 ≔ 𝒱𝑋 ∗ ◼ 𝒴 ≔ 𝒱𝑌 ∗ ⚫ 𝒱𝑋 ∗, 𝒱𝑌 ∗ ⚫ 𝒚1 ⚫ 𝒚2 ⚫ 𝒚3 ⚫ 𝒚4 ⚫
◼ ⚫ 𝒚⋆ ∈ 𝒴 ▶ 𝒴 ≔ 𝒱𝑌 ∗
⚫ 𝒙 ∈ 𝒳 ▶ ◼ ◼ 𝒳 ≔ 𝒱𝑋 ∗ ◼ 𝒴 ≔ 𝒱𝑌 ∗ ⚫ 𝒱𝑋 ∗, 𝒱𝑌 ∗
https://repositorio.ul.pt/bitstream/10451/10945/2/ulfl155512_tm_2.pdf
◼ 𝑝 𝒚|𝒙; 𝜃 ⚫ 𝒚 𝒙 𝒚 ◼ 𝒙
∈ 𝒱𝑋 ∗ ◼ 𝒚 ∈ 𝒱𝑌 ∗ ⚫ 𝒱𝑋 ∗, 𝒱𝑌 ∗ ◼ 𝜃 ⚫ 𝑝 This book is interesting ; 𝜃) = 0.8434 𝑝 This book is delicious ; 𝜃) = 0.0013
◼ 𝑝 𝒚|𝒙; 𝜃 ⚫ 𝒚 𝒙 𝒚 ◼ ⚫
⚫ 𝑝 𝒚|𝒙; 𝜃 = 𝑝 𝑦1 |𝒙; 𝜃 𝑝 𝑦2 |𝑦1 , 𝒙; 𝜃 𝑝 𝑦3 |𝑦2 , 𝑦1 , 𝒙; 𝜃 … ◼ 𝒙 ∈ 𝒱𝑋 ∗ ◼ 𝒚 ∈ 𝒱𝑌 ∗ ⚫ 𝒱𝑋 ∗, 𝒱𝑌 ∗ ◼ 𝜃 ⚫ 𝑝 interesting This book is, ; 𝜃) = 0.2875 𝑝 delicious This book is, ; 𝜃) = 0.0003
◼ 𝑝 𝒚|𝒙; 𝜃 ⚫ 𝒚 𝒙 𝒚 ◼ 𝒚MAP𝜃
∈ 𝒴 𝒚MAP𝜃 = argmax 𝒚∈𝒴 𝑝 𝒚|𝒙; 𝜃 ς 𝑡=1 𝒚 𝑝 𝑦𝑡|𝒚<𝑡,𝒙;𝜃 ⚫ ⚫ ◼ 𝒳 ≔ 𝒱𝑋 ∗ ◼ 𝒴 ≔ 𝒱𝑌 ∗ ⚫ 𝒚1 ⚫ 𝒚2 ⚫ 𝒚3 ⚫ 𝒚4 ⚫ ◼ 𝜃 ⚫
◼ 𝑝 𝒚|𝒙; 𝜃 ⚫ 𝒚 𝒙 𝒚 ◼ 𝒚MAP𝜃
∈ 𝒴 𝒚MAP𝜃 = argmax 𝒚∈𝒴 𝑝 𝒚|𝒙; 𝜃 ς 𝑡=1 𝒚 𝑝 𝑦𝑡|𝒚<𝑡,𝒙;𝜃 ⚫ ⚫ ◼ 𝒳 ≔ 𝒱𝑋 ∗ ◼ 𝒴 ≔ 𝒱𝑌 ∗ ⚫ 𝒚1 ⚫ 𝒚2 ⚫ 𝒚3 ⚫ 𝒚4 ⚫ ◼ 𝜃 ⚫ ◼ ⚫
◼ 𝑝 𝒚|𝒙; 𝜃 ⚫ 𝒚 𝒙 𝒚 ◼ 𝒚MAP𝜃
∈ 𝒴 𝒚MAP𝜃 = argmax 𝒚∈𝒴 𝑝 𝒚|𝒙; 𝜃 ς 𝑡=1 𝒚 𝑝 𝑦𝑡|𝒚<𝑡,𝒙;𝜃 ⚫ ⚫ ◼ 𝒳 ≔ 𝒱𝑋 ∗ ◼ 𝒴 ≔ 𝒱𝑌 ∗ ⚫ 𝒚1 ⚫ 𝒚2 ⚫ 𝒚3 ⚫ 𝒚4 ⚫ ◼ 𝜃 ⚫ ◼ ⚫
◼ ⚫ ▶ ◼ ⚫ ⚫ 𝑝 ""|𝒙; 𝜃 ⚫
; 1 2 3 4 5 𝑦5 (Ott+, ICML2018; Stahlberg & Byrne, EMNLP2019) Ott+, ICML2018, “Analyzing Uncertainty in Neural Machine Translation”. Stahlberg & Byrne, EMNLP2019, “On NMT Search Errors and Model Errors: Cat Got Your Tongue?”
◼ Risk 𝒚 = 𝔼𝒚′~ Pr ⋅|𝒙 ℒ 𝒚, 𝒚′
⚫ ◼ ⚫ argmin 𝒚∈𝒴 Risk 𝒚 Goel & Byrne, CS&L Vol14., 2000, “Minimum Bayes-risk automatic speech recognition”. Kumar & Byrne, NAACL2004, “Minimum Bayes-Risk Decoding for Statistical Machine Translation”. ◼ ℒ: 𝒴 × 𝒴 → ℝ ◼ Pr ⋅ |𝒙
◼ Risk 𝒚 = 𝔼𝒚′~ Pr ⋅|𝒙 ℒ 𝒚, 𝒚′
⚫ ◼ ⚫ argmin 𝒚∈𝒴 Risk 𝒚 ◼ Goel & Byrne, CS&L Vol14., 2000, “Minimum Bayes-risk automatic speech recognition”. Kumar & Byrne, NAACL2004, “Minimum Bayes-Risk Decoding for Statistical Machine Translation”. ◼ ℒ: 𝒴 × 𝒴 → ℝ ◼ Pr ⋅ |𝒙
◼ (von Neumann & Morgenstern, 1944) ⚫ von Neumann &
Morgenstern, 1944, “Theory of Games and Economic Behavior”. ⚫ ⚫ ▶ $1500 ∗ 0.75 + $3000 ∗ 0.25 = $1875 ⚫ ▶ $1500 ∗ 0.25 + $3000 ∗ 0.75 = $2625
◼ ⚫ ⚫ 𝑢: 𝒴 × 𝒴 → ℝ 𝒚
≽ 𝒚′ ⇔ 𝑢 𝒚, 𝒓 ≥ 𝑢 𝒚′, 𝒓 ⚫ ◼ 𝑢: 𝒴 × 𝒴 → ℝ ◼ ≽ 𝒚 𝒚′ ◼ 𝒓 ∈ 𝒴
◼ ⚫ ⚫ 𝑢: 𝒴 × 𝒴 → ℝ 𝒚
≽ 𝒚′ ⇔ 𝑢 𝒚, 𝒓 ≥ 𝑢 𝒚′, 𝒓 ⚫ ◼ 𝑢: 𝒴 × 𝒴 → ℝ ◼ ≽ 𝒚 𝒚′ ◼ 𝒓 ∈ 𝒴
◼ 𝒚MBRtrue = argmax 𝒚∈𝒴 𝔼𝒓~ Pr ⋅|𝒙 𝑢 𝒚,
𝒓 ⚫ ◼ argmin 𝒚∈𝒴 Risk 𝒚 = argmin 𝒚∈𝒴 𝔼𝒚′~ Pr ⋅|𝒙 ℒ 𝒚, 𝒚′ ⚫ ◼ 𝑢: 𝒴 × 𝒴 → ℝ ◼ Pr ⋅ |𝒙 ◼ ⚫
◼ 𝒚MBRtrue = argmax 𝒚∈𝒴 𝔼𝒓~ Pr ⋅|𝒙 𝑢 𝒚,
𝒓 ◼ ⚫ ▶ ⚫ ▶ ▶ Pr ⋅ |𝒙 ⚫ ▶
◼ 𝒚MBRtrue = argmax 𝒚∈𝒴 𝔼𝒓~ Pr ⋅|𝒙 𝑢 𝒚,
𝒓 ◼ ⚫ ▶ ℋ ⊆ 𝒴 ⚫ ▶ ▶ Pr ⋅ |𝒙 ⚫ ▶
◼ 𝒚MBRtrue = argmax 𝒚∈𝒴 𝔼𝒓~ Pr ⋅|𝒙 𝑢 𝒚,
𝒓 ◼ ⚫ ▶ ℋ ⊆ 𝒴 ⚫ ▶ ▶ Pr ⋅ |𝒙 ⚫ ▶
(Eikema & Aziz, COLING2020) ◼ ℛ ≔ 𝒓𝑖 ∈
𝒴 𝒓𝑖 ~𝑝 𝒓|𝒙; 𝜃 𝑖=1 ℛ ◼ 𝑝MC 𝒓|𝒙; ℛ ≔ 𝑚 ℛ 𝒓 ℛ 𝜇MC 𝒉; ℛ ≔ 𝒓∈Supp ℛ 𝑝MC 𝒓|𝒙; ℛ 𝑢 𝒉, 𝒓 𝑦MBR𝜃 MC = argmax 𝒉∈ℋ 𝜇MC 𝒉; ℛ ◼ ℋ ⊆ 𝒴 ◼ ℛ ◼ Supp ℛ ⊆ 𝒴 ℛ ◼ 𝑚 ℛ : 𝒴 → ℤ+ Eikema & Aziz, COLING2020, “Is MAP Decoding All You Need? The Inadequacy of the Mode in Neural Machine Translation”.
(Eikema & Aziz, COLING2020) ◼ ℛ ≔ 𝒓𝑖 ∈
𝒴 𝒓𝑖 ~𝑝 𝒓|𝒙; 𝜃 𝑖=1 ℛ ◼ 𝑝MC 𝒓|𝒙; ℛ ≔ 𝑚 ℛ 𝒓 ℛ 𝜇MC 𝒉; ℛ ≔ 𝒓∈Supp ℛ 𝑝MC 𝒓|𝒙; ℛ 𝑢 𝒉, 𝒓 𝑦MBR𝜃 MC = argmax 𝒉∈ℋ 𝜇MC 𝒉; ℛ Eikema & Aziz, COLING2020, “Is MAP Decoding All You Need? The Inadequacy of the Mode in Neural Machine Translation”.
(Eikema & Aziz, COLING2020) Eikema & Aziz, COLING2020, “Is MAP
Decoding All You Need? The Inadequacy of the Mode in Neural Machine Translation”. ◼ 𝑦MBR𝜃 MC = argmax 𝒉∈ℋ 𝜇MC 𝒉; ℛ ◼
◼ ⚫ ◼ 𝜃
◼ ⚫ ▶ ℋ = ℛ ⚫ 𝜖 =
0.02 ◼ ⚫ ⚫ ◼ ⚫ ◼
◼ ◼ ◼ ◼ ◼ ◼ ◼ ⚫ ⚫
◼
◼ ⚫ ⚫ ▶ ◼ ⚫ Deguchi+, arxiv, 2408.04167, “mbrs:
A Library for Minimum Bayes Risk Decoding”.
None
◼ ◼
◼ 𝒪 ℋ ℛ ⚫ 𝒪 𝑁2 𝑁 ≔
ℋ ⚫ ⚫ ▶ ⚫ ◼ ◼ ℋ ⊆ 𝒴 ◼ ℛ
◼ ⚫ (DeNero+, ACL2009; Vamvas&Sennrich, ACL2024) ⚫ (Deguchi+, ACLFindigns2024) ◼
⚫ (Cheng&Vlachos, EMNLP2023) ◼ ⚫ (Trabelsi+, 2024) DeNero+, ACL2009, “Fast Consensus Decoding over Translation Forests”. Vamvas&Sennrich, ACL2024, “Linear-time Minimum Bayes Risk Decoding with Reference Aggregation”. Deguchi+, Findings of ACL2024, “Centroid-Based Efficient Minimum Bayes Risk Decoding”. Cheng&Vlachos, EMNLP2023, “Faster Minimum Bayes Risk Decoding with Confidence-based Pruning”. Trabelsi+, 2024, “Efficient Minimum Bayes Risk Decoding using Low-Rank Matrix Completion Algorithms”.
(Denero+, ACL2009; Vamvas&Sennrich, ACL2024) ◼ 𝜙 𝒚 ⚫ ⚫ ⚫
◼ ത 𝜙 ℛ = 𝒓∈Supp ℛ 𝑝MC 𝒓|𝒙; ℛ 𝜙 𝒓 ◼ ത 𝜙 ℛ 𝒚RAMBR𝜃 MC = argmax 𝒉∈ℋ 𝑠 𝜙 𝒉 , ത 𝜙 ℛ ⚫ 𝒪 ℋ ℛ 𝒪 ℋ + ℛ ◼ ℋ ⊆ 𝒴 ◼ ℛ ◼ 𝜙 ◼ 𝑠 DeNero+, ACL2009, “Fast Consensus Decoding over Translation Forests”. Vamvas&Sennrich, ACL2024, “Linear-time Minimum Bayes Risk Decoding with Reference Aggregation”.
(Deguchi+, Findings of ACL2024) ◼ 𝐷 ⚫ 𝜙: 𝒴 →
ℝ𝐷 ◼ 𝑘 ⚫ 𝑘 ◼ 𝒪 ℋ 𝑘 + ℛ 𝑘 ◼ Deguchi+, Findings of ACL2024, “Centroid-Based Efficient Minimum Bayes Risk Decoding”.
(Cheng&Vlachos, EMNLP2023) ◼ ⚫ ◼ ◼ Cheng&Vlachos, EMNLP2023, “Faster Minimum
Bayes Risk Decoding with Confidence-based Pruning”.
(Trabelsi+, 2024) ◼ ℋ × ℛ ⚫ ◼ ⚫
▶ ▶ ◼ ⚫ 𝐻 ∈ ℝ𝑟× ℋ , 𝑅 ∈ ℝ𝑟× ℛ ⚫ 𝑀 ≈ 𝐻⊤𝑅 ▶ Trabelsi+, 2024, “Efficient Minimum Bayes Risk Decoding using Low-Rank Matrix Completion Algorithms”.
◼ ◼
(Jinnai+, ICML2024) ◼ ◼ ◼ 𝑝MB 𝒓|𝒙; ℛ, 𝜃 ≔
𝑝 𝒓|𝒙; 𝜃 σ 𝒓∈ℛ 𝑝 𝒓|𝒙; 𝜃 𝜇MB 𝒉; ℛ, 𝜃 ≔ 𝒓∈ℛ 𝑝MB 𝒓|𝒙; ℛ, 𝜃 𝑢 𝒉, 𝒓 𝑦MBR𝜃 MB = argmax 𝒉∈ℋ 𝜇MB 𝒉; ℛ, 𝜃 ◼ ℋ ⊆ 𝒴 ◼ ℛ Jinnai+, ICML2024, “Model-Based Minimum Bayes Risk Decoding for Text Generation”.
◼ 𝑝MB 𝒓|𝒙; ℛ, 𝜃 ≔ 𝑝 𝒓|𝒙; 𝜃 σ
𝒓∈ℛ 𝑝 𝒓|𝒙; 𝜃 𝜇MB 𝒉; ℛ, 𝜃 ≔ 𝒓∈ℛ 𝑝MB 𝒓|𝒙; ℛ, 𝜃 𝑢 𝒉, 𝒓 𝑦MBR𝜃 MB = argmax 𝒉∈ℋ 𝜇MB 𝒉; ℛ, 𝜃 ◼ ℋ ⊆ 𝒴 ◼ ℛ ◼ ℛ ◼ 𝑝MC 𝒓|𝒙; ℛ ≔ 𝑚 ℛ 𝒓 ℛ 𝜇MC 𝒉; ℛ ≔ 𝒓∈Supp ℛ 𝑝MC 𝒓|𝒙; ℛ 𝑢 𝒉, 𝒓 𝑦MBR𝜃 MC = argmax 𝒉∈ℋ 𝜇MC 𝒉; ℛ Jinnai+, ICML2024, “Model-Based Minimum Bayes Risk Decoding for Text Generation”.
◼ ⚫ ◼ ◼ Deguchi+, arxiv, 2408.04167, “mbrs: A Library
for Minimum Bayes Risk Decoding”.
Deguchi+, arxiv, 2408.04167, “mbrs: A Library for Minimum Bayes Risk
Decoding”. 𝑢 𝑢 ◼ ◼
Deguchi+, arxiv, 2408.04167, “mbrs: A Library for Minimum Bayes Risk
Decoding”. ◼ ⚫ ⚫ ⚫ ◼ ⚫ ⚫ ⚫ ⚫
◼ ◼ ◼ ⚫ ◼ ◼ ⚫