Upgrade to Pro — share decks privately, control downloads, hide ads and more …

20240820: Minimum Bayes Risk Decoding for High-...

20240820: Minimum Bayes Risk Decoding for High-Quality Text Generation Beyond High-Probability Text

Hiroyuki Deguchi

August 20, 2024
Tweet

More Decks by Hiroyuki Deguchi

Other Decks in Research

Transcript

  1. ◼ ⚫ ⚫ ⚫ ◼ ◼ ⚫ ⚫ ▶ ▶

    https://en.wikipedia.org/wiki/Transfer-based_machine_translation
  2. ◼ ⚫ 𝒚⋆ ∈ 𝒴 ▶ 𝒴 ≔ 𝒱𝑌 ∗

    ⚫ 𝒙 ∈ 𝒳 ▶ ◼ ◼ 𝒳 ≔ 𝒱𝑋 ∗ ◼ 𝒴 ≔ 𝒱𝑌 ∗ ⚫ 𝒱𝑋 ∗, 𝒱𝑌 ∗ ⚫ 𝒚1 ⚫ 𝒚2 ⚫ 𝒚3 ⚫ 𝒚4 ⚫
  3. ◼ ⚫ 𝒚⋆ ∈ 𝒴 ▶ 𝒴 ≔ 𝒱𝑌 ∗

    ⚫ 𝒙 ∈ 𝒳 ▶ ◼ ◼ 𝒳 ≔ 𝒱𝑋 ∗ ◼ 𝒴 ≔ 𝒱𝑌 ∗ ⚫ 𝒱𝑋 ∗, 𝒱𝑌 ∗
  4. ◼ 𝑝 𝒚|𝒙; 𝜃 ⚫ 𝒚 𝒙 𝒚 ◼ 𝒙

    ∈ 𝒱𝑋 ∗ ◼ 𝒚 ∈ 𝒱𝑌 ∗ ⚫ 𝒱𝑋 ∗, 𝒱𝑌 ∗ ◼ 𝜃 ⚫ 𝑝 This book is interesting ; 𝜃) = 0.8434 𝑝 This book is delicious ; 𝜃) = 0.0013
  5. ◼ 𝑝 𝒚|𝒙; 𝜃 ⚫ 𝒚 𝒙 𝒚 ◼ ⚫

    ⚫ 𝑝 𝒚|𝒙; 𝜃 = 𝑝 𝑦1 |𝒙; 𝜃 𝑝 𝑦2 |𝑦1 , 𝒙; 𝜃 𝑝 𝑦3 |𝑦2 , 𝑦1 , 𝒙; 𝜃 … ◼ 𝒙 ∈ 𝒱𝑋 ∗ ◼ 𝒚 ∈ 𝒱𝑌 ∗ ⚫ 𝒱𝑋 ∗, 𝒱𝑌 ∗ ◼ 𝜃 ⚫ 𝑝 interesting This book is, ; 𝜃) = 0.2875 𝑝 delicious This book is, ; 𝜃) = 0.0003
  6. ◼ 𝑝 𝒚|𝒙; 𝜃 ⚫ 𝒚 𝒙 𝒚 ◼ 𝒚MAP𝜃

    ∈ 𝒴 𝒚MAP𝜃 = argmax 𝒚∈𝒴 𝑝 𝒚|𝒙; 𝜃 ς 𝑡=1 𝒚 𝑝 𝑦𝑡|𝒚<𝑡,𝒙;𝜃 ⚫ ⚫ ◼ 𝒳 ≔ 𝒱𝑋 ∗ ◼ 𝒴 ≔ 𝒱𝑌 ∗ ⚫ 𝒚1 ⚫ 𝒚2 ⚫ 𝒚3 ⚫ 𝒚4 ⚫ ◼ 𝜃 ⚫
  7. ◼ 𝑝 𝒚|𝒙; 𝜃 ⚫ 𝒚 𝒙 𝒚 ◼ 𝒚MAP𝜃

    ∈ 𝒴 𝒚MAP𝜃 = argmax 𝒚∈𝒴 𝑝 𝒚|𝒙; 𝜃 ς 𝑡=1 𝒚 𝑝 𝑦𝑡|𝒚<𝑡,𝒙;𝜃 ⚫ ⚫ ◼ 𝒳 ≔ 𝒱𝑋 ∗ ◼ 𝒴 ≔ 𝒱𝑌 ∗ ⚫ 𝒚1 ⚫ 𝒚2 ⚫ 𝒚3 ⚫ 𝒚4 ⚫ ◼ 𝜃 ⚫ ◼ ⚫
  8. ◼ 𝑝 𝒚|𝒙; 𝜃 ⚫ 𝒚 𝒙 𝒚 ◼ 𝒚MAP𝜃

    ∈ 𝒴 𝒚MAP𝜃 = argmax 𝒚∈𝒴 𝑝 𝒚|𝒙; 𝜃 ς 𝑡=1 𝒚 𝑝 𝑦𝑡|𝒚<𝑡,𝒙;𝜃 ⚫ ⚫ ◼ 𝒳 ≔ 𝒱𝑋 ∗ ◼ 𝒴 ≔ 𝒱𝑌 ∗ ⚫ 𝒚1 ⚫ 𝒚2 ⚫ 𝒚3 ⚫ 𝒚4 ⚫ ◼ 𝜃 ⚫ ◼ ⚫
  9. ◼ ⚫ ▶ ◼ ⚫ ⚫ 𝑝 ""|𝒙; 𝜃 ⚫

    ; 1 2 3 4 5 𝑦5 (Ott+, ICML2018; Stahlberg & Byrne, EMNLP2019) Ott+, ICML2018, “Analyzing Uncertainty in Neural Machine Translation”. Stahlberg & Byrne, EMNLP2019, “On NMT Search Errors and Model Errors: Cat Got Your Tongue?”
  10. ◼ Risk 𝒚 = 𝔼𝒚′~ Pr ⋅|𝒙 ℒ 𝒚, 𝒚′

    ⚫ ◼ ⚫ argmin 𝒚∈𝒴 Risk 𝒚 Goel & Byrne, CS&L Vol14., 2000, “Minimum Bayes-risk automatic speech recognition”. Kumar & Byrne, NAACL2004, “Minimum Bayes-Risk Decoding for Statistical Machine Translation”. ◼ ℒ: 𝒴 × 𝒴 → ℝ ◼ Pr ⋅ |𝒙
  11. ◼ Risk 𝒚 = 𝔼𝒚′~ Pr ⋅|𝒙 ℒ 𝒚, 𝒚′

    ⚫ ◼ ⚫ argmin 𝒚∈𝒴 Risk 𝒚 ◼ Goel & Byrne, CS&L Vol14., 2000, “Minimum Bayes-risk automatic speech recognition”. Kumar & Byrne, NAACL2004, “Minimum Bayes-Risk Decoding for Statistical Machine Translation”. ◼ ℒ: 𝒴 × 𝒴 → ℝ ◼ Pr ⋅ |𝒙
  12. ◼ (von Neumann & Morgenstern, 1944) ⚫ von Neumann &

    Morgenstern, 1944, “Theory of Games and Economic Behavior”. ⚫ ⚫ ▶ $1500 ∗ 0.75 + $3000 ∗ 0.25 = $1875 ⚫ ▶ $1500 ∗ 0.25 + $3000 ∗ 0.75 = $2625
  13. ◼ ⚫ ⚫ 𝑢: 𝒴 × 𝒴 → ℝ 𝒚

    ≽ 𝒚′ ⇔ 𝑢 𝒚, 𝒓 ≥ 𝑢 𝒚′, 𝒓 ⚫ ◼ 𝑢: 𝒴 × 𝒴 → ℝ ◼ ≽ 𝒚 𝒚′ ◼ 𝒓 ∈ 𝒴
  14. ◼ ⚫ ⚫ 𝑢: 𝒴 × 𝒴 → ℝ 𝒚

    ≽ 𝒚′ ⇔ 𝑢 𝒚, 𝒓 ≥ 𝑢 𝒚′, 𝒓 ⚫ ◼ 𝑢: 𝒴 × 𝒴 → ℝ ◼ ≽ 𝒚 𝒚′ ◼ 𝒓 ∈ 𝒴
  15. ◼ 𝒚MBRtrue = argmax 𝒚∈𝒴 𝔼𝒓~ Pr ⋅|𝒙 𝑢 𝒚,

    𝒓 ⚫ ◼ argmin 𝒚∈𝒴 Risk 𝒚 = argmin 𝒚∈𝒴 𝔼𝒚′~ Pr ⋅|𝒙 ℒ 𝒚, 𝒚′ ⚫ ◼ 𝑢: 𝒴 × 𝒴 → ℝ ◼ Pr ⋅ |𝒙 ◼ ⚫
  16. ◼ 𝒚MBRtrue = argmax 𝒚∈𝒴 𝔼𝒓~ Pr ⋅|𝒙 𝑢 𝒚,

    𝒓 ◼ ⚫ ▶ ⚫ ▶ ▶ Pr ⋅ |𝒙 ⚫ ▶
  17. ◼ 𝒚MBRtrue = argmax 𝒚∈𝒴 𝔼𝒓~ Pr ⋅|𝒙 𝑢 𝒚,

    𝒓 ◼ ⚫ ▶ ℋ ⊆ 𝒴 ⚫ ▶ ▶ Pr ⋅ |𝒙 ⚫ ▶
  18. ◼ 𝒚MBRtrue = argmax 𝒚∈𝒴 𝔼𝒓~ Pr ⋅|𝒙 𝑢 𝒚,

    𝒓 ◼ ⚫ ▶ ℋ ⊆ 𝒴 ⚫ ▶ ▶ Pr ⋅ |𝒙 ⚫ ▶
  19. (Eikema & Aziz, COLING2020) ◼ ෠ ℛ ≔ 𝒓𝑖 ∈

    𝒴 𝒓𝑖 ~𝑝 𝒓|𝒙; 𝜃 𝑖=1 ෠ ℛ ◼ 𝑝MC 𝒓|𝒙; ෠ ℛ ≔ 𝑚 ෠ ℛ 𝒓 ෠ ℛ 𝜇MC 𝒉; ෠ ℛ ≔ ෍ 𝒓∈Supp ෠ ℛ 𝑝MC 𝒓|𝒙; ෠ ℛ 𝑢 𝒉, 𝒓 𝑦MBR𝜃 MC = argmax 𝒉∈ℋ 𝜇MC 𝒉; ෠ ℛ ◼ ℋ ⊆ 𝒴 ◼ ෠ ℛ ◼ Supp ෠ ℛ ⊆ 𝒴 ෠ ℛ ◼ 𝑚 ෡ ℛ : 𝒴 → ℤ+ Eikema & Aziz, COLING2020, “Is MAP Decoding All You Need? The Inadequacy of the Mode in Neural Machine Translation”.
  20. (Eikema & Aziz, COLING2020) ◼ ෠ ℛ ≔ 𝒓𝑖 ∈

    𝒴 𝒓𝑖 ~𝑝 𝒓|𝒙; 𝜃 𝑖=1 ෠ ℛ ◼ 𝑝MC 𝒓|𝒙; ෠ ℛ ≔ 𝑚 ෠ ℛ 𝒓 ෠ ℛ 𝜇MC 𝒉; ෠ ℛ ≔ ෍ 𝒓∈Supp ෠ ℛ 𝑝MC 𝒓|𝒙; ෠ ℛ 𝑢 𝒉, 𝒓 𝑦MBR𝜃 MC = argmax 𝒉∈ℋ 𝜇MC 𝒉; ෠ ℛ Eikema & Aziz, COLING2020, “Is MAP Decoding All You Need? The Inadequacy of the Mode in Neural Machine Translation”.
  21. (Eikema & Aziz, COLING2020) Eikema & Aziz, COLING2020, “Is MAP

    Decoding All You Need? The Inadequacy of the Mode in Neural Machine Translation”. ◼ 𝑦MBR𝜃 MC = argmax 𝒉∈ℋ 𝜇MC 𝒉; ෠ ℛ ◼
  22. ◼ ⚫ ▶ ℋ = ෠ ℛ ⚫ 𝜖 =

    0.02 ◼ ⚫ ⚫ ◼ ⚫ ◼
  23. ◼ ⚫ ⚫ ▶ ◼ ⚫ Deguchi+, arxiv, 2408.04167, “mbrs:

    A Library for Minimum Bayes Risk Decoding”.
  24. ◼ 𝒪 ℋ ෠ ℛ ⚫ 𝒪 𝑁2 𝑁 ≔

    ℋ ⚫ ⚫ ▶ ⚫ ◼ ◼ ℋ ⊆ 𝒴 ◼ ෠ ℛ
  25. ◼ ⚫ (DeNero+, ACL2009; Vamvas&Sennrich, ACL2024) ⚫ (Deguchi+, ACLFindigns2024) ◼

    ⚫ (Cheng&Vlachos, EMNLP2023) ◼ ⚫ (Trabelsi+, 2024) DeNero+, ACL2009, “Fast Consensus Decoding over Translation Forests”. Vamvas&Sennrich, ACL2024, “Linear-time Minimum Bayes Risk Decoding with Reference Aggregation”. Deguchi+, Findings of ACL2024, “Centroid-Based Efficient Minimum Bayes Risk Decoding”. Cheng&Vlachos, EMNLP2023, “Faster Minimum Bayes Risk Decoding with Confidence-based Pruning”. Trabelsi+, 2024, “Efficient Minimum Bayes Risk Decoding using Low-Rank Matrix Completion Algorithms”.
  26. (Denero+, ACL2009; Vamvas&Sennrich, ACL2024) ◼ 𝜙 𝒚 ⚫ ⚫ ⚫

    ◼ ത 𝜙 ෠ ℛ = ෍ 𝒓∈Supp ෠ ℛ 𝑝MC 𝒓|𝒙; ෠ ℛ 𝜙 𝒓 ◼ ത 𝜙 ෠ ℛ 𝒚RAMBR𝜃 MC = argmax 𝒉∈ℋ 𝑠 𝜙 𝒉 , ത 𝜙 ෠ ℛ ⚫ 𝒪 ℋ ෠ ℛ 𝒪 ℋ + ෠ ℛ ◼ ℋ ⊆ 𝒴 ◼ ෠ ℛ ◼ 𝜙 ◼ 𝑠 DeNero+, ACL2009, “Fast Consensus Decoding over Translation Forests”. Vamvas&Sennrich, ACL2024, “Linear-time Minimum Bayes Risk Decoding with Reference Aggregation”.
  27. (Deguchi+, Findings of ACL2024) ◼ 𝐷 ⚫ 𝜙: 𝒴 →

    ℝ𝐷 ◼ 𝑘 ⚫ 𝑘 ◼ 𝒪 ℋ 𝑘 + ෠ ℛ 𝑘 ◼ Deguchi+, Findings of ACL2024, “Centroid-Based Efficient Minimum Bayes Risk Decoding”.
  28. (Trabelsi+, 2024) ◼ ℋ × ෠ ℛ ⚫ ◼ ⚫

    ▶ ▶ ◼ ⚫ 𝐻 ∈ ℝ𝑟× ℋ , 𝑅 ∈ ℝ𝑟× ෠ ℛ ⚫ 𝑀 ≈ 𝐻⊤𝑅 ▶ Trabelsi+, 2024, “Efficient Minimum Bayes Risk Decoding using Low-Rank Matrix Completion Algorithms”.
  29. (Jinnai+, ICML2024) ◼ ◼ ◼ 𝑝MB 𝒓|𝒙; ℛ, 𝜃 ≔

    𝑝 𝒓|𝒙; 𝜃 σ 𝒓∈ℛ 𝑝 𝒓|𝒙; 𝜃 𝜇MB 𝒉; ℛ, 𝜃 ≔ ෍ 𝒓∈ℛ 𝑝MB 𝒓|𝒙; ℛ, 𝜃 𝑢 𝒉, 𝒓 𝑦MBR𝜃 MB = argmax 𝒉∈ℋ 𝜇MB 𝒉; ℛ, 𝜃 ◼ ℋ ⊆ 𝒴 ◼ ℛ Jinnai+, ICML2024, “Model-Based Minimum Bayes Risk Decoding for Text Generation”.
  30. ◼ 𝑝MB 𝒓|𝒙; ℛ, 𝜃 ≔ 𝑝 𝒓|𝒙; 𝜃 σ

    𝒓∈ℛ 𝑝 𝒓|𝒙; 𝜃 𝜇MB 𝒉; ℛ, 𝜃 ≔ ෍ 𝒓∈ℛ 𝑝MB 𝒓|𝒙; ℛ, 𝜃 𝑢 𝒉, 𝒓 𝑦MBR𝜃 MB = argmax 𝒉∈ℋ 𝜇MB 𝒉; ℛ, 𝜃 ◼ ℋ ⊆ 𝒴 ◼ ෠ ℛ ◼ ℛ ◼ 𝑝MC 𝒓|𝒙; ෠ ℛ ≔ 𝑚 ෠ ℛ 𝒓 ෠ ℛ 𝜇MC 𝒉; ෠ ℛ ≔ ෍ 𝒓∈Supp ෠ ℛ 𝑝MC 𝒓|𝒙; ෠ ℛ 𝑢 𝒉, 𝒓 𝑦MBR𝜃 MC = argmax 𝒉∈ℋ 𝜇MC 𝒉; ෠ ℛ Jinnai+, ICML2024, “Model-Based Minimum Bayes Risk Decoding for Text Generation”.
  31. Deguchi+, arxiv, 2408.04167, “mbrs: A Library for Minimum Bayes Risk

    Decoding”. ◼ ⚫ ⚫ ⚫ ◼ ⚫ ⚫ ⚫ ⚫