Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Paper Reading: Sampling-Based Approximations to Minimum Bayes Risk Decoding for Neural Machine Translation

Paper Reading: Sampling-Based Approximations to Minimum Bayes Risk Decoding for Neural Machine Translation

Hiroyuki Deguchi

February 15, 2023
Tweet

More Decks by Hiroyuki Deguchi

Other Decks in Research

Transcript

  1. ◼ ⚫ ⚫ 𝒚MAP = argmax 𝒉∈𝒴 log 𝑝 𝒉

    | 𝒙, 𝜃 𝒴 ▶ ⚫ 𝒚MBR = argmax 𝒉∈𝒴 𝔼 𝑢 𝒚∗, 𝒉 | 𝒙, 𝜃 = argmax 𝒉∈𝒴 𝜇𝑢 𝒉; 𝒙, 𝜃 ▶ 𝑢 𝒉 ∈ 𝒴 𝒚∗ ∈ 𝒴 ◼ 𝒴 𝜇𝑢 ⚫ ▶ ▶ 𝜇𝑢
  2. (Eikema&Aziz, COLING2020) ◼ 𝑁 ഥ ℋ 𝒙 = 𝒚 1

    , … , 𝒚 𝑁 ⚫ ◼ 𝜇𝑢 𝒉; 𝒙, 𝜃 ⚫ ො 𝜇𝑢 𝒉; 𝒙, 𝑁 ≔ 1 𝑁 σ𝑛=1 𝑁 𝑢 𝒚 𝑛 , 𝒉 ⚫ 𝒚NbyN ≔ argmax𝒉∈ ഥ ℋ 𝒙 ො 𝜇𝑢 𝒉; 𝒙, 𝑁 ◼ ⚫ 𝑁2 ▶ ▶ 𝒪 𝑁2 × 𝑈 , 𝑈 is the uppperbound cost to assess the utility function once. ⚫ “Is MAP Decoding All You Need? The Inadequacy of the Mode in Neural Machine Translation”, Eikema&Aziz, COLING2020
  3. ◼ 𝑆 < 𝑁 ො 𝜇𝑢 𝒪 𝑁2 × 𝑈

    → 𝒪 𝑁 × 𝑆 × 𝑈 ◼ 𝑇 ො 𝜇𝑢proxy ⚫ ഥ ℋ𝑇 𝒙 ≔ top𝑇𝒉∈ ഥ ℋ 𝒙 ො 𝜇𝑢proxy 𝒉; 𝒙, 𝑆 ⚫ 𝒚C2F ≔ argmax𝒉∈ ഥ ℋ𝑇 𝒙 ො 𝜇𝑢target 𝒉; 𝒙, 𝐿 ▶ 𝒪 𝑁 × 𝑆 × 𝑈proxy + 𝑇 × 𝐿 × 𝑈target ▶ 𝑆 = 5 𝑆 = 50
  4. ◼ ⚫ ⚫ ⚫ ◼ ◼ (Stanojević&Sima’an, WMT2014) ⚫ ◼

    “BEER: BEtter Evaluation as Ranking”, Stanojević&Sima’an, WMT2014
  5. ◼ 𝒚NbyS ≔ argmax 𝒉∈ 𝒚 𝑘 𝑘=1 𝑁 ො

    𝜇𝑢 𝒉; 𝒙, 𝑆 ◼ 𝑆 ◼ 𝑆
  6. ◼ ⚫ ▶ 𝑁 = 405 ▶ 𝑆 = 13

    ⚫ ▶ top𝑇 = 50 ▶ ▶ 𝐿 = 100 ⚫ 𝑁 = 405 ◼ ⚫