Upgrade to Pro — share decks privately, control downloads, hide ads and more …

[NLPコロキウム 2023/9/13] Unbalanced Optimal Transport for Unbalanced Word Alignment

Yuki Arase
September 13, 2023

[NLPコロキウム 2023/9/13] Unbalanced Optimal Transport for Unbalanced Word Alignment

Monolingual word alignment, which identifies semantically corresponding words in a sentence pair, has been actively studied as a crucial technique for modelling semantic relationships between sentences, such as for paraphrase identification and textual entailment recognition. Remarkably, the alignment information has been recently recognized as valuable cues for interpreting model predictions with application to quality estimation and hallucination detection. Despite years of dedicated research, challenges still persist in many-to-many and null alignment, which constitutes an *unbalanced* word alignment problem.
In this talk, we show that the optimal transport (OT) based methods are natural and sufficiently powerful approaches to unbalanced word alignment without tailor-made techniques. We provide a comprehensive analysis that unveils the characteristics of different OT problems on unbalanced word alignment across diverse null alignment ratios.

Yuki Arase

September 13, 2023
Tweet

Other Decks in Research

Transcript

  1. About Me • Career • 2010-2014 Associate Researcher, Microsoft Research

    Asia, China • 2014- Associate Professor, Osaka University, Japan • Research Interest • Paraphrase recognition and generation • NLP for language education and healthcare • Community Service • PC Chair @ IJCNLP-AACL2023 • MAL @ Asian Federation of NLP, 2
  2. Monolingual Word Alignment • Identifies semantically corresponding words in a

    sentence pair • Is crucial for modelling semantic interactions between sentences: • Paraphrase & entailment recognition • Summarization & sentence fusion • Interpretability, provenance of LLM outputs 3 The agency described in a statement that the information was a pack of lies It said in a bulletin that reports about the incident are cheap lies and news rumors
  3. Unbalanced Word Alignment • Null alignment is prevalent in semantically

    divergent sentences, which makes the alignment problem challenging • Null alignment ratio can be ~64% in entailment sentence pairs • Identification of null alignment is useful to declare semantic gaps and reason about semantic (dis)similarity The agency described in a statement that the information was a pack of lies It said in a bulletin that reports about the incident are cheap lies and rumors news
  4. Unbalanced Word Alignment • Null alignment is prevalent in semantically

    divergent sentences, which makes the alignment problem challenging • Null alignment ratio can be ~64% in entailment sentence pairs • Identification of null alignment is useful to declare semantic gaps and reason about semantic (dis)similarity Two days later , a 28-year-old man died in a shark attack in Avon , North Carolina . A shark attacked a human being .
  5. Related Work • Bilingual word alignment has been commonly studied

    for MT, e.g., (Garg et al. 2019, Zenkel et al. 2020) • Assume the availability of a large-scale parallel corpus • Monolingual word alignment commonly uses supervised learning, e.g., (Yao et al. 2013, Lan et al. 2021) • Modelled word alignment using the CRF regarding source words as observations and target words as hidden states • Null alignment has got less attention. • Critical to handle semantically divergent sentence pairs 6
  6. Unbalanced Word Alignment as OT 9 𝑤1 𝑠 𝑤2 𝑠

    𝑤3 𝑠 𝑤4 𝑠 𝑤5 𝑠 𝑤6 𝑠 𝑤7 𝑠 𝑤8 𝑠 𝑤1 𝑡 𝑤2 𝑡 𝑤3 𝑡 𝑤4 𝑡 𝑤5 𝑡 𝑤6 𝑡 𝑤7 𝑡 Source Target 1.0 0.0 Distance matrix
  7. Link to Statistic Word Alignment 10 𝑤1 𝑠 𝑤2 𝑠

    𝑤3 𝑠 𝑤4 𝑠 𝑤5 𝑠 𝑤6 𝑠 𝑤7 𝑠 𝑤8 𝑠 𝑤1 𝑡 𝑤2 𝑡 𝑤3 𝑡 𝑤4 𝑡 𝑤5 𝑡 𝑤6 𝑡 𝑤7 𝑡 Source Target 1.0 0.0 Distance matrix Distortion (IBM Model2) Fertility (IBM Model3) Brown et al. 2003. The Mathematics of Statistical Machine Translation: Parameter Estimation. CL.
  8. Optimal Transport Alignment: OTAlign • Leverage balanced, partial, and unbalanced

    OT for unbalanced word alignment • Obtain contextualized word embeddings using a pretrained LM, namely BERT • Cost: Cosine and Euclidean distances of embeddings • Fertility: L2-norms / Uniform • Sparcify alignments • Regularization on OT makes the alignment matrix dense • Prune alignment whose probability is smaller than a threshold 11
  9. Experiment Settings • Datasets with human alignment • MSR-RTE, Edinburgh++

    • MultiMWA: MTRef, Wiki, Newsela, and ArXiv • Evaluation metrics precision = ෡ 𝕐𝑎 ∩ 𝕐𝑎 + ෡ 𝕐∅ ∩ 𝕐∅ ෡ 𝕐𝑎 + ෡ 𝕐∅ , recall = ෡ 𝕐𝑎 ∩ 𝕐𝑎 + ෡ 𝕐∅ ∩ 𝕐∅ 𝕐𝑎 + 𝕐∅ 12
  10. Unsupervised Alignment: Per Corpus [Observation 1] The best OT problem

    depends on null alignment ratios 13 Corpus (sparse ↔ dense) MSR-RTE Newsela EDB++ MTRef Arxiv Wiki Alignment links S S + P S S + P S S + P S S + P S S + P S Null rate (%) 63.8 59.0 33.3 23.5 27.4 19.0 18.7 11.2 12.8 12.2 8.3 fast-align 42.3 41.6 58.4 56.5 59.6 60.8 58.1 58.0 80.5 80.5 87.2 SimAlign 85.4 81.5 76.7 77.3 74.7 78.9 74.8 75.8 91.7 91.9 94.8 Type Reg. cost mass BOT -- cosine uniform 20.6 22.5 41.4 46.9 49.0 55.0 50.4 55.5 65.6 66.2 66.5 Sk cosine uniform 88.8 83.0 83.7 79.4 84.4 82.8 77.3 77.2 90.4 90.9 93.9 POT -- cosine uniform 89.0 84.0 77.1 76.2 78.4 78.7 75.6 76.2 84.3 89.9 94.5 Sk cosine uniform 92.2 86.4 84.6 79.8 83.8 82.3 77.0 76.6 91.5 90.3 93.9 UOT Sk cosine uniform 90.2 84.5 83.1 79.1 84.7 82.5 77.2 77.1 90.0 89.6 93.8
  11. Unsupervised Alignment: Per Null Rate 0% 20% 40% 60% 80%

    100% 0 20 40 60 80 100 Alignment F1 (%) Null ratio (%) fast-align SimAlign BOT: cos, uniform Regularised BOT: cos, uniform POT: cos, uniform Regularised POT: cos, uniform UOT: cos, uniform 14
  12. Unsupervised Alignment: Per Null Rate 0% 20% 40% 60% 80%

    100% 0 20 40 60 80 100 Alignment F1 (%) Null atio (%) fast-align SimAlign BOT: cos, uniform Regularised BOT: cos, uniform POT: cos, uniform Regularised POT: cos, uniform UOT: cos, uniform 15 [Observation 2] Thresholding on the alignment matrix makes it unbalanced.
  13. Supervised Alignment • The entropy-regularized OT is differentiable and thus

    can be directly integrated into neural models. • Fine-tune the entire model by minimizing the binary cross- entropy loss: ℒ 𝑃𝑖,𝑗 , 𝑌𝑖,𝑗 = −𝑌𝑖,𝑗 log 𝑃𝑖,𝑗 − 1 − 𝑌𝑖,𝑗 log(1 − 𝑃𝑖,𝑗 ) 16
  14. Supervised Alignment: Per Corpus [Observation 3] OT-based alignment is competitive

    against the SoTA methods on datasets with higher null alignment ratios. 17 Corpus (sparse ↔ dense) MSR-RTE Newsela EDB++ MTRef Arxiv Wiki Alignment links S S + P S S + P S S + P S S + P S S + P S Null rate (%) 63.8 59.0 33.3 23.5 27.4 19.0 18.7 11.2 12.8 12.2 8.3 (Lan et al. 2021) 95.1 89.2 86.7 85.3 88.3 87.8 83.4 86.1 95.2 95.0 96.6 (Nagata et al. 2020) 95.0 89.2 79.4 82.4 86.9 87.2 82.9 88.0 89.1 89.5 96.5 Type cost mass BOT cosine norm 94.6 88.4 86.5 84.4 85.7 85.4 82.9 87.3 91.7 93.0 96.5 POT cosine norm 94.6 88.4 84.0 81.4 85.5 83.7 82.0 85.2 93.0 92.2 95.5 UOT cosine norm 94.8 89.0 86.8 84.7 86.7 86.6 82.9 87.4 92.5 92.8 96.7
  15. Supervised Alignment: Per Null Rate 70% 80% 90% 100% 0

    20 40 60 80 100 Alignment F1 (%) Null ratio (%) (Lan et al., 2021) (Nagata et al., 2020) Regularised BOT: cos, norm Regularised POT: cos, norm UOT: cos, norm 18
  16. Unsupervised Bilingual Word Alignment 20 • Applied OTAlign to bilingual

    word alignment • Multilingual pre-trained model: LaBSE Corpus de-en sv-en fr-en ro-en ja-en zh-en Awsome-align (Dou and Neubig 2021) 82.5 90.2 94.3 72.1 54.5 82.1 AccAlign (Wang et al. 2022) 84.0 92.6 95.5 79.2 56.7 83.8 Type cost mass BOT cosine norm 82.1 90.5 92.8 76.6 51.8 84.0 UOT cosine norm 85.3 93.6 96.3 79.9 59.5 84.8 * Hyper-parameters were tuned on the dev set (cs-en)
  17. Summary This is the first study that connects the paradigms

    of unbalanced word alignment and the OT problems. We empirically showed 1. OTAlign is a natural and powerful tool to unbalanced word alignment without tailor-made techniques 2. a comprehensive picture that unveils the characteristics of the OT problems on unbalanced word alignment 21 OTAlign: https://github.com/yukiar/OTAlign
  18. All Resources Are Available! • Yuki Arase, Han Bao, and

    Sho Yokoi. Unbalanced Optimal Transport for Unbalanced Word Alignment. In Proc. of ACL 2023. • OTAlign: https://github.com/yukiar/OTAlign • Yuki Arase and Jun’ichi Tsujii. 2020. Compositional Phrase Alignment and Beyond. In Proc. of EMNLP 2020. • Sora Kadotani and Yuki Arase. 2023. Monolingual Phrase Alignment as Parse Forest Mapping. In Proc. of *SEM 2023. • Phrase Aligner: https://github.com/yukiar/phrase_alignment_cted