[NLPコロキウム 2023/9/13] Unbalanced Optimal Transport for Unbalanced Word Alignment

Unbalanced Optimal Transport for Unbalanced Word Alignment Yuki Arase Associate
Professor, Osaka University, Japan NLP

About Me • Career • 2010-2014 Associate Researcher, Microsoft Research
Asia, China • 2014- Associate Professor, Osaka University, Japan • Research Interest • Paraphrase recognition and generation • NLP for language education and healthcare • Community Service • PC Chair @ IJCNLP-AACL2023 • MAL @ Asian Federation of NLP, 2

Monolingual Word Alignment • Identifies semantically corresponding words in a
sentence pair • Is crucial for modelling semantic interactions between sentences: • Paraphrase & entailment recognition • Summarization & sentence fusion • Interpretability, provenance of LLM outputs 3 The agency described in a statement that the information was a pack of lies It said in a bulletin that reports about the incident are cheap lies and news rumors

Unbalanced Word Alignment • Null alignment is prevalent in semantically
divergent sentences, which makes the alignment problem challenging • Null alignment ratio can be ~64% in entailment sentence pairs • Identification of null alignment is useful to declare semantic gaps and reason about semantic (dis)similarity The agency described in a statement that the information was a pack of lies It said in a bulletin that reports about the incident are cheap lies and rumors news

Unbalanced Word Alignment • Null alignment is prevalent in semantically
divergent sentences, which makes the alignment problem challenging • Null alignment ratio can be ~64% in entailment sentence pairs • Identification of null alignment is useful to declare semantic gaps and reason about semantic (dis)similarity Two days later , a 28-year-old man died in a shark attack in Avon , North Carolina . A shark attacked a human being .

Related Work • Bilingual word alignment has been commonly studied
for MT, e.g., (Garg et al. 2019, Zenkel et al. 2020) • Assume the availability of a large-scale parallel corpus • Monolingual word alignment commonly uses supervised learning, e.g., (Yao et al. 2013, Lan et al. 2021) • Modelled word alignment using the CRF regarding source words as observations and target words as hidden states • Null alignment has got less attention. • Critical to handle semantically divergent sentence pairs 6

Optimal Transport (OT) Problems ※ OT 7 Cost matrix 1.0
0.0

Partial and Unbalanced OT 8 Cost matrix 1.0 0.0 Null
alignment

Unbalanced Word Alignment as OT 9 𝑤1 𝑠 𝑤2 𝑠
𝑤3 𝑠 𝑤4 𝑠 𝑤5 𝑠 𝑤6 𝑠 𝑤7 𝑠 𝑤8 𝑠 𝑤1 𝑡 𝑤2 𝑡 𝑤3 𝑡 𝑤4 𝑡 𝑤5 𝑡 𝑤6 𝑡 𝑤7 𝑡 Source Target 1.0 0.0 Distance matrix

Link to Statistic Word Alignment 10 𝑤1 𝑠 𝑤2 𝑠
𝑤3 𝑠 𝑤4 𝑠 𝑤5 𝑠 𝑤6 𝑠 𝑤7 𝑠 𝑤8 𝑠 𝑤1 𝑡 𝑤2 𝑡 𝑤3 𝑡 𝑤4 𝑡 𝑤5 𝑡 𝑤6 𝑡 𝑤7 𝑡 Source Target 1.0 0.0 Distance matrix Distortion (IBM Model2) Fertility (IBM Model3) Brown et al. 2003. The Mathematics of Statistical Machine Translation: Parameter Estimation. CL.

Optimal Transport Alignment: OTAlign • Leverage balanced, partial, and unbalanced
OT for unbalanced word alignment • Obtain contextualized word embeddings using a pretrained LM, namely BERT • Cost: Cosine and Euclidean distances of embeddings • Fertility: L2-norms / Uniform • Sparcify alignments • Regularization on OT makes the alignment matrix dense • Prune alignment whose probability is smaller than a threshold 11

Experiment Settings • Datasets with human alignment • MSR-RTE, Edinburgh++
• MultiMWA: MTRef, Wiki, Newsela, and ArXiv • Evaluation metrics precision = ෡ 𝕐𝑎 ∩ 𝕐𝑎 + ෡ 𝕐∅ ∩ 𝕐∅ ෡ 𝕐𝑎 + ෡ 𝕐∅ , recall = ෡ 𝕐𝑎 ∩ 𝕐𝑎 + ෡ 𝕐∅ ∩ 𝕐∅ 𝕐𝑎 + 𝕐∅ 12

Unsupervised Alignment: Per Corpus [Observation 1] The best OT problem
depends on null alignment ratios 13 Corpus (sparse ↔ dense) MSR-RTE Newsela EDB++ MTRef Arxiv Wiki Alignment links S S + P S S + P S S + P S S + P S S + P S Null rate (%) 63.8 59.0 33.3 23.5 27.4 19.0 18.7 11.2 12.8 12.2 8.3 fast-align 42.3 41.6 58.4 56.5 59.6 60.8 58.1 58.0 80.5 80.5 87.2 SimAlign 85.4 81.5 76.7 77.3 74.7 78.9 74.8 75.8 91.7 91.9 94.8 Type Reg. cost mass BOT -- cosine uniform 20.6 22.5 41.4 46.9 49.0 55.0 50.4 55.5 65.6 66.2 66.5 Sk cosine uniform 88.8 83.0 83.7 79.4 84.4 82.8 77.3 77.2 90.4 90.9 93.9 POT -- cosine uniform 89.0 84.0 77.1 76.2 78.4 78.7 75.6 76.2 84.3 89.9 94.5 Sk cosine uniform 92.2 86.4 84.6 79.8 83.8 82.3 77.0 76.6 91.5 90.3 93.9 UOT Sk cosine uniform 90.2 84.5 83.1 79.1 84.7 82.5 77.2 77.1 90.0 89.6 93.8

Unsupervised Alignment: Per Null Rate 0% 20% 40% 60% 80%
100% 0 20 40 60 80 100 Alignment F1 (%) Null ratio (%) fast-align SimAlign BOT: cos, uniform Regularised BOT: cos, uniform POT: cos, uniform Regularised POT: cos, uniform UOT: cos, uniform 14

Unsupervised Alignment: Per Null Rate 0% 20% 40% 60% 80%
100% 0 20 40 60 80 100 Alignment F1 (%) Null atio (%) fast-align SimAlign BOT: cos, uniform Regularised BOT: cos, uniform POT: cos, uniform Regularised POT: cos, uniform UOT: cos, uniform 15 [Observation 2] Thresholding on the alignment matrix makes it unbalanced.

Supervised Alignment • The entropy-regularized OT is differentiable and thus
can be directly integrated into neural models. • Fine-tune the entire model by minimizing the binary cross- entropy loss: ℒ 𝑃𝑖,𝑗 , 𝑌𝑖,𝑗 = −𝑌𝑖,𝑗 log 𝑃𝑖,𝑗 − 1 − 𝑌𝑖,𝑗 log(1 − 𝑃𝑖,𝑗 ) 16

Supervised Alignment: Per Corpus [Observation 3] OT-based alignment is competitive
against the SoTA methods on datasets with higher null alignment ratios. 17 Corpus (sparse ↔ dense) MSR-RTE Newsela EDB++ MTRef Arxiv Wiki Alignment links S S + P S S + P S S + P S S + P S S + P S Null rate (%) 63.8 59.0 33.3 23.5 27.4 19.0 18.7 11.2 12.8 12.2 8.3 (Lan et al. 2021) 95.1 89.2 86.7 85.3 88.3 87.8 83.4 86.1 95.2 95.0 96.6 (Nagata et al. 2020) 95.0 89.2 79.4 82.4 86.9 87.2 82.9 88.0 89.1 89.5 96.5 Type cost mass BOT cosine norm 94.6 88.4 86.5 84.4 85.7 85.4 82.9 87.3 91.7 93.0 96.5 POT cosine norm 94.6 88.4 84.0 81.4 85.5 83.7 82.0 85.2 93.0 92.2 95.5 UOT cosine norm 94.8 89.0 86.8 84.7 86.7 86.6 82.9 87.4 92.5 92.8 96.7

Supervised Alignment: Per Null Rate 70% 80% 90% 100% 0
20 40 60 80 100 Alignment F1 (%) Null ratio (%) (Lan et al., 2021) (Nagata et al., 2020) Regularised BOT: cos, norm Regularised POT: cos, norm UOT: cos, norm 18

OTAlign Example 19 State-of-the-art (Lan et al. 2021) OTAlign (Unbalanced
OT)

Unsupervised Bilingual Word Alignment 20 • Applied OTAlign to bilingual
word alignment • Multilingual pre-trained model: LaBSE Corpus de-en sv-en fr-en ro-en ja-en zh-en Awsome-align (Dou and Neubig 2021) 82.5 90.2 94.3 72.1 54.5 82.1 AccAlign (Wang et al. 2022) 84.0 92.6 95.5 79.2 56.7 83.8 Type cost mass BOT cosine norm 82.1 90.5 92.8 76.6 51.8 84.0 UOT cosine norm 85.3 93.6 96.3 79.9 59.5 84.8 * Hyper-parameters were tuned on the dev set (cs-en)

Summary This is the first study that connects the paradigms
of unbalanced word alignment and the OT problems. We empirically showed 1. OTAlign is a natural and powerful tool to unbalanced word alignment without tailor-made techniques 2. a comprehensive picture that unveils the characteristics of the OT problems on unbalanced word alignment 21 OTAlign: https://github.com/yukiar/OTAlign

All Resources Are Available! • Yuki Arase, Han Bao, and
Sho Yokoi. Unbalanced Optimal Transport for Unbalanced Word Alignment. In Proc. of ACL 2023. • OTAlign: https://github.com/yukiar/OTAlign • Yuki Arase and Jun’ichi Tsujii. 2020. Compositional Phrase Alignment and Beyond. In Proc. of EMNLP 2020. • Sora Kadotani and Yuki Arase. 2023. Monolingual Phrase Alignment as Parse Forest Mapping. In Proc. of *SEM 2023. • Phrase Aligner: https://github.com/yukiar/phrase_alignment_cted

[NLPコロキウム 2023/9/13] Unbalanced Optimal Transpo...

[NLPコロキウム 2023/9/13] Unbalanced Optimal Transport for Unbalanced Word Alignment

Yuki Arase

More Decks by Yuki Arase

Other Decks in Research

Featured

Transcript

Unbalanced Optimal Transport for Unbalanced Word Alignment Yuki Arase Associate

About Me • Career • 2010-2014 Associate Researcher, Microsoft Research

Monolingual Word Alignment • Identifies semantically corresponding words in a

Unbalanced Word Alignment • Null alignment is prevalent in semantically

Unbalanced Word Alignment • Null alignment is prevalent in semantically

Related Work • Bilingual word alignment has been commonly studied

Optimal Transport (OT) Problems ※ OT 7 Cost matrix 1.0

Partial and Unbalanced OT 8 Cost matrix 1.0 0.0 Null

Unbalanced Word Alignment as OT 9 𝑤1 𝑠 𝑤2 𝑠

Link to Statistic Word Alignment 10 𝑤1 𝑠 𝑤2 𝑠

Optimal Transport Alignment: OTAlign • Leverage balanced, partial, and unbalanced

Experiment Settings • Datasets with human alignment • MSR-RTE, Edinburgh++

Unsupervised Alignment: Per Corpus [Observation 1] The best OT problem

Unsupervised Alignment: Per Null Rate 0% 20% 40% 60% 80%

Unsupervised Alignment: Per Null Rate 0% 20% 40% 60% 80%

Supervised Alignment • The entropy-regularized OT is differentiable and thus

Supervised Alignment: Per Corpus [Observation 3] OT-based alignment is competitive

Supervised Alignment: Per Null Rate 70% 80% 90% 100% 0

OTAlign Example 19 State-of-the-art (Lan et al. 2021) OTAlign (Unbalanced

Unsupervised Bilingual Word Alignment 20 • Applied OTAlign to bilingual

Summary This is the first study that connects the paradigms

All Resources Are Available! • Yuki Arase, Han Bao, and