Upgrade to Pro — share decks privately, control downloads, hide ads and more …

[NLPコロキウム 2023/9/13] Unbalanced Optimal Transport for Unbalanced Word Alignment

Yuki Arase
September 13, 2023

[NLPコロキウム 2023/9/13] Unbalanced Optimal Transport for Unbalanced Word Alignment

Monolingual word alignment, which identifies semantically corresponding words in a sentence pair, has been actively studied as a crucial technique for modelling semantic relationships between sentences, such as for paraphrase identification and textual entailment recognition. Remarkably, the alignment information has been recently recognized as valuable cues for interpreting model predictions with application to quality estimation and hallucination detection. Despite years of dedicated research, challenges still persist in many-to-many and null alignment, which constitutes an *unbalanced* word alignment problem.
In this talk, we show that the optimal transport (OT) based methods are natural and sufficiently powerful approaches to unbalanced word alignment without tailor-made techniques. We provide a comprehensive analysis that unveils the characteristics of different OT problems on unbalanced word alignment across diverse null alignment ratios.

Yuki Arase

September 13, 2023
Tweet

Other Decks in Research

Transcript

  1. Unbalanced Optimal Transport for
    Unbalanced Word Alignment
    Yuki Arase
    Associate Professor, Osaka University, Japan
    NLP

    View Slide

  2. About Me
    • Career
    • 2010-2014 Associate Researcher, Microsoft Research Asia, China
    • 2014- Associate Professor, Osaka University, Japan
    • Research Interest
    • Paraphrase recognition and generation
    • NLP for language education and healthcare
    • Community Service
    • PC Chair @ IJCNLP-AACL2023
    • MAL @ Asian Federation of NLP,
    2

    View Slide

  3. Monolingual Word Alignment
    • Identifies semantically corresponding words in a sentence pair
    • Is crucial for modelling semantic interactions between sentences:
    • Paraphrase & entailment recognition
    • Summarization & sentence fusion
    • Interpretability, provenance of LLM outputs
    3
    The agency described in a statement that the information was a pack of lies
    It said in a bulletin that reports about the incident are cheap lies and
    news rumors

    View Slide

  4. Unbalanced Word Alignment
    • Null alignment is prevalent in semantically divergent
    sentences, which makes the alignment problem challenging
    • Null alignment ratio can be ~64% in entailment sentence pairs
    • Identification of null alignment is useful to declare semantic
    gaps and reason about semantic (dis)similarity
    The agency described in a statement that the information was a pack of lies
    It said in a bulletin that reports about the incident are cheap lies and rumors
    news

    View Slide

  5. Unbalanced Word Alignment
    • Null alignment is prevalent in semantically divergent
    sentences, which makes the alignment problem challenging
    • Null alignment ratio can be ~64% in entailment sentence pairs
    • Identification of null alignment is useful to declare semantic
    gaps and reason about semantic (dis)similarity
    Two days later , a 28-year-old man died in a shark attack in Avon , North Carolina .
    A shark attacked a human being .

    View Slide

  6. Related Work
    • Bilingual word alignment has been commonly studied for MT,
    e.g., (Garg et al. 2019, Zenkel et al. 2020)
    • Assume the availability of a large-scale parallel corpus
    • Monolingual word alignment commonly uses supervised
    learning, e.g., (Yao et al. 2013, Lan et al. 2021)
    • Modelled word alignment using the CRF regarding source words as
    observations and target words as hidden states
    • Null alignment has got less attention.
    • Critical to handle semantically divergent sentence pairs
    6

    View Slide

  7. Optimal Transport (OT) Problems
    ※ OT
    7
    Cost matrix
    1.0
    0.0

    View Slide

  8. Partial and Unbalanced OT
    8
    Cost matrix
    1.0
    0.0
    Null alignment

    View Slide

  9. Unbalanced Word Alignment as OT
    9
    𝑤1
    𝑠 𝑤2
    𝑠 𝑤3
    𝑠 𝑤4
    𝑠 𝑤5
    𝑠 𝑤6
    𝑠 𝑤7
    𝑠 𝑤8
    𝑠
    𝑤1
    𝑡 𝑤2
    𝑡 𝑤3
    𝑡 𝑤4
    𝑡 𝑤5
    𝑡 𝑤6
    𝑡 𝑤7
    𝑡
    Source
    Target
    1.0
    0.0
    Distance matrix

    View Slide

  10. Link to Statistic Word Alignment
    10
    𝑤1
    𝑠 𝑤2
    𝑠 𝑤3
    𝑠 𝑤4
    𝑠 𝑤5
    𝑠 𝑤6
    𝑠 𝑤7
    𝑠 𝑤8
    𝑠
    𝑤1
    𝑡 𝑤2
    𝑡 𝑤3
    𝑡 𝑤4
    𝑡 𝑤5
    𝑡 𝑤6
    𝑡 𝑤7
    𝑡
    Source
    Target
    1.0
    0.0
    Distance matrix
    Distortion
    (IBM Model2)
    Fertility
    (IBM Model3)
    Brown et al. 2003. The Mathematics of Statistical Machine Translation: Parameter Estimation. CL.

    View Slide

  11. Optimal Transport Alignment: OTAlign
    • Leverage balanced, partial, and unbalanced OT for unbalanced
    word alignment
    • Obtain contextualized word embeddings using a pretrained LM,
    namely BERT
    • Cost: Cosine and Euclidean distances of embeddings
    • Fertility: L2-norms / Uniform
    • Sparcify alignments
    • Regularization on OT makes the alignment matrix dense
    • Prune alignment whose probability is smaller than a threshold
    11

    View Slide

  12. Experiment Settings
    • Datasets with human alignment
    • MSR-RTE, Edinburgh++
    • MultiMWA: MTRef, Wiki, Newsela, and ArXiv
    • Evaluation metrics
    precision =

    𝕐𝑎
    ∩ 𝕐𝑎
    + ෡
    𝕐∅
    ∩ 𝕐∅

    𝕐𝑎
    + ෡
    𝕐∅
    ,
    recall =

    𝕐𝑎
    ∩ 𝕐𝑎
    + ෡
    𝕐∅
    ∩ 𝕐∅
    𝕐𝑎
    + 𝕐∅
    12

    View Slide

  13. Unsupervised Alignment: Per Corpus
    [Observation 1]
    The best OT problem depends on null alignment ratios
    13
    Corpus (sparse ↔ dense) MSR-RTE Newsela EDB++ MTRef Arxiv Wiki
    Alignment links S S + P S S + P S S + P S S + P S S + P S
    Null rate (%) 63.8 59.0 33.3 23.5 27.4 19.0 18.7 11.2 12.8 12.2 8.3
    fast-align 42.3 41.6 58.4 56.5 59.6 60.8 58.1 58.0 80.5 80.5 87.2
    SimAlign 85.4 81.5 76.7 77.3 74.7 78.9 74.8 75.8 91.7 91.9 94.8
    Type Reg. cost mass
    BOT
    -- cosine uniform 20.6 22.5 41.4 46.9 49.0 55.0 50.4 55.5 65.6 66.2 66.5
    Sk cosine uniform 88.8 83.0 83.7 79.4 84.4 82.8 77.3 77.2 90.4 90.9 93.9
    POT
    -- cosine uniform 89.0 84.0 77.1 76.2 78.4 78.7 75.6 76.2 84.3 89.9 94.5
    Sk cosine uniform 92.2 86.4 84.6 79.8 83.8 82.3 77.0 76.6 91.5 90.3 93.9
    UOT Sk cosine uniform 90.2 84.5 83.1 79.1 84.7 82.5 77.2 77.1 90.0 89.6 93.8

    View Slide

  14. Unsupervised Alignment: Per Null Rate
    0%
    20%
    40%
    60%
    80%
    100%
    0 20 40 60 80 100
    Alignment F1 (%)
    Null ratio (%)
    fast-align
    SimAlign
    BOT: cos, uniform
    Regularised BOT:
    cos, uniform
    POT: cos, uniform
    Regularised POT:
    cos, uniform
    UOT: cos, uniform
    14

    View Slide

  15. Unsupervised Alignment: Per Null Rate
    0%
    20%
    40%
    60%
    80%
    100%
    0 20 40 60 80 100
    Alignment F1 (%)
    Null atio (%)
    fast-align
    SimAlign
    BOT: cos, uniform
    Regularised BOT:
    cos, uniform
    POT: cos, uniform
    Regularised POT:
    cos, uniform
    UOT: cos, uniform
    15
    [Observation 2]
    Thresholding on the alignment matrix makes it unbalanced.

    View Slide

  16. Supervised Alignment
    • The entropy-regularized OT is differentiable and thus can be
    directly integrated into neural models.
    • Fine-tune the entire model by minimizing the binary cross-
    entropy loss:
    ℒ 𝑃𝑖,𝑗
    , 𝑌𝑖,𝑗
    = −𝑌𝑖,𝑗
    log 𝑃𝑖,𝑗
    − 1 − 𝑌𝑖,𝑗
    log(1 − 𝑃𝑖,𝑗
    )
    16

    View Slide

  17. Supervised Alignment: Per Corpus
    [Observation 3] OT-based alignment is competitive against the
    SoTA methods on datasets with higher null alignment ratios.
    17
    Corpus (sparse ↔ dense) MSR-RTE Newsela EDB++ MTRef Arxiv Wiki
    Alignment links S S + P S S + P S S + P S S + P S S + P S
    Null rate (%) 63.8 59.0 33.3 23.5 27.4 19.0 18.7 11.2 12.8 12.2 8.3
    (Lan et al. 2021) 95.1 89.2 86.7 85.3 88.3 87.8 83.4 86.1 95.2 95.0 96.6
    (Nagata et al. 2020) 95.0 89.2 79.4 82.4 86.9 87.2 82.9 88.0 89.1 89.5 96.5
    Type cost mass
    BOT cosine norm 94.6 88.4 86.5 84.4 85.7 85.4 82.9 87.3 91.7 93.0 96.5
    POT cosine norm 94.6 88.4 84.0 81.4 85.5 83.7 82.0 85.2 93.0 92.2 95.5
    UOT cosine norm 94.8 89.0 86.8 84.7 86.7 86.6 82.9 87.4 92.5 92.8 96.7

    View Slide

  18. Supervised Alignment: Per Null Rate
    70%
    80%
    90%
    100%
    0 20 40 60 80 100
    Alignment F1 (%)
    Null ratio (%)
    (Lan et al., 2021)
    (Nagata et al., 2020)
    Regularised BOT: cos, norm
    Regularised POT: cos, norm
    UOT: cos, norm
    18

    View Slide

  19. OTAlign Example
    19
    State-of-the-art (Lan et al. 2021) OTAlign (Unbalanced OT)

    View Slide

  20. Unsupervised Bilingual Word Alignment
    20
    • Applied OTAlign to bilingual word alignment
    • Multilingual pre-trained model: LaBSE
    Corpus de-en sv-en fr-en ro-en ja-en zh-en
    Awsome-align (Dou and Neubig 2021) 82.5 90.2 94.3 72.1 54.5 82.1
    AccAlign (Wang et al. 2022) 84.0 92.6 95.5 79.2 56.7 83.8
    Type cost mass
    BOT cosine norm 82.1 90.5 92.8 76.6 51.8 84.0
    UOT cosine norm 85.3 93.6 96.3 79.9 59.5 84.8
    * Hyper-parameters were tuned on the dev set (cs-en)

    View Slide

  21. Summary
    This is the first study that connects the paradigms of
    unbalanced word alignment and the OT problems.
    We empirically showed
    1. OTAlign is a natural and powerful tool to unbalanced word
    alignment without tailor-made techniques
    2. a comprehensive picture that unveils the characteristics of
    the OT problems on unbalanced word alignment
    21
    OTAlign:
    https://github.com/yukiar/OTAlign

    View Slide

  22. All Resources Are Available!
    • Yuki Arase, Han Bao, and Sho Yokoi. Unbalanced Optimal
    Transport for Unbalanced Word Alignment. In Proc. of ACL 2023.
    • OTAlign: https://github.com/yukiar/OTAlign
    • Yuki Arase and Jun’ichi Tsujii. 2020. Compositional Phrase
    Alignment and Beyond. In Proc. of EMNLP 2020.
    • Sora Kadotani and Yuki Arase. 2023. Monolingual Phrase
    Alignment as Parse Forest Mapping. In Proc. of *SEM 2023.
    • Phrase Aligner:
    https://github.com/yukiar/phrase_alignment_cted

    View Slide