Upgrade to Pro — share decks privately, control downloads, hide ads and more …

[Poster] Nikkei at SemEval-2022 Task 8: Exploring BERT-based Bi-Encoder Approach for Pairwise Multilingual News Article Similarity

[Poster] Nikkei at SemEval-2022 Task 8: Exploring BERT-based Bi-Encoder Approach for Pairwise Multilingual News Article Similarity

Shotaro Ishihara and Hono Shirai. 2022. Nikkei at SemEval-2022 Task 8: Exploring BERT-based Bi-Encoder Approach for Pairwise Multilingual News Article Similarity. In Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022), pages 1208–1214, Seattle, United States. Association for Computational Linguistics.
https://aclanthology.org/2022.semeval-1.171/

Shotaro Ishihara

July 15, 2022
Tweet

More Decks by Shotaro Ishihara

Other Decks in Research

Transcript

  1. Findings:
    A0: Bi-Encoder worked well.
    A1: We used three of them for the final submission.
    A2: CLS outperformed the other three methods.
    A3: The translation did not improve the performance.
    A4: The larger number of splits and max length led the
    higher performance.
    Nikkei at SemEval-2022 Task 8: Exploring BERT-based Bi-Encoder Approach for Pairwise
    Multilingual News Article Similarity
    Overview: This paper presents our exploration of BERT-based
    Bi-Encoder approach for predicting the similarity of two
    multilingual news. There are several findings such as pretrained
    models, pooling methods, translation, data separation, and the
    number of tokens. The weighted average ensemble of the four
    models (id: 1, 2, 7, and 8) achieved the competitive result and
    ranked in the top 12.
    RQ0: Cross-Encoder vs Bi-Encoder?
    RQ1: Which pretrained model works well?
    RQ2: What kind of pooling method is proper?
    RQ3: Is it useful for translating into English?
    RQ4: Is there effect of data splitting and max length?
    Shotaro Ishihara and Hono Shirai (Nikkei) [email protected]

    View Slide