Upgrade to Pro — share decks privately, control downloads, hide ads and more …

[Poster] Nikkei at SemEval-2022 Task 8: Exploring BERT-based Bi-Encoder Approach for Pairwise Multilingual News Article Similarity

[Poster] Nikkei at SemEval-2022 Task 8: Exploring BERT-based Bi-Encoder Approach for Pairwise Multilingual News Article Similarity

Shotaro Ishihara and Hono Shirai. 2022. Nikkei at SemEval-2022 Task 8: Exploring BERT-based Bi-Encoder Approach for Pairwise Multilingual News Article Similarity. In Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022), pages 1208–1214, Seattle, United States. Association for Computational Linguistics.
https://aclanthology.org/2022.semeval-1.171/

Shotaro Ishihara

July 15, 2022
Tweet

More Decks by Shotaro Ishihara

Other Decks in Research

Transcript

  1. Findings: A0: Bi-Encoder worked well. A1: We used three of

    them for the final submission. A2: CLS outperformed the other three methods. A3: The translation did not improve the performance. A4: The larger number of splits and max length led the higher performance. Nikkei at SemEval-2022 Task 8: Exploring BERT-based Bi-Encoder Approach for Pairwise Multilingual News Article Similarity Overview: This paper presents our exploration of BERT-based Bi-Encoder approach for predicting the similarity of two multilingual news. There are several findings such as pretrained models, pooling methods, translation, data separation, and the number of tokens. The weighted average ensemble of the four models (id: 1, 2, 7, and 8) achieved the competitive result and ranked in the top 12. RQ0: Cross-Encoder vs Bi-Encoder? RQ1: Which pretrained model works well? RQ2: What kind of pooling method is proper? RQ3: Is it useful for translating into English? RQ4: Is there effect of data splitting and max length? Shotaro Ishihara and Hono Shirai (Nikkei) shotaro.ishihara@nex.nikkei.com