[Poster] Nikkei at SemEval-2022 Task 8: Explori...

July 15, 2022

920

[Poster] Nikkei at SemEval-2022 Task 8: Exploring BERT-based Bi-Encoder Approach for Pairwise Multilingual News Article Similarity

Shotaro Ishihara and Hono Shirai. 2022. Nikkei at SemEval-2022 Task 8: Exploring BERT-based Bi-Encoder Approach for Pairwise Multilingual News Article Similarity. In Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022), pages 1208–1214, Seattle, United States. Association for Computational Linguistics.
https://aclanthology.org/2022.semeval-1.171/

Shotaro Ishihara

July 15, 2022

More Decks by Shotaro Ishihara

See All by Shotaro Ishihara

Quantifying Memorization in Continual Pre-training with Japanese General or Industry-Specific Corpora

upura

JOAI2025講評 / joai2025-review

upura

380

AI エージェントを活用した研究再現性の自動定量評価 / scisci2025

upura

140

JSAI2025 企画セッション「人工知能とコンペティション」/ jsai2025-competition

upura

生成的推薦の人気バイアスの分析：暗記の観点から / JSAI2025

upura

240

Semantic Shift Stability: 学習コーパス内の単語の意味変化を用いた事前学習済みモデルの時系列性能劣化の監査

upura

日本語ニュース記事要約支援に向けたドメイン特化事前学習済みモデルの構築と活用 / t5-news-summarization

upura

Web からのデータ収集と探究事例の紹介 / no94_jsai_seminar

upura

330

記者・編集者との協働：情報技術が変えるニュースメディア / Kaishi PU 2024

upura

120

Other Decks in Research

See All in Research

Computational OT #4 - Gradient flow and diffusion models

gpeyre

360

Computational OT #1 - Monge and Kantorovitch

gpeyre

220

最適化と機械学習による問題解決

mickey_kubo

160

引力・斥力を制御可能なランダム部分集合の確率分布

wasyro

220

Self-supervised audiovisual representation learning for remote sensing data

satai

250

【緊急警告】日本の未来設計図～沈没か、再生か。国民と断行するラストチャンス～

yuutakasan

140

Generative Models 2025

takahashihiroshi

13k

NLP Colloquium

junokim

180

時系列データに対する解釈可能な決定木クラスタリング

mickey_kubo

870

20250605_新交通システム推進議連_熊本都市圏「車1割削減、渋滞半減、公共交通2倍」から考える地方都市交通政策

trafficbrain

680

SkySense : A Multi-Modal Remote Sensing Foundation Model Towards Universal Interpretation for Earth Observation Imagery

satai

300

When Submarine Cables Go Dark: Examining the Web Services Resilience Amid Global Internet Disruptions

irvin

280

Featured

See All Featured

Easily Structure & Communicate Ideas using Wireframe

afnizarnur

194

16k

Let's Do A Bunch of Simple Stuff to Make Websites Faster

chriscoyier

507

140k

Bootstrapping a Software Product

garrettdimon

PRO

307

110k

A Tale of Four Properties

chriscoyier

160

23k

Rails Girls Zürich Keynote

gr2m

14k

4 Signs Your Business is Dying

shpigford

184

22k

Navigating Team Friction

lara

188

15k

KATA

mclloyd

14k

How STYLIGHT went responsive

nonsquared

100

5.7k

How to Think Like a Performance Engineer

csswizardry

1.8k

Dealing with People You Can't Stand - Big Design 2015

cassininazir

367

26k

[RailsConf 2023] Rails as a piece of cake

palkan

5.8k

Transcript

Findings: A0: Bi-Encoder worked well. A1: We used three of
them for the ﬁnal submission. A2: CLS outperformed the other three methods. A3: The translation did not improve the performance. A4: The larger number of splits and max length led the higher performance. Nikkei at SemEval-2022 Task 8: Exploring BERT-based Bi-Encoder Approach for Pairwise Multilingual News Article Similarity Overview: This paper presents our exploration of BERT-based Bi-Encoder approach for predicting the similarity of two multilingual news. There are several ﬁndings such as pretrained models, pooling methods, translation, data separation, and the number of tokens. The weighted average ensemble of the four models (id: 1, 2, 7, and 8) achieved the competitive result and ranked in the top 12. RQ0: Cross-Encoder vs Bi-Encoder? RQ1: Which pretrained model works well? RQ2: What kind of pooling method is proper? RQ3: Is it useful for translating into English? RQ4: Is there effect of data splitting and max length? Shotaro Ishihara and Hono Shirai (Nikkei) [email protected]

[Poster] Nikkei at SemEval-2022 Task 8: Explori...

[Poster] Nikkei at SemEval-2022 Task 8: Exploring BERT-based Bi-Encoder Approach for Pairwise Multilingual News Article Similarity

Shotaro Ishihara

More Decks by Shotaro Ishihara

Other Decks in Research

Featured

Transcript

Findings: A0: Bi-Encoder worked well. A1: We used three of