posting location of social media ⚫ Applying EL to mentions ⚫ Utilizing entity information ⚫ EL for geographical mention is limited ⚫ Geographical mentions ⚫ Mentions with geographical attributes ⚫ e.g. Japan, Tokyo, Kyushu University ⚫ Propose an entity disambiguation method specialized for geographical mentions in Japanese text 3
in text → real world locations) ⚫ Disambiguation using geographical distance ⚫ [Yamada+ 2022] ⚫ Deep learning-based entity disambiguation model ⚫ Model corresponding for general tasks 4 [Leidner 2004] Leidner, Jochen L. "Toponym resolution in text:“Which Sheffield is it?”." Proceedings of the the 27th annual international ACM SIGIR conference (SIGIR 2004). 2004. [Yamada 2022] Ikuya Yamada, Koki Washio, Hiroyuki Shindo, Yuji Matsumoto. Global Entity Disambiguation with BERT. Association for Computational Language. 2022
⚫ Geographical mention → Acquire embedding representation ⚫ Entity Prediction Head ⚫ Predict entity from candidates based on embedding ⚫ Classification problem for candidates 7
⚫Uses Japanese pre-trained LUKE* ⚫Model pre-trained on Wikipedia 9 ・Anchor Link → Mention to Entity ・Others → Word * https://huggingface.co/studio-ousia/luke-japanese-base
based on newspaper articles ⚫ Mention → Mention category, Corresponding Wikipedia article ⚫ Target: 5,525 geographical mentions where the correct answer is included in candidates ⚫ Average number of candidates per mention: 32.9 10 [Yamada+2022] Accuracy(%) 89.8 (4961 / 5525) [Jargalsaikhan+] Jargalsaikhan et at. Building a Corpus for Japanese Wikification with Fine-Grained Entity Classes, Proceedings of the ACL 2016 Student Research
Hiyoshi-cho(Tokorozawa)】 Example: … Center in Hiyoshi-cho, Kyoto Prefecture ⚫ Confusion with entities of different categories 【Correct: Noboribetsu City, Output: Noboribetsu Station】 Example: The restaurant guide … is the first in Noboribetsu 11
⚫The address-like relationship between mentions is not captured? ⚫Relationships between mentions ⚫Propose Hierarchical Insertion ⚫Edit input text ⚫Explicitly reflect address hierarchy relationships 13
by a lower-category mention ⚫ Example: “Ibaraki-ken no Tsukuba-shi” (Tsukuba City in Ibaraki Prefecture) ⚫ Ibaraki-ken → Province, Tsukuba-shi → City ⚫ Operation: Insert the upper-level mention before the lower-level one ⚫ Result: “Ibaraki-ken no Ibaraki-ken Tsukuba-shi” ⚫ This edited text is used as the model input ⚫ The generated entity candidates remain unchanged 15
model ⚫Difference is whether to edit input text with HI ⚫Train and infer with edited data ⚫Category information for HI ⚫Category Estimation Model (Details in Appendix) ⚫Infer category from embedding of geographical mention 17
occurred: 345 ⚫Significant difference at 1% significance level Without HI With HI Accuracy (Correct mentions/total mentions) 89.8 (4,961 / 5,525) 94.0 (5,195 / 5,525)
Baseline: Mityata Town(Aichi)】 ⚫ Original: “Fukuoka-ken Miyata-cho no…” ⚫ (Miyata Town in Fukuoka Prefecture) ⚫ After Insertion ⚫ “Fukuoka-ken Fukuoka-ken Miyata-cho no…” ⚫ Appears redundant? ⚫ The inserted “Fukuoka-ken” is treated as a word ⚫ →Reinforces geographical context 20 Province City
Sea, Baseline: Mediterranean diet】 ・Original: “Chichuu-kai no Shichiria-tou…” (Grew up on Sicily Island in the Mediterranean sea…) ⚫ Improvement without insertion ⚫ Input text remains the same ⚫ The model learned from other cases where insertion occurred 21 Sea Island
Island(New Zealand) Output: Yuzhny Island*】 ・“Nyuujiirando no Minami-jima ni aru…” (…Located in New Zealand’s South Island) ⚫ Address-line hierarchy exists (Country→Island) ⚫We restricted the target categories for this experiment ⚫Future Work→Expanding applicable categories 22 Island Country *The southern island of the Novaya Zemlya archipelago, Russia
⚫Reduces confusion with same-named places ⚫Reduces confusion with entities of different categories ⚫Future Work ⚫Expansion of Hierarchical Insertion rules ⚫Mentions with insertion were about 6% in total 23