$30 off During Our Annual Pro Sale. View Details »

nlpir2025 Entity Linking for Geographical Menti...

Avatar for Takashi INUI Takashi INUI
December 22, 2025
2

nlpir2025 Entity Linking for Geographical Mentions Using Address Hierarchy

Avatar for Takashi INUI

Takashi INUI

December 22, 2025
Tweet

More Decks by Takashi INUI

Transcript

  1. Entity Linking for Geographical Mentions Using Address Hierarchy University of

    Tsukuba Name: Takeru Mitsumori, Takashi Inui December 14, 2025 Session 7 KS011 1 NLPIR 2025
  2. Research Background and Purpose ⚫ Document geolocation ⚫ Estimate the

    posting location of social media ⚫ Applying EL to mentions ⚫ Utilizing entity information ⚫ EL for geographical mention is limited ⚫ Geographical mentions ⚫ Mentions with geographical attributes ⚫ e.g. Japan, Tokyo, Kyushu University ⚫ Propose an entity disambiguation method specialized for geographical mentions in Japanese text 3
  3. Related Work ⚫ [Leidner 2004] ⚫ Toponym Resolution (Place names

    in text → real world locations) ⚫ Disambiguation using geographical distance ⚫ [Yamada+ 2022] ⚫ Deep learning-based entity disambiguation model ⚫ Model corresponding for general tasks 4 [Leidner 2004] Leidner, Jochen L. "Toponym resolution in text:“Which Sheffield is it?”." Proceedings of the the 27th annual international ACM SIGIR conference (SIGIR 2004). 2004. [Yamada 2022] Ikuya Yamada, Koki Washio, Hiroyuki Shindo, Yuji Matsumoto. Global Entity Disambiguation with BERT. Association for Computational Language. 2022
  4. Outline ⚫ Investigation of [Yamada+2022] ⚫ Model Input and Output

    ⚫ Model Details ⚫ Model Accuracy, Error Analysis on disambiguation ⚫Proposed Method ⚫ Evaluation Experiment ⚫ Experimental Setup ⚫ Experimental Results ⚫ Analysis of Results 5
  5. Disambiguation Model ⚫ LUKE (Language Understanding with Knowledge- based Embeddings)

    ⚫ Geographical mention → Acquire embedding representation ⚫ Entity Prediction Head ⚫ Predict entity from candidates based on embedding ⚫ Classification problem for candidates 7
  6. Language Model LUKE ⚫Handling words and entities(mentions) ⚫ Input text

    X (token sequence {𝑡1 , 𝑡2 … }) ⚫ 𝑒 ∈ 1, 0 Whether token is a word or mention E𝑚𝑏𝑒𝑑𝑑𝑖𝑛𝑔 = 𝐿𝑈𝐾𝐸(𝑋, 𝑡𝑖 , 𝑒) 8
  7. Language Model LUKE(2) ⚫Distinguishes between words and mentions to entities

    ⚫Uses Japanese pre-trained LUKE* ⚫Model pre-trained on Wikipedia 9 ・Anchor Link → Mention to Entity ・Others → Word * https://huggingface.co/studio-ousia/luke-japanese-base
  8. Evaluation of [Yamada2022+] ⚫ Dataset: Japanese Wikification Corpus* ⚫ Corpus

    based on newspaper articles ⚫ Mention → Mention category, Corresponding Wikipedia article ⚫ Target: 5,525 geographical mentions where the correct answer is included in candidates ⚫ Average number of candidates per mention: 32.9 10 [Yamada+2022] Accuracy(%) 89.8 (4961 / 5525) [Jargalsaikhan+] Jargalsaikhan et at. Building a Corpus for Japanese Wikification with Fine-Grained Entity Classes, Proceedings of the ACL 2016 Student Research
  9. Error Analysis ⚫ Confusion with same-named places 【Correct: Hiyoshi-cho(Kyoto) Output:

    Hiyoshi-cho(Tokorozawa)】 Example: … Center in Hiyoshi-cho, Kyoto Prefecture ⚫ Confusion with entities of different categories 【Correct: Noboribetsu City, Output: Noboribetsu Station】 Example: The restaurant guide … is the first in Noboribetsu 11
  10. Outline ⚫ Investigation of [Yamada+2022] ⚫ Model Input and Output

    ⚫ Model Details ⚫ Model Accuracy, Error Analysis on disambiguation ⚫Proposed Method ⚫ Evaluation Experiment ⚫ Experimental Setup ⚫ Experimental Results ⚫ Analysis of Results 12
  11. Proposed Method: Idea ⚫The error on Hiyoshi-cho in previous slide

    ⚫The address-like relationship between mentions is not captured? ⚫Relationships between mentions ⚫Propose Hierarchical Insertion ⚫Edit input text ⚫Explicitly reflect address hierarchy relationships 13
  12. Hierarchical Insertion (HI) ⚫Each geographical mention has a category ⚫Country,

    City, Station, etc. ⚫Categories with address-like hierarchy ⚫Country > Province > County > City・・・ ⚫Ignore hierarchy other than the 4 categories 14
  13. Execution of Hierarchical Insertion ⚫ Condition: A mention is followed

    by a lower-category mention ⚫ Example: “Ibaraki-ken no Tsukuba-shi” (Tsukuba City in Ibaraki Prefecture) ⚫ Ibaraki-ken → Province, Tsukuba-shi → City ⚫ Operation: Insert the upper-level mention before the lower-level one ⚫ Result: “Ibaraki-ken no Ibaraki-ken Tsukuba-shi” ⚫ This edited text is used as the model input ⚫ The generated entity candidates remain unchanged 15
  14. Why Hierarchical Insertion Works ⚫ Target Geographical Mention→ Treated as

    “Mention” ⚫Entity identity is [Unknown] ⚫Inserted Place name → Treated as “Word” ⚫Enriched context 16
  15. Evaluation Experiment ⚫With HI vs Without HI ⚫Both adopt [Yamada+2022]

    model ⚫Difference is whether to edit input text with HI ⚫Train and infer with edited data ⚫Category information for HI ⚫Category Estimation Model (Details in Appendix) ⚫Infer category from embedding of geographical mention 17
  16. Evaluation Results 18 ⚫Improvement in accuracy observed ⚫Mentions where insertion

    occurred: 345 ⚫Significant difference at 1% significance level Without HI With HI Accuracy (Correct mentions/total mentions) 89.8 (4,961 / 5,525) 94.0 (5,195 / 5,525)
  17. Evaluation Results(2) ⚫Improvement seen in mentions other than place names

    ⚫Improved mentions: 416 mentions ⚫Editing input texts ⚫Influenced the model itself 19
  18. Improvement Example(1) ⚫ Hierarchical information provided directly 【Correct: Miyata Town(Fukuoka),

    Baseline: Mityata Town(Aichi)】 ⚫ Original: “Fukuoka-ken Miyata-cho no…” ⚫ (Miyata Town in Fukuoka Prefecture) ⚫ After Insertion ⚫ “Fukuoka-ken Fukuoka-ken Miyata-cho no…” ⚫ Appears redundant? ⚫ The inserted “Fukuoka-ken” is treated as a word ⚫ →Reinforces geographical context 20 Province City
  19. Improvement Example(2) ⚫ Improvement on “Different Category” errors 【Correct: Mediterranean

    Sea, Baseline: Mediterranean diet】 ・Original: “Chichuu-kai no Shichiria-tou…” (Grew up on Sicily Island in the Mediterranean sea…) ⚫ Improvement without insertion ⚫ Input text remains the same ⚫ The model learned from other cases where insertion occurred 21 Sea Island
  20. Error Case(Limitation) ⚫ Both models failed to disambiguate 【Correct: South

    Island(New Zealand) Output: Yuzhny Island*】 ・“Nyuujiirando no Minami-jima ni aru…” (…Located in New Zealand’s South Island) ⚫ Address-line hierarchy exists (Country→Island) ⚫We restricted the target categories for this experiment ⚫Future Work→Expanding applicable categories 22 Island Country *The southern island of the Novaya Zemlya archipelago, Russia
  21. Summary ⚫Entity Disambiguation for Geographical Mentions ⚫Hierarchical Insertion functions significantly

    ⚫Reduces confusion with same-named places ⚫Reduces confusion with entities of different categories ⚫Future Work ⚫Expansion of Hierarchical Insertion rules ⚫Mentions with insertion were about 6% in total 23