Slide 1

Slide 1 text

Computational Approaches for Diachronic Semantic Change Detection Zhidong Ling Tokyo Metropolitan University M2 2024.8 1

Slide 2

Slide 2 text

Self Introduction 凌 志栋, Zhidong Ling (Zh), Ryo Shito(Jp) 2nd year Master Student Tokyo Metropolitan University Natural Language Processing Group (will disappear in 2025) Plan to go PhD to Hitotsubashi University from 2025, same supervisor Prof. Mamoru Komachi 2 My cat named Dog

Slide 3

Slide 3 text

Table of Contents 1. Diachronic Semantic Change Detection (DSCD) 2. Basic Computational Methods for DSCD a. for Detection b. for Analysis 3. Evaluation of DSCD (what I have worked on) 4. Current Topics of Semantic Change 3

Slide 4

Slide 4 text

Diachronic Semantic Change Detection (DSCD) ● To detect words that changed its meaning from diachronic texts ● Objective : Manual Check → Automatic Detection / Analysis Diachronic Word Embeddings Reveal Statistical Laws of Semantic Change [Hamilton+16] 4

Slide 5

Slide 5 text

Diachronic Semantic Change Detection (DSCD) ● To detect words that changed its meaning from diachronic texts ● Objective : Manual Check → Automatic Detection / Analysis Diachronic Word Embeddings Reveal Statistical Laws of Semantic Change [Hamilton+16] 5

Slide 6

Slide 6 text

Basic Methods of DSCD Methods for Detection via word embedding ● Static + Alignment + Distance ● Contextualized + Clustering + Distance/Distribution ● Methods out of the paradigm Methods for Analysis ● Topic Model with great explainability 6

Slide 7

Slide 7 text

Word Embeddings Reflect Semantic Change Distribution Hypothesis: Words with similar distributions have similar meanings ○ Word meaning is determined by context words ○ In embedding space, words with similar/related meanings get closer to each other Car Auto Wagon Dog Hound ➔ Police wagons and fire wagons were not moving. ➔ We have a family in our church whose teenage son was in an auto accident. ➔ They arrived by car. ➔ A wily fox will outrun a pack of hounds, but never a bullet. ➔ The British are renowned as a nation of dog lovers. 7

Slide 8

Slide 8 text

How Do Word Embeddings Reflect Semantic Change ● For target word, learn different embeddings from different periods of corpora ● Different meanings = Longer distance in embedding space ● The distance (kind of) reflects the semantic change 8 Horse Coach_1850s Drive Basketball Football Coach_2010s ➔ [1851] Louis and his brother generally patronized the top of the coach, but as they drew near Bristol ➔ [1851] The coachman said, " Yes, yes; " and Rollo got into the coach. ➔ [2010] I am here with legendary icon and basketball coach

Slide 9

Slide 9 text

Static+Alignment+Distance Static Embedding e.g. Word2Vec (one word one embedding) 9 Coach @1850s usage1 usage2… Coach @2010s usage1 usage2… SGNS, CBOW, … V_Coach_1850s = [0.1, 0.3, 0.6…] V_Coach _2010s = [0.9, 0.4, -0.7…] Corpus @1850s Corpus @2010s V_Drive_2010s = [0.2, 0.3, -0.1…] V_Basketball_2010s = [-1.3, 0.4, -0.4…] V_Drive_1850s = [0.8, 0.1, -0.1…] V_Basketball_1850s = [-2.3, 0.5, -0.4…]

Slide 10

Slide 10 text

Static+Alignment+Distance 10 Horse_1850s Coach_1850s Drive_1850s Basketball_1850s Football_1850s Horse_2010s Drive_2010s Basketball_2010s Football_2010s Coach_2010s Cannot be compared because they are in different embedding spaces Embedding Space @1850s Embedding Space @2010s

Slide 11

Slide 11 text

Static+Alignment+Distance 11 Alignment e.g. Orthogonal Procrustes [Hamilton+16] Horse_1850s Coach_1850s Drive_1850s Basketball_1850s Football_1850s Horse_2010s Coach_1850s Drive_2010s Basketball_2010s Football_2010 s Coach_2010s Horse_1850s Drive_1850s Basketball_1850s Football_1850s into same space Rotate with R(θ) Embedding Space @1850s Embedding Space @2010s

Slide 12

Slide 12 text

Static+Alignment+Distance 12 Distance e.g. Cosine Similarity Horse_1850s Coach_1850s Drive_1850s Basketball_ 1850s Football_18 50s Horse_2010s Coach_1850s Drive_2010s Basketball_2010s Football_2010s Coach_2010s Horse_1850s Drive_1850s Basketball_1850s Football_1850s Rotate with R(θ) into same space Embedding Space @1850s Embedding Space @2010s / Cosine Distance=(1- CosSim)

Slide 13

Slide 13 text

Contextualized+Clustering+Distance/Distribution 13 Coach @1850s usage1 usage2… Coach @2010s usage1 usage2… BERT (w/ w/o FT) … ~Coach Drive~ ~Coach Basketball~ Contextualized e.g. BERT (one token one embedding) (almost) groupby Sense clustering

Slide 14

Slide 14 text

Contextualized+Clustering+Distance/Distribution 14 ~Coach Drive~ ~Coach Drive~ ~Coach Basketball~ Embedding Distribution @1850s Embedding Distribution @2010s Distribution e.g. The SC degree = JSD(D_1850s,D_2010s) Drive Basketball Drive Basketball ~Coach Basketball~

Slide 15

Slide 15 text

Paradigms for Detection Static emd - Alignment - Distance 15 SGNS, CBOW, … Contextualized emd - Clustering - Distance/Distribution BERT XLM-R … @ time1 sense1 sense2… @ time2 sense1 sense2…

Slide 16

Slide 16 text

Methods out of the paradigm 16 Swap and Predict [Aida+2023] Detecting Changes by norm and mean of vectors [Nagata+2023]

Slide 17

Slide 17 text

Method for Analysis Infinite-SCAN [Inoue+2022] ● A Bayesian Model ● Jointly estimate the number of senses of words and the trend of their changes ● Output the distribution of senses annotated with the snippets (words in the context) → Explainable Results for Analysis 17 Sense distribution of Coach Sense distribution of Record

Slide 18

Slide 18 text

Evaluation of DSCD Early Stage : pre-selected word list that we know those words changed ● How many target words in the top K of the rank ● Cons: ○ Lack of the semantic Stable words ○ Lack of the materials for analysis (no proper usages of the target word to support) Now : Word list manually annotated with degrees of semantic change ● Metric: Spearman’s Correlation between prediction and human judge 18

Slide 19

Slide 19 text

How to create the degree of Semantic Change Diachronic Usage Relatedness (DURel) [Schlechtweg+2018] ● A Framework for the Annotation of Lexical Semantic Change ● By manually annotating the semantic relatedness to the target word across 2 usages (a usage pair), We can calculate the average of all scores as the degree of change 19 4-point scale of relatedness Usage Pair : ● [Corpus 1] Louis and his brother generally patronized the top of the Coach , but as they drew near Bristol ● [Corpus 2] I am here with legendary icon and basketball Coach → Score : 1 An example of annotation to a usage pair

Slide 20

Slide 20 text

How to create the degree of Semantic Change Diachronic Usage Relatedness (DURel) [Schlechtweg+2018] 20 Stable more changed Chinese dataset based on DURel [Chen+2022] C1:1953~1978 C2:1979~2003 Reform and Opening-up (改革开放) 机制(machine-made -> mechanism) 软(soft sofa?->soft landing) 照片(photos) 雪(snow)

Slide 21

Slide 21 text

How to create the degree of Semantic Change Diachronic Word Usage Graphs (DWUG) [Schlechtweg+2021] ● An extent version of DURel with multi-round incremental annotation process ● Each word has a graph, node = usage, edge = relatedness 21

Slide 22

Slide 22 text

How to create the degree of Semantic Change Diachronic Word Usage Graphs (DWUG) [Schlechtweg+2021] ● Datasets for 4 languages published (En, Ge, Sw, La) and SemEval 2020 task 1 ● Expansive, and EXTREMELY time consuming (according to ZH dataset author) ● Access (More languages available now) 22

Slide 23

Slide 23 text

Current Topics of Semantic Change (detection & analysis) [~2023] ● Fine Tuning for DSCD ○ MLM with Time Label Masking [Rosin+2022a] ○ Time Aware Self-attention Mechanism [Rosin+2022b] ○ Prompt-based Time Adaptation [Tang+2023] ○ Fine tuned XLM-R on WiC (Word in Context) [Cassotti+2023] ←SOTA in 2023 [2024: ACL, ECAL] ● Exploring the type/pattern of semantic change [Cassotti+2024] ● Detecting semantic change by replacing words [Periti+2024] ● Semantic distance metric learning approach [Aida+2024] ● Definition Generation + DSCD [Fedorova+2024] ● Annotation Tool [Schlechtweg+2024] 23

Slide 24

Slide 24 text

Current Topics of Semantic Change (detection & analysis) [~2023] ● Fine Tuning for DSCD ○ MLM with Time Label Masking [Rosin+2022a] ○ Time Aware Self-attention Mechanism [Rosin+2022b] ○ Prompt-based Time Adaptation [Tang+2023] ○ Fine tuned XLM-R on WiC (Word in Context) [Cassotti+2023] [2024: ACL, ECAL] ● Exploring the type/pattern of semantic change [Cassotti+2024] ● Detecting semantic change by replacing words [Periti+2024] ● Semantic distance metric learning approach [Aida+2024] ● Definition Generation + DSCD [Fedorova+2024] ● Annotation Tool [Schlechtweg+2024] 24 SOTA Race on SemEval datasets Explainability in Evaluation Semantic change w/ LLM

Slide 25

Slide 25 text

Round Table Topics in LChange (ACL2024) LChange a workshop for language change 25 ● Difficult task rather than easy task ● Focus on real world application

Slide 26

Slide 26 text

The future(I think) of Semantic Change Detection ● The issues of the methods : Explainability ○ Only output the degree of change, no direct result of how word senses changed ● The issues of the evaluation ○ Maybe the ground truth we(I) want is not the degree but the word sense distribution? ○ With the degree of SC we can already know these methods can detect the change or not. The next task might be the Pattern prediction or Word sense description (with the generative models) ⇒ No (enough) data for these tasks yet ● More crossfield topics for Application ○ Semantic Change + Social Science (digging up new concept from (web) corpus) ○ Semantic Change + Lexicography (adding new meanings into dictionaries) ○ Semantic Change + Healthcare/Biomedical NLP maybe? 26

Slide 27

Slide 27 text

References 1. A Wind of Change: Detecting and Evaluating Lexical Semantic Change across Times and Domains 2. Diachronic Usage Relatedness (DURel): A Framework for the Annotation of Lexical Semantic Change 3. DWUG: A large Resource of Diachronic Word Usage Graphs in Four Languages 4. SemEval-2010s Task 1: Unsupervised Lexical Semantic Change Detection 5. Analysing Lexical Semantic Change with Contextualised Word Representations 6. Lexicon of Changes: Towards the Evaluation of Diachronic Semantic Shift in Chinese 7. Swap and Predict Predicting the Semantic Changes in Words across Corpora by Context Swapping 8. Variance Matters: Detecting Semantic Differences without Corpus/Word Alignment 9. Infinite SCAN: An Infinite Model of Diachronic Semantic Change 10. Time Masking for Temporal Language Models 11. Temporal Attention for Language Models 12. Learning Dynamic Contextualised Word Embeddings via Template-based Temporal Adaptation 13. XL-LEXEME: WiC Pretrained Model for Cross-Lingual LEXical sEMantic changE 14. Using Synchronic Definitions and Semantic Relations to Classify Semantic Change Types 15. Analyzing Semantic Change through Lexical Replacements 16. A Semantic Distance Metric Learning approach for Lexical Semantic Change Detection 17. Definition generation for lexical semantic change detection 18. The DURel Annotation Tool: Human and Computational Measurement of Semantic Proximity, Sense Clusters and Semantic Change 27