Shotaro Ishihara, Hiromu Takahashi, and Hono Shirai (2023). Quantifying Diachronic Language Change via Word Embeddings: Analysis of Social Events using 11 Years News Articles in Japanese and English. 9th International Conference on Computational Social Science.
Quantifying Diachronic Language Change via
Word Embeddings: Analysis of Social Events using
11 Years News Articles in Japanese and English
● We quantitatively analyzed semantic
shifts caused by social events across
multiple corpora and years (using
news articles published in Japanese
and English between 2011-2021).
● Studies on the analysis of social
events have often focused on a
single event, and it is important to
explore more comprehensive method.
● RQ1: Is the semantic shift caused by
● RQ2: Are the trends of change in
Japan and English similar?
● A1&2: Yes (at least in our approach)
Shotaro Ishihara (Nikkei Inc. [email protected] ), Hiromu Takahashi, Hono Shirai
 How COVID-19 is changing our language: Detecting
semantic shift in twitter word embeddings.
 Semantic Shift Stability: Eﬃcient Way to Detect
Performance Degradation of Word Embeddings and
Pre-trained Language Models. (AACL2022)
We are grateful to Kunihiro Miyazaki for
the useful research discussions.
● A1: The semantic shift stability for
2019-2020 was observed to be the
lowest for Nikkei (ja) and NOW (en), the
degree of change was the greatest.
● A2: The correlation coeﬃcient between
Nikkei and NOW was calculated to be
0.66, indicating a similar trend.
1. Corpora are divided by year, and
word2vec models are trained.
2. We take two trained word2vec
models as input and derive rotation
matrices (R) to align their
3. Stability can be calculated by the
similarities in two directions. 
4. We refer to the average value of
stab of words as semantic shift
stability, and adopted this as a
representative value. 
Inferring Reason of Semantic Shifts
It has the advantage of identifying words
that exhibited a signiﬁcant semantic shift
(words with the lowest stab) in 2019-2020.
● Nikkei: infection, spread, corona,
vaccine, virus, mask, infected, North
Korea, vaccination, and epidemic.
● NOW: king, Scott, de, virus, masks,
wear, mask, pi, q, and wearing.
=> Words related to COVID-19 appeared at
the top of the lists. Note that the analysis
for 2015-2016 implied the impact of the
U.S. presidential election.