Semantic Shift Stability: Efficient Way to Detect Performance Degradation of Word Embeddings and Pre-trained Language Models

Background & Research Question: The performance of models are degrading
by the lapse of time. One of the solutions is re-training, but it requires huge computational cost. Can we estimate the performance before re-training? Key idea: We use an eﬃciently computable metric Semantic Shift Stability (SSS) based on the methodology of semantic shift analysis. Contributions: - We created models (RoBERTa and word2vec) that vary by time-series and revealed the performance degradation via experiments on Nikkei (Japanese) and NOW (English). - Our experiments reported that a large time-series performance degradation occurs in the years when SSS is smaller. Future work: More diverse dataset and model, and discussion in a persuasive manner. Resources: GitHub: https://github.com/Nikkei/semantic-shift-stability Semantic Shift Stability: Eﬃcient Way to Detect Performance Degradation of Word Embeddings and Pre-trained Language Models Shotaro Ishihara, Hiromu Takahashi, and Hono Shirai (AACL-IJCNLP 2022, Long) Fig. 1: Procedure to calculate SSS. Fig. 2: Word2vec performance improvement vs SSS of Nikkei (upper) and NOW. Fig. 3: Nikkei RoBERTa performance degradation (not improvement) vs SSS. coef: -0.4855 coef: -0.8861 Finding: Performance of models gets worse with the corpus 2016 and 2020.

Shotaro Ishihara*, Hiromu Takahashi*, and Hono Shirai Nikkei Inc. (*equal
contribution) shotaro.ishihara@nex.nikkei.com AACL-IJCNLP 2022 Semantic Shift Stability: Eﬃcient Way to Detect Performance Degradation of Word Embeddings and Pre-trained Language Models

Performance degrades by the lapse of time 3 Year 2020
<= The time when we pre-train language models.

When should we re-train language models? 4 Year 2020 How
much improvement can we gain from re-training?

Corpus 2020 Semantic shift between two corpora 5 Year 2020
Corpus 2019 We use an eﬃciently computable metric Semantic Shift Stability based on the methodology of semantic shift analysis.

Semantic Shift Stability 6 Semantic Shift Stability Corpus 2019 Corpus
2020 Word embeddings Anchor words Mapping: Rotate in two directions • INPUT: Two word2vec models which are more eﬃcient to calculate than pre-training. • OUTPUT: The average value of the degree of semantic shift for all words.

Contributions 7 • We deﬁned Semantic Shift Stability, and proposed
to use it for detecting time-series performance degradation of word embeddings and pre-trained language models. • We created models that vary by time-series and revealed the performance degradation via the experiments on English and Japanese (including 12 RoBERTa on Japanese ﬁnancial news). • Our experiments reported that a large time-series performance degradation occurs in the years when Semantic Shift Stability is smaller.

Experiments: Semantic Shift Stability 8 <= US Presidential election <=
COVID-19 <= Earthquake in Japan

Experiments: 12 RoBERTa with Japanese news 9 Pseudo-perplexity (PPPL) •
Metric for time-series performance degradation. • Computed on the basis of the idea of iteratively replacing each token in a sequence with a mask and summing the corresponding conditional log probabilities.

PPPL correlates with Semantic Shift Stability 10 Values are the
percentage of degradation from previous year. Finding: Performance of models gets worse with the corpus 2016 and 2020.

Experiments: word2vec 11 Classiﬁcation task: How well word2vec trained on
a previous corpus performs against a newer corpus (the corpus 2021). • CORPORA: Nikkei (Japanese) and NOW (English) • INPUT: Average of word embeddings (trained on year #) • CLASSIFIER: LightGBM • OUTPUT: Article genre

Correlation coeﬃcient is -0.4855 and -0.8861 12 Red shows improvement
from previous year. (LEFT: Nikkei, RIGHT: NOW)

Future work 13 • Further experiments with more diverse corpora
and models. • Additional research should lead us to explore ways to formulate the discussion in a more persuasive manner.

Conclusion 14 • Semantic Shift Stability is an eﬃciently computable
metric. • We revealed time-series performance degradation. • Experiments reported that degradation occurs in the years when Semantic Shift Stability is smaller. • For more detail: ◦ Paper: to appear ◦ GitHub: https://github.com/Nikkei/semantic-shift-stability

Semantic Shift Stability: Efficient Way to Dete...

Semantic Shift Stability: Efficient Way to Detect Performance Degradation of Word Embeddings and Pre-trained Language Models

Shotaro Ishihara

More Decks by Shotaro Ishihara

Other Decks in Research

Featured

Transcript

Background & Research Question: The performance of models are degrading

Shotaro Ishihara, Hiromu Takahashi, and Hono Shirai Nikkei Inc. (*equal

Performance degrades by the lapse of time 3 Year 2020

When should we re-train language models? 4 Year 2020 How

Corpus 2020 Semantic shift between two corpora 5 Year 2020

Semantic Shift Stability 6 Semantic Shift Stability Corpus 2019 Corpus

Contributions 7 • We deﬁned Semantic Shift Stability, and proposed

Experiments: Semantic Shift Stability 8 <= US Presidential election <=

Experiments: 12 RoBERTa with Japanese news 9 Pseudo-perplexity (PPPL) •

PPPL correlates with Semantic Shift Stability 10 Values are the

Experiments: word2vec 11 Classiﬁcation task: How well word2vec trained on

Correlation coeﬃcient is -0.4855 and -0.8861 12 Red shows improvement

Future work 13 • Further experiments with more diverse corpora

Conclusion 14 • Semantic Shift Stability is an eﬃciently computable