Semantic Shift Stability: Efficient Way to Detect Performance Degradation of Word Embeddings and Pre-trained Language Models

by Shotaro Ishihara

Slide 1

Slide 1 text

Background & Research Question: The performance of models are degrading by the lapse of time. One of the solutions is re-training, but it requires huge computational cost. Can we estimate the performance before re-training? Key idea: We use an eﬃciently computable metric Semantic Shift Stability (SSS) based on the methodology of semantic shift analysis. Contributions: - We created models (RoBERTa and word2vec) that vary by time-series and revealed the performance degradation via experiments on Nikkei (Japanese) and NOW (English). - Our experiments reported that a large time-series performance degradation occurs in the years when SSS is smaller. Future work: More diverse dataset and model, and discussion in a persuasive manner. Resources: GitHub: https://github.com/Nikkei/semantic-shift-stability Semantic Shift Stability: Eﬃcient Way to Detect Performance Degradation of Word Embeddings and Pre-trained Language Models Shotaro Ishihara, Hiromu Takahashi, and Hono Shirai (AACL-IJCNLP 2022, Long) Fig. 1: Procedure to calculate SSS. Fig. 2: Word2vec performance improvement vs SSS of Nikkei (upper) and NOW. Fig. 3: Nikkei RoBERTa performance degradation (not improvement) vs SSS. coef: -0.4855 coef: -0.8861 Finding: Performance of models gets worse with the corpus 2016 and 2020.

Slide 2

Slide 2 text

Shotaro Ishihara*, Hiromu Takahashi*, and Hono Shirai Nikkei Inc. (*equal contribution) [email protected] AACL-IJCNLP 2022 Semantic Shift Stability: Eﬃcient Way to Detect Performance Degradation of Word Embeddings and Pre-trained Language Models

Slide 3

Slide 3 text

Performance degrades by the lapse of time 3 Year 2020 <= The time when we pre-train language models.

Slide 4

Slide 4 text

When should we re-train language models? 4 Year 2020 How much improvement can we gain from re-training?

Slide 5

Slide 5 text

Corpus 2020 Semantic shift between two corpora 5 Year 2020 Corpus 2019 We use an eﬃciently computable metric Semantic Shift Stability based on the methodology of semantic shift analysis.

Slide 6

Slide 6 text

Semantic Shift Stability 6 Semantic Shift Stability Corpus 2019 Corpus 2020 Word embeddings Anchor words Mapping: Rotate in two directions ● INPUT: Two word2vec models which are more eﬃcient to calculate than pre-training. ● OUTPUT: The average value of the degree of semantic shift for all words.

Slide 7

Slide 7 text

Contributions 7 ● We deﬁned Semantic Shift Stability, and proposed to use it for detecting time-series performance degradation of word embeddings and pre-trained language models. ● We created models that vary by time-series and revealed the performance degradation via the experiments on English and Japanese (including 12 RoBERTa on Japanese ﬁnancial news). ● Our experiments reported that a large time-series performance degradation occurs in the years when Semantic Shift Stability is smaller.

Slide 8

Slide 8 text

Experiments: Semantic Shift Stability 8 <= US Presidential election <= COVID-19 <= Earthquake in Japan

Slide 9

Slide 9 text

Experiments: 12 RoBERTa with Japanese news 9 Pseudo-perplexity (PPPL) ● Metric for time-series performance degradation. ● Computed on the basis of the idea of iteratively replacing each token in a sequence with a mask and summing the corresponding conditional log probabilities.

Slide 10

Slide 10 text

PPPL correlates with Semantic Shift Stability 10 Values are the percentage of degradation from previous year. Finding: Performance of models gets worse with the corpus 2016 and 2020.

Slide 11

Slide 11 text

Experiments: word2vec 11 Classiﬁcation task: How well word2vec trained on a previous corpus performs against a newer corpus (the corpus 2021). ● CORPORA: Nikkei (Japanese) and NOW (English) ● INPUT: Average of word embeddings (trained on year #) ● CLASSIFIER: LightGBM ● OUTPUT: Article genre

Slide 12

Slide 12 text

Correlation coeﬃcient is -0.4855 and -0.8861 12 Red shows improvement from previous year. (LEFT: Nikkei, RIGHT: NOW)

Slide 13

Slide 13 text

Future work 13 ● Further experiments with more diverse corpora and models. ● Additional research should lead us to explore ways to formulate the discussion in a more persuasive manner.

Slide 14

Slide 14 text

Conclusion 14 ● Semantic Shift Stability is an eﬃciently computable metric. ● We revealed time-series performance degradation. ● Experiments reported that degradation occurs in the years when Semantic Shift Stability is smaller. ● For more detail: ○ Paper: to appear ○ GitHub: https://github.com/Nikkei/semantic-shift-stability