Semantic Shift Stability: Efficient Way to Detect Performance Degradation of Word Embeddings and Pre-trained Language Models
Semantic Shift Stability: Efficient Way to Detect Performance Degradation of Word Embeddings and Pre-trained Language Models
Shotaro Ishihara, Hiromu Takahashi, and Hono Shirai (AACL-IJCNLP 2022, Long)
Background & Research Question: The performance of models are degrading by the lapse of time. One of the solutions is re-training, but it requires huge computational cost. Can we estimate the performance before re-training? Key idea: We use an efficiently computable metric Semantic Shift Stability (SSS) based on the methodology of semantic shift analysis. Contributions: - We created models (RoBERTa and word2vec) that vary by time-series and revealed the performance degradation via experiments on Nikkei (Japanese) and NOW (English). - Our experiments reported that a large time-series performance degradation occurs in the years when SSS is smaller. Future work: More diverse dataset and model, and discussion in a persuasive manner. Resources: GitHub: https://github.com/Nikkei/semantic-shift-stability Semantic Shift Stability: Efficient Way to Detect Performance Degradation of Word Embeddings and Pre-trained Language Models Shotaro Ishihara, Hiromu Takahashi, and Hono Shirai (AACL-IJCNLP 2022, Long) Fig. 1: Procedure to calculate SSS. Fig. 2: Word2vec performance improvement vs SSS of Nikkei (upper) and NOW. Fig. 3: Nikkei RoBERTa performance degradation (not improvement) vs SSS. coef: -0.4855 coef: -0.8861 Finding: Performance of models gets worse with the corpus 2016 and 2020.
Shotaro Ishihara*, Hiromu Takahashi*, and Hono Shirai Nikkei Inc. (*equal contribution) [email protected] AACL-IJCNLP 2022 Semantic Shift Stability: Efficient Way to Detect Performance Degradation of Word Embeddings and Pre-trained Language Models
Corpus 2020 Semantic shift between two corpora 5 Year 2020 Corpus 2019 We use an efficiently computable metric Semantic Shift Stability based on the methodology of semantic shift analysis.
Semantic Shift Stability 6 Semantic Shift Stability Corpus 2019 Corpus 2020 Word embeddings Anchor words Mapping: Rotate in two directions ● INPUT: Two word2vec models which are more efficient to calculate than pre-training. ● OUTPUT: The average value of the degree of semantic shift for all words.
Contributions 7 ● We defined Semantic Shift Stability, and proposed to use it for detecting time-series performance degradation of word embeddings and pre-trained language models. ● We created models that vary by time-series and revealed the performance degradation via the experiments on English and Japanese (including 12 RoBERTa on Japanese financial news). ● Our experiments reported that a large time-series performance degradation occurs in the years when Semantic Shift Stability is smaller.
Experiments: 12 RoBERTa with Japanese news 9 Pseudo-perplexity (PPPL) ● Metric for time-series performance degradation. ● Computed on the basis of the idea of iteratively replacing each token in a sequence with a mask and summing the corresponding conditional log probabilities.
PPPL correlates with Semantic Shift Stability 10 Values are the percentage of degradation from previous year. Finding: Performance of models gets worse with the corpus 2016 and 2020.
Experiments: word2vec 11 Classification task: How well word2vec trained on a previous corpus performs against a newer corpus (the corpus 2021). ● CORPORA: Nikkei (Japanese) and NOW (English) ● INPUT: Average of word embeddings (trained on year #) ● CLASSIFIER: LightGBM ● OUTPUT: Article genre
Future work 13 ● Further experiments with more diverse corpora and models. ● Additional research should lead us to explore ways to formulate the discussion in a more persuasive manner.
Conclusion 14 ● Semantic Shift Stability is an efficiently computable metric. ● We revealed time-series performance degradation. ● Experiments reported that degradation occurs in the years when Semantic Shift Stability is smaller. ● For more detail: ○ Paper: to appear ○ GitHub: https://github.com/Nikkei/semantic-shift-stability