Semantic Shift Stability: Efficient Way to Detect Performance Degradation of Word Embeddings and Pre-trained Language Models
Semantic Shift Stability: Efficient Way to Detect Performance Degradation of Word Embeddings and Pre-trained Language Models
Shotaro Ishihara, Hiromu Takahashi, and Hono Shirai (AACL-IJCNLP 2022, Long)
by the lapse of time. One of the solutions is re-training, but it requires huge computational cost. Can we estimate the performance before re-training? Key idea: We use an efficiently computable metric Semantic Shift Stability (SSS) based on the methodology of semantic shift analysis. Contributions: - We created models (RoBERTa and word2vec) that vary by time-series and revealed the performance degradation via experiments on Nikkei (Japanese) and NOW (English). - Our experiments reported that a large time-series performance degradation occurs in the years when SSS is smaller. Future work: More diverse dataset and model, and discussion in a persuasive manner. Resources: GitHub: https://github.com/Nikkei/semantic-shift-stability Semantic Shift Stability: Efficient Way to Detect Performance Degradation of Word Embeddings and Pre-trained Language Models Shotaro Ishihara, Hiromu Takahashi, and Hono Shirai (AACL-IJCNLP 2022, Long) Fig. 1: Procedure to calculate SSS. Fig. 2: Word2vec performance improvement vs SSS of Nikkei (upper) and NOW. Fig. 3: Nikkei RoBERTa performance degradation (not improvement) vs SSS. coef: -0.4855 coef: -0.8861 Finding: Performance of models gets worse with the corpus 2016 and 2020.
contribution) [email protected] AACL-IJCNLP 2022 Semantic Shift Stability: Efficient Way to Detect Performance Degradation of Word Embeddings and Pre-trained Language Models
2020 Word embeddings Anchor words Mapping: Rotate in two directions • INPUT: Two word2vec models which are more efficient to calculate than pre-training. • OUTPUT: The average value of the degree of semantic shift for all words.
to use it for detecting time-series performance degradation of word embeddings and pre-trained language models. • We created models that vary by time-series and revealed the performance degradation via the experiments on English and Japanese (including 12 RoBERTa on Japanese financial news). • Our experiments reported that a large time-series performance degradation occurs in the years when Semantic Shift Stability is smaller.
Metric for time-series performance degradation. • Computed on the basis of the idea of iteratively replacing each token in a sequence with a mask and summing the corresponding conditional log probabilities.
a previous corpus performs against a newer corpus (the corpus 2021). • CORPORA: Nikkei (Japanese) and NOW (English) • INPUT: Average of word embeddings (trained on year #) • CLASSIFIER: LightGBM • OUTPUT: Article genre
metric. • We revealed time-series performance degradation. • Experiments reported that degradation occurs in the years when Semantic Shift Stability is smaller. • For more detail: ◦ Paper: to appear ◦ GitHub: https://github.com/Nikkei/semantic-shift-stability