Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Semantic Shift Stability: Efficient Way to Detect Performance Degradation of Word Embeddings and Pre-trained Language Models

Semantic Shift Stability: Efficient Way to Detect Performance Degradation of Word Embeddings and Pre-trained Language Models

Semantic Shift Stability: Efficient Way to Detect Performance Degradation of Word Embeddings and Pre-trained Language Models
Shotaro Ishihara, Hiromu Takahashi, and Hono Shirai (AACL-IJCNLP 2022, Long)

code: https://github.com/Nikkei/semantic-shift-stability

Shotaro Ishihara

November 13, 2022
Tweet

More Decks by Shotaro Ishihara

Other Decks in Research

Transcript

  1. Background & Research Question: The performance of models are degrading

    by the lapse of time. One of the solutions is re-training, but it requires huge computational cost. Can we estimate the performance before re-training? Key idea: We use an efficiently computable metric Semantic Shift Stability (SSS) based on the methodology of semantic shift analysis. Contributions: - We created models (RoBERTa and word2vec) that vary by time-series and revealed the performance degradation via experiments on Nikkei (Japanese) and NOW (English). - Our experiments reported that a large time-series performance degradation occurs in the years when SSS is smaller. Future work: More diverse dataset and model, and discussion in a persuasive manner. Resources: GitHub: https://github.com/Nikkei/semantic-shift-stability Semantic Shift Stability: Efficient Way to Detect Performance Degradation of Word Embeddings and Pre-trained Language Models Shotaro Ishihara, Hiromu Takahashi, and Hono Shirai (AACL-IJCNLP 2022, Long) Fig. 1: Procedure to calculate SSS. Fig. 2: Word2vec performance improvement vs SSS of Nikkei (upper) and NOW. Fig. 3: Nikkei RoBERTa performance degradation (not improvement) vs SSS. coef: -0.4855 coef: -0.8861 Finding: Performance of models gets worse with the corpus 2016 and 2020.
  2. Shotaro Ishihara*, Hiromu Takahashi*, and Hono Shirai Nikkei Inc. (*equal

    contribution) [email protected] AACL-IJCNLP 2022 Semantic Shift Stability: Efficient Way to Detect Performance Degradation of Word Embeddings and Pre-trained Language Models
  3. Performance degrades by the lapse of time 3 Year 2020

    <= The time when we pre-train language models.
  4. When should we re-train language models? 4 Year 2020 How

    much improvement can we gain from re-training?
  5. Corpus 2020 Semantic shift between two corpora 5 Year 2020

    Corpus 2019 We use an efficiently computable metric Semantic Shift Stability based on the methodology of semantic shift analysis.
  6. Semantic Shift Stability 6 Semantic Shift Stability Corpus 2019 Corpus

    2020 Word embeddings Anchor words Mapping: Rotate in two directions • INPUT: Two word2vec models which are more efficient to calculate than pre-training. • OUTPUT: The average value of the degree of semantic shift for all words.
  7. Contributions 7 • We defined Semantic Shift Stability, and proposed

    to use it for detecting time-series performance degradation of word embeddings and pre-trained language models. • We created models that vary by time-series and revealed the performance degradation via the experiments on English and Japanese (including 12 RoBERTa on Japanese financial news). • Our experiments reported that a large time-series performance degradation occurs in the years when Semantic Shift Stability is smaller.
  8. Experiments: Semantic Shift Stability 8 <= US Presidential election <=

    COVID-19 <= Earthquake in Japan
  9. Experiments: 12 RoBERTa with Japanese news 9 Pseudo-perplexity (PPPL) •

    Metric for time-series performance degradation. • Computed on the basis of the idea of iteratively replacing each token in a sequence with a mask and summing the corresponding conditional log probabilities.
  10. PPPL correlates with Semantic Shift Stability 10 Values are the

    percentage of degradation from previous year. Finding: Performance of models gets worse with the corpus 2016 and 2020.
  11. Experiments: word2vec 11 Classification task: How well word2vec trained on

    a previous corpus performs against a newer corpus (the corpus 2021). • CORPORA: Nikkei (Japanese) and NOW (English) • INPUT: Average of word embeddings (trained on year #) • CLASSIFIER: LightGBM • OUTPUT: Article genre
  12. Correlation coefficient is -0.4855 and -0.8861 12 Red shows improvement

    from previous year. (LEFT: Nikkei, RIGHT: NOW)
  13. Future work 13 • Further experiments with more diverse corpora

    and models. • Additional research should lead us to explore ways to formulate the discussion in a more persuasive manner.
  14. Conclusion 14 • Semantic Shift Stability is an efficiently computable

    metric. • We revealed time-series performance degradation. • Experiments reported that degradation occurs in the years when Semantic Shift Stability is smaller. • For more detail: ◦ Paper: to appear ◦ GitHub: https://github.com/Nikkei/semantic-shift-stability