$30 off During Our Annual Pro Sale. View Details »

Semantic Shift Stability: Efficient Way to Detect Performance Degradation of Word Embeddings and Pre-trained Language Models

Shotaro Ishihara
November 13, 2022

Semantic Shift Stability: Efficient Way to Detect Performance Degradation of Word Embeddings and Pre-trained Language Models

Semantic Shift Stability: Efficient Way to Detect Performance Degradation of Word Embeddings and Pre-trained Language Models
Shotaro Ishihara, Hiromu Takahashi, and Hono Shirai (AACL-IJCNLP 2022, Long)

code: https://github.com/Nikkei/semantic-shift-stability

Shotaro Ishihara

November 13, 2022
Tweet

More Decks by Shotaro Ishihara

Other Decks in Research

Transcript

  1. Background & Research Question:
    The performance of models are degrading by the lapse of time. One of
    the solutions is re-training, but it requires huge computational cost. Can
    we estimate the performance before re-training?
    Key idea:
    We use an efficiently computable metric Semantic Shift Stability (SSS)
    based on the methodology of semantic shift analysis.
    Contributions:
    - We created models (RoBERTa and word2vec) that vary by time-series
    and revealed the performance degradation via experiments on Nikkei
    (Japanese) and NOW (English).
    - Our experiments reported that a large time-series performance
    degradation occurs in the years when SSS is smaller.
    Future work:
    More diverse dataset and model, and discussion in a persuasive manner.
    Resources:
    GitHub: https://github.com/Nikkei/semantic-shift-stability
    Semantic Shift Stability: Efficient Way to Detect
    Performance Degradation of Word Embeddings
    and Pre-trained Language Models
    Shotaro Ishihara, Hiromu Takahashi, and Hono Shirai (AACL-IJCNLP 2022, Long)
    Fig. 1: Procedure to calculate SSS.
    Fig. 2: Word2vec performance improvement
    vs SSS of Nikkei (upper) and NOW.
    Fig. 3: Nikkei RoBERTa performance degradation (not improvement) vs SSS.
    coef: -0.4855
    coef: -0.8861
    Finding: Performance of models gets
    worse with the corpus 2016 and 2020.

    View Slide

  2. Shotaro Ishihara*, Hiromu Takahashi*, and Hono Shirai
    Nikkei Inc. (*equal contribution)
    [email protected]
    AACL-IJCNLP 2022
    Semantic Shift Stability: Efficient
    Way to Detect Performance
    Degradation of Word Embeddings
    and Pre-trained Language Models

    View Slide

  3. Performance degrades by the lapse of time
    3
    Year
    2020
    <= The time when we pre-train language models.

    View Slide

  4. When should we re-train language models?
    4
    Year
    2020
    How much improvement can we gain from re-training?

    View Slide

  5. Corpus
    2020
    Semantic shift between two corpora
    5
    Year
    2020
    Corpus
    2019
    We use an efficiently computable
    metric Semantic Shift Stability
    based on the methodology of
    semantic shift analysis.

    View Slide

  6. Semantic Shift Stability
    6
    Semantic Shift Stability
    Corpus
    2019
    Corpus
    2020
    Word embeddings
    Anchor words
    Mapping: Rotate in two directions
    ● INPUT: Two word2vec models
    which are more efficient to
    calculate than pre-training.
    ● OUTPUT: The average value of
    the degree of semantic shift
    for all words.

    View Slide

  7. Contributions
    7
    ● We defined Semantic Shift Stability, and proposed to use it for
    detecting time-series performance degradation of word
    embeddings and pre-trained language models.
    ● We created models that vary by time-series and revealed the
    performance degradation via the experiments on English and
    Japanese (including 12 RoBERTa on Japanese financial news).
    ● Our experiments reported that a large time-series performance
    degradation occurs in the years when Semantic Shift Stability is
    smaller.

    View Slide

  8. Experiments: Semantic Shift Stability
    8
    <= US Presidential election
    <= COVID-19
    <= Earthquake in Japan

    View Slide

  9. Experiments: 12 RoBERTa with Japanese news
    9
    Pseudo-perplexity (PPPL)
    ● Metric for time-series
    performance degradation.
    ● Computed on the basis of the
    idea of iteratively replacing
    each token in a sequence with
    a mask and summing the
    corresponding conditional log
    probabilities.

    View Slide

  10. PPPL correlates with Semantic Shift Stability
    10
    Values are the percentage of degradation from previous year.
    Finding: Performance of models gets
    worse with the corpus 2016 and 2020.

    View Slide

  11. Experiments: word2vec
    11
    Classification task: How well word2vec trained on a previous corpus
    performs against a newer corpus (the corpus 2021).
    ● CORPORA: Nikkei (Japanese) and NOW (English)
    ● INPUT: Average of word embeddings (trained on year #)
    ● CLASSIFIER: LightGBM
    ● OUTPUT: Article genre

    View Slide

  12. Correlation coefficient is -0.4855 and -0.8861
    12
    Red shows improvement from previous year. (LEFT: Nikkei, RIGHT: NOW)

    View Slide

  13. Future work
    13
    ● Further experiments with more diverse corpora and models.
    ● Additional research should lead us to explore ways to formulate
    the discussion in a more persuasive manner.

    View Slide

  14. Conclusion
    14
    ● Semantic Shift Stability is an efficiently computable metric.
    ● We revealed time-series performance degradation.
    ● Experiments reported that degradation occurs in the years when
    Semantic Shift Stability is smaller.
    ● For more detail:
    ○ Paper: to appear
    ○ GitHub: https://github.com/Nikkei/semantic-shift-stability

    View Slide