Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Training Data Extraction From Pre-trained Language Models: A Survey

Training Data Extraction From Pre-trained Language Models: A Survey

Shotaro Ishihara (2023). Training Data Extraction From Pre-trained Language Models: A Survey. Proceedings of Third Workshop on Trustworthy Natural Language Processing.
https://arxiv.org/abs/2305.16157
https://trustnlpworkshop.github.io/

Shotaro Ishihara

June 05, 2023
Tweet

More Decks by Shotaro Ishihara

Other Decks in Technology

Transcript

  1. [1] Nicholas Carlini et al. Extracting training data from
    large language models. In USENIX Security 21.
    [2] Nicholas Carlini et al. Quantifying memorization
    across neural language models. In ICLR 2023.
    Training Data Extraction From Pre-trained Language Models: A Survey
    Overview
    ● This study is the first to provide a comprehensive survey
    of training data extraction from Pre-trained Language
    Models (PLMs).
    ● Our review covers more than 100 key papers in fields NLP
    and security:
    ○ Preliminary knowledge
    ○ Taxonomy of memorization, attacks, defenses, and
    empirical findings
    ○ Future research directions
    Shotaro Ishihara (Nikkei Inc. [email protected] ) arXiv preprint: https://arxiv.org/abs/2305.16157
    Attacks, defenses, and findings
    ● The attack is consist of candidate generation and
    membership inference.
    ● The pioneering work has identified that personal
    information can be extracted from pre-trained
    GPT-2 models [1].
    ● Experiments show that memorization is related to
    model size, prompt length and duplications in the
    training data [2].
    ● Defenses include pre-processing, training and
    post-processing.
    Definition of memorization
    With the advent of approximate
    memorization, the concern
    became similar to a famous issue
    called model inversion attack.
    Future research directions
    ● Is memorization always evil?
    ○ memorization vs association
    ○ memorization vs performance
    ● Toward broader research fields
    ○ model inversion attacks
    ○ plagiarism detection
    ○ image similarity
    ● Evaluation schema
    ○ benchmark dataset
    ○ evaluation metrics
    ● Model variation
    ○ masked language models

    View Slide