Training Data Extraction From Pre-trained Language Models: A Survey

[1] Nicholas Carlini et al. Extracting training data from large
language models. In USENIX Security 21. [2] Nicholas Carlini et al. Quantifying memorization across neural language models. In ICLR 2023. Training Data Extraction From Pre-trained Language Models: A Survey Overview • This study is the first to provide a comprehensive survey of training data extraction from Pre-trained Language Models (PLMs). • Our review covers more than 100 key papers in fields NLP and security: ◦ Preliminary knowledge ◦ Taxonomy of memorization, attacks, defenses, and empirical findings ◦ Future research directions Shotaro Ishihara (Nikkei Inc. [email protected] ) arXiv preprint: https://arxiv.org/abs/2305.16157 Attacks, defenses, and findings • The attack is consist of candidate generation and membership inference. • The pioneering work has identified that personal information can be extracted from pre-trained GPT-2 models [1]. • Experiments show that memorization is related to model size, prompt length and duplications in the training data [2]. • Defenses include pre-processing, training and post-processing. Definition of memorization With the advent of approximate memorization, the concern became similar to a famous issue called model inversion attack. Future research directions • Is memorization always evil? ◦ memorization vs association ◦ memorization vs performance • Toward broader research fields ◦ model inversion attacks ◦ plagiarism detection ◦ image similarity • Evaluation schema ◦ benchmark dataset ◦ evaluation metrics • Model variation ◦ masked language models

Training Data Extraction From Pre-trained Langu...

Training Data Extraction From Pre-trained Language Models: A Survey

Shotaro Ishihara

More Decks by Shotaro Ishihara

Other Decks in Technology

Featured

Transcript

[1] Nicholas Carlini et al. Extracting training data from large