Membership Inference Attacks on Sequence-to-Sequence Models

Membership Inference Attacks on Sequence-to-Sequence Models A Case in Privacy-preserving
Neural Machine Translation May 3, 2019 Sorami Hisamoto A work with Kevin Duh & Matt Post arxiv.org/abs/1904.05506

Summary ‣ Privacy in Machine Learning  ‣ Membership Inference Problem
‣ “Was this in the model’s training data?” ‣ Attacker creates models to mimic the target blackbox model  ‣ Empirical Results ‣ Multi-class Classiﬁcation: attack successful  Exploits output distribution diﬀerence ‣ Sequence Generation: attack not successful (so far)   More complex output space !2

Self Introduction: Sorami Hisamoto ‣ Visiting Researcher Oct 2018 -
Jun 2019 ‣ Before: NAIST, Japan ‣ Studied under Kevin Duh & Yuji Matsumoto ‣ Word Representations and Dependency Parsing (2012-2014) ‣ Now: WAP Tokushima AI & NLP Lab. ‣ NLP applications to enterprise services ‣ Morphological analysis: Sudachi → !3

‣ Determining “words” in a Japanese sentence is diﬃcult! ‣
Dictionary: 3 million vocabs, constantly updating ‣ Code on GitHub: Java, Python, Elasticsearch plugin ‣ A paper in LREC2018 Sudachi: A Japanese Tokenizer for Business !4

Privacy & Machine Learning

!6 nytimes.com/interactive/2019/opinion/ internet-privacy-project.html

Privacy is more important than ever! ‣ More important in
recent societies ‣ More data to collect ‣ Usefulness of data  ‣ Data → for Machine Learning (ML) … ‣ Increasing interest in the research communities !7 irasutoya.com

NeurIPS2018

ICML2019

ICJAI2019

ISSP2019

Privacy & Natural Language Processing ‣ Some, not many yet;
‣ “Towards Robust and Privacy-preserving Text Representations”  Yitong Li, Timothy Baldwin, Trevor Cohn. ACL2018 (short) ‣ “Privacy-preserving Neural Representations of Text”  Maximin Coavoux, Shashi Narayan, Shay Cohen. EMNLP2018 ‣ “Adversarial Removal of Demographic Attributes from Text Data”  Yanai Elazar, Yoav Goldberg. EMNLP2018 !12

Different kind of problems in ML Privacy ‣Model Inversion ‣
Uses model’s output on a hidden input to infer something about this input ‣Diﬀerential Privacy ‣ Will the model behave diﬀerently if a particular data is removed / added to the training data? ‣Membership Inference → !13

Membership Inference Attacks [Shokri+ 2017]

“Was this in the training data?” ‣ Given a blackbox
machine learning model,   can you guess if a data sample was in the training data?  *Blackbox: no info about model detail; Only access to the API to send input & receive result ‣ [Shokri+ 2017] “Membership Inference Attacks against Machine Learning Models” (IEEE Symposium on Security and Privacy) ‣ Important in real world situations ‣ e.g., “ML as a Service” like Google, Amazon, or MS … ‣ e.g., Private information: Medical records, location, purchase history, … ‣ “Trust, but verify” (Доверяй, но проверяй) !15

Following the tradition in security literature … !16 Alice Bob
Defender Attacker e.g., Service Provider e.g., Service User

Membership Inference Problem !17 Service Provider Training Data

Membership Inference Problem !17 Service Provider Training API Machine Learning
as a Service Training Data Blackbox Training

as a Service Training Data Blackbox Training Model

as a Service User Training Data Blackbox Training Model Private Data

as a Service User Training Data Blackbox Training Model Private Data Prediction API

as a Service User Training Data Blackbox Training Model Private Data Result Prediction API

Membership Inference Problem !17 Service Provider ? Training API Machine
Learning as a Service User Training Data ? ? ? Blackbox Training Model Private Data Result Prediction API Is user’s private data in model training set?

How can Bob “attack” Alice model? !18 Service Provider Training
API ML as a Service Training Data Target Model ‣ Shadow models to mimic the target model

API ML as a Service Training Data Target Model Attacker ‣ Shadow models to mimic the target model Shadow Set 1 Shadow Set 2 Shadow Set 3

API ML as a Service Training Data Target Model Attacker ‣ Shadow models to mimic the target model Shadow Set 1 Shadow Set 2 Shadow Set 3 Assumption: Attacker has an access to the same training API (or knows the target model detail)

API ML as a Service Training Data Target Model Attacker ‣ Shadow models to mimic the target model Shadow Set 1 Shadow Set 2 Shadow Set 3 Shadow Model 1 Shadow Model 2 Shadow Model 3 Assumption: Attacker has an access to the same training API (or knows the target model detail)

API ML as a Service Training Data Target Model Attacker ‣ Shadow models to mimic the target model Shadow Set 1 Shadow Set 2 Shadow Set 3 Shadow Model 1 Shadow Model 2 Shadow Model 3 How to prepare these data? → explained later

Shadow models to train a “in or out” classiﬁer !19
‣ Bob knows what was “in or out” of his shadow model training set ML as a Service Shadow Model

‣ Bob knows what was “in or out” of his shadow model training set ML as a Service Shadow Training Data Shadow Model IN

‣ Bob knows what was “in or out” of his shadow model training set ML as a Service Shadow Training Data Shadow Model IN Some Other Data OUT

‣ Bob knows what was “in or out” of his shadow model training set ML as a Service Shadow Training Data Shadow Model Prediction API Result IN Some Other Data Prediction API Result OUT

‣ Bob knows what was “in or out” of his shadow model training set ML as a Service Shadow Training Data Shadow Model Prediction API Result IN Binary Classiﬁer for Membership Inference Some Other Data Prediction API Result OUT

Synthesizing data for shadow models ‣1. Model-based synthesis ‣ Using
the target model itself ‣ “high conﬁdence result → probably similar to the target training data” ‣2. Statistics-based synthesis ‣ Information about the population from which the target data was drawn ‣ → e.g., prior knowledge of the marginal distributions of features ‣3. Noisy real data ‣ Attacker has data similar to target training data ‣ → can consider it as a “noisy” version !20

Experimental Setup: Target Models ‣Google Prediction API ‣ No conﬁguration
parameters ‣Amazon ML ‣ Few meta-parameters  (max. number of training dat pass, regularization amount) ‣Local Neural Networks !21

Experimental Setup: Data !22 Description Target Model Training Set Size
Number of Shadow Models CIFAR-{10,100} Image recognition 10: 2.5k~15k 100: 5k~30k 100 Purchases Shopping history 10,000 20 Locations Foursquare check-ins 1,200 60 Hospital Stays Inpatients stays in facilities 10,000 10 UCI Adult Census income 10,000 20 MNIST Handwritten digit 10,000 50 ‣Multi-class classiﬁcation problems

Experimental Setup: Accuracy ‣ Membership Inference: binary classiﬁcation ‣ Is
this “IN” or “OUT” of training set? ‣ Accuracy ‣ Same size for both sides → baseline (random): 0.5 ‣ Precision:  fraction of the records inferred as members of the training dataset that are indeed members ‣ Recall:  fraction of the training records that the attacker can correctly infer as members !23

Results: Attacks successful! !24 *Figures from [Shokri+ 2017] Training Set
Size & Attack Precision Precision well above 0.5 (Recall was almost 1.0 for both datasets) Baseline (random guess) = 0.5 CIFAR-10 CIFAR-100

Attack Precision of Datasets and Overﬁtting !25 *Figures from [Shokri+
2017] Target Model Accuracy (Google models) & Attack Precision Mostly > 0.5

Attack Precision of Datasets and Overﬁtting !25 *Figures from [Shokri+
2017] Target Model Accuracy (Google models) & Attack Precision Large gap = Overﬁtting

Why do the attacks work? ‣ Factors aﬀecting leakage ‣
Overﬁtting ‣ Diversity of Training Data ‣ Model Type ‣ Attack exploits the output distribution of classes labels returned by target model !26 *Figure from [Shokri+ 2017] Model Train/Test Accuracy Gap & Attack Prevision

Distributions differ for samples in / out of training data
!27 Prediction Accuracy *Figures from [Shokri+ 2017] Prediction Uncertainty

Mitigation of the Attacks ‣ Restrict the prediction output to
top k classes ‣ Coarsen precision of the prediction output ‣ Increase entropy of the prediction output ‣Regularization !28

How about on more complex problems …? ‣ Membership Inference
Attacks were successful ‣ It was on “flat classification” models ‣ Binary or multi-class classification ‣ How about more complex problems? ‣ e.g., structured prediction / generation ‣ Sequence-to-Sequence models → !29

Sequence-to-Sequence Models [Hisamoto+ 2019]

Will attacks work on seq2seq models? ‣ Previous case: “flat
classification” ‣ Output space: fixed set of labels ‣ → Sequence generation ‣ Output space: sequence of words, length undetermined a priori ‣ e.g., Machine Translation, Speech Synthesis, Video Captioning, Text Summarization !31

Machine Translation (MT) as an example !32 “Given black-box access
to an MT model, is it possible to determine whether a particular sentence pair was in the training set for that model?”

Possible scenarios ‣ Bitext data provider ‣ Providing data under
license restrictions  → check the compliance of the license in services ‣ MT conference organizer ‣ Annual bakeoﬀ  → check if participants are following the rules ‣ “MT as a Service” provider ‣ Providing customized engines with user data  → may want to provide guarantees that   a) user data is not used for other users’ engines  b) if that is used for others, privacy will not be leaked !33

Carol: Neutral judge, for evaluation purpose !34 Alice Bob Defender
Attacker e.g., Service Provider e.g., Service User Carol Judge

Carol: Neutral judge, for evaluation purpose !34 Alice Bob Defender
Attacker e.g., Service Provider e.g., Service User Carol Judge Does not exist in real scenarios

Problem Overview !35 Carol Data Alice Bob

Problem Overview !35 Carol Data Alice Bob Evaluation set 1.
Carol splits data into a) Alice set b) Bob set c) Evaluation set Alice set Bob set

Problem Overview !35 Carol Data Alice Bob Evaluation set Alice
set Alice model Bob set 2. Alice trains her MT model

set Alice model Bob set 3. Bob uses his data and Alice model translation API in whatever way he wants to attack Alice model

set Alice model Bob set 4. Carol receives Bob’s attack results and evaluate

Splitting Data !36 OUT Probes ‣ Probes: sentence samples for
evaluation ‣ {IN, OUT} Probes: {in, not in} the target model training data Corpus 1 Corpus 2 Corpus 3 OUT Probes IN Probes IN Probes OOD Probes

evaluation ‣ {IN, OUT} Probes: {in, not in} the target model training data Corpus 1 Corpus 2 Corpus 3 OUT Probes IN Probes IN Probes OOD Probes Alice model training data

evaluation ‣ {IN, OUT} Probes: {in, not in} the target model training data Corpus 1 Corpus 2 Corpus 3 OUT Probes IN Probes IN Probes OOD Probes Alice model training data Bob data No corpus 2, Alice IN Probes (Corpus 2 is like   MT provider in-house crawled data)

evaluation ‣ {IN, OUT} Probes: {in, not in} the target model training data Corpus 1 Corpus 2 Corpus 3 OUT Probes IN Probes IN Probes OOD Probes Alice model training data Out-of-domain (not in Alice model) Corpus Bob data No corpus 2, Alice IN Probes (Corpus 2 is like   MT provider in-house crawled data)

Experimental Setup: Characters !37 Alice Bob Defender Attacker Carol Judge

Experimental Setup: Characters !37 Defender Attacker Judge Matt Sorami Kevin

Experimental Setup: Characters !37 Defender Attacker Judge Matt Sorami Kevin
Don’t know each other’s data or MT model detail (architecture, training strategy, etc)

Experimental Setup: Data and Splits ‣ Data from WMT2018 ‣
Probes: 5,000 sentence pairs per corpus !38 OUT IN Common Crawl Out-of-domain OUT IN Europarl OUT IN News OUT IN Rapid OUT IN Para Crawl - EMEA - Subtitles - Koran - TED

Probes: 5,000 sentence pairs per corpus !38 OUT IN Common Crawl Out-of-domain OUT IN Europarl OUT IN News OUT IN Rapid OUT IN Para Crawl - EMEA - Subtitles - Koran - TED Alice data

Probes: 5,000 sentence pairs per corpus !38 OUT IN Common Crawl Out-of-domain OUT IN Europarl OUT IN News OUT IN Rapid OUT IN Para Crawl - EMEA - Subtitles - Koran - TED Alice data Bob data

Experimental Setup: Evaluation Protocol 1. Carol splits data, give to
Alice and Bob 2. Alice trains her MT model 3. Bob uses his data in whatever way to create a classiﬁer 4. Carol gives Bob translation of probes by Alice model 5. Bob infers their membership, gives results to Carol 6. Carol evaluates the attack accuracy  *Accuracy: percentage of probes where classiﬁcation result is correct !39 We will release the data (split sets, translation by Matt’s model) so people can try their attack methods

Alice MT architecture (by Matt) ‣ BLEU: 42.6 ‣ 6-layer
Transformer ‣ Joint BPE subword model (32k) ‣ Dual conditional cross-entropy ﬁltering for ParaCrawl ‣ … !40

Attack: Shadow model & data splits ‣ This time, Bob
splits his data to create 10 shadow models ‣ Blue: Training Data for shadow model (smaller box = IN Probes) ‣ Green: OUT Probes !41 Splits for Shadow Models train, valid, test for the classiﬁer to infer membership Probes for   shadow models e.g., 1+ & 1- IN / OUT Probes are ﬂipped to make balanced data

Bob MT architecture (by Sorami) ‣ BLEU: 38.06±0.2  (Alice: 42.6)
‣ 4-layer Transformer  (Alice: 6-layer Transformer) ‣ BPE subword model (30k) for each language  (Alice: joint 32k) ‣ Other parameter / training strategy diﬀerence ‣ … !42

Alice & Bob MT model difference ‣ This time, they
happened to be not so different ‣What if the difference is very large? ‣ Model architecture difference ‣ Available data size ‣ Available computational resources ‣ → Even if the attack accuracy is good within Bob’s data,  It might perform very badly with Alice data  (and in real scenario Bob will not know that) !43

Difference to the previous work [Shokri+ 2017] ‣ Model Training
‣ Bob does not have access to the training API used for Alice model ‣ Attacker Data ‣ Bob has a real subset of Alice data !44

Attack Classifier for Membership Inference ‣ Binary Classification ‣ “IN”
or “OUT” of the model training data? ‣ Features ‣ Modified 1-4 gram precisions ‣ Sentence-level BLEU scores ‣ Later: MT Model score - extra information for the attacker !45

Attack Classifier for Membership Inference ‣ Binary Classification ‣ “IN”
or “OUT” of the model training data? ‣ Features ‣ Modified 1-4 gram precisions ‣ Sentence-level BLEU scores ‣ Later: MT Model score - extra information for the attacker !45 Intuition: If output is a “good” translation (i.e. similar to the reference translation), the model might have seen it in training time and memorized it

Results: the attacks were not successful … ‣ Around 50%:
same as by chance ‣ BLEU and N-gram precision: not enough information to distinguish ‣ Using MT model score did not help either !46 Alice Bob:train Bob:valid Bob:test 50.4 51.4 51.2 51.1 * Accuracy for Decision Tree classifier.   Bob has tried several other types of classifiers but the result trends were the same. Attack Accuracy of Different Probes Accuracy low even for Classifier in-sample (training) data → Overfitting is not the problem

Results: Out-of-domain (OOD) Corpora ‣ Whether the domain was in
MT model training data or not ‣ Assumption: Model will not translate OOD sentences well ‣ → Much better results with OOD data !47 ParaCrawl CommonCrawl Europarl News Rapid 50.3 51.1 49.7 50.7 50.0 Attack Accuracy: In-domain Corpora EMEA Koran Subtitles TED 67.2 94.1 80.2 67.1 Attack Accuracy: Out-of-domain Corpora

Results: Out-of-vocab (OOV) samples ‣ Subset of probes containing OOV
‣ OOV in source (7.4%), in reference (3.2%), in both (1.9%) ‣ Assumption: Model will not translate sentences with OOV well ‣ → Like OOD cases, much better results than entire probe set !48 All OOV in src OOV in ref OOV in both 50.4 73.9 74.1 68.0 Attack Accuracy of OOV subsets

Why was it not successful for seq2seq? ‣ Why successful
with “flat classification”, but not with seq2seq? ‣ One possible reason: model output space difference ‣ "Fixed set of labels” or “arbitrary length sequence”:   Latter if far more complex ‣ In “flat classification” cases it was successful because   the attacker exploits difference in the model output distribution ‣ seq2seq: how can we quantify the uncertainty of model or quality of output? ‣ OOD and OOV: more promising results ‣ Harder for the target model to produce high quality translation   → more distinguishable !49

Further attacks & protections: Arms Race ‣ Multiple API attack
‣ Modify (e.g, drop / add word) and   translate same sentence multiple times, observe diﬀerence ‣ “Watermark” sentence ‣ Add characteristic samples to make it more distinguishable ‣ If Bob has a better chance → Protection by Alice ‣ Subsample data for training ‣ Regularization ‣ … !50 W ork in Progress

Summary ‣ Privacy in Machine Learning  ‣ Membership Inference Problem
‣ “Was this in the model’s training data?” ‣ Attacker creates models to mimic the target blackbox model  ‣ Empirical Results ‣ Multi-class Classiﬁcation: attack successful  Exploits output distribution diﬀerence ‣ Sequence Generation: attack not successful (so far)   More complex output space !51 More at arxiv.org/abs/1904.05506

Membership Inference Attacks on Sequence-to-Seq...

Membership Inference Attacks on Sequence-to-Sequence Models

More Decks by Sorami Hisamoto

Other Decks in Research

Featured

Transcript