[ACL2020] Membership Inference Attacks on Sequence-to-Sequence Models: Is My Data In Your Machine Translation System?

Membership Inference Attacks on Sequence-to-Sequence Models Is My Data In
Your Machine Translation System? Sorami Hisamoto*, Matt Post**, Kevin Duh** *Works Applications (Work done while at JHU) **Johns Hopkins University TACL paper, presented @ ACL 2020

Membership Inference [Shokri+ 2017] ‣ Data Privacy is an increasingly
important issue ‣ Membership Inference Problem:   Given a blackbox machine learning model, guess if data was in the training data 2 [Shokri+ 2017] “Membership Inference Attacks against Machine Learning Models”

important issue ‣ Membership Inference Problem:   Given a blackbox machine learning model, guess if data was in the training data 2 [Shokri+ 2017] “Membership Inference Attacks against Machine Learning Models” Service Provider Training Data

important issue ‣ Membership Inference Problem:   Given a blackbox machine learning model, guess if data was in the training data 2 [Shokri+ 2017] “Membership Inference Attacks against Machine Learning Models” Service Provider Machine Learning as a Service Training Data

important issue ‣ Membership Inference Problem:   Given a blackbox machine learning model, guess if data was in the training data 2 [Shokri+ 2017] “Membership Inference Attacks against Machine Learning Models” Service Provider Machine Learning as a Service Blackbox Training Model Training Data Training API

important issue ‣ Membership Inference Problem:   Given a blackbox machine learning model, guess if data was in the training data 2 [Shokri+ 2017] “Membership Inference Attacks against Machine Learning Models” Service Provider Machine Learning as a Service Blackbox Training Model User / Attacker Training Data Private Data Training API

important issue ‣ Membership Inference Problem:   Given a blackbox machine learning model, guess if data was in the training data 2 [Shokri+ 2017] “Membership Inference Attacks against Machine Learning Models” Service Provider Machine Learning as a Service Blackbox Training Model User / Attacker Training Data Private Data Result Training API Prediction API

important issue ‣ Membership Inference Problem:   Given a blackbox machine learning model, guess if data was in the training data 2 [Shokri+ 2017] “Membership Inference Attacks against Machine Learning Models” Service Provider Machine Learning as a Service Blackbox Training Model User / Attacker Training Data Private Data Result Training API Prediction API ? ? Is user’s private data in model training set?

Attack with “Shadow Models” 3 ‣ Assume attacker has access
to the training API (or knows the model detail) ‣ Synthesis data similar to the target training data, and train “shadow models” Service Provider Training API ML as a Service Training Data Target Model Attacker Shadow Set 1 Shadow Set 2 Shadow Model 1 Shadow Model 2 … …

Train “IN or OUT” Classiﬁer for Attack 4 ML as
a Service Shadow Training Data Shadow Model Prediction API Result IN Binary Classiﬁer for Membership Inference Some Other Data Prediction API Result OUT ‣ Shadow model mimics the target, and attacker knows its training data

Attacks Can Be Successful ‣ [Shokri+ 2017] showed that  
you can build an attack classifier with high accuracy ‣ Multi-class Classification problems ‣ Even with real “Machine Learning as a Service” models ‣ Why successful? ‣ Attack mainly exploits the difference in model output distribution 5

Will It Work On More Complex Problems? ‣ Flat Classiﬁcation
‣ Output space: Fixed set of labels ‣Sequence Generation ‣ Output space: Sequence ‣ e.g., Machine Translation, Speech Synthesis, Video Captioning,   Text Summarization 6

Machine Translation (MT) as An Example ‣ Given black-box access
to an MT model,   is it possible to determine whether   a particular sentence pair was in the training set? 7 Blackbox MT Translation API only ? “Hello” ⁶ “Bonjour” Attacker

Possible Scenarios 8 Bitext Data Provider MT Conference Organizer “MT
as a Service” Provider ‣ Attacker may not necessarily be the “bad guy” ‣ Check license violation in published models License License License ‣ Annual bakeoﬀ (e.g., WMT) ‣ Conﬁrm participants are not using test sets Participant Participant Participant ‣ Customized models for users ‣ Attack its own model: Provide privacy guarantee that user data not used elsewhere User Attack Attack Provide & Attack User User

Experiment: Characters 9 Alice Bob Defender Attacker e.g., Service Provider
e.g., Service User

Experiment: Data and Splits ‣ Formulate a fair and reproducible
setup for both Alice and Bob 10 Alice data * Actual experiment details more complicated: Please refer to the paper. She uses this to train her model

setup for both Alice and Bob 10 Alice data Bob Data * Actual experiment details more complicated: Please refer to the paper. She uses this to train her model Subset of Alice data: He can use this in whatever way he desires for attacks

setup for both Alice and Bob 10 Alice data Bob Data * Actual experiment details more complicated: Please refer to the paper. She uses this to train her model Subset of Alice data: He can use this in whatever way he desires for attacks IN probes OUT probes Samples for evaluation: IN and OUT of training

Evaluation Procedure 11 Alice data Target MT model

Evaluation Procedure 11 Alice data Bob data Target MT model
Shadow MT models Attack classiﬁer

Shadow MT models IN probes OUT probes Attack classiﬁer Translate

Shadow MT models IN probes OUT probes Attack classiﬁer Translate Infer Membership

Shadow MT models IN probes OUT probes Attack classiﬁer ‣ If Bob can get attack accuracy above 50%, privacy leak suggested ‣ Alice / Bob model diﬀerence ‣ Bob’s attack accuracy on his own model is likely   the optimistic upper-bound on the real attack Translate Infer Membership

Attack Classifier for Membership Inference ‣ Binary Classification ‣ “IN”
or “OUT” of the model training data? ‣ Features ‣ Modified 1-4 gram precisions ‣ Sentence-level BLEU scores ‣ Optional: MT Model score - extra information for the attacker 12

Attack Classifier for Membership Inference ‣ Binary Classification ‣ “IN”
or “OUT” of the model training data? ‣ Features ‣ Modified 1-4 gram precisions ‣ Sentence-level BLEU scores ‣ Optional: MT Model score - extra information for the attacker 12 Intuition: If output is a “good” translation (i.e. similar to the reference translation), the model might have seen it in training time and memorized it

Results: Attacks Not Successful ‣ Around 50%: same as by
chance ‣ BLEU and N-gram precision: not enough information to distinguish ‣ Using MT model score did not help either 13 Alice Bob:train Bob:valid Bob:test 50.4 51.5 51.1 51.2 Attack Accuracy of Different Probes Accuracy low even for Classifier in-sample data → Overfitting is not the problem * Even with external resources (MT Quality Estimation model or BERT), the results were the same.

Results: Out-of-vocab (OOV) samples ‣ Assumption: Model will not translate
sentences with OOV well ‣ Much better results than entire probe set ‣ Same trend with Out-of-domain probes 14 All OOV 50.4 68.0 Attack Accuracy of OOV subsets

Why Not Successful with seq2seq? ‣ Difference in model output
space ‣ "Fixed set of labels” or “sequence”: Latter far more complex ‣ Flat classification:   Attacks exploit difference in the model output distribution ‣ seq2seq: How to quantify model uncertainty / output quality? 15

Alternative Evaluation: Grouping Probes ‣ Instead of “Per Sentence”, use
500 sentences together ‣ Features: Sentence BLEU bin percentage, Corpus BLEU ‣ Attack possible: Above 50% for Alice and Bob probes ‣ First strong general results for the attacker 16 Attack Accuracy Alice Bob:train Bob:valid Bob:test 61.1 70.4 65.6 64.4

Summary ‣ Membership Inference Attacks on Seq-to-Seq Models ‣ Unlike
multi-class classification cases,   attacks generally not successful (so far) ‣ However, accuracy above chance for some situations ‣ Out-of-vocabulary and Out-of-domain data ‣ Looser definition of attack: Group of sentences ‣ More complex attacks may be effective ‣ Manipulate one sentence and use API multiple times ‣ “Watermark sentences” to influence the target model ‣ … 17 Data available: You can try your attacks github.com/sorami/TACL-Membership

[ACL2020] Membership Inference Attacks on Seque...

[ACL2020] Membership Inference Attacks on Sequence-to-Sequence Models: Is My Data In Your Machine Translation System?

Sorami Hisamoto

More Decks by Sorami Hisamoto

Other Decks in Research

Featured

Transcript

Membership Inference Attacks on Sequence-to-Sequence Models Is My Data In

Membership Inference [Shokri+ 2017] ‣ Data Privacy is an increasingly

Membership Inference [Shokri+ 2017] ‣ Data Privacy is an increasingly

Membership Inference [Shokri+ 2017] ‣ Data Privacy is an increasingly

Membership Inference [Shokri+ 2017] ‣ Data Privacy is an increasingly

Membership Inference [Shokri+ 2017] ‣ Data Privacy is an increasingly

Membership Inference [Shokri+ 2017] ‣ Data Privacy is an increasingly

Membership Inference [Shokri+ 2017] ‣ Data Privacy is an increasingly

Attack with “Shadow Models” 3 ‣ Assume attacker has access

Train “IN or OUT” Classiﬁer for Attack 4 ML as

Attacks Can Be Successful ‣ [Shokri+ 2017] showed that

Will It Work On More Complex Problems? ‣ Flat Classiﬁcation

Machine Translation (MT) as An Example ‣ Given black-box access

Possible Scenarios 8 Bitext Data Provider MT Conference Organizer “MT

Experiment: Characters 9 Alice Bob Defender Attacker e.g., Service Provider

Experiment: Data and Splits ‣ Formulate a fair and reproducible

Experiment: Data and Splits ‣ Formulate a fair and reproducible

Experiment: Data and Splits ‣ Formulate a fair and reproducible

Evaluation Procedure 11 Alice data Target MT model

Evaluation Procedure 11 Alice data Bob data Target MT model

Evaluation Procedure 11 Alice data Bob data Target MT model

Evaluation Procedure 11 Alice data Bob data Target MT model

Evaluation Procedure 11 Alice data Bob data Target MT model

Attack Classiﬁer for Membership Inference ‣ Binary Classiﬁcation ‣ “IN”

Attack Classiﬁer for Membership Inference ‣ Binary Classiﬁcation ‣ “IN”

Results: Attacks Not Successful ‣ Around 50%: same as by

Results: Out-of-vocab (OOV) samples ‣ Assumption: Model will not translate

Why Not Successful with seq2seq? ‣ Diﬀerence in model output

Alternative Evaluation: Grouping Probes ‣ Instead of “Per Sentence”, use

Summary ‣ Membership Inference Attacks on Seq-to-Seq Models ‣ Unlike