[ACL2020] Membership Inference Attacks on Sequence-to-Sequence Models: Is My Data In Your Machine Translation System?

Slide 1

Slide 1 text

Membership Inference Attacks on Sequence-to-Sequence Models Is My Data In Your Machine Translation System? Sorami Hisamoto*, Matt Post**, Kevin Duh** *Works Applications (Work done while at JHU) **Johns Hopkins University TACL paper, presented @ ACL 2020

Slide 2

Slide 2 text

Slide 3

Slide 3 text

Slide 4

Slide 4 text

Membership Inference [Shokri+ 2017] ‣ Data Privacy is an increasingly important issue ‣ Membership Inference Problem:   Given a blackbox machine learning model, guess if data was in the training data 2 [Shokri+ 2017] “Membership Inference Attacks against Machine Learning Models” Service Provider Machine Learning as a Service Training Data

Slide 5

Slide 5 text

Slide 6

Slide 6 text

Slide 7

Slide 7 text

Slide 8

Slide 8 text

Slide 9

Slide 9 text

Attack with “Shadow Models” 3 ‣ Assume attacker has access to the training API (or knows the model detail) ‣ Synthesis data similar to the target training data, and train “shadow models” Service Provider Training API ML as a Service Training Data Target Model Attacker Shadow Set 1 Shadow Set 2 Shadow Model 1 Shadow Model 2 … …

Slide 10

Slide 10 text

Train “IN or OUT” Classiﬁer for Attack 4 ML as a Service Shadow Training Data Shadow Model Prediction API Result IN Binary Classiﬁer for Membership Inference Some Other Data Prediction API Result OUT ‣ Shadow model mimics the target, and attacker knows its training data

Slide 11

Slide 11 text

Attacks Can Be Successful ‣ [Shokri+ 2017] showed that   you can build an attack classifier with high accuracy ‣ Multi-class Classification problems ‣ Even with real “Machine Learning as a Service” models ‣ Why successful? ‣ Attack mainly exploits the difference in model output distribution 5

Slide 12

Slide 12 text

Will It Work On More Complex Problems? ‣ Flat Classiﬁcation ‣ Output space: Fixed set of labels ‣Sequence Generation ‣ Output space: Sequence ‣ e.g., Machine Translation, Speech Synthesis, Video Captioning,   Text Summarization 6

Slide 13

Slide 13 text

Machine Translation (MT) as An Example ‣ Given black-box access to an MT model,   is it possible to determine whether   a particular sentence pair was in the training set? 7 Blackbox MT Translation API only ? “Hello” ⁶ “Bonjour” Attacker

Slide 14

Slide 14 text

Possible Scenarios 8 Bitext Data Provider MT Conference Organizer “MT as a Service” Provider ‣ Attacker may not necessarily be the “bad guy” ‣ Check license violation in published models License License License ‣ Annual bakeoﬀ (e.g., WMT) ‣ Conﬁrm participants are not using test sets Participant Participant Participant ‣ Customized models for users ‣ Attack its own model: Provide privacy guarantee that user data not used elsewhere User Attack Attack Provide & Attack User User

Slide 15

Slide 15 text

Experiment: Characters 9 Alice Bob Defender Attacker e.g., Service Provider e.g., Service User

Slide 16

Slide 16 text

Experiment: Data and Splits ‣ Formulate a fair and reproducible setup for both Alice and Bob 10 Alice data * Actual experiment details more complicated: Please refer to the paper. She uses this to train her model

Slide 17

Slide 17 text

Experiment: Data and Splits ‣ Formulate a fair and reproducible setup for both Alice and Bob 10 Alice data Bob Data * Actual experiment details more complicated: Please refer to the paper. She uses this to train her model Subset of Alice data: He can use this in whatever way he desires for attacks

Slide 18

Slide 18 text

Slide 19

Slide 19 text

Evaluation Procedure 11 Alice data Target MT model

Slide 20

Slide 20 text

Evaluation Procedure 11 Alice data Bob data Target MT model Shadow MT models Attack classiﬁer

Slide 21

Slide 21 text

Evaluation Procedure 11 Alice data Bob data Target MT model Shadow MT models IN probes OUT probes Attack classiﬁer Translate

Slide 22

Slide 22 text

Evaluation Procedure 11 Alice data Bob data Target MT model Shadow MT models IN probes OUT probes Attack classiﬁer Translate Infer Membership

Slide 23

Slide 23 text

Evaluation Procedure 11 Alice data Bob data Target MT model Shadow MT models IN probes OUT probes Attack classiﬁer ‣ If Bob can get attack accuracy above 50%, privacy leak suggested ‣ Alice / Bob model diﬀerence ‣ Bob’s attack accuracy on his own model is likely   the optimistic upper-bound on the real attack Translate Infer Membership

Slide 24

Slide 24 text

Slide 25

Slide 25 text

Attack Classifier for Membership Inference ‣ Binary Classification ‣ “IN” or “OUT” of the model training data? ‣ Features ‣ Modified 1-4 gram precisions ‣ Sentence-level BLEU scores ‣ Optional: MT Model score - extra information for the attacker 12 Intuition: If output is a “good” translation (i.e. similar to the reference translation), the model might have seen it in training time and memorized it

Slide 26

Slide 26 text

Results: Attacks Not Successful ‣ Around 50%: same as by chance ‣ BLEU and N-gram precision: not enough information to distinguish ‣ Using MT model score did not help either 13 Alice Bob:train Bob:valid Bob:test 50.4 51.5 51.1 51.2 Attack Accuracy of Different Probes Accuracy low even for Classifier in-sample data → Overfitting is not the problem * Even with external resources (MT Quality Estimation model or BERT), the results were the same.

Slide 27

Slide 27 text

Results: Out-of-vocab (OOV) samples ‣ Assumption: Model will not translate sentences with OOV well ‣ Much better results than entire probe set ‣ Same trend with Out-of-domain probes 14 All OOV 50.4 68.0 Attack Accuracy of OOV subsets

Slide 28

Slide 28 text

Why Not Successful with seq2seq? ‣ Difference in model output space ‣ "Fixed set of labels” or “sequence”: Latter far more complex ‣ Flat classification:   Attacks exploit difference in the model output distribution ‣ seq2seq: How to quantify model uncertainty / output quality? 15

Slide 29

Slide 29 text

Alternative Evaluation: Grouping Probes ‣ Instead of “Per Sentence”, use 500 sentences together ‣ Features: Sentence BLEU bin percentage, Corpus BLEU ‣ Attack possible: Above 50% for Alice and Bob probes ‣ First strong general results for the attacker 16 Attack Accuracy Alice Bob:train Bob:valid Bob:test 61.1 70.4 65.6 64.4

Slide 30

Slide 30 text

Summary ‣ Membership Inference Attacks on Seq-to-Seq Models ‣ Unlike multi-class classification cases,   attacks generally not successful (so far) ‣ However, accuracy above chance for some situations ‣ Out-of-vocabulary and Out-of-domain data ‣ Looser definition of attack: Group of sentences ‣ More complex attacks may be effective ‣ Manipulate one sentence and use API multiple times ‣ “Watermark sentences” to influence the target model ‣ … 17 Data available: You can try your attacks github.com/sorami/TACL-Membership