Your Machine Translation System? Sorami Hisamoto*, Matt Post**, Kevin Duh** *Works Applications (Work done while at JHU) **Johns Hopkins University TACL paper, presented @ ACL 2020
important issue ‣ Membership Inference Problem: Given a blackbox machine learning model, guess if data was in the training data 2 [Shokri+ 2017] “Membership Inference Attacks against Machine Learning Models”
important issue ‣ Membership Inference Problem: Given a blackbox machine learning model, guess if data was in the training data 2 [Shokri+ 2017] “Membership Inference Attacks against Machine Learning Models” Service Provider Training Data
important issue ‣ Membership Inference Problem: Given a blackbox machine learning model, guess if data was in the training data 2 [Shokri+ 2017] “Membership Inference Attacks against Machine Learning Models” Service Provider Machine Learning as a Service Training Data
important issue ‣ Membership Inference Problem: Given a blackbox machine learning model, guess if data was in the training data 2 [Shokri+ 2017] “Membership Inference Attacks against Machine Learning Models” Service Provider Machine Learning as a Service Blackbox Training Model Training Data Training API
important issue ‣ Membership Inference Problem: Given a blackbox machine learning model, guess if data was in the training data 2 [Shokri+ 2017] “Membership Inference Attacks against Machine Learning Models” Service Provider Machine Learning as a Service Blackbox Training Model User / Attacker Training Data Private Data Training API
important issue ‣ Membership Inference Problem: Given a blackbox machine learning model, guess if data was in the training data 2 [Shokri+ 2017] “Membership Inference Attacks against Machine Learning Models” Service Provider Machine Learning as a Service Blackbox Training Model User / Attacker Training Data Private Data Result Training API Prediction API
important issue ‣ Membership Inference Problem: Given a blackbox machine learning model, guess if data was in the training data 2 [Shokri+ 2017] “Membership Inference Attacks against Machine Learning Models” Service Provider Machine Learning as a Service Blackbox Training Model User / Attacker Training Data Private Data Result Training API Prediction API ? ? Is user’s private data in model training set?
to the training API (or knows the model detail) ‣ Synthesis data similar to the target training data, and train “shadow models” Service Provider Training API ML as a Service Training Data Target Model Attacker Shadow Set 1 Shadow Set 2 Shadow Model 1 Shadow Model 2 … …
a Service Shadow Training Data Shadow Model Prediction API Result IN Binary Classifier for Membership Inference Some Other Data Prediction API Result OUT ‣ Shadow model mimics the target, and attacker knows its training data
you can build an attack classifier with high accuracy ‣ Multi-class Classification problems ‣ Even with real “Machine Learning as a Service” models ‣ Why successful? ‣ Attack mainly exploits the difference in model output distribution 5
to an MT model, is it possible to determine whether a particular sentence pair was in the training set? 7 Blackbox MT Translation API only ? “Hello” ⁶ “Bonjour” Attacker
as a Service” Provider ‣ Attacker may not necessarily be the “bad guy” ‣ Check license violation in published models License License License ‣ Annual bakeoff (e.g., WMT) ‣ Confirm participants are not using test sets Participant Participant Participant ‣ Customized models for users ‣ Attack its own model: Provide privacy guarantee that user data not used elsewhere User Attack Attack Provide & Attack User User
setup for both Alice and Bob 10 Alice data Bob Data * Actual experiment details more complicated: Please refer to the paper. She uses this to train her model Subset of Alice data: He can use this in whatever way he desires for attacks
setup for both Alice and Bob 10 Alice data Bob Data * Actual experiment details more complicated: Please refer to the paper. She uses this to train her model Subset of Alice data: He can use this in whatever way he desires for attacks IN probes OUT probes Samples for evaluation: IN and OUT of training
Shadow MT models IN probes OUT probes Attack classifier ‣ If Bob can get attack accuracy above 50%, privacy leak suggested ‣ Alice / Bob model difference ‣ Bob’s attack accuracy on his own model is likely the optimistic upper-bound on the real attack Translate Infer Membership
or “OUT” of the model training data? ‣ Features ‣ Modified 1-4 gram precisions ‣ Sentence-level BLEU scores ‣ Optional: MT Model score - extra information for the attacker 12
or “OUT” of the model training data? ‣ Features ‣ Modified 1-4 gram precisions ‣ Sentence-level BLEU scores ‣ Optional: MT Model score - extra information for the attacker 12 Intuition: If output is a “good” translation (i.e. similar to the reference translation), the model might have seen it in training time and memorized it
chance ‣ BLEU and N-gram precision: not enough information to distinguish ‣ Using MT model score did not help either 13 Alice Bob:train Bob:valid Bob:test 50.4 51.5 51.1 51.2 Attack Accuracy of Different Probes Accuracy low even for Classifier in-sample data → Overfitting is not the problem * Even with external resources (MT Quality Estimation model or BERT), the results were the same.
sentences with OOV well ‣ Much better results than entire probe set ‣ Same trend with Out-of-domain probes 14 All OOV 50.4 68.0 Attack Accuracy of OOV subsets
space ‣ "Fixed set of labels” or “sequence”: Latter far more complex ‣ Flat classification: Attacks exploit difference in the model output distribution ‣ seq2seq: How to quantify model uncertainty / output quality? 15
500 sentences together ‣ Features: Sentence BLEU bin percentage, Corpus BLEU ‣ Attack possible: Above 50% for Alice and Bob probes ‣ First strong general results for the attacker 16 Attack Accuracy Alice Bob:train Bob:valid Bob:test 61.1 70.4 65.6 64.4
multi-class classification cases, attacks generally not successful (so far) ‣ However, accuracy above chance for some situations ‣ Out-of-vocabulary and Out-of-domain data ‣ Looser definition of attack: Group of sentences ‣ More complex attacks may be effective ‣ Manipulate one sentence and use API multiple times ‣ “Watermark sentences” to influence the target model ‣ … 17 Data available: You can try your attacks github.com/sorami/TACL-Membership