[ACL2020] Membership Inference Attacks on Sequence-to-Sequence Models: Is My Data In Your Machine Translation System?

[ACL2020] Membership Inference Attacks on Sequence-to-Sequence Models: Is My Data In Your Machine Translation System?

C6b97a47d5406cfdef50a5c755751c16?s=128

Sorami Hisamoto

June 17, 2020
Tweet

Transcript

  1. Membership Inference Attacks on Sequence-to-Sequence Models Is My Data In

    Your Machine Translation System? Sorami Hisamoto*, Matt Post**, Kevin Duh** *Works Applications (Work done while at JHU) **Johns Hopkins University TACL paper, presented @ ACL 2020
  2. Membership Inference [Shokri+ 2017] ‣ Data Privacy is an increasingly

    important issue ‣ Membership Inference Problem: 
 Given a blackbox machine learning model, guess if data was in the training data 2 [Shokri+ 2017] “Membership Inference Attacks against Machine Learning Models”
  3. Membership Inference [Shokri+ 2017] ‣ Data Privacy is an increasingly

    important issue ‣ Membership Inference Problem: 
 Given a blackbox machine learning model, guess if data was in the training data 2 [Shokri+ 2017] “Membership Inference Attacks against Machine Learning Models” Service Provider Training Data
  4. Membership Inference [Shokri+ 2017] ‣ Data Privacy is an increasingly

    important issue ‣ Membership Inference Problem: 
 Given a blackbox machine learning model, guess if data was in the training data 2 [Shokri+ 2017] “Membership Inference Attacks against Machine Learning Models” Service Provider Machine Learning as a Service Training Data
  5. Membership Inference [Shokri+ 2017] ‣ Data Privacy is an increasingly

    important issue ‣ Membership Inference Problem: 
 Given a blackbox machine learning model, guess if data was in the training data 2 [Shokri+ 2017] “Membership Inference Attacks against Machine Learning Models” Service Provider Machine Learning as a Service Blackbox Training Model Training Data Training API
  6. Membership Inference [Shokri+ 2017] ‣ Data Privacy is an increasingly

    important issue ‣ Membership Inference Problem: 
 Given a blackbox machine learning model, guess if data was in the training data 2 [Shokri+ 2017] “Membership Inference Attacks against Machine Learning Models” Service Provider Machine Learning as a Service Blackbox Training Model User / Attacker Training Data Private Data Training API
  7. Membership Inference [Shokri+ 2017] ‣ Data Privacy is an increasingly

    important issue ‣ Membership Inference Problem: 
 Given a blackbox machine learning model, guess if data was in the training data 2 [Shokri+ 2017] “Membership Inference Attacks against Machine Learning Models” Service Provider Machine Learning as a Service Blackbox Training Model User / Attacker Training Data Private Data Result Training API Prediction API
  8. Membership Inference [Shokri+ 2017] ‣ Data Privacy is an increasingly

    important issue ‣ Membership Inference Problem: 
 Given a blackbox machine learning model, guess if data was in the training data 2 [Shokri+ 2017] “Membership Inference Attacks against Machine Learning Models” Service Provider Machine Learning as a Service Blackbox Training Model User / Attacker Training Data Private Data Result Training API Prediction API ? ? Is user’s private data in model training set?
  9. Attack with “Shadow Models” 3 ‣ Assume attacker has access

    to the training API (or knows the model detail) ‣ Synthesis data similar to the target training data, and train “shadow models” Service Provider Training API ML as a Service Training Data Target Model Attacker Shadow Set 1 Shadow Set 2 Shadow Model 1 Shadow Model 2 … …
  10. Train “IN or OUT” Classifier for Attack 4 ML as

    a Service Shadow Training Data Shadow Model Prediction API Result IN Binary Classifier for Membership Inference Some Other Data Prediction API Result OUT ‣ Shadow model mimics the target, and attacker knows its training data
  11. Attacks Can Be Successful ‣ [Shokri+ 2017] showed that 


    you can build an attack classifier with high accuracy ‣ Multi-class Classification problems ‣ Even with real “Machine Learning as a Service” models ‣ Why successful? ‣ Attack mainly exploits the difference in model output distribution 5
  12. Will It Work On More Complex Problems? ‣ Flat Classification

    ‣ Output space: Fixed set of labels ‣Sequence Generation ‣ Output space: Sequence ‣ e.g., Machine Translation, Speech Synthesis, Video Captioning, 
 Text Summarization 6
  13. Machine Translation (MT) as An Example ‣ Given black-box access

    to an MT model, 
 is it possible to determine whether 
 a particular sentence pair was in the training set? 7 Blackbox MT Translation API only ? “Hello” ⁶ “Bonjour” Attacker
  14. Possible Scenarios 8 Bitext Data Provider MT Conference Organizer “MT

    as a Service” Provider ‣ Attacker may not necessarily be the “bad guy” ‣ Check license violation in published models License License License ‣ Annual bakeoff (e.g., WMT) ‣ Confirm participants are not using test sets Participant Participant Participant ‣ Customized models for users ‣ Attack its own model: Provide privacy guarantee that user data not used elsewhere User Attack Attack Provide & Attack User User
  15. Experiment: Characters 9 Alice Bob Defender Attacker e.g., Service Provider

    e.g., Service User
  16. Experiment: Data and Splits ‣ Formulate a fair and reproducible

    setup for both Alice and Bob 10 Alice data * Actual experiment details more complicated: Please refer to the paper. She uses this to train her model
  17. Experiment: Data and Splits ‣ Formulate a fair and reproducible

    setup for both Alice and Bob 10 Alice data Bob Data * Actual experiment details more complicated: Please refer to the paper. She uses this to train her model Subset of Alice data: He can use this in whatever way he desires for attacks
  18. Experiment: Data and Splits ‣ Formulate a fair and reproducible

    setup for both Alice and Bob 10 Alice data Bob Data * Actual experiment details more complicated: Please refer to the paper. She uses this to train her model Subset of Alice data: He can use this in whatever way he desires for attacks IN probes OUT probes Samples for evaluation: IN and OUT of training
  19. Evaluation Procedure 11 Alice data Target MT model

  20. Evaluation Procedure 11 Alice data Bob data Target MT model

    Shadow MT models Attack classifier
  21. Evaluation Procedure 11 Alice data Bob data Target MT model

    Shadow MT models IN probes OUT probes Attack classifier Translate
  22. Evaluation Procedure 11 Alice data Bob data Target MT model

    Shadow MT models IN probes OUT probes Attack classifier Translate Infer Membership
  23. Evaluation Procedure 11 Alice data Bob data Target MT model

    Shadow MT models IN probes OUT probes Attack classifier ‣ If Bob can get attack accuracy above 50%, privacy leak suggested ‣ Alice / Bob model difference ‣ Bob’s attack accuracy on his own model is likely 
 the optimistic upper-bound on the real attack Translate Infer Membership
  24. Attack Classifier for Membership Inference ‣ Binary Classification ‣ “IN”

    or “OUT” of the model training data? ‣ Features ‣ Modified 1-4 gram precisions ‣ Sentence-level BLEU scores ‣ Optional: MT Model score - extra information for the attacker 12
  25. Attack Classifier for Membership Inference ‣ Binary Classification ‣ “IN”

    or “OUT” of the model training data? ‣ Features ‣ Modified 1-4 gram precisions ‣ Sentence-level BLEU scores ‣ Optional: MT Model score - extra information for the attacker 12 Intuition: If output is a “good” translation (i.e. similar to the reference translation), the model might have seen it in training time and memorized it
  26. Results: Attacks Not Successful ‣ Around 50%: same as by

    chance ‣ BLEU and N-gram precision: not enough information to distinguish ‣ Using MT model score did not help either 13 Alice Bob:train Bob:valid Bob:test 50.4 51.5 51.1 51.2 Attack Accuracy of Different Probes Accuracy low even for Classifier in-sample data → Overfitting is not the problem * Even with external resources (MT Quality Estimation model or BERT), the results were the same.
  27. Results: Out-of-vocab (OOV) samples ‣ Assumption: Model will not translate

    sentences with OOV well ‣ Much better results than entire probe set ‣ Same trend with Out-of-domain probes 14 All OOV 50.4 68.0 Attack Accuracy of OOV subsets
  28. Why Not Successful with seq2seq? ‣ Difference in model output

    space ‣ "Fixed set of labels” or “sequence”: Latter far more complex ‣ Flat classification: 
 Attacks exploit difference in the model output distribution ‣ seq2seq: How to quantify model uncertainty / output quality? 15
  29. Alternative Evaluation: Grouping Probes ‣ Instead of “Per Sentence”, use

    500 sentences together ‣ Features: Sentence BLEU bin percentage, Corpus BLEU ‣ Attack possible: Above 50% for Alice and Bob probes ‣ First strong general results for the attacker 16 Attack Accuracy Alice Bob:train Bob:valid Bob:test 61.1 70.4 65.6 64.4
  30. Summary ‣ Membership Inference Attacks on Seq-to-Seq Models ‣ Unlike

    multi-class classification cases, 
 attacks generally not successful (so far) ‣ However, accuracy above chance for some situations ‣ Out-of-vocabulary and Out-of-domain data ‣ Looser definition of attack: Group of sentences ‣ More complex attacks may be effective ‣ Manipulate one sentence and use API multiple times ‣ “Watermark sentences” to influence the target model ‣ … 17 Data available: You can try your attacks github.com/sorami/TACL-Membership