Upgrade to Pro — share decks privately, control downloads, hide ads and more …

[ACL2020] Membership Inference Attacks on Sequence-to-Sequence Models: Is My Data In Your Machine Translation System?

[ACL2020] Membership Inference Attacks on Sequence-to-Sequence Models: Is My Data In Your Machine Translation System?

Sorami Hisamoto

June 17, 2020
Tweet

More Decks by Sorami Hisamoto

Other Decks in Research

Transcript

  1. Membership Inference Attacks on
    Sequence-to-Sequence Models
    Is My Data In Your Machine Translation System?
    Sorami Hisamoto*, Matt Post**, Kevin Duh**

    *Works Applications (Work done while at JHU)
    **Johns Hopkins University
    TACL paper, presented @ ACL 2020

    View full-size slide

  2. Membership Inference [Shokri+ 2017]
    ‣ Data Privacy is an increasingly important issue

    ‣ Membership Inference Problem: 

    Given a blackbox machine learning model, guess if data was in the training data
    2
    [Shokri+ 2017] “Membership Inference Attacks against Machine Learning Models”

    View full-size slide

  3. Membership Inference [Shokri+ 2017]
    ‣ Data Privacy is an increasingly important issue

    ‣ Membership Inference Problem: 

    Given a blackbox machine learning model, guess if data was in the training data
    2
    [Shokri+ 2017] “Membership Inference Attacks against Machine Learning Models”
    Service Provider
    Training Data

    View full-size slide

  4. Membership Inference [Shokri+ 2017]
    ‣ Data Privacy is an increasingly important issue

    ‣ Membership Inference Problem: 

    Given a blackbox machine learning model, guess if data was in the training data
    2
    [Shokri+ 2017] “Membership Inference Attacks against Machine Learning Models”
    Service Provider Machine Learning as a Service
    Training Data

    View full-size slide

  5. Membership Inference [Shokri+ 2017]
    ‣ Data Privacy is an increasingly important issue

    ‣ Membership Inference Problem: 

    Given a blackbox machine learning model, guess if data was in the training data
    2
    [Shokri+ 2017] “Membership Inference Attacks against Machine Learning Models”
    Service Provider Machine Learning as a Service
    Blackbox Training Model
    Training Data
    Training API

    View full-size slide

  6. Membership Inference [Shokri+ 2017]
    ‣ Data Privacy is an increasingly important issue

    ‣ Membership Inference Problem: 

    Given a blackbox machine learning model, guess if data was in the training data
    2
    [Shokri+ 2017] “Membership Inference Attacks against Machine Learning Models”
    Service Provider Machine Learning as a Service
    Blackbox Training Model
    User / Attacker
    Training Data
    Private Data
    Training API

    View full-size slide

  7. Membership Inference [Shokri+ 2017]
    ‣ Data Privacy is an increasingly important issue

    ‣ Membership Inference Problem: 

    Given a blackbox machine learning model, guess if data was in the training data
    2
    [Shokri+ 2017] “Membership Inference Attacks against Machine Learning Models”
    Service Provider Machine Learning as a Service
    Blackbox Training Model
    User / Attacker
    Training Data
    Private Data Result
    Training API
    Prediction
    API

    View full-size slide

  8. Membership Inference [Shokri+ 2017]
    ‣ Data Privacy is an increasingly important issue

    ‣ Membership Inference Problem: 

    Given a blackbox machine learning model, guess if data was in the training data
    2
    [Shokri+ 2017] “Membership Inference Attacks against Machine Learning Models”
    Service Provider Machine Learning as a Service
    Blackbox Training Model
    User / Attacker
    Training Data
    Private Data Result
    Training API
    Prediction
    API
    ? ?
    Is user’s private data in
    model training set?

    View full-size slide

  9. Attack with “Shadow Models”
    3
    ‣ Assume attacker has access to the training API (or knows the model detail)

    ‣ Synthesis data similar to the target training data, and train “shadow models”
    Service Provider
    Training API
    ML as a Service
    Training Data Target Model
    Attacker
    Shadow Set 1
    Shadow Set 2
    Shadow Model 1
    Shadow Model 2
    … …

    View full-size slide

  10. Train “IN or OUT” Classifier for Attack
    4
    ML as a Service
    Shadow Training Data
    Shadow Model
    Prediction
    API
    Result
    IN
    Binary Classifier
    for Membership Inference
    Some Other Data
    Prediction
    API
    Result
    OUT
    ‣ Shadow model mimics the target, and attacker knows its training data

    View full-size slide

  11. Attacks Can Be Successful
    ‣ [Shokri+ 2017] showed that 

    you can build an attack classifier with high accuracy

    ‣ Multi-class Classification problems

    ‣ Even with real “Machine Learning as a Service” models

    ‣ Why successful?

    ‣ Attack mainly exploits the difference in model output distribution
    5

    View full-size slide

  12. Will It Work On More Complex Problems?
    ‣ Flat Classification

    ‣ Output space: Fixed set of labels

    ‣Sequence Generation
    ‣ Output space: Sequence

    ‣ e.g., Machine Translation, Speech Synthesis, Video Captioning, 

    Text Summarization
    6

    View full-size slide

  13. Machine Translation (MT) as An Example
    ‣ Given black-box access to an MT model, 

    is it possible to determine whether 

    a particular sentence pair was in the training set?
    7
    Blackbox MT
    Translation API
    only
    ? “Hello” ⁶ “Bonjour”
    Attacker

    View full-size slide

  14. Possible Scenarios
    8
    Bitext Data
    Provider
    MT Conference
    Organizer
    “MT as a Service”
    Provider
    ‣ Attacker may not necessarily be the “bad guy”
    ‣ Check license violation in
    published models
    License License License
    ‣ Annual bakeoff (e.g., WMT)

    ‣ Confirm participants are
    not using test sets
    Participant Participant Participant
    ‣ Customized models for users

    ‣ Attack its own model: Provide
    privacy guarantee that user
    data not used elsewhere
    User
    Attack
    Attack
    Provide & Attack
    User User

    View full-size slide

  15. Experiment: Characters
    9
    Alice Bob
    Defender Attacker
    e.g.,
    Service Provider
    e.g.,
    Service User

    View full-size slide

  16. Experiment: Data and Splits
    ‣ Formulate a fair and reproducible setup for both Alice and Bob
    10
    Alice data
    * Actual experiment details more complicated: Please refer to the paper.
    She uses this
    to train her model

    View full-size slide

  17. Experiment: Data and Splits
    ‣ Formulate a fair and reproducible setup for both Alice and Bob
    10
    Alice data
    Bob Data
    * Actual experiment details more complicated: Please refer to the paper.
    She uses this
    to train her model
    Subset of Alice data:
    He can use this in whatever way
    he desires for attacks

    View full-size slide

  18. Experiment: Data and Splits
    ‣ Formulate a fair and reproducible setup for both Alice and Bob
    10
    Alice data
    Bob Data
    * Actual experiment details more complicated: Please refer to the paper.
    She uses this
    to train her model
    Subset of Alice data:
    He can use this in whatever way
    he desires for attacks
    IN probes
    OUT probes
    Samples for evaluation:
    IN and OUT of training

    View full-size slide

  19. Evaluation Procedure
    11
    Alice data
    Target MT model

    View full-size slide

  20. Evaluation Procedure
    11
    Alice data
    Bob
    data
    Target MT model
    Shadow
    MT models
    Attack
    classifier

    View full-size slide

  21. Evaluation Procedure
    11
    Alice data
    Bob
    data
    Target MT model
    Shadow
    MT models
    IN probes
    OUT probes
    Attack
    classifier
    Translate

    View full-size slide

  22. Evaluation Procedure
    11
    Alice data
    Bob
    data
    Target MT model
    Shadow
    MT models
    IN probes
    OUT probes
    Attack
    classifier
    Translate
    Infer
    Membership

    View full-size slide

  23. Evaluation Procedure
    11
    Alice data
    Bob
    data
    Target MT model
    Shadow
    MT models
    IN probes
    OUT probes
    Attack
    classifier
    ‣ If Bob can get attack accuracy above 50%, privacy leak suggested

    ‣ Alice / Bob model difference

    ‣ Bob’s attack accuracy on his own model is likely 

    the optimistic upper-bound on the real attack
    Translate
    Infer
    Membership

    View full-size slide

  24. Attack Classifier for Membership Inference
    ‣ Binary Classification

    ‣ “IN” or “OUT” of the model training data?

    ‣ Features

    ‣ Modified 1-4 gram precisions

    ‣ Sentence-level BLEU scores

    ‣ Optional: MT Model score - extra information for the attacker
    12

    View full-size slide

  25. Attack Classifier for Membership Inference
    ‣ Binary Classification

    ‣ “IN” or “OUT” of the model training data?

    ‣ Features

    ‣ Modified 1-4 gram precisions

    ‣ Sentence-level BLEU scores

    ‣ Optional: MT Model score - extra information for the attacker
    12
    Intuition:
    If output is a “good” translation
    (i.e. similar to the reference translation),
    the model might have seen it
    in training time and memorized it

    View full-size slide

  26. Results: Attacks Not Successful
    ‣ Around 50%: same as by chance

    ‣ BLEU and N-gram precision: not enough information to distinguish

    ‣ Using MT model score did not help either
    13
    Alice Bob:train Bob:valid Bob:test
    50.4 51.5 51.1 51.2
    Attack Accuracy of Different Probes
    Accuracy low even for
    Classifier in-sample data
    → Overfitting is not the problem
    * Even with external resources (MT Quality Estimation model or BERT), the results were the same.

    View full-size slide

  27. Results: Out-of-vocab (OOV) samples
    ‣ Assumption: Model will not translate sentences with OOV well

    ‣ Much better results than entire probe set

    ‣ Same trend with Out-of-domain probes
    14
    All OOV
    50.4 68.0
    Attack Accuracy of OOV subsets

    View full-size slide

  28. Why Not Successful with seq2seq?
    ‣ Difference in model output space

    ‣ "Fixed set of labels” or “sequence”: Latter far more complex

    ‣ Flat classification: 

    Attacks exploit difference in the model output distribution

    ‣ seq2seq: How to quantify model uncertainty / output quality?
    15

    View full-size slide

  29. Alternative Evaluation: Grouping Probes
    ‣ Instead of “Per Sentence”, use 500 sentences together

    ‣ Features: Sentence BLEU bin percentage, Corpus BLEU

    ‣ Attack possible: Above 50% for Alice and Bob probes

    ‣ First strong general results for the attacker
    16
    Attack Accuracy
    Alice Bob:train Bob:valid Bob:test
    61.1 70.4 65.6 64.4

    View full-size slide

  30. Summary
    ‣ Membership Inference Attacks on Seq-to-Seq Models

    ‣ Unlike multi-class classification cases, 

    attacks generally not successful (so far)

    ‣ However, accuracy above chance for some situations

    ‣ Out-of-vocabulary and Out-of-domain data

    ‣ Looser definition of attack: Group of sentences

    ‣ More complex attacks may be effective

    ‣ Manipulate one sentence and use API multiple times

    ‣ “Watermark sentences” to influence the target model

    ‣ …
    17
    Data available: You can try your attacks
    github.com/sorami/TACL-Membership

    View full-size slide