Upgrade to Pro — share decks privately, control downloads, hide ads and more …

[ACL2020] Membership Inference Attacks on Sequence-to-Sequence Models: Is My Data In Your Machine Translation System?

[ACL2020] Membership Inference Attacks on Sequence-to-Sequence Models: Is My Data In Your Machine Translation System?

Sorami Hisamoto

June 17, 2020
Tweet

More Decks by Sorami Hisamoto

Other Decks in Research

Transcript

  1. Membership Inference Attacks on
    Sequence-to-Sequence Models
    Is My Data In Your Machine Translation System?
    Sorami Hisamoto*, Matt Post**, Kevin Duh**

    *Works Applications (Work done while at JHU)
    **Johns Hopkins University
    TACL paper, presented @ ACL 2020

    View Slide

  2. Membership Inference [Shokri+ 2017]
    ‣ Data Privacy is an increasingly important issue

    ‣ Membership Inference Problem: 

    Given a blackbox machine learning model, guess if data was in the training data
    2
    [Shokri+ 2017] “Membership Inference Attacks against Machine Learning Models”

    View Slide

  3. Membership Inference [Shokri+ 2017]
    ‣ Data Privacy is an increasingly important issue

    ‣ Membership Inference Problem: 

    Given a blackbox machine learning model, guess if data was in the training data
    2
    [Shokri+ 2017] “Membership Inference Attacks against Machine Learning Models”
    Service Provider
    Training Data

    View Slide

  4. Membership Inference [Shokri+ 2017]
    ‣ Data Privacy is an increasingly important issue

    ‣ Membership Inference Problem: 

    Given a blackbox machine learning model, guess if data was in the training data
    2
    [Shokri+ 2017] “Membership Inference Attacks against Machine Learning Models”
    Service Provider Machine Learning as a Service
    Training Data

    View Slide

  5. Membership Inference [Shokri+ 2017]
    ‣ Data Privacy is an increasingly important issue

    ‣ Membership Inference Problem: 

    Given a blackbox machine learning model, guess if data was in the training data
    2
    [Shokri+ 2017] “Membership Inference Attacks against Machine Learning Models”
    Service Provider Machine Learning as a Service
    Blackbox Training Model
    Training Data
    Training API

    View Slide

  6. Membership Inference [Shokri+ 2017]
    ‣ Data Privacy is an increasingly important issue

    ‣ Membership Inference Problem: 

    Given a blackbox machine learning model, guess if data was in the training data
    2
    [Shokri+ 2017] “Membership Inference Attacks against Machine Learning Models”
    Service Provider Machine Learning as a Service
    Blackbox Training Model
    User / Attacker
    Training Data
    Private Data
    Training API

    View Slide

  7. Membership Inference [Shokri+ 2017]
    ‣ Data Privacy is an increasingly important issue

    ‣ Membership Inference Problem: 

    Given a blackbox machine learning model, guess if data was in the training data
    2
    [Shokri+ 2017] “Membership Inference Attacks against Machine Learning Models”
    Service Provider Machine Learning as a Service
    Blackbox Training Model
    User / Attacker
    Training Data
    Private Data Result
    Training API
    Prediction
    API

    View Slide

  8. Membership Inference [Shokri+ 2017]
    ‣ Data Privacy is an increasingly important issue

    ‣ Membership Inference Problem: 

    Given a blackbox machine learning model, guess if data was in the training data
    2
    [Shokri+ 2017] “Membership Inference Attacks against Machine Learning Models”
    Service Provider Machine Learning as a Service
    Blackbox Training Model
    User / Attacker
    Training Data
    Private Data Result
    Training API
    Prediction
    API
    ? ?
    Is user’s private data in
    model training set?

    View Slide

  9. Attack with “Shadow Models”
    3
    ‣ Assume attacker has access to the training API (or knows the model detail)

    ‣ Synthesis data similar to the target training data, and train “shadow models”
    Service Provider
    Training API
    ML as a Service
    Training Data Target Model
    Attacker
    Shadow Set 1
    Shadow Set 2
    Shadow Model 1
    Shadow Model 2
    … …

    View Slide

  10. Train “IN or OUT” Classifier for Attack
    4
    ML as a Service
    Shadow Training Data
    Shadow Model
    Prediction
    API
    Result
    IN
    Binary Classifier
    for Membership Inference
    Some Other Data
    Prediction
    API
    Result
    OUT
    ‣ Shadow model mimics the target, and attacker knows its training data

    View Slide

  11. Attacks Can Be Successful
    ‣ [Shokri+ 2017] showed that 

    you can build an attack classifier with high accuracy

    ‣ Multi-class Classification problems

    ‣ Even with real “Machine Learning as a Service” models

    ‣ Why successful?

    ‣ Attack mainly exploits the difference in model output distribution
    5

    View Slide

  12. Will It Work On More Complex Problems?
    ‣ Flat Classification

    ‣ Output space: Fixed set of labels

    ‣Sequence Generation
    ‣ Output space: Sequence

    ‣ e.g., Machine Translation, Speech Synthesis, Video Captioning, 

    Text Summarization
    6

    View Slide

  13. Machine Translation (MT) as An Example
    ‣ Given black-box access to an MT model, 

    is it possible to determine whether 

    a particular sentence pair was in the training set?
    7
    Blackbox MT
    Translation API
    only
    ? “Hello” ⁶ “Bonjour”
    Attacker

    View Slide

  14. Possible Scenarios
    8
    Bitext Data
    Provider
    MT Conference
    Organizer
    “MT as a Service”
    Provider
    ‣ Attacker may not necessarily be the “bad guy”
    ‣ Check license violation in
    published models
    License License License
    ‣ Annual bakeoff (e.g., WMT)

    ‣ Confirm participants are
    not using test sets
    Participant Participant Participant
    ‣ Customized models for users

    ‣ Attack its own model: Provide
    privacy guarantee that user
    data not used elsewhere
    User
    Attack
    Attack
    Provide & Attack
    User User

    View Slide

  15. Experiment: Characters
    9
    Alice Bob
    Defender Attacker
    e.g.,
    Service Provider
    e.g.,
    Service User

    View Slide

  16. Experiment: Data and Splits
    ‣ Formulate a fair and reproducible setup for both Alice and Bob
    10
    Alice data
    * Actual experiment details more complicated: Please refer to the paper.
    She uses this
    to train her model

    View Slide

  17. Experiment: Data and Splits
    ‣ Formulate a fair and reproducible setup for both Alice and Bob
    10
    Alice data
    Bob Data
    * Actual experiment details more complicated: Please refer to the paper.
    She uses this
    to train her model
    Subset of Alice data:
    He can use this in whatever way
    he desires for attacks

    View Slide

  18. Experiment: Data and Splits
    ‣ Formulate a fair and reproducible setup for both Alice and Bob
    10
    Alice data
    Bob Data
    * Actual experiment details more complicated: Please refer to the paper.
    She uses this
    to train her model
    Subset of Alice data:
    He can use this in whatever way
    he desires for attacks
    IN probes
    OUT probes
    Samples for evaluation:
    IN and OUT of training

    View Slide

  19. Evaluation Procedure
    11
    Alice data
    Target MT model

    View Slide

  20. Evaluation Procedure
    11
    Alice data
    Bob
    data
    Target MT model
    Shadow
    MT models
    Attack
    classifier

    View Slide

  21. Evaluation Procedure
    11
    Alice data
    Bob
    data
    Target MT model
    Shadow
    MT models
    IN probes
    OUT probes
    Attack
    classifier
    Translate

    View Slide

  22. Evaluation Procedure
    11
    Alice data
    Bob
    data
    Target MT model
    Shadow
    MT models
    IN probes
    OUT probes
    Attack
    classifier
    Translate
    Infer
    Membership

    View Slide

  23. Evaluation Procedure
    11
    Alice data
    Bob
    data
    Target MT model
    Shadow
    MT models
    IN probes
    OUT probes
    Attack
    classifier
    ‣ If Bob can get attack accuracy above 50%, privacy leak suggested

    ‣ Alice / Bob model difference

    ‣ Bob’s attack accuracy on his own model is likely 

    the optimistic upper-bound on the real attack
    Translate
    Infer
    Membership

    View Slide

  24. Attack Classifier for Membership Inference
    ‣ Binary Classification

    ‣ “IN” or “OUT” of the model training data?

    ‣ Features

    ‣ Modified 1-4 gram precisions

    ‣ Sentence-level BLEU scores

    ‣ Optional: MT Model score - extra information for the attacker
    12

    View Slide

  25. Attack Classifier for Membership Inference
    ‣ Binary Classification

    ‣ “IN” or “OUT” of the model training data?

    ‣ Features

    ‣ Modified 1-4 gram precisions

    ‣ Sentence-level BLEU scores

    ‣ Optional: MT Model score - extra information for the attacker
    12
    Intuition:
    If output is a “good” translation
    (i.e. similar to the reference translation),
    the model might have seen it
    in training time and memorized it

    View Slide

  26. Results: Attacks Not Successful
    ‣ Around 50%: same as by chance

    ‣ BLEU and N-gram precision: not enough information to distinguish

    ‣ Using MT model score did not help either
    13
    Alice Bob:train Bob:valid Bob:test
    50.4 51.5 51.1 51.2
    Attack Accuracy of Different Probes
    Accuracy low even for
    Classifier in-sample data
    → Overfitting is not the problem
    * Even with external resources (MT Quality Estimation model or BERT), the results were the same.

    View Slide

  27. Results: Out-of-vocab (OOV) samples
    ‣ Assumption: Model will not translate sentences with OOV well

    ‣ Much better results than entire probe set

    ‣ Same trend with Out-of-domain probes
    14
    All OOV
    50.4 68.0
    Attack Accuracy of OOV subsets

    View Slide

  28. Why Not Successful with seq2seq?
    ‣ Difference in model output space

    ‣ "Fixed set of labels” or “sequence”: Latter far more complex

    ‣ Flat classification: 

    Attacks exploit difference in the model output distribution

    ‣ seq2seq: How to quantify model uncertainty / output quality?
    15

    View Slide

  29. Alternative Evaluation: Grouping Probes
    ‣ Instead of “Per Sentence”, use 500 sentences together

    ‣ Features: Sentence BLEU bin percentage, Corpus BLEU

    ‣ Attack possible: Above 50% for Alice and Bob probes

    ‣ First strong general results for the attacker
    16
    Attack Accuracy
    Alice Bob:train Bob:valid Bob:test
    61.1 70.4 65.6 64.4

    View Slide

  30. Summary
    ‣ Membership Inference Attacks on Seq-to-Seq Models

    ‣ Unlike multi-class classification cases, 

    attacks generally not successful (so far)

    ‣ However, accuracy above chance for some situations

    ‣ Out-of-vocabulary and Out-of-domain data

    ‣ Looser definition of attack: Group of sentences

    ‣ More complex attacks may be effective

    ‣ Manipulate one sentence and use API multiple times

    ‣ “Watermark sentences” to influence the target model

    ‣ …
    17
    Data available: You can try your attacks
    github.com/sorami/TACL-Membership

    View Slide