Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Membership Inference Attacks on Sequence-to-Sequence Models

Membership Inference Attacks on Sequence-to-Sequence Models

Membership Inference Attacks on Sequence-to-Sequence Models
A Case in Privacy-preserving Neural Machine Translation

paper
https://arxiv.org/abs/1904.05506

@ CLSP Seminar
Center for Language and Speech Processing, Johns Hopkins University
https://www.clsp.jhu.edu/events/aswin-subramanian/?instance_id=2832

Sorami Hisamoto

May 03, 2019
Tweet

More Decks by Sorami Hisamoto

Other Decks in Research

Transcript

  1. Membership Inference Attacks on
    Sequence-to-Sequence Models
    A Case in Privacy-preserving Neural Machine Translation
    May 3, 2019

    Sorami Hisamoto

    A work with Kevin Duh & Matt Post
    arxiv.org/abs/1904.05506

    View Slide

  2. Summary
    ‣ Privacy in Machine Learning

    ‣ Membership Inference Problem

    ‣ “Was this in the model’s training data?”

    ‣ Attacker creates models to mimic the target blackbox model

    ‣ Empirical Results

    ‣ Multi-class Classification: attack successful

    Exploits output distribution difference

    ‣ Sequence Generation: attack not successful (so far) 

    More complex output space
    !2

    View Slide

  3. Self Introduction: Sorami Hisamoto
    ‣ Visiting Researcher Oct 2018 - Jun 2019

    ‣ Before: NAIST, Japan

    ‣ Studied under Kevin Duh & Yuji Matsumoto

    ‣ Word Representations and Dependency Parsing (2012-2014)

    ‣ Now: WAP Tokushima AI & NLP Lab.

    ‣ NLP applications to enterprise services

    ‣ Morphological analysis: Sudachi →
    !3

    View Slide

  4. ‣ Determining “words” in a Japanese sentence is difficult!

    ‣ Dictionary: 3 million vocabs, constantly updating

    ‣ Code on GitHub: Java, Python, Elasticsearch plugin

    ‣ A paper in LREC2018
    Sudachi: A Japanese Tokenizer for Business
    !4

    View Slide

  5. Privacy & Machine Learning

    View Slide

  6. !6
    nytimes.com/interactive/2019/opinion/
    internet-privacy-project.html

    View Slide

  7. Privacy is more important than ever!
    ‣ More important in recent societies

    ‣ More data to collect

    ‣ Usefulness of data

    ‣ Data → for Machine Learning (ML) …

    ‣ Increasing interest in the research communities
    !7
    irasutoya.com

    View Slide

  8. NeurIPS2018

    View Slide

  9. ICML2019

    View Slide

  10. ICJAI2019

    View Slide

  11. ISSP2019

    View Slide

  12. Privacy & Natural Language Processing
    ‣ Some, not many yet;

    ‣ “Towards Robust and Privacy-preserving Text
    Representations”

    Yitong Li, Timothy Baldwin, Trevor Cohn. ACL2018 (short)

    ‣ “Privacy-preserving Neural Representations of Text”

    Maximin Coavoux, Shashi Narayan, Shay Cohen. EMNLP2018

    ‣ “Adversarial Removal of Demographic Attributes from
    Text Data”

    Yanai Elazar, Yoav Goldberg. EMNLP2018
    !12

    View Slide

  13. Different kind of problems in ML Privacy
    ‣Model Inversion
    ‣ Uses model’s output on a hidden input to infer something about this
    input

    ‣Differential Privacy
    ‣ Will the model behave differently if a particular data is removed /
    added to the training data?

    ‣Membership Inference →
    !13

    View Slide

  14. Membership Inference Attacks
    [Shokri+ 2017]

    View Slide

  15. “Was this in the training data?”
    ‣ Given a blackbox machine learning model, 

    can you guess if a data sample was in the training data?

    *Blackbox: no info about model detail; Only access to the API to send input & receive
    result

    ‣ [Shokri+ 2017] “Membership Inference Attacks against
    Machine Learning Models” (IEEE Symposium on Security and Privacy)

    ‣ Important in real world situations

    ‣ e.g., “ML as a Service” like Google, Amazon, or MS …

    ‣ e.g., Private information: Medical records, location, purchase history, …

    ‣ “Trust, but verify” (Доверяй, но проверяй)
    !15

    View Slide

  16. Following the tradition in security literature …
    !16
    Alice Bob
    Defender Attacker
    e.g.,
    Service Provider
    e.g.,
    Service User

    View Slide

  17. Membership Inference Problem
    !17
    Service Provider
    Training Data

    View Slide

  18. Membership Inference Problem
    !17
    Service Provider
    Training API
    Machine Learning as a Service
    Training Data Blackbox Training

    View Slide

  19. Membership Inference Problem
    !17
    Service Provider
    Training API
    Machine Learning as a Service
    Training Data Blackbox Training Model

    View Slide

  20. Membership Inference Problem
    !17
    Service Provider
    Training API
    Machine Learning as a Service
    User
    Training Data Blackbox Training Model
    Private Data

    View Slide

  21. Membership Inference Problem
    !17
    Service Provider
    Training API
    Machine Learning as a Service
    User
    Training Data Blackbox Training Model
    Private Data
    Prediction
    API

    View Slide

  22. Membership Inference Problem
    !17
    Service Provider
    Training API
    Machine Learning as a Service
    User
    Training Data Blackbox Training Model
    Private Data Result
    Prediction
    API

    View Slide

  23. Membership Inference Problem
    !17
    Service Provider
    ?
    Training API
    Machine Learning as a Service
    User
    Training Data
    ?
    ? ?
    Blackbox Training Model
    Private Data Result
    Prediction
    API
    Is user’s private data in
    model training set?

    View Slide

  24. How can Bob “attack” Alice model?
    !18
    Service Provider
    Training API
    ML as a Service
    Training Data Target Model
    ‣ Shadow models to mimic the target model

    View Slide

  25. How can Bob “attack” Alice model?
    !18
    Service Provider
    Training API
    ML as a Service
    Training Data Target Model
    Attacker
    ‣ Shadow models to mimic the target model
    Shadow Set 1
    Shadow Set 2
    Shadow Set 3

    View Slide

  26. How can Bob “attack” Alice model?
    !18
    Service Provider
    Training API
    ML as a Service
    Training Data Target Model
    Attacker
    ‣ Shadow models to mimic the target model
    Shadow Set 1
    Shadow Set 2
    Shadow Set 3
    Assumption:
    Attacker has an access
    to the same training API
    (or knows the target model detail)

    View Slide

  27. How can Bob “attack” Alice model?
    !18
    Service Provider
    Training API
    ML as a Service
    Training Data Target Model
    Attacker
    ‣ Shadow models to mimic the target model
    Shadow Set 1
    Shadow Set 2
    Shadow Set 3
    Shadow Model 1
    Shadow Model 2
    Shadow Model 3
    Assumption:
    Attacker has an access
    to the same training API
    (or knows the target model detail)

    View Slide

  28. How can Bob “attack” Alice model?
    !18
    Service Provider
    Training API
    ML as a Service
    Training Data Target Model
    Attacker
    ‣ Shadow models to mimic the target model
    Shadow Set 1
    Shadow Set 2
    Shadow Set 3
    Shadow Model 1
    Shadow Model 2
    Shadow Model 3
    How to prepare these data?
    → explained later

    View Slide

  29. Shadow models to train a “in or out” classifier
    !19
    ‣ Bob knows what was “in or out” of his shadow model training set
    ML as a Service
    Shadow Model

    View Slide

  30. Shadow models to train a “in or out” classifier
    !19
    ‣ Bob knows what was “in or out” of his shadow model training set
    ML as a Service
    Shadow Training Data
    Shadow Model
    IN

    View Slide

  31. Shadow models to train a “in or out” classifier
    !19
    ‣ Bob knows what was “in or out” of his shadow model training set
    ML as a Service
    Shadow Training Data
    Shadow Model
    IN
    Some Other Data
    OUT

    View Slide

  32. Shadow models to train a “in or out” classifier
    !19
    ‣ Bob knows what was “in or out” of his shadow model training set
    ML as a Service
    Shadow Training Data
    Shadow Model
    Prediction
    API
    Result
    IN
    Some Other Data
    Prediction
    API
    Result
    OUT

    View Slide

  33. Shadow models to train a “in or out” classifier
    !19
    ‣ Bob knows what was “in or out” of his shadow model training set
    ML as a Service
    Shadow Training Data
    Shadow Model
    Prediction
    API
    Result
    IN
    Binary Classifier
    for Membership Inference
    Some Other Data
    Prediction
    API
    Result
    OUT

    View Slide

  34. Synthesizing data for shadow models
    ‣1. Model-based synthesis
    ‣ Using the target model itself

    ‣ “high confidence result → probably similar to the target training data”

    ‣2. Statistics-based synthesis
    ‣ Information about the population from which the target data was drawn

    ‣ → e.g., prior knowledge of the marginal distributions of features

    ‣3. Noisy real data
    ‣ Attacker has data similar to target training data

    ‣ → can consider it as a “noisy” version
    !20

    View Slide

  35. Experimental Setup: Target Models
    ‣Google Prediction API
    ‣ No configuration parameters

    ‣Amazon ML
    ‣ Few meta-parameters

    (max. number of training dat pass, regularization amount)

    ‣Local Neural Networks
    !21

    View Slide

  36. Experimental Setup: Data
    !22
    Description Target Model
    Training Set Size
    Number of
    Shadow Models
    CIFAR-{10,100} Image recognition 10: 2.5k~15k

    100: 5k~30k
    100
    Purchases Shopping history 10,000 20
    Locations Foursquare check-ins 1,200 60
    Hospital Stays Inpatients stays in facilities 10,000 10
    UCI Adult Census income 10,000 20
    MNIST Handwritten digit 10,000 50
    ‣Multi-class classification problems

    View Slide

  37. Experimental Setup: Accuracy
    ‣ Membership Inference: binary classification

    ‣ Is this “IN” or “OUT” of training set?

    ‣ Accuracy

    ‣ Same size for both sides → baseline (random): 0.5

    ‣ Precision:

    fraction of the records inferred as members of the training dataset
    that are indeed members

    ‣ Recall:

    fraction of the training records that the attacker can correctly infer
    as members
    !23

    View Slide

  38. Results: Attacks successful!
    !24
    *Figures from [Shokri+ 2017]
    Training Set Size & Attack Precision
    Precision well above 0.5

    (Recall was almost 1.0 for both datasets)

    Baseline
    (random guess)
    = 0.5
    CIFAR-10 CIFAR-100

    View Slide

  39. Attack Precision of Datasets and Overfitting
    !25
    *Figures from [Shokri+ 2017]
    Target Model Accuracy (Google models) & Attack Precision
    Mostly
    > 0.5

    View Slide

  40. Attack Precision of Datasets and Overfitting
    !25
    *Figures from [Shokri+ 2017]
    Target Model Accuracy (Google models) & Attack Precision
    Large gap = Overfitting

    View Slide

  41. Why do the attacks work?
    ‣ Factors affecting leakage

    ‣ Overfitting

    ‣ Diversity of Training Data

    ‣ Model Type

    ‣ Attack exploits the output distribution of classes labels
    returned by target model
    !26
    *Figure from [Shokri+ 2017]
    Model Train/Test Accuracy Gap & Attack Prevision

    View Slide

  42. Distributions differ for samples in / out of training data
    !27
    Prediction Accuracy
    *Figures from [Shokri+ 2017]
    Prediction Uncertainty

    View Slide

  43. Mitigation of the Attacks
    ‣ Restrict the prediction output to top k classes

    ‣ Coarsen precision of the prediction output

    ‣ Increase entropy of the prediction output

    ‣Regularization
    !28

    View Slide

  44. How about on more complex problems …?
    ‣ Membership Inference Attacks were successful

    ‣ It was on “flat classification” models

    ‣ Binary or multi-class classification

    ‣ How about more complex problems?

    ‣ e.g., structured prediction / generation

    ‣ Sequence-to-Sequence models →
    !29

    View Slide

  45. Sequence-to-Sequence Models
    [Hisamoto+ 2019]

    View Slide

  46. Will attacks work on seq2seq models?
    ‣ Previous case: “flat classification”

    ‣ Output space: fixed set of labels
    ‣ → Sequence generation

    ‣ Output space: sequence of words, length undetermined a priori
    ‣ e.g., Machine Translation, Speech Synthesis, Video Captioning, Text
    Summarization
    !31

    View Slide

  47. Machine Translation (MT) as an example
    !32
    “Given black-box access to an MT model,
    is it possible to determine whether
    a particular sentence pair was in the training set
    for that model?”

    View Slide

  48. Possible scenarios
    ‣ Bitext data provider
    ‣ Providing data under license restrictions

    → check the compliance of the license in services

    ‣ MT conference organizer
    ‣ Annual bakeoff

    → check if participants are following the rules

    ‣ “MT as a Service” provider
    ‣ Providing customized engines with user data

    → may want to provide guarantees that 

    a) user data is not used for other users’ engines

    b) if that is used for others, privacy will not be leaked
    !33

    View Slide

  49. Carol: Neutral judge, for evaluation purpose
    !34
    Alice Bob
    Defender Attacker
    e.g.,
    Service Provider
    e.g.,
    Service User
    Carol
    Judge

    View Slide

  50. Carol: Neutral judge, for evaluation purpose
    !34
    Alice Bob
    Defender Attacker
    e.g.,
    Service Provider
    e.g.,
    Service User
    Carol
    Judge
    Does not exist in
    real scenarios

    View Slide

  51. Problem Overview
    !35
    Carol
    Data
    Alice Bob

    View Slide

  52. Problem Overview
    !35
    Carol
    Data
    Alice Bob
    Evaluation set
    1. Carol splits data into
    a) Alice set
    b) Bob set
    c) Evaluation set
    Alice
    set
    Bob
    set

    View Slide

  53. Problem Overview
    !35
    Carol
    Data
    Alice Bob
    Evaluation set
    Alice
    set
    Alice
    model
    Bob
    set
    2. Alice trains her
    MT model

    View Slide

  54. Problem Overview
    !35
    Carol
    Data
    Alice Bob
    Evaluation set
    Alice
    set
    Alice
    model
    Bob
    set
    3. Bob uses his data and
    Alice model translation API
    in whatever way he wants
    to attack Alice model

    View Slide

  55. Problem Overview
    !35
    Carol
    Data
    Alice Bob
    Evaluation set
    Alice
    set
    Alice
    model
    Bob
    set
    4. Carol receives
    Bob’s attack results
    and evaluate

    View Slide

  56. Splitting Data
    !36
    OUT Probes
    ‣ Probes: sentence samples for evaluation

    ‣ {IN, OUT} Probes: {in, not in} the target model training data
    Corpus 1 Corpus 2 Corpus 3
    OUT Probes
    IN Probes IN Probes
    OOD Probes

    View Slide

  57. Splitting Data
    !36
    OUT Probes
    ‣ Probes: sentence samples for evaluation

    ‣ {IN, OUT} Probes: {in, not in} the target model training data
    Corpus 1 Corpus 2 Corpus 3
    OUT Probes
    IN Probes IN Probes
    OOD Probes

    View Slide

  58. Splitting Data
    !36
    OUT Probes
    ‣ Probes: sentence samples for evaluation

    ‣ {IN, OUT} Probes: {in, not in} the target model training data
    Corpus 1 Corpus 2 Corpus 3
    OUT Probes
    IN Probes IN Probes
    OOD Probes
    Alice model
    training data

    View Slide

  59. Splitting Data
    !36
    OUT Probes
    ‣ Probes: sentence samples for evaluation

    ‣ {IN, OUT} Probes: {in, not in} the target model training data
    Corpus 1 Corpus 2 Corpus 3
    OUT Probes
    IN Probes IN Probes
    OOD Probes
    Alice model
    training data
    Bob data
    No corpus 2, Alice IN Probes
    (Corpus 2 is like 

    MT provider in-house crawled data)

    View Slide

  60. Splitting Data
    !36
    OUT Probes
    ‣ Probes: sentence samples for evaluation

    ‣ {IN, OUT} Probes: {in, not in} the target model training data
    Corpus 1 Corpus 2 Corpus 3
    OUT Probes
    IN Probes IN Probes
    OOD Probes
    Alice model
    training data
    Out-of-domain
    (not in Alice model)
    Corpus
    Bob data
    No corpus 2, Alice IN Probes
    (Corpus 2 is like 

    MT provider in-house crawled data)

    View Slide

  61. Experimental Setup: Characters
    !37
    Alice Bob
    Defender Attacker
    Carol
    Judge

    View Slide

  62. Experimental Setup: Characters
    !37
    Defender Attacker Judge
    Matt Sorami Kevin

    View Slide

  63. Experimental Setup: Characters
    !37
    Defender Attacker Judge
    Matt Sorami Kevin
    Don’t know each other’s data or
    MT model detail (architecture, training strategy, etc)

    View Slide

  64. Experimental Setup: Data and Splits
    ‣ Data from WMT2018

    ‣ Probes: 5,000 sentence pairs per corpus
    !38
    OUT
    IN
    Common
    Crawl
    Out-of-domain
    OUT
    IN
    Europarl
    OUT
    IN
    News
    OUT
    IN
    Rapid
    OUT
    IN
    Para
    Crawl
    - EMEA
    - Subtitles
    - Koran
    - TED

    View Slide

  65. Experimental Setup: Data and Splits
    ‣ Data from WMT2018

    ‣ Probes: 5,000 sentence pairs per corpus
    !38
    OUT
    IN
    Common
    Crawl
    Out-of-domain
    OUT
    IN
    Europarl
    OUT
    IN
    News
    OUT
    IN
    Rapid
    OUT
    IN
    Para
    Crawl
    - EMEA
    - Subtitles
    - Koran
    - TED
    Alice data

    View Slide

  66. Experimental Setup: Data and Splits
    ‣ Data from WMT2018

    ‣ Probes: 5,000 sentence pairs per corpus
    !38
    OUT
    IN
    Common
    Crawl
    Out-of-domain
    OUT
    IN
    Europarl
    OUT
    IN
    News
    OUT
    IN
    Rapid
    OUT
    IN
    Para
    Crawl
    - EMEA
    - Subtitles
    - Koran
    - TED
    Alice data
    Bob data

    View Slide

  67. Experimental Setup: Evaluation Protocol
    1. Carol splits data, give to Alice and Bob

    2. Alice trains her MT model

    3. Bob uses his data in whatever way to create a classifier

    4. Carol gives Bob translation of probes by Alice model

    5. Bob infers their membership, gives results to Carol

    6. Carol evaluates the attack accuracy

    *Accuracy: percentage of probes where classification result is correct
    !39
    We will release the data (split sets, translation by Matt’s model)
    so people can try their attack methods

    View Slide

  68. Alice MT architecture (by Matt)
    ‣ BLEU: 42.6

    ‣ 6-layer Transformer

    ‣ Joint BPE subword model (32k)

    ‣ Dual conditional cross-entropy filtering for ParaCrawl

    ‣ …
    !40

    View Slide

  69. Attack: Shadow model & data splits
    ‣ This time, Bob splits his data to create 10 shadow models

    ‣ Blue: Training Data for shadow model (smaller box = IN Probes)

    ‣ Green: OUT Probes
    !41
    Splits for Shadow Models
    train, valid, test for
    the classifier to infer
    membership
    Probes for 

    shadow models
    e.g., 1+ & 1-
    IN / OUT Probes are
    flipped to
    make balanced data

    View Slide

  70. Bob MT architecture (by Sorami)
    ‣ BLEU: 38.06±0.2

    (Alice: 42.6)

    ‣ 4-layer Transformer

    (Alice: 6-layer Transformer)

    ‣ BPE subword model (30k) for each language

    (Alice: joint 32k)

    ‣ Other parameter / training strategy difference

    ‣ …
    !42

    View Slide

  71. Alice & Bob MT model difference
    ‣ This time, they happened to be not so different

    ‣What if the difference is very large?
    ‣ Model architecture difference

    ‣ Available data size

    ‣ Available computational resources

    ‣ → Even if the attack accuracy is good within Bob’s data,

    It might perform very badly with Alice data

    (and in real scenario Bob will not know that)
    !43

    View Slide

  72. Difference to the previous work [Shokri+ 2017]
    ‣ Model Training

    ‣ Bob does not have access to the training API used for Alice model

    ‣ Attacker Data

    ‣ Bob has a real subset of Alice data
    !44

    View Slide

  73. Attack Classifier for Membership Inference
    ‣ Binary Classification

    ‣ “IN” or “OUT” of the model training data?

    ‣ Features

    ‣ Modified 1-4 gram precisions

    ‣ Sentence-level BLEU scores

    ‣ Later: MT Model score - extra information for the attacker
    !45

    View Slide

  74. Attack Classifier for Membership Inference
    ‣ Binary Classification

    ‣ “IN” or “OUT” of the model training data?

    ‣ Features

    ‣ Modified 1-4 gram precisions

    ‣ Sentence-level BLEU scores

    ‣ Later: MT Model score - extra information for the attacker
    !45
    Intuition:
    If output is a “good” translation
    (i.e. similar to the reference translation),
    the model might have seen it
    in training time and memorized it

    View Slide

  75. Results: the attacks were not successful …
    ‣ Around 50%: same as by chance

    ‣ BLEU and N-gram precision: not enough information to distinguish

    ‣ Using MT model score did not help either
    !46
    Alice Bob:train Bob:valid Bob:test
    50.4 51.4 51.2 51.1
    * Accuracy for Decision Tree classifier. 

    Bob has tried several other types of classifiers but the result trends were the same.
    Attack Accuracy of Different Probes
    Accuracy low even for
    Classifier in-sample (training) data
    → Overfitting is not the problem

    View Slide

  76. Results: Out-of-domain (OOD) Corpora
    ‣ Whether the domain was in MT model training data or not

    ‣ Assumption: Model will not translate OOD sentences well

    ‣ → Much better results with OOD data
    !47
    ParaCrawl CommonCrawl Europarl News Rapid
    50.3 51.1 49.7 50.7 50.0
    Attack Accuracy: In-domain Corpora
    EMEA Koran Subtitles TED
    67.2 94.1 80.2 67.1
    Attack Accuracy: Out-of-domain Corpora

    View Slide

  77. Results: Out-of-vocab (OOV) samples
    ‣ Subset of probes containing OOV

    ‣ OOV in source (7.4%), in reference (3.2%), in both (1.9%)

    ‣ Assumption: Model will not translate sentences with OOV well

    ‣ → Like OOD cases, much better results than entire probe set
    !48
    All OOV in src OOV in ref OOV in both
    50.4 73.9 74.1 68.0
    Attack Accuracy of OOV subsets

    View Slide

  78. Why was it not successful for seq2seq?
    ‣ Why successful with “flat classification”, but not with seq2seq?

    ‣ One possible reason: model output space difference

    ‣ "Fixed set of labels” or “arbitrary length sequence”: 

    Latter if far more complex

    ‣ In “flat classification” cases it was successful because 

    the attacker exploits difference in the model output distribution

    ‣ seq2seq: how can we quantify the uncertainty of model or quality of output?

    ‣ OOD and OOV: more promising results

    ‣ Harder for the target model to produce high quality translation 

    → more distinguishable
    !49

    View Slide

  79. Further attacks & protections: Arms Race
    ‣ Multiple API attack

    ‣ Modify (e.g, drop / add word) and 

    translate same sentence multiple times, observe difference

    ‣ “Watermark” sentence

    ‣ Add characteristic samples to make it more distinguishable

    ‣ If Bob has a better chance → Protection by Alice

    ‣ Subsample data for training

    ‣ Regularization

    ‣ …
    !50
    W
    ork
    in
    Progress

    View Slide

  80. Summary
    ‣ Privacy in Machine Learning

    ‣ Membership Inference Problem

    ‣ “Was this in the model’s training data?”

    ‣ Attacker creates models to mimic the target blackbox model

    ‣ Empirical Results

    ‣ Multi-class Classification: attack successful

    Exploits output distribution difference

    ‣ Sequence Generation: attack not successful (so far) 

    More complex output space
    !51
    More at arxiv.org/abs/1904.05506

    View Slide