Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Membership Inference Attacks on Sequence-to-Seq...

Membership Inference Attacks on Sequence-to-Sequence Models

Membership Inference Attacks on Sequence-to-Sequence Models
A Case in Privacy-preserving Neural Machine Translation

paper
https://arxiv.org/abs/1904.05506

@ CLSP Seminar
Center for Language and Speech Processing, Johns Hopkins University
https://www.clsp.jhu.edu/events/aswin-subramanian/?instance_id=2832

Sorami Hisamoto

May 03, 2019
Tweet

More Decks by Sorami Hisamoto

Other Decks in Research

Transcript

  1. Membership Inference Attacks on Sequence-to-Sequence Models A Case in Privacy-preserving

    Neural Machine Translation May 3, 2019 Sorami Hisamoto A work with Kevin Duh & Matt Post arxiv.org/abs/1904.05506
  2. Summary ‣ Privacy in Machine Learning
 ‣ Membership Inference Problem

    ‣ “Was this in the model’s training data?” ‣ Attacker creates models to mimic the target blackbox model
 ‣ Empirical Results ‣ Multi-class Classification: attack successful
 Exploits output distribution difference ‣ Sequence Generation: attack not successful (so far) 
 More complex output space !2
  3. Self Introduction: Sorami Hisamoto ‣ Visiting Researcher Oct 2018 -

    Jun 2019 ‣ Before: NAIST, Japan ‣ Studied under Kevin Duh & Yuji Matsumoto ‣ Word Representations and Dependency Parsing (2012-2014) ‣ Now: WAP Tokushima AI & NLP Lab. ‣ NLP applications to enterprise services ‣ Morphological analysis: Sudachi → !3
  4. ‣ Determining “words” in a Japanese sentence is difficult! ‣

    Dictionary: 3 million vocabs, constantly updating ‣ Code on GitHub: Java, Python, Elasticsearch plugin ‣ A paper in LREC2018 Sudachi: A Japanese Tokenizer for Business !4
  5. Privacy is more important than ever! ‣ More important in

    recent societies ‣ More data to collect ‣ Usefulness of data
 ‣ Data → for Machine Learning (ML) … ‣ Increasing interest in the research communities !7 irasutoya.com
  6. Privacy & Natural Language Processing ‣ Some, not many yet;

    ‣ “Towards Robust and Privacy-preserving Text Representations”
 Yitong Li, Timothy Baldwin, Trevor Cohn. ACL2018 (short) ‣ “Privacy-preserving Neural Representations of Text”
 Maximin Coavoux, Shashi Narayan, Shay Cohen. EMNLP2018 ‣ “Adversarial Removal of Demographic Attributes from Text Data”
 Yanai Elazar, Yoav Goldberg. EMNLP2018 !12
  7. Different kind of problems in ML Privacy ‣Model Inversion ‣

    Uses model’s output on a hidden input to infer something about this input ‣Differential Privacy ‣ Will the model behave differently if a particular data is removed / added to the training data? ‣Membership Inference → !13
  8. “Was this in the training data?” ‣ Given a blackbox

    machine learning model, 
 can you guess if a data sample was in the training data?
 *Blackbox: no info about model detail; Only access to the API to send input & receive result ‣ [Shokri+ 2017] “Membership Inference Attacks against Machine Learning Models” (IEEE Symposium on Security and Privacy) ‣ Important in real world situations ‣ e.g., “ML as a Service” like Google, Amazon, or MS … ‣ e.g., Private information: Medical records, location, purchase history, … ‣ “Trust, but verify” (Доверяй, но проверяй) !15
  9. Following the tradition in security literature … !16 Alice Bob

    Defender Attacker e.g., Service Provider e.g., Service User
  10. Membership Inference Problem !17 Service Provider Training API Machine Learning

    as a Service Training Data Blackbox Training Model
  11. Membership Inference Problem !17 Service Provider Training API Machine Learning

    as a Service User Training Data Blackbox Training Model Private Data
  12. Membership Inference Problem !17 Service Provider Training API Machine Learning

    as a Service User Training Data Blackbox Training Model Private Data Prediction API
  13. Membership Inference Problem !17 Service Provider Training API Machine Learning

    as a Service User Training Data Blackbox Training Model Private Data Result Prediction API
  14. Membership Inference Problem !17 Service Provider ? Training API Machine

    Learning as a Service User Training Data ? ? ? Blackbox Training Model Private Data Result Prediction API Is user’s private data in model training set?
  15. How can Bob “attack” Alice model? !18 Service Provider Training

    API ML as a Service Training Data Target Model ‣ Shadow models to mimic the target model
  16. How can Bob “attack” Alice model? !18 Service Provider Training

    API ML as a Service Training Data Target Model Attacker ‣ Shadow models to mimic the target model Shadow Set 1 Shadow Set 2 Shadow Set 3
  17. How can Bob “attack” Alice model? !18 Service Provider Training

    API ML as a Service Training Data Target Model Attacker ‣ Shadow models to mimic the target model Shadow Set 1 Shadow Set 2 Shadow Set 3 Assumption: Attacker has an access to the same training API (or knows the target model detail)
  18. How can Bob “attack” Alice model? !18 Service Provider Training

    API ML as a Service Training Data Target Model Attacker ‣ Shadow models to mimic the target model Shadow Set 1 Shadow Set 2 Shadow Set 3 Shadow Model 1 Shadow Model 2 Shadow Model 3 Assumption: Attacker has an access to the same training API (or knows the target model detail)
  19. How can Bob “attack” Alice model? !18 Service Provider Training

    API ML as a Service Training Data Target Model Attacker ‣ Shadow models to mimic the target model Shadow Set 1 Shadow Set 2 Shadow Set 3 Shadow Model 1 Shadow Model 2 Shadow Model 3 How to prepare these data? → explained later
  20. Shadow models to train a “in or out” classifier !19

    ‣ Bob knows what was “in or out” of his shadow model training set ML as a Service Shadow Model
  21. Shadow models to train a “in or out” classifier !19

    ‣ Bob knows what was “in or out” of his shadow model training set ML as a Service Shadow Training Data Shadow Model IN
  22. Shadow models to train a “in or out” classifier !19

    ‣ Bob knows what was “in or out” of his shadow model training set ML as a Service Shadow Training Data Shadow Model IN Some Other Data OUT
  23. Shadow models to train a “in or out” classifier !19

    ‣ Bob knows what was “in or out” of his shadow model training set ML as a Service Shadow Training Data Shadow Model Prediction API Result IN Some Other Data Prediction API Result OUT
  24. Shadow models to train a “in or out” classifier !19

    ‣ Bob knows what was “in or out” of his shadow model training set ML as a Service Shadow Training Data Shadow Model Prediction API Result IN Binary Classifier for Membership Inference Some Other Data Prediction API Result OUT
  25. Synthesizing data for shadow models ‣1. Model-based synthesis ‣ Using

    the target model itself ‣ “high confidence result → probably similar to the target training data” ‣2. Statistics-based synthesis ‣ Information about the population from which the target data was drawn ‣ → e.g., prior knowledge of the marginal distributions of features ‣3. Noisy real data ‣ Attacker has data similar to target training data ‣ → can consider it as a “noisy” version !20
  26. Experimental Setup: Target Models ‣Google Prediction API ‣ No configuration

    parameters ‣Amazon ML ‣ Few meta-parameters
 (max. number of training dat pass, regularization amount) ‣Local Neural Networks !21
  27. Experimental Setup: Data !22 Description Target Model Training Set Size

    Number of Shadow Models CIFAR-{10,100} Image recognition 10: 2.5k~15k 100: 5k~30k 100 Purchases Shopping history 10,000 20 Locations Foursquare check-ins 1,200 60 Hospital Stays Inpatients stays in facilities 10,000 10 UCI Adult Census income 10,000 20 MNIST Handwritten digit 10,000 50 ‣Multi-class classification problems
  28. Experimental Setup: Accuracy ‣ Membership Inference: binary classification ‣ Is

    this “IN” or “OUT” of training set? ‣ Accuracy ‣ Same size for both sides → baseline (random): 0.5 ‣ Precision:
 fraction of the records inferred as members of the training dataset that are indeed members ‣ Recall:
 fraction of the training records that the attacker can correctly infer as members !23
  29. Results: Attacks successful! !24 *Figures from [Shokri+ 2017] Training Set

    Size & Attack Precision Precision well above 0.5 (Recall was almost 1.0 for both datasets) Baseline (random guess) = 0.5 CIFAR-10 CIFAR-100
  30. Attack Precision of Datasets and Overfitting !25 *Figures from [Shokri+

    2017] Target Model Accuracy (Google models) & Attack Precision Mostly > 0.5
  31. Attack Precision of Datasets and Overfitting !25 *Figures from [Shokri+

    2017] Target Model Accuracy (Google models) & Attack Precision Large gap = Overfitting
  32. Why do the attacks work? ‣ Factors affecting leakage ‣

    Overfitting ‣ Diversity of Training Data ‣ Model Type ‣ Attack exploits the output distribution of classes labels returned by target model !26 *Figure from [Shokri+ 2017] Model Train/Test Accuracy Gap & Attack Prevision
  33. Distributions differ for samples in / out of training data

    !27 Prediction Accuracy *Figures from [Shokri+ 2017] Prediction Uncertainty
  34. Mitigation of the Attacks ‣ Restrict the prediction output to

    top k classes ‣ Coarsen precision of the prediction output ‣ Increase entropy of the prediction output ‣Regularization !28
  35. How about on more complex problems …? ‣ Membership Inference

    Attacks were successful ‣ It was on “flat classification” models ‣ Binary or multi-class classification ‣ How about more complex problems? ‣ e.g., structured prediction / generation ‣ Sequence-to-Sequence models → !29
  36. Will attacks work on seq2seq models? ‣ Previous case: “flat

    classification” ‣ Output space: fixed set of labels ‣ → Sequence generation ‣ Output space: sequence of words, length undetermined a priori ‣ e.g., Machine Translation, Speech Synthesis, Video Captioning, Text Summarization !31
  37. Machine Translation (MT) as an example !32 “Given black-box access

    to an MT model, is it possible to determine whether a particular sentence pair was in the training set for that model?”
  38. Possible scenarios ‣ Bitext data provider ‣ Providing data under

    license restrictions
 → check the compliance of the license in services ‣ MT conference organizer ‣ Annual bakeoff
 → check if participants are following the rules ‣ “MT as a Service” provider ‣ Providing customized engines with user data
 → may want to provide guarantees that 
 a) user data is not used for other users’ engines
 b) if that is used for others, privacy will not be leaked !33
  39. Carol: Neutral judge, for evaluation purpose !34 Alice Bob Defender

    Attacker e.g., Service Provider e.g., Service User Carol Judge
  40. Carol: Neutral judge, for evaluation purpose !34 Alice Bob Defender

    Attacker e.g., Service Provider e.g., Service User Carol Judge Does not exist in real scenarios
  41. Problem Overview !35 Carol Data Alice Bob Evaluation set 1.

    Carol splits data into a) Alice set b) Bob set c) Evaluation set Alice set Bob set
  42. Problem Overview !35 Carol Data Alice Bob Evaluation set Alice

    set Alice model Bob set 2. Alice trains her MT model
  43. Problem Overview !35 Carol Data Alice Bob Evaluation set Alice

    set Alice model Bob set 3. Bob uses his data and Alice model translation API in whatever way he wants to attack Alice model
  44. Problem Overview !35 Carol Data Alice Bob Evaluation set Alice

    set Alice model Bob set 4. Carol receives Bob’s attack results and evaluate
  45. Splitting Data !36 OUT Probes ‣ Probes: sentence samples for

    evaluation ‣ {IN, OUT} Probes: {in, not in} the target model training data Corpus 1 Corpus 2 Corpus 3 OUT Probes IN Probes IN Probes OOD Probes
  46. Splitting Data !36 OUT Probes ‣ Probes: sentence samples for

    evaluation ‣ {IN, OUT} Probes: {in, not in} the target model training data Corpus 1 Corpus 2 Corpus 3 OUT Probes IN Probes IN Probes OOD Probes
  47. Splitting Data !36 OUT Probes ‣ Probes: sentence samples for

    evaluation ‣ {IN, OUT} Probes: {in, not in} the target model training data Corpus 1 Corpus 2 Corpus 3 OUT Probes IN Probes IN Probes OOD Probes Alice model training data
  48. Splitting Data !36 OUT Probes ‣ Probes: sentence samples for

    evaluation ‣ {IN, OUT} Probes: {in, not in} the target model training data Corpus 1 Corpus 2 Corpus 3 OUT Probes IN Probes IN Probes OOD Probes Alice model training data Bob data No corpus 2, Alice IN Probes (Corpus 2 is like 
 MT provider in-house crawled data)
  49. Splitting Data !36 OUT Probes ‣ Probes: sentence samples for

    evaluation ‣ {IN, OUT} Probes: {in, not in} the target model training data Corpus 1 Corpus 2 Corpus 3 OUT Probes IN Probes IN Probes OOD Probes Alice model training data Out-of-domain (not in Alice model) Corpus Bob data No corpus 2, Alice IN Probes (Corpus 2 is like 
 MT provider in-house crawled data)
  50. Experimental Setup: Characters !37 Defender Attacker Judge Matt Sorami Kevin

    Don’t know each other’s data or MT model detail (architecture, training strategy, etc)
  51. Experimental Setup: Data and Splits ‣ Data from WMT2018 ‣

    Probes: 5,000 sentence pairs per corpus !38 OUT IN Common Crawl Out-of-domain OUT IN Europarl OUT IN News OUT IN Rapid OUT IN Para Crawl - EMEA - Subtitles - Koran - TED
  52. Experimental Setup: Data and Splits ‣ Data from WMT2018 ‣

    Probes: 5,000 sentence pairs per corpus !38 OUT IN Common Crawl Out-of-domain OUT IN Europarl OUT IN News OUT IN Rapid OUT IN Para Crawl - EMEA - Subtitles - Koran - TED Alice data
  53. Experimental Setup: Data and Splits ‣ Data from WMT2018 ‣

    Probes: 5,000 sentence pairs per corpus !38 OUT IN Common Crawl Out-of-domain OUT IN Europarl OUT IN News OUT IN Rapid OUT IN Para Crawl - EMEA - Subtitles - Koran - TED Alice data Bob data
  54. Experimental Setup: Evaluation Protocol 1. Carol splits data, give to

    Alice and Bob 2. Alice trains her MT model 3. Bob uses his data in whatever way to create a classifier 4. Carol gives Bob translation of probes by Alice model 5. Bob infers their membership, gives results to Carol 6. Carol evaluates the attack accuracy
 *Accuracy: percentage of probes where classification result is correct !39 We will release the data (split sets, translation by Matt’s model) so people can try their attack methods
  55. Alice MT architecture (by Matt) ‣ BLEU: 42.6 ‣ 6-layer

    Transformer ‣ Joint BPE subword model (32k) ‣ Dual conditional cross-entropy filtering for ParaCrawl ‣ … !40
  56. Attack: Shadow model & data splits ‣ This time, Bob

    splits his data to create 10 shadow models ‣ Blue: Training Data for shadow model (smaller box = IN Probes) ‣ Green: OUT Probes !41 Splits for Shadow Models train, valid, test for the classifier to infer membership Probes for 
 shadow models e.g., 1+ & 1- IN / OUT Probes are flipped to make balanced data
  57. Bob MT architecture (by Sorami) ‣ BLEU: 38.06±0.2
 (Alice: 42.6)

    ‣ 4-layer Transformer
 (Alice: 6-layer Transformer) ‣ BPE subword model (30k) for each language
 (Alice: joint 32k) ‣ Other parameter / training strategy difference ‣ … !42
  58. Alice & Bob MT model difference ‣ This time, they

    happened to be not so different ‣What if the difference is very large? ‣ Model architecture difference ‣ Available data size ‣ Available computational resources ‣ → Even if the attack accuracy is good within Bob’s data,
 It might perform very badly with Alice data
 (and in real scenario Bob will not know that) !43
  59. Difference to the previous work [Shokri+ 2017] ‣ Model Training

    ‣ Bob does not have access to the training API used for Alice model ‣ Attacker Data ‣ Bob has a real subset of Alice data !44
  60. Attack Classifier for Membership Inference ‣ Binary Classification ‣ “IN”

    or “OUT” of the model training data? ‣ Features ‣ Modified 1-4 gram precisions ‣ Sentence-level BLEU scores ‣ Later: MT Model score - extra information for the attacker !45
  61. Attack Classifier for Membership Inference ‣ Binary Classification ‣ “IN”

    or “OUT” of the model training data? ‣ Features ‣ Modified 1-4 gram precisions ‣ Sentence-level BLEU scores ‣ Later: MT Model score - extra information for the attacker !45 Intuition: If output is a “good” translation (i.e. similar to the reference translation), the model might have seen it in training time and memorized it
  62. Results: the attacks were not successful … ‣ Around 50%:

    same as by chance ‣ BLEU and N-gram precision: not enough information to distinguish ‣ Using MT model score did not help either !46 Alice Bob:train Bob:valid Bob:test 50.4 51.4 51.2 51.1 * Accuracy for Decision Tree classifier. 
 Bob has tried several other types of classifiers but the result trends were the same. Attack Accuracy of Different Probes Accuracy low even for Classifier in-sample (training) data → Overfitting is not the problem
  63. Results: Out-of-domain (OOD) Corpora ‣ Whether the domain was in

    MT model training data or not ‣ Assumption: Model will not translate OOD sentences well ‣ → Much better results with OOD data !47 ParaCrawl CommonCrawl Europarl News Rapid 50.3 51.1 49.7 50.7 50.0 Attack Accuracy: In-domain Corpora EMEA Koran Subtitles TED 67.2 94.1 80.2 67.1 Attack Accuracy: Out-of-domain Corpora
  64. Results: Out-of-vocab (OOV) samples ‣ Subset of probes containing OOV

    ‣ OOV in source (7.4%), in reference (3.2%), in both (1.9%) ‣ Assumption: Model will not translate sentences with OOV well ‣ → Like OOD cases, much better results than entire probe set !48 All OOV in src OOV in ref OOV in both 50.4 73.9 74.1 68.0 Attack Accuracy of OOV subsets
  65. Why was it not successful for seq2seq? ‣ Why successful

    with “flat classification”, but not with seq2seq? ‣ One possible reason: model output space difference ‣ "Fixed set of labels” or “arbitrary length sequence”: 
 Latter if far more complex ‣ In “flat classification” cases it was successful because 
 the attacker exploits difference in the model output distribution ‣ seq2seq: how can we quantify the uncertainty of model or quality of output? ‣ OOD and OOV: more promising results ‣ Harder for the target model to produce high quality translation 
 → more distinguishable !49
  66. Further attacks & protections: Arms Race ‣ Multiple API attack

    ‣ Modify (e.g, drop / add word) and 
 translate same sentence multiple times, observe difference ‣ “Watermark” sentence ‣ Add characteristic samples to make it more distinguishable ‣ If Bob has a better chance → Protection by Alice ‣ Subsample data for training ‣ Regularization ‣ … !50 W ork in Progress
  67. Summary ‣ Privacy in Machine Learning
 ‣ Membership Inference Problem

    ‣ “Was this in the model’s training data?” ‣ Attacker creates models to mimic the target blackbox model
 ‣ Empirical Results ‣ Multi-class Classification: attack successful
 Exploits output distribution difference ‣ Sequence Generation: attack not successful (so far) 
 More complex output space !51 More at arxiv.org/abs/1904.05506