Membership Inference Attacks on Sequence-to-Sequence Models

Slide 1

Slide 1 text

Membership Inference Attacks on Sequence-to-Sequence Models A Case in Privacy-preserving Neural Machine Translation May 3, 2019 Sorami Hisamoto A work with Kevin Duh & Matt Post arxiv.org/abs/1904.05506

Slide 2

Slide 2 text

Summary ‣ Privacy in Machine Learning  ‣ Membership Inference Problem ‣ “Was this in the model’s training data?” ‣ Attacker creates models to mimic the target blackbox model  ‣ Empirical Results ‣ Multi-class Classiﬁcation: attack successful  Exploits output distribution diﬀerence ‣ Sequence Generation: attack not successful (so far)   More complex output space !2

Slide 3

Slide 3 text

Self Introduction: Sorami Hisamoto ‣ Visiting Researcher Oct 2018 - Jun 2019 ‣ Before: NAIST, Japan ‣ Studied under Kevin Duh & Yuji Matsumoto ‣ Word Representations and Dependency Parsing (2012-2014) ‣ Now: WAP Tokushima AI & NLP Lab. ‣ NLP applications to enterprise services ‣ Morphological analysis: Sudachi → !3

Slide 4

Slide 4 text

‣ Determining “words” in a Japanese sentence is diﬃcult! ‣ Dictionary: 3 million vocabs, constantly updating ‣ Code on GitHub: Java, Python, Elasticsearch plugin ‣ A paper in LREC2018 Sudachi: A Japanese Tokenizer for Business !4

Slide 5

Slide 5 text

Privacy & Machine Learning

Slide 6

Slide 6 text

!6 nytimes.com/interactive/2019/opinion/ internet-privacy-project.html

Slide 7

Slide 7 text

Privacy is more important than ever! ‣ More important in recent societies ‣ More data to collect ‣ Usefulness of data  ‣ Data → for Machine Learning (ML) … ‣ Increasing interest in the research communities !7 irasutoya.com

Slide 8

Slide 8 text

NeurIPS2018

Slide 9

Slide 9 text

ICML2019

Slide 10

Slide 10 text

ICJAI2019

Slide 11

Slide 11 text

ISSP2019

Slide 12

Slide 12 text

Privacy & Natural Language Processing ‣ Some, not many yet; ‣ “Towards Robust and Privacy-preserving Text Representations”  Yitong Li, Timothy Baldwin, Trevor Cohn. ACL2018 (short) ‣ “Privacy-preserving Neural Representations of Text”  Maximin Coavoux, Shashi Narayan, Shay Cohen. EMNLP2018 ‣ “Adversarial Removal of Demographic Attributes from Text Data”  Yanai Elazar, Yoav Goldberg. EMNLP2018 !12

Slide 13

Slide 13 text

Different kind of problems in ML Privacy ‣Model Inversion ‣ Uses model’s output on a hidden input to infer something about this input ‣Diﬀerential Privacy ‣ Will the model behave diﬀerently if a particular data is removed / added to the training data? ‣Membership Inference → !13

Slide 14

Slide 14 text

Membership Inference Attacks [Shokri+ 2017]

Slide 15

Slide 15 text

“Was this in the training data?” ‣ Given a blackbox machine learning model,   can you guess if a data sample was in the training data?  *Blackbox: no info about model detail; Only access to the API to send input & receive result ‣ [Shokri+ 2017] “Membership Inference Attacks against Machine Learning Models” (IEEE Symposium on Security and Privacy) ‣ Important in real world situations ‣ e.g., “ML as a Service” like Google, Amazon, or MS … ‣ e.g., Private information: Medical records, location, purchase history, … ‣ “Trust, but verify” (Доверяй, но проверяй) !15

Slide 16

Slide 16 text

Following the tradition in security literature … !16 Alice Bob Defender Attacker e.g., Service Provider e.g., Service User

Slide 17

Slide 17 text

Membership Inference Problem !17 Service Provider Training Data

Slide 18

Slide 18 text

Membership Inference Problem !17 Service Provider Training API Machine Learning as a Service Training Data Blackbox Training

Slide 19

Slide 19 text

Membership Inference Problem !17 Service Provider Training API Machine Learning as a Service Training Data Blackbox Training Model

Slide 20

Slide 20 text

Membership Inference Problem !17 Service Provider Training API Machine Learning as a Service User Training Data Blackbox Training Model Private Data

Slide 21

Slide 21 text

Membership Inference Problem !17 Service Provider Training API Machine Learning as a Service User Training Data Blackbox Training Model Private Data Prediction API

Slide 22

Slide 22 text

Membership Inference Problem !17 Service Provider Training API Machine Learning as a Service User Training Data Blackbox Training Model Private Data Result Prediction API

Slide 23

Slide 23 text

Membership Inference Problem !17 Service Provider ? Training API Machine Learning as a Service User Training Data ? ? ? Blackbox Training Model Private Data Result Prediction API Is user’s private data in model training set?

Slide 24

Slide 24 text

How can Bob “attack” Alice model? !18 Service Provider Training API ML as a Service Training Data Target Model ‣ Shadow models to mimic the target model

Slide 25

Slide 25 text

Slide 26

Slide 26 text

How can Bob “attack” Alice model? !18 Service Provider Training API ML as a Service Training Data Target Model Attacker ‣ Shadow models to mimic the target model Shadow Set 1 Shadow Set 2 Shadow Set 3 Assumption: Attacker has an access to the same training API (or knows the target model detail)

Slide 27

Slide 27 text

How can Bob “attack” Alice model? !18 Service Provider Training API ML as a Service Training Data Target Model Attacker ‣ Shadow models to mimic the target model Shadow Set 1 Shadow Set 2 Shadow Set 3 Shadow Model 1 Shadow Model 2 Shadow Model 3 Assumption: Attacker has an access to the same training API (or knows the target model detail)

Slide 28

Slide 28 text

How can Bob “attack” Alice model? !18 Service Provider Training API ML as a Service Training Data Target Model Attacker ‣ Shadow models to mimic the target model Shadow Set 1 Shadow Set 2 Shadow Set 3 Shadow Model 1 Shadow Model 2 Shadow Model 3 How to prepare these data? → explained later

Slide 29

Slide 29 text

Shadow models to train a “in or out” classiﬁer !19 ‣ Bob knows what was “in or out” of his shadow model training set ML as a Service Shadow Model

Slide 30

Slide 30 text

Shadow models to train a “in or out” classiﬁer !19 ‣ Bob knows what was “in or out” of his shadow model training set ML as a Service Shadow Training Data Shadow Model IN

Slide 31

Slide 31 text

Shadow models to train a “in or out” classiﬁer !19 ‣ Bob knows what was “in or out” of his shadow model training set ML as a Service Shadow Training Data Shadow Model IN Some Other Data OUT

Slide 32

Slide 32 text

Shadow models to train a “in or out” classiﬁer !19 ‣ Bob knows what was “in or out” of his shadow model training set ML as a Service Shadow Training Data Shadow Model Prediction API Result IN Some Other Data Prediction API Result OUT

Slide 33

Slide 33 text

Slide 34

Slide 34 text

Synthesizing data for shadow models ‣1. Model-based synthesis ‣ Using the target model itself ‣ “high conﬁdence result → probably similar to the target training data” ‣2. Statistics-based synthesis ‣ Information about the population from which the target data was drawn ‣ → e.g., prior knowledge of the marginal distributions of features ‣3. Noisy real data ‣ Attacker has data similar to target training data ‣ → can consider it as a “noisy” version !20

Slide 35

Slide 35 text

Experimental Setup: Target Models ‣Google Prediction API ‣ No conﬁguration parameters ‣Amazon ML ‣ Few meta-parameters  (max. number of training dat pass, regularization amount) ‣Local Neural Networks !21

Slide 36

Slide 36 text

Experimental Setup: Data !22 Description Target Model Training Set Size Number of Shadow Models CIFAR-{10,100} Image recognition 10: 2.5k~15k 100: 5k~30k 100 Purchases Shopping history 10,000 20 Locations Foursquare check-ins 1,200 60 Hospital Stays Inpatients stays in facilities 10,000 10 UCI Adult Census income 10,000 20 MNIST Handwritten digit 10,000 50 ‣Multi-class classiﬁcation problems

Slide 37

Slide 37 text

Experimental Setup: Accuracy ‣ Membership Inference: binary classiﬁcation ‣ Is this “IN” or “OUT” of training set? ‣ Accuracy ‣ Same size for both sides → baseline (random): 0.5 ‣ Precision:  fraction of the records inferred as members of the training dataset that are indeed members ‣ Recall:  fraction of the training records that the attacker can correctly infer as members !23

Slide 38

Slide 38 text

Results: Attacks successful! !24 *Figures from [Shokri+ 2017] Training Set Size & Attack Precision Precision well above 0.5 (Recall was almost 1.0 for both datasets) Baseline (random guess) = 0.5 CIFAR-10 CIFAR-100

Slide 39

Slide 39 text

Attack Precision of Datasets and Overﬁtting !25 *Figures from [Shokri+ 2017] Target Model Accuracy (Google models) & Attack Precision Mostly > 0.5

Slide 40

Slide 40 text

Attack Precision of Datasets and Overﬁtting !25 *Figures from [Shokri+ 2017] Target Model Accuracy (Google models) & Attack Precision Large gap = Overﬁtting

Slide 41

Slide 41 text

Why do the attacks work? ‣ Factors aﬀecting leakage ‣ Overﬁtting ‣ Diversity of Training Data ‣ Model Type ‣ Attack exploits the output distribution of classes labels returned by target model !26 *Figure from [Shokri+ 2017] Model Train/Test Accuracy Gap & Attack Prevision

Slide 42

Slide 42 text

Distributions differ for samples in / out of training data !27 Prediction Accuracy *Figures from [Shokri+ 2017] Prediction Uncertainty

Slide 43

Slide 43 text

Mitigation of the Attacks ‣ Restrict the prediction output to top k classes ‣ Coarsen precision of the prediction output ‣ Increase entropy of the prediction output ‣Regularization !28

Slide 44

Slide 44 text

How about on more complex problems …? ‣ Membership Inference Attacks were successful ‣ It was on “flat classification” models ‣ Binary or multi-class classification ‣ How about more complex problems? ‣ e.g., structured prediction / generation ‣ Sequence-to-Sequence models → !29

Slide 45

Slide 45 text

Sequence-to-Sequence Models [Hisamoto+ 2019]

Slide 46

Slide 46 text

Will attacks work on seq2seq models? ‣ Previous case: “flat classification” ‣ Output space: fixed set of labels ‣ → Sequence generation ‣ Output space: sequence of words, length undetermined a priori ‣ e.g., Machine Translation, Speech Synthesis, Video Captioning, Text Summarization !31

Slide 47

Slide 47 text

Machine Translation (MT) as an example !32 “Given black-box access to an MT model, is it possible to determine whether a particular sentence pair was in the training set for that model?”

Slide 48

Slide 48 text

Possible scenarios ‣ Bitext data provider ‣ Providing data under license restrictions  → check the compliance of the license in services ‣ MT conference organizer ‣ Annual bakeoﬀ  → check if participants are following the rules ‣ “MT as a Service” provider ‣ Providing customized engines with user data  → may want to provide guarantees that   a) user data is not used for other users’ engines  b) if that is used for others, privacy will not be leaked !33

Slide 49

Slide 49 text

Carol: Neutral judge, for evaluation purpose !34 Alice Bob Defender Attacker e.g., Service Provider e.g., Service User Carol Judge

Slide 50

Slide 50 text

Carol: Neutral judge, for evaluation purpose !34 Alice Bob Defender Attacker e.g., Service Provider e.g., Service User Carol Judge Does not exist in real scenarios

Slide 51

Slide 51 text

Problem Overview !35 Carol Data Alice Bob

Slide 52

Slide 52 text

Problem Overview !35 Carol Data Alice Bob Evaluation set 1. Carol splits data into a) Alice set b) Bob set c) Evaluation set Alice set Bob set

Slide 53

Slide 53 text

Problem Overview !35 Carol Data Alice Bob Evaluation set Alice set Alice model Bob set 2. Alice trains her MT model

Slide 54

Slide 54 text

Problem Overview !35 Carol Data Alice Bob Evaluation set Alice set Alice model Bob set 3. Bob uses his data and Alice model translation API in whatever way he wants to attack Alice model

Slide 55

Slide 55 text

Problem Overview !35 Carol Data Alice Bob Evaluation set Alice set Alice model Bob set 4. Carol receives Bob’s attack results and evaluate

Slide 56

Slide 56 text

Slide 57

Slide 57 text

Slide 58

Slide 58 text

Slide 59

Slide 59 text

Splitting Data !36 OUT Probes ‣ Probes: sentence samples for evaluation ‣ {IN, OUT} Probes: {in, not in} the target model training data Corpus 1 Corpus 2 Corpus 3 OUT Probes IN Probes IN Probes OOD Probes Alice model training data Bob data No corpus 2, Alice IN Probes (Corpus 2 is like   MT provider in-house crawled data)

Slide 60

Slide 60 text

Splitting Data !36 OUT Probes ‣ Probes: sentence samples for evaluation ‣ {IN, OUT} Probes: {in, not in} the target model training data Corpus 1 Corpus 2 Corpus 3 OUT Probes IN Probes IN Probes OOD Probes Alice model training data Out-of-domain (not in Alice model) Corpus Bob data No corpus 2, Alice IN Probes (Corpus 2 is like   MT provider in-house crawled data)

Slide 61

Slide 61 text

Experimental Setup: Characters !37 Alice Bob Defender Attacker Carol Judge

Slide 62

Slide 62 text

Experimental Setup: Characters !37 Defender Attacker Judge Matt Sorami Kevin

Slide 63

Slide 63 text

Experimental Setup: Characters !37 Defender Attacker Judge Matt Sorami Kevin Don’t know each other’s data or MT model detail (architecture, training strategy, etc)

Slide 64

Slide 64 text

Experimental Setup: Data and Splits ‣ Data from WMT2018 ‣ Probes: 5,000 sentence pairs per corpus !38 OUT IN Common Crawl Out-of-domain OUT IN Europarl OUT IN News OUT IN Rapid OUT IN Para Crawl - EMEA - Subtitles - Koran - TED

Slide 65

Slide 65 text

Slide 66

Slide 66 text

Slide 67

Slide 67 text

Experimental Setup: Evaluation Protocol 1. Carol splits data, give to Alice and Bob 2. Alice trains her MT model 3. Bob uses his data in whatever way to create a classiﬁer 4. Carol gives Bob translation of probes by Alice model 5. Bob infers their membership, gives results to Carol 6. Carol evaluates the attack accuracy  *Accuracy: percentage of probes where classiﬁcation result is correct !39 We will release the data (split sets, translation by Matt’s model) so people can try their attack methods

Slide 68

Slide 68 text

Alice MT architecture (by Matt) ‣ BLEU: 42.6 ‣ 6-layer Transformer ‣ Joint BPE subword model (32k) ‣ Dual conditional cross-entropy ﬁltering for ParaCrawl ‣ … !40

Slide 69

Slide 69 text

Attack: Shadow model & data splits ‣ This time, Bob splits his data to create 10 shadow models ‣ Blue: Training Data for shadow model (smaller box = IN Probes) ‣ Green: OUT Probes !41 Splits for Shadow Models train, valid, test for the classiﬁer to infer membership Probes for   shadow models e.g., 1+ & 1- IN / OUT Probes are ﬂipped to make balanced data

Slide 70

Slide 70 text

Bob MT architecture (by Sorami) ‣ BLEU: 38.06±0.2  (Alice: 42.6) ‣ 4-layer Transformer  (Alice: 6-layer Transformer) ‣ BPE subword model (30k) for each language  (Alice: joint 32k) ‣ Other parameter / training strategy diﬀerence ‣ … !42

Slide 71

Slide 71 text

Alice & Bob MT model difference ‣ This time, they happened to be not so different ‣What if the difference is very large? ‣ Model architecture difference ‣ Available data size ‣ Available computational resources ‣ → Even if the attack accuracy is good within Bob’s data,  It might perform very badly with Alice data  (and in real scenario Bob will not know that) !43

Slide 72

Slide 72 text

Difference to the previous work [Shokri+ 2017] ‣ Model Training ‣ Bob does not have access to the training API used for Alice model ‣ Attacker Data ‣ Bob has a real subset of Alice data !44

Slide 73

Slide 73 text

Slide 74

Slide 74 text

Attack Classifier for Membership Inference ‣ Binary Classification ‣ “IN” or “OUT” of the model training data? ‣ Features ‣ Modified 1-4 gram precisions ‣ Sentence-level BLEU scores ‣ Later: MT Model score - extra information for the attacker !45 Intuition: If output is a “good” translation (i.e. similar to the reference translation), the model might have seen it in training time and memorized it

Slide 75

Slide 75 text

Results: the attacks were not successful … ‣ Around 50%: same as by chance ‣ BLEU and N-gram precision: not enough information to distinguish ‣ Using MT model score did not help either !46 Alice Bob:train Bob:valid Bob:test 50.4 51.4 51.2 51.1 * Accuracy for Decision Tree classifier.   Bob has tried several other types of classifiers but the result trends were the same. Attack Accuracy of Different Probes Accuracy low even for Classifier in-sample (training) data → Overfitting is not the problem

Slide 76

Slide 76 text

Results: Out-of-domain (OOD) Corpora ‣ Whether the domain was in MT model training data or not ‣ Assumption: Model will not translate OOD sentences well ‣ → Much better results with OOD data !47 ParaCrawl CommonCrawl Europarl News Rapid 50.3 51.1 49.7 50.7 50.0 Attack Accuracy: In-domain Corpora EMEA Koran Subtitles TED 67.2 94.1 80.2 67.1 Attack Accuracy: Out-of-domain Corpora

Slide 77

Slide 77 text

Results: Out-of-vocab (OOV) samples ‣ Subset of probes containing OOV ‣ OOV in source (7.4%), in reference (3.2%), in both (1.9%) ‣ Assumption: Model will not translate sentences with OOV well ‣ → Like OOD cases, much better results than entire probe set !48 All OOV in src OOV in ref OOV in both 50.4 73.9 74.1 68.0 Attack Accuracy of OOV subsets

Slide 78

Slide 78 text

Why was it not successful for seq2seq? ‣ Why successful with “flat classification”, but not with seq2seq? ‣ One possible reason: model output space difference ‣ "Fixed set of labels” or “arbitrary length sequence”:   Latter if far more complex ‣ In “flat classification” cases it was successful because   the attacker exploits difference in the model output distribution ‣ seq2seq: how can we quantify the uncertainty of model or quality of output? ‣ OOD and OOV: more promising results ‣ Harder for the target model to produce high quality translation   → more distinguishable !49

Slide 79

Slide 79 text

Further attacks & protections: Arms Race ‣ Multiple API attack ‣ Modify (e.g, drop / add word) and   translate same sentence multiple times, observe diﬀerence ‣ “Watermark” sentence ‣ Add characteristic samples to make it more distinguishable ‣ If Bob has a better chance → Protection by Alice ‣ Subsample data for training ‣ Regularization ‣ … !50 W ork in Progress