Evaluating Differentially Private Machine Learning in Practice

Evaluating Differentially Private Machine Learning in Practice Bargav Jayaraman and
David Evans Department of Computer Science University of Virginia

Data Machine Learning M Our Objective To evaluate the privacy
leakage of private mechanisms Leakage is quantiﬁed in terms of inference attacks

Result Highlights Privacy Budget ϵ Accuracy Loss Privacy Leakage 0.00
0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 0.01 0.05 0.1 0.5 1 5 10 50 100 500 1000 Theoretical Guarantee RDP RDP Leakage Naïve Composition (NC) NC Leakage Neural Network on CIFAR-100

Rest of the Talk 1. Background on Applying Differential Privacy
to Machine Learning 2. Experimental Evaluation of Differentially Private Machine Learning Implementations

Applying DP to Machine Learning Data Machine Learning Deﬁne Objective
Function Iterate for T epochs: Calculate Gradients Update Model M Gradient Perturbation Objective Perturbation Output Perturbation

[CM09] ERM Algorithms using 2006 2008 2010 2012 2014 2016
2018 [D06] [DMNS06] [CMS11] [PRR10] [ZZXYW12] [JT13] [JT14] [WFWJN15] [HCB16] ϵ = 0.2 ϵ = 0.8 ϵ = 0.5 ϵ = 0.1 ϵ = 1 ϵ = 0.2 [WLKCJN17] ϵ = 0.05 ϵ ≤ 1 DP Introduced Objective Perturbation Output Perturbation ϵ = 0.2 ϵ = 0.2

Applying DP to Deep Learning Data Machine Learning Deﬁne Objective
Function Iterate for T epochs: Calculate Gradients Update Model M Gradient Perturbation Objective Perturbation Output Perturbation

Deep Learning requiring high value [SS15] [ZZWCWZ18] ϵ = 100
ϵ = 369,200 ϵ 2006 2008 2010 2012 2014 2016 2018 [D06] [DMNS06] [PRR10] [ZZXYW12] [JT14] [HCB16] ϵ = 0.2 ϵ = 0.8 ϵ = 0.5 ϵ = 0.1 ϵ = 1 ϵ = 0.2 ϵ = 0.05 ϵ = 0.2 ϵ = 0.2 [CM09] [CMS11] [JT13] [WFWJN15] [WLKCJN17]

Improving Composition If each iteration is -DP By composition, model:
-DP ϵ Tϵ Model is: (O( Tϵ), δ)-DP Concentrated DP Zero Concentrated DP Rènyi DP Moments Accountant [Dwork et al. (2016)] [Bun & Steinke (2016)] [Abadi et al. (2016)] [Mironov (2017)] Data Machine Learning Deﬁne Objective Function Iterate for T epochs: Calculate Gradients Update Model M Gradient Perturbation

[ACGMMTZ16] [PAEGT16] ϵ = 8 ϵ = 8 Lower value
with recent DP notions [BDFKR18] [HCS18] [YLPGT19] [GKN17] ϵ = 3 ϵ = 4 ϵ = 21.5 ϵ [SS15] [ZZWCWZ18] ϵ = 100 ϵ = 369,200 2006 2008 2010 2012 2014 2016 2018 [D06] [DMNS06] [PRR10] [ZZXYW12] [JT14] [HCB16] ϵ = 0.2 ϵ = 0.8 ϵ = 0.5 ϵ = 0.1 ϵ = 1 ϵ = 0.2 ϵ = 0.05 ϵ = 0.2 ϵ = 0.2 [CM09] [CMS11] [JT13] [WFWJN15] [WLKCJN17] ϵ = 8 Privacy Budget ϵ Privacy Leakage Accuracy Loss 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 0.01 0.05 0.1 0.5 1 5 10 50 100 500 1000 RDP Acc Loss NC Acc Loss NC Leakage RDP Leakage Theoretical Guarantee ϵ = 3

Experiments Task Model Evaluation Metric Logistic Regression Neural Network 100
class classiﬁcation on CIFAR-100 100 class classiﬁcation on Purchase-100 Accuracy Loss Privacy Leakage Code Available: https://github.com/bargavj/EvaluatingDPML

Training and Testing M Machine Learning Training Set Test Set
Accuracy Loss Data Set ( 1 − Accuracy of Private Model Accuracy of Non-Private Model)

0.00 0.25 0.50 0.75 1.00 0.01 0.05 0.1 0.5 1
5 10 50 100 500 1000 NC zCDP RDP RDP has 0.10 accuracy loss at = 10 and NC at = 500 Privacy Budget ϵ Accuracy Loss ϵ ϵ Logistic Regression on CIFAR-100

Membership Inference Attacks Predict Membership Data Set M (TPR −
FPR) M1 M2 Mk A Expected Training Loss 1 n n ∑ i=1 ℓ(di , θ) Reza Shokri, Marco Stronati, Congzheng Song, Vitaly Shmatikov (S&P 2017) Samuel Yeom, Irene Giacomelli, Matt Fredrikson, Somesh Jha (CSF 2018) Privacy Leakage

0.00 0.03 0.06 0.09 0.12 0.01 0.05 0.1 0.5 1
5 10 50 100 500 1000 NC zCDP RDP Privacy Budget ϵ Privacy Leakage Theoretical Guarantee 0.55 PPV Non-private model has   0.12 leakage with 0.56 PPV RDP has 0.06 leakage at = 10 and NC at = 500 ϵ ϵ Logistic Regression on CIFAR-100

Neural Networks NN has 103,936 trainable parameters so it has
more capacity to learn on training data Input Layer Hidden Layer 1 Hidden Layer 2 Output Layer 50 Neurons 256 Neurons 100 Neurons 256 Neurons

0.00 0.25 0.50 0.75 1.00 0.01 0.05 0.1 0.5 1
5 10 50 100 500 1000 NC zCDP RDP Privacy Budget ϵ Accuracy Loss RDP has 0.53 accuracy loss at = 10 and NC at = 500 ϵ ϵ Neural Network on CIFAR-100

0.00 0.13 0.25 0.38 0.50 0.01 0.05 0.1 0.5 1
5 10 50 100 500 1000 0.71 PPV 0.74 PPV 0.71 PPV NC zCDP RDP Privacy Budget ϵ Privacy Leakage Theoretical Guarantee Non-private model has   0.72 leakage with 0.94 PPV Neural Network on CIFAR-100 RDP has 0.07 leakage at = 10 and NC at = 500 ϵ ϵ

Run 1 Run 2 4370 1080 FP (2126) TP (6150)
FP (2157) TP (6156) 0.74 PPV 0.74 PPV 0.80 PPV New results, see updated paper in arXiv

0.00 0.25 0.50 0.75 1.00 >= 0 >= 1 >=
2 >= 3 >= 4 = 5 Number of times identiﬁed as member (out of 5 runs) True Members Non Members 0.822 PPV 0.817 PPV 0.797 PPV 0.749 PPV 0.656 PPV 0.500 PPV Fraction of Data Set Random, Independent Predictions

0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00
0.01 0.05 0.1 0.5 1 5 10 50 100 500 1000 Privacy Budget ϵ Accuracy Loss Privacy Leakage Theoretical Guarantee RDP Acc Loss RDP Leakage NC Acc Loss Conclusion Non-private model has   0.12 leakage with 0.56 PPV 0.55 PPV There is privacy leakage, but not considerable, even for non-private model Logistic Regression on CIFAR-100 NC Leakage

0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00
0.01 0.05 0.1 0.5 1 5 10 50 100 500 1000 Privacy Budget ϵ Accuracy Loss Privacy Leakage Theoretical Guarantee RDP Acc Loss RDP Leakage NC Acc Loss NC Leakage Bridging the gap between theoretical bound on leakage and the leakage of practical attacks Conclusion Neural Network on CIFAR-100 Non-private model has   0.72 leakage with 0.94 PPV 0.74 PPV Privacy doesn’t come for free

0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00
0.01 0.05 0.1 0.5 1 5 10 50 100 500 1000 Privacy Budget ϵ Accuracy Loss Privacy Leakage Theoretical Guarantee RDP Acc Loss RDP Leakage NC Acc Loss Bridging the gap between theoretical bound on leakage and the leakage of practical attacks Conclusion Questions? Thank You! Bargav Jayaraman [email protected] Privacy doesn’t come for free NC Leakage https://github.com/bargavj/EvaluatingDPML

Evaluating Differentially Private Machine Learn...

Evaluating Differentially Private Machine Learning in Practice

David Evans

More Decks by David Evans

Other Decks in Research

Featured

Transcript

Evaluating Differentially Private Machine Learning in Practice Bargav Jayaraman and

Data Machine Learning M Our Objective To evaluate the privacy

Result Highlights Privacy Budget ϵ Accuracy Loss Privacy Leakage 0.00

Rest of the Talk 1. Background on Applying Differential Privacy

Applying DP to Machine Learning Data Machine Learning Deﬁne Objective

[CM09] ERM Algorithms using 2006 2008 2010 2012 2014 2016

Applying DP to Deep Learning Data Machine Learning Deﬁne Objective

Deep Learning requiring high value [SS15] [ZZWCWZ18] ϵ = 100

Improving Composition If each iteration is -DP By composition, model:

[ACGMMTZ16] [PAEGT16] ϵ = 8 ϵ = 8 Lower value

Experiments Task Model Evaluation Metric Logistic Regression Neural Network 100

Training and Testing M Machine Learning Training Set Test Set

0.00 0.25 0.50 0.75 1.00 0.01 0.05 0.1 0.5 1

Membership Inference Attacks Predict Membership Data Set M (TPR −

0.00 0.03 0.06 0.09 0.12 0.01 0.05 0.1 0.5 1

Neural Networks NN has 103,936 trainable parameters so it has

0.00 0.25 0.50 0.75 1.00 0.01 0.05 0.1 0.5 1

0.00 0.13 0.25 0.38 0.50 0.01 0.05 0.1 0.5 1

Run 1 Run 2 4370 1080 FP (2126) TP (6150)

0.00 0.25 0.50 0.75 1.00 >= 0 >= 1 >=

0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00

0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00

0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00