Evaluating Differentially Private Machine Learning in Practice

40e37c08199ed4d3866ce6e1ff0be06d?s=47 David Evans
August 16, 2019

Evaluating Differentially Private Machine Learning in Practice

Bargav Jayaraman's talk at USENIX Security Symposium 2019.
Paper: https://arxiv.org/abs/1902.08874

40e37c08199ed4d3866ce6e1ff0be06d?s=128

David Evans

August 16, 2019
Tweet

Transcript

  1. Evaluating Differentially Private Machine Learning in Practice Bargav Jayaraman and

    David Evans Department of Computer Science University of Virginia
  2. Data Machine Learning M Our Objective To evaluate the privacy

    leakage of private mechanisms Leakage is quantified in terms of inference attacks
  3. Result Highlights Privacy Budget ϵ Accuracy Loss Privacy Leakage 0.00

    0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 0.01 0.05 0.1 0.5 1 5 10 50 100 500 1000 Theoretical Guarantee RDP RDP Leakage Naïve Composition (NC) NC Leakage Neural Network on CIFAR-100
  4. Rest of the Talk 1. Background on Applying Differential Privacy

    to Machine Learning 2. Experimental Evaluation of Differentially Private Machine Learning Implementations
  5. Applying DP to Machine Learning Data Machine Learning Define Objective

    Function Iterate for T epochs: Calculate Gradients Update Model M Gradient Perturbation Objective Perturbation Output Perturbation
  6. [CM09] ERM Algorithms using 2006 2008 2010 2012 2014 2016

    2018 [D06] [DMNS06] [CMS11] [PRR10] [ZZXYW12] [JT13] [JT14] [WFWJN15] [HCB16] ϵ = 0.2 ϵ = 0.8 ϵ = 0.5 ϵ = 0.1 ϵ = 1 ϵ = 0.2 [WLKCJN17] ϵ = 0.05 ϵ ≤ 1 DP Introduced Objective Perturbation Output Perturbation ϵ = 0.2 ϵ = 0.2
  7. Applying DP to Deep Learning Data Machine Learning Define Objective

    Function Iterate for T epochs: Calculate Gradients Update Model M Gradient Perturbation Objective Perturbation Output Perturbation
  8. Deep Learning requiring high value [SS15] [ZZWCWZ18] ϵ = 100

    ϵ = 369,200 ϵ 2006 2008 2010 2012 2014 2016 2018 [D06] [DMNS06] [PRR10] [ZZXYW12] [JT14] [HCB16] ϵ = 0.2 ϵ = 0.8 ϵ = 0.5 ϵ = 0.1 ϵ = 1 ϵ = 0.2 ϵ = 0.05 ϵ = 0.2 ϵ = 0.2 [CM09] [CMS11] [JT13] [WFWJN15] [WLKCJN17]
  9. Improving Composition If each iteration is -DP By composition, model:

    -DP ϵ Tϵ Model is: (O( Tϵ), δ)-DP Concentrated DP Zero Concentrated DP Rènyi DP Moments Accountant [Dwork et al. (2016)] [Bun & Steinke (2016)] [Abadi et al. (2016)] [Mironov (2017)] Data Machine Learning Define Objective Function Iterate for T epochs: Calculate Gradients Update Model M Gradient Perturbation
  10. [ACGMMTZ16] [PAEGT16] ϵ = 8 ϵ = 8 Lower value

    with recent DP notions [BDFKR18] [HCS18] [YLPGT19] [GKN17] ϵ = 3 ϵ = 4 ϵ = 21.5 ϵ [SS15] [ZZWCWZ18] ϵ = 100 ϵ = 369,200 2006 2008 2010 2012 2014 2016 2018 [D06] [DMNS06] [PRR10] [ZZXYW12] [JT14] [HCB16] ϵ = 0.2 ϵ = 0.8 ϵ = 0.5 ϵ = 0.1 ϵ = 1 ϵ = 0.2 ϵ = 0.05 ϵ = 0.2 ϵ = 0.2 [CM09] [CMS11] [JT13] [WFWJN15] [WLKCJN17] ϵ = 8 Privacy Budget ϵ Privacy Leakage Accuracy Loss 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 0.01 0.05 0.1 0.5 1 5 10 50 100 500 1000 RDP Acc Loss NC Acc Loss NC Leakage RDP Leakage Theoretical Guarantee ϵ = 3
  11. Experiments Task Model Evaluation Metric Logistic Regression Neural Network 100

    class classification on CIFAR-100 100 class classification on Purchase-100 Accuracy Loss Privacy Leakage Code Available: https://github.com/bargavj/EvaluatingDPML
  12. Training and Testing M Machine Learning Training Set Test Set

    Accuracy Loss Data Set ( 1 − Accuracy of Private Model Accuracy of Non-Private Model)
  13. 0.00 0.25 0.50 0.75 1.00 0.01 0.05 0.1 0.5 1

    5 10 50 100 500 1000 NC zCDP RDP RDP has 0.10 accuracy loss at = 10 and NC at = 500 Privacy Budget ϵ Accuracy Loss ϵ ϵ Logistic Regression on CIFAR-100
  14. Membership Inference Attacks Predict Membership Data Set M (TPR −

    FPR) M1 M2 Mk A Expected Training Loss 1 n n ∑ i=1 ℓ(di , θ) Reza Shokri, Marco Stronati, Congzheng Song, Vitaly Shmatikov (S&P 2017) Samuel Yeom, Irene Giacomelli, Matt Fredrikson, Somesh Jha (CSF 2018) Privacy Leakage
  15. 0.00 0.03 0.06 0.09 0.12 0.01 0.05 0.1 0.5 1

    5 10 50 100 500 1000 NC zCDP RDP Privacy Budget ϵ Privacy Leakage Theoretical Guarantee 0.55 PPV Non-private model has 
 0.12 leakage with 0.56 PPV RDP has 0.06 leakage at = 10 and NC at = 500 ϵ ϵ Logistic Regression on CIFAR-100
  16. Neural Networks NN has 103,936 trainable parameters so it has

    more capacity to learn on training data Input Layer Hidden Layer 1 Hidden Layer 2 Output Layer 50 Neurons 256 Neurons 100 Neurons 256 Neurons
  17. 0.00 0.25 0.50 0.75 1.00 0.01 0.05 0.1 0.5 1

    5 10 50 100 500 1000 NC zCDP RDP Privacy Budget ϵ Accuracy Loss RDP has 0.53 accuracy loss at = 10 and NC at = 500 ϵ ϵ Neural Network on CIFAR-100
  18. 0.00 0.13 0.25 0.38 0.50 0.01 0.05 0.1 0.5 1

    5 10 50 100 500 1000 0.71 PPV 0.74 PPV 0.71 PPV NC zCDP RDP Privacy Budget ϵ Privacy Leakage Theoretical Guarantee Non-private model has 
 0.72 leakage with 0.94 PPV Neural Network on CIFAR-100 RDP has 0.07 leakage at = 10 and NC at = 500 ϵ ϵ
  19. Run 1 Run 2 4370 1080 FP (2126) TP (6150)

    FP (2157) TP (6156) 0.74 PPV 0.74 PPV 0.80 PPV New results, see updated paper in arXiv
  20. 0.00 0.25 0.50 0.75 1.00 >= 0 >= 1 >=

    2 >= 3 >= 4 = 5 Number of times identified as member (out of 5 runs) True Members Non Members 0.822 PPV 0.817 PPV 0.797 PPV 0.749 PPV 0.656 PPV 0.500 PPV Fraction of Data Set Random, Independent Predictions
  21. 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00

    0.01 0.05 0.1 0.5 1 5 10 50 100 500 1000 Privacy Budget ϵ Accuracy Loss Privacy Leakage Theoretical Guarantee RDP Acc Loss RDP Leakage NC Acc Loss Conclusion Non-private model has 
 0.12 leakage with 0.56 PPV 0.55 PPV There is privacy leakage, but not considerable, even for non-private model Logistic Regression on CIFAR-100 NC Leakage
  22. 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00

    0.01 0.05 0.1 0.5 1 5 10 50 100 500 1000 Privacy Budget ϵ Accuracy Loss Privacy Leakage Theoretical Guarantee RDP Acc Loss RDP Leakage NC Acc Loss NC Leakage Bridging the gap between theoretical bound on leakage and the leakage of practical attacks Conclusion Neural Network on CIFAR-100 Non-private model has 
 0.72 leakage with 0.94 PPV 0.74 PPV Privacy doesn’t come for free
  23. 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00

    0.01 0.05 0.1 0.5 1 5 10 50 100 500 1000 Privacy Budget ϵ Accuracy Loss Privacy Leakage Theoretical Guarantee RDP Acc Loss RDP Leakage NC Acc Loss Bridging the gap between theoretical bound on leakage and the leakage of practical attacks Conclusion Questions? Thank You! Bargav Jayaraman bj4nq@virginia.edu Privacy doesn’t come for free NC Leakage https://github.com/bargavj/EvaluatingDPML