Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Differential Privacy in Machine Learning

Differential Privacy in Machine Learning

Tsubasa Takahashi
LINE Machine Learning Research Team Senior Research Scientist / Manager
https://linedevday.linecorp.com/2020/ja/sessions/0739
https://linedevday.linecorp.com/2020/en/sessions/0739

Eebedc2ee7ff95ffb9d9102c6d4a065c?s=128

LINE DevDay 2020

November 25, 2020
Tweet

Transcript

  1. None
  2. Agenda › Introduction of Differential Privacy › Differential Privacy in

    Machine Learning › Research on Differentially Private Deep Generative Models
  3. Introduction of Differential Privacy

  4. Research on Differential Privacy Differential Privacy Differential Privacy › Differential

    Privacy (DP) will be a key technology for privacy at LINE scale › Data Labs have just started R&D about DP
  5. What is Differential Privacy [1] ? “Differential privacy is a

    research topic in the area of statistics and data analytics that uses hashing, subsampling and noise injection to enable crowdsourced learning while keeping the data of individual users completely private.” On WWDC2016, Craig Federighi (Apple) said [2], › [1] C. Dwork. Differential privacy. ICALP, 2006. › [2] https://www.wired.com/2016/06/apples-differential-privacy-collecting-data/
  6. Disclosure Avoidance in US Census 2020 [3] 2020 Census results

    will be protected using “differential privacy,” the new gold standard in data privacy protection. › [3] https://www.census.gov/about/policies/privacy/statistical_safeguards/disclosure-avoidance-2020-census.html
  7. Privacy for Stats / ML. Privacy by Randomization. Privacy at

    Scale.
  8. Privacy by Randomization !" ! non-private #$ % ! =

    ' #$[% !" = '] ≤ +,- . /: privacy parameter Definition. (.-differential privacy) Randomized algorithm % is /-dp if › (roughly speaking) Outcomes are “approximately-same”, whether or not any individual is included in. randomized (/ = 1) 1 ∼ + noise 1 including Alice excluding Alice Alice differ only 1 tuple
  9. Privacy by Randomization !"#$ %. '. ()' * ()' *

    + , -./0 %. '. , ()' * + , * = 2 * = , * = ∞ 0.5 0.5 1.0 0.0 0.73 0.27 Random Guess Non-Private › (roughly speaking) Outcomes are “approximately-same”, whether or not any individual is included in. 04 5 6 = 7 04[5 69 = 7] ≤ ()' * <: privacy parameter Definition. (*-differential privacy) Randomized algorithm 5 is <-dp if
  10. Privacy at Scale N=10,000 N=10,000,000 Frequency estimations for synthetic datasets

    including 100 sorts of items.
  11. Towards Trustworthy Data Platform › Deep sensitive questionnaire and telemetry

    › With randomization mechanisms of differential privacy › For improving satisfactions of our services Deep Census of Our Users, that we have never reached, while preserving privacy Sharing Data, Stats, and AIs while preserving privacy › Knowledge circulation across our services › Share our insights with trusted partners › Make AIs robust against real (adversarial) environment
  12. Differential Privacy in Machine Learning

  13. How to make ML differentially private › Learn a model

    with injecting noise (DP-SGD [4] is well-known framework) › Model parameters satisfy differential privacy è parameters are sharable without privacy concerns › Use-case: Sharing ML models Learn Randomly & Respond as Usual Learn as Usual & Respond Randomly › Learn a model on a raw dataset › Introduce a random responding mechanism › Use-case: MLaaS (put a model on secure location and access via API) › [4] M. Abadi et al. Deep learning with differential privacy. ACM CCS, 2016.
  14. DP-SGD [4] Differentially Private Stochastic Gradient Descent Random Sampling Compute

    Loss ℒ Compute Gradient ∇# ℒ Add Noise Update Params $ › DP-SGD makes a gradient differentially private, and hence model parameters are also dp. › Privacy consumption at an iteration is derived with % and & w. p. % = (/* * training samples noise scaler & › [4] M. Abadi et al. Deep learning with differential privacy. ACM CCS, 2016.
  15. How much noise should we inject? Noise scale follows “sensitivity”

    of gradient › Sensitivity: the maximal change of a function’s output when changing a sample in batch (or DB) › Ex. Counting: 1, Histogram: 2 Gradient’s sensitivity is intractable, thus CLIPPING! › DP-SGD employs clipping !" norm of the gradient by a constant # › Sample a noise from Gaussian whose variance is #$ " to craft a randomized gradient
  16. Learning a model with DP-SGD Differentially Private Stochastic Gradient Descent

    Random Sampling ! training samples ̅ #$ = #$ +'$ ̅ #( = #( +'( ̅ #) = #) +') Privacy Accounting *$ *$ *( *$ *( *+ *),$ *) … noise '- is sampled from the Gaussian whose mean = 0 and var = ./ ( learning process is stopped when the privacy budget is exhausted … … … This illustration is the simplest accounting way.
  17. Joint work w/ S. Takagi† (intern 2019), Y. Cao† and

    M. Yoshikawa† †: Kyoto University This work has been accepted at ICDE 2021. Differentially Private Deep Generative Model
  18. Data Sharing via Privacy-preserving Generative Model Sensitive Data Division A

    Division B Synthesized Data › We have developed differentially private deep generative models for sharing sensitive data while preserving privacy of individuals Differential Privacy Differential Privacy Generative model
  19. Data Sharing via Privacy-preserving Generative Model Sensitive Data Division A

    Division B Synthesized Data › We focus on VAE (Variational Autoencoder) based approach › To employ high representation capability of neural network › GANs are not easy to converge and have mode collapse issues Differential Privacy Differential Privacy ENC DEC DEC Random Seeds
  20. Preview: Synthesized Data MNIST VAE+DP-SGD DP-GM [5] Ours All models

    are built under differential privacy constraints (! = #). › [5] G. Acs et al. Differentially private mixture of generative neural networks. TKDE, 2018.
  21. Our Contributions Bayesian Network GANs VAEs Ours Noise robustness for

    High Dim. Data º º ˓ ˓ Preserving Data Distribution ˓ º º ˓ This comparison is under differentially private constraint.
  22. Difficulty of learning dp-generative model Our Understandings › Injected noise

    makes it difficult to learn multiple tasks simultaneously (i.e., both encoding and decoding in end-to-end) › Required to be converged by small epochs due to privacy budget
  23. Learning Behaviors of VAE + DP-SGD Injected noise makes it

    difficult to learn encoding and decoding in E2E ways × × × Original Data Domain Original Data Domain Latent Space × × × × × × × × × VAE VAE +DP-SGD
  24. Difficulty of learning dp-generative model › Injected noise makes it

    difficult to learn multiple tasks simultaneously (i.e., both encoding and decoding in end-to-end) OUR SOLUTION: Simplified Two-Phase Algorithm › Required to converge small epochs due to privacy consumptions
  25. P3GM: Privacy Preserving Phased Generative Model × Original Data Domain

    (X) • We assume, to fit easily, the prior is the distribution of the (compressed) training data. • We use the mixture of Gaussians estimated from the training data by DP-EM [7] algorithm. × DP-PCA [6] × × × × Original Data Domain (X) Compressed X Phase 1 Phase 2 The coordinates in the latent space are fixed after Phase 1. › [6] W. Jiang et al. Wishart mechanism for differentially private principal components analysis. AAAI, 2016. › [7] M. Park et al. DP-EM: Differentially private expectation maximization. AISTATS, 2017.
  26. Evaluation Sensitive Data Division A Division B Generated Samples ›

    How effective can the generated samples be used in data mining tasks? › How efficient in constructing a differentially private model? › How robust against injected noise for satisfying differential privacy? Differential Privacy Differential Privacy ENC DEC DEC Random Seeds
  27. Classification on Synthesized Data Our solution demonstrates high utility in

    several classification tasks. Dataset AUROC AUPRC PrivBayes Ryan’s DP-GM P3GM original PrivBayes Ryan’s DP-GM P3GM origin Kaggle Credit 0.5520 0.5326 0.8805 0.9232 0.9663 0.2084 0.2503 0.3301 0.5208 0.892 UCI ESR 0.5377 0.5757 0.4911 0.8243 0.8698 0.5419 0.4265 0.3311 0.7559 0.809 Adult 0.8530 0.5048 0.7806 0.8321 0.9119 0.6374 0.2584 0.4502 0.5917 0.784 UCI ISOLET 0.5100 0.5326 0.4695 0.6855 0.9891 0.2084 0.2099 0.1816 0.3287 0.962 TABLE VII: Classification accuracies on image datasets. Dataset VAE DP-GM PrivBayes Ryan’s P3GM MNIST 0.8571 0.4973 0.0970 0.2385 0.7946 Fashion 0.7854 0.5200 0.0996 0.2408 0.7311 (a) AUROC (b) AUPRC Fig. 4: Utility in fraud detection (Kaggle Credit). Fig. 5: Reducing dimension improves accuracy (MNIST). Fig. 6: Only P3GM ca high-dimensionality. Too much small dimensionality lacks the expressiv for embedding. From the result, d p = [10, 100] look solution with balancing the accuracy and the dimen reduction on the MNIST dataset. TABLE VI: Performance comparison on four real datasets. Each score is the average AUROC or AUPRC over four classifiers listed in Table V. P3GM outperforms other two differentially private models on three datasets. Dataset AUROC AUPRC PrivBayes Ryan’s DP-GM P3GM original PrivBayes Ryan’s DP-GM P3GM original Kaggle Credit 0.5520 0.5326 0.8805 0.9232 0.9663 0.2084 0.2503 0.3301 0.5208 0.8927 UCI ESR 0.5377 0.5757 0.4911 0.8243 0.8698 0.5419 0.4265 0.3311 0.7559 0.8098 Adult 0.8530 0.5048 0.7806 0.8321 0.9119 0.6374 0.2584 0.4502 0.5917 0.7844 UCI ISOLET 0.5100 0.5326 0.4695 0.6855 0.9891 0.2084 0.2099 0.1816 0.3287 0.9623 TABLE VII: Classification accuracies on image datasets. Dataset VAE DP-GM PrivBayes Ryan’s P3GM Classification Accuracy on image datasets AUROC/AUPRC on table datasets All models are built under differential privacy constraints (! = #).
  28. Learning Efficiency MNIST). (b) Reconstruction loss (Kaggle Credit) (c) Classification

    accuracy (MNIST). (d strates higher learning efficiency than DP-VAE. More simple model increases m (a) Reconstruction loss (MNIST). (b) Reconstruction loss (Kaggle Credit) (c) Cl Fig. 7: P3GM demonstrates higher learning efficiency than DP-VAE. Reconstruction loss (MNIST) Classification Accuracy (MNIST)
  29. Robustness against Noise MNIST 0.8571 0.4973 0.0970 Fashion 0.7854 0.5200

    0.0996 (a) AUROC Fig. 4: Utility in fraud detectio 3 0.0970 0.2385 0.7946 0 0.0996 0.2408 0.7311 (b) AUPRC ud detection (Kaggle Credit). Fig. 5: Reducing dimension improves accuracy (MNIST). Fig. 6: O high-dim Too much small dimensionality lacks for embedding. From the result, d p = [ solution with balancing the accuracy an reduction on the MNIST dataset. B. Learning Efficiency AUROC varying ε (MNIST) AUPRC varying ε (MNIST)
  30. Conclusion & Take Home Away › Made robust against noise

    injections by simple two-phased learning algorithm › Outperform existing methods in terms of utility of synthesized data › Our paper and code is public on arXiv and GitHub, respectively. › [Paper] https://arxiv.org/abs/2006.12101 › [Code] https://github.com/tsubasat/P3GM Introduced research achievements about Differentially Private Generative Model Differential Privacy is “Privacy by Randomization” and “Privacy at Scale” › Differential privacy has been utilized for gathering stats and sharing ML outcomes › DP-SGD is a standard framework to make a machine learning differentially private
  31. References › [1] C. Dwork. Differential privacy. ICALP, 2006. ›

    [2] https://www.wired.com/2016/06/apples-differential-privacy-collecting-data/ › [3] https://www.census.gov/about/policies/privacy/statistical_safeguards/disclosure-avoidance- 2020-census.html › [4] M. Abadi et al. Deep learning with differential privacy. ACM CCS, 2016. › [5] G. Acs et al. Differentially private mixture of generative neural networks. TKDE, 2018. › [6] W. Jiang et al. Wishart mechanism for differentially private principal components analysis. AAAI, 2016. › [7] M. Park et al. DP-EM: Differentially private expectation maximization. AISTATS, 2017.
  32. Thank you