Differential Privacy in Machine Learning

Agenda › Introduction of Differential Privacy › Differential Privacy in
Machine Learning › Research on Differentially Private Deep Generative Models

Introduction of Differential Privacy

Research on Differential Privacy Differential Privacy Differential Privacy › Differential
Privacy (DP) will be a key technology for privacy at LINE scale › Data Labs have just started R&D about DP

What is Differential Privacy [1] ? “Differential privacy is a
research topic in the area of statistics and data analytics that uses hashing, subsampling and noise injection to enable crowdsourced learning while keeping the data of individual users completely private.” On WWDC2016, Craig Federighi (Apple) said [2], › [1] C. Dwork. Differential privacy. ICALP, 2006. › [2] https://www.wired.com/2016/06/apples-differential-privacy-collecting-data/

Disclosure Avoidance in US Census 2020 [3] 2020 Census results
will be protected using “differential privacy,” the new gold standard in data privacy protection. › [3] https://www.census.gov/about/policies/privacy/statistical_safeguards/disclosure-avoidance-2020-census.html

Privacy for Stats / ML. Privacy by Randomization. Privacy at
Scale.

Privacy by Randomization !" ! non-private #$ % ! =
' #$[% !" = '] ≤ +,- . /: privacy parameter Definition. (.-differential privacy) Randomized algorithm % is /-dp if › (roughly speaking) Outcomes are “approximately-same”, whether or not any individual is included in. randomized (/ = 1) 1 ∼ + noise 1 including Alice excluding Alice Alice differ only 1 tuple

Privacy by Randomization !"#$ %. '. ()' * ()' *
+ , -./0 %. '. , ()' * + , * = 2 * = , * = ∞ 0.5 0.5 1.0 0.0 0.73 0.27 Random Guess Non-Private › (roughly speaking) Outcomes are “approximately-same”, whether or not any individual is included in. 04 5 6 = 7 04[5 69 = 7] ≤ ()' * <: privacy parameter Definition. (*-differential privacy) Randomized algorithm 5 is <-dp if

Privacy at Scale N=10,000 N=10,000,000 Frequency estimations for synthetic datasets
including 100 sorts of items.

Towards Trustworthy Data Platform › Deep sensitive questionnaire and telemetry
› With randomization mechanisms of differential privacy › For improving satisfactions of our services Deep Census of Our Users, that we have never reached, while preserving privacy Sharing Data, Stats, and AIs while preserving privacy › Knowledge circulation across our services › Share our insights with trusted partners › Make AIs robust against real (adversarial) environment

Differential Privacy in Machine Learning

How to make ML differentially private › Learn a model
with injecting noise (DP-SGD [4] is well-known framework) › Model parameters satisfy differential privacy è parameters are sharable without privacy concerns › Use-case: Sharing ML models Learn Randomly & Respond as Usual Learn as Usual & Respond Randomly › Learn a model on a raw dataset › Introduce a random responding mechanism › Use-case: MLaaS (put a model on secure location and access via API) › [4] M. Abadi et al. Deep learning with differential privacy. ACM CCS, 2016.

DP-SGD [4] Differentially Private Stochastic Gradient Descent Random Sampling Compute
Loss ℒ Compute Gradient ∇# ℒ Add Noise Update Params $ › DP-SGD makes a gradient differentially private, and hence model parameters are also dp. › Privacy consumption at an iteration is derived with % and & w. p. % = (/* * training samples noise scaler & › [4] M. Abadi et al. Deep learning with differential privacy. ACM CCS, 2016.

How much noise should we inject? Noise scale follows “sensitivity”
of gradient › Sensitivity: the maximal change of a function’s output when changing a sample in batch (or DB) › Ex. Counting: 1, Histogram: 2 Gradient’s sensitivity is intractable, thus CLIPPING! › DP-SGD employs clipping !" norm of the gradient by a constant # › Sample a noise from Gaussian whose variance is #$ " to craft a randomized gradient

Learning a model with DP-SGD Differentially Private Stochastic Gradient Descent
Random Sampling ! training samples ̅ #$ = #$ +'$ ̅ #( = #( +'( ̅ #) = #) +') Privacy Accounting *$ *$ *( *$ *( *+ *),$ *) … noise '- is sampled from the Gaussian whose mean = 0 and var = ./ ( learning process is stopped when the privacy budget is exhausted … … … This illustration is the simplest accounting way.

Joint work w/ S. Takagi† (intern 2019), Y. Cao† and
M. Yoshikawa† †: Kyoto University This work has been accepted at ICDE 2021. Differentially Private Deep Generative Model

Data Sharing via Privacy-preserving Generative Model Sensitive Data Division A
Division B Synthesized Data › We have developed differentially private deep generative models for sharing sensitive data while preserving privacy of individuals Differential Privacy Differential Privacy Generative model

Data Sharing via Privacy-preserving Generative Model Sensitive Data Division A
Division B Synthesized Data › We focus on VAE (Variational Autoencoder) based approach › To employ high representation capability of neural network › GANs are not easy to converge and have mode collapse issues Differential Privacy Differential Privacy ENC DEC DEC Random Seeds

Preview: Synthesized Data MNIST VAE+DP-SGD DP-GM [5] Ours All models
are built under differential privacy constraints (! = #). › [5] G. Acs et al. Differentially private mixture of generative neural networks. TKDE, 2018.

Our Contributions Bayesian Network GANs VAEs Ours Noise robustness for
High Dim. Data º º ˓ ˓ Preserving Data Distribution ˓ º º ˓ This comparison is under differentially private constraint.

Difficulty of learning dp-generative model Our Understandings › Injected noise
makes it difficult to learn multiple tasks simultaneously (i.e., both encoding and decoding in end-to-end) › Required to be converged by small epochs due to privacy budget

Learning Behaviors of VAE + DP-SGD Injected noise makes it
difficult to learn encoding and decoding in E2E ways × × × Original Data Domain Original Data Domain Latent Space × × × × × × × × × VAE VAE +DP-SGD

Difficulty of learning dp-generative model › Injected noise makes it
difficult to learn multiple tasks simultaneously (i.e., both encoding and decoding in end-to-end) OUR SOLUTION: Simplified Two-Phase Algorithm › Required to converge small epochs due to privacy consumptions

P3GM: Privacy Preserving Phased Generative Model × Original Data Domain
(X) • We assume, to fit easily, the prior is the distribution of the (compressed) training data. • We use the mixture of Gaussians estimated from the training data by DP-EM [7] algorithm. × DP-PCA [6] × × × × Original Data Domain (X) Compressed X Phase 1 Phase 2 The coordinates in the latent space are fixed after Phase 1. › [6] W. Jiang et al. Wishart mechanism for differentially private principal components analysis. AAAI, 2016. › [7] M. Park et al. DP-EM: Differentially private expectation maximization. AISTATS, 2017.

Evaluation Sensitive Data Division A Division B Generated Samples ›
How effective can the generated samples be used in data mining tasks? › How efficient in constructing a differentially private model? › How robust against injected noise for satisfying differential privacy? Differential Privacy Differential Privacy ENC DEC DEC Random Seeds

Classification on Synthesized Data Our solution demonstrates high utility in
several classification tasks. Dataset AUROC AUPRC PrivBayes Ryan’s DP-GM P3GM original PrivBayes Ryan’s DP-GM P3GM origin Kaggle Credit 0.5520 0.5326 0.8805 0.9232 0.9663 0.2084 0.2503 0.3301 0.5208 0.892 UCI ESR 0.5377 0.5757 0.4911 0.8243 0.8698 0.5419 0.4265 0.3311 0.7559 0.809 Adult 0.8530 0.5048 0.7806 0.8321 0.9119 0.6374 0.2584 0.4502 0.5917 0.784 UCI ISOLET 0.5100 0.5326 0.4695 0.6855 0.9891 0.2084 0.2099 0.1816 0.3287 0.962 TABLE VII: Classification accuracies on image datasets. Dataset VAE DP-GM PrivBayes Ryan’s P3GM MNIST 0.8571 0.4973 0.0970 0.2385 0.7946 Fashion 0.7854 0.5200 0.0996 0.2408 0.7311 (a) AUROC (b) AUPRC Fig. 4: Utility in fraud detection (Kaggle Credit). Fig. 5: Reducing dimension improves accuracy (MNIST). Fig. 6: Only P3GM ca high-dimensionality. Too much small dimensionality lacks the expressiv for embedding. From the result, d p = [10, 100] look solution with balancing the accuracy and the dimen reduction on the MNIST dataset. TABLE VI: Performance comparison on four real datasets. Each score is the average AUROC or AUPRC over four classifiers listed in Table V. P3GM outperforms other two differentially private models on three datasets. Dataset AUROC AUPRC PrivBayes Ryan’s DP-GM P3GM original PrivBayes Ryan’s DP-GM P3GM original Kaggle Credit 0.5520 0.5326 0.8805 0.9232 0.9663 0.2084 0.2503 0.3301 0.5208 0.8927 UCI ESR 0.5377 0.5757 0.4911 0.8243 0.8698 0.5419 0.4265 0.3311 0.7559 0.8098 Adult 0.8530 0.5048 0.7806 0.8321 0.9119 0.6374 0.2584 0.4502 0.5917 0.7844 UCI ISOLET 0.5100 0.5326 0.4695 0.6855 0.9891 0.2084 0.2099 0.1816 0.3287 0.9623 TABLE VII: Classification accuracies on image datasets. Dataset VAE DP-GM PrivBayes Ryan’s P3GM Classification Accuracy on image datasets AUROC/AUPRC on table datasets All models are built under differential privacy constraints (! = #).

Learning Efficiency MNIST). (b) Reconstruction loss (Kaggle Credit) (c) Classification
accuracy (MNIST). (d strates higher learning efficiency than DP-VAE. More simple model increases m (a) Reconstruction loss (MNIST). (b) Reconstruction loss (Kaggle Credit) (c) Cl Fig. 7: P3GM demonstrates higher learning efficiency than DP-VAE. Reconstruction loss (MNIST) Classification Accuracy (MNIST)

Robustness against Noise MNIST 0.8571 0.4973 0.0970 Fashion 0.7854 0.5200
0.0996 (a) AUROC Fig. 4: Utility in fraud detectio 3 0.0970 0.2385 0.7946 0 0.0996 0.2408 0.7311 (b) AUPRC ud detection (Kaggle Credit). Fig. 5: Reducing dimension improves accuracy (MNIST). Fig. 6: O high-dim Too much small dimensionality lacks for embedding. From the result, d p = [ solution with balancing the accuracy an reduction on the MNIST dataset. B. Learning Efﬁciency AUROC varying ε (MNIST) AUPRC varying ε (MNIST)

Conclusion & Take Home Away › Made robust against noise
injections by simple two-phased learning algorithm › Outperform existing methods in terms of utility of synthesized data › Our paper and code is public on arXiv and GitHub, respectively. › [Paper] https://arxiv.org/abs/2006.12101 › [Code] https://github.com/tsubasat/P3GM Introduced research achievements about Differentially Private Generative Model Differential Privacy is “Privacy by Randomization” and “Privacy at Scale” › Differential privacy has been utilized for gathering stats and sharing ML outcomes › DP-SGD is a standard framework to make a machine learning differentially private

References › [1] C. Dwork. Differential privacy. ICALP, 2006. ›
[2] https://www.wired.com/2016/06/apples-differential-privacy-collecting-data/ › [3] https://www.census.gov/about/policies/privacy/statistical_safeguards/disclosure-avoidance- 2020-census.html › [4] M. Abadi et al. Deep learning with differential privacy. ACM CCS, 2016. › [5] G. Acs et al. Differentially private mixture of generative neural networks. TKDE, 2018. › [6] W. Jiang et al. Wishart mechanism for differentially private principal components analysis. AAAI, 2016. › [7] M. Park et al. DP-EM: Differentially private expectation maximization. AISTATS, 2017.

Thank you

Differential Privacy in Machine Learning

Differential Privacy in Machine Learning

LINE DevDay 2020

More Decks by LINE DevDay 2020

Other Decks in Technology

Featured

Transcript

Agenda › Introduction of Differential Privacy › Differential Privacy in

Introduction of Differential Privacy

Research on Differential Privacy Differential Privacy Differential Privacy › Differential

What is Differential Privacy [1] ? “Differential privacy is a

Disclosure Avoidance in US Census 2020 [3] 2020 Census results

Privacy for Stats / ML. Privacy by Randomization. Privacy at

Privacy by Randomization !" ! non-private #$ % ! =

Privacy by Randomization !"#$ %. '. ()' * ()' *

Privacy at Scale N=10,000 N=10,000,000 Frequency estimations for synthetic datasets

Towards Trustworthy Data Platform › Deep sensitive questionnaire and telemetry

Differential Privacy in Machine Learning

How to make ML differentially private › Learn a model

DP-SGD [4] Differentially Private Stochastic Gradient Descent Random Sampling Compute

How much noise should we inject? Noise scale follows “sensitivity”

Learning a model with DP-SGD Differentially Private Stochastic Gradient Descent

Joint work w/ S. Takagi† (intern 2019), Y. Cao† and

Data Sharing via Privacy-preserving Generative Model Sensitive Data Division A

Data Sharing via Privacy-preserving Generative Model Sensitive Data Division A

Preview: Synthesized Data MNIST VAE+DP-SGD DP-GM [5] Ours All models

Our Contributions Bayesian Network GANs VAEs Ours Noise robustness for

Difficulty of learning dp-generative model Our Understandings › Injected noise

Learning Behaviors of VAE + DP-SGD Injected noise makes it

Difficulty of learning dp-generative model › Injected noise makes it

P3GM: Privacy Preserving Phased Generative Model × Original Data Domain

Evaluation Sensitive Data Division A Division B Generated Samples ›

Classification on Synthesized Data Our solution demonstrates high utility in

Learning Efficiency MNIST). (b) Reconstruction loss (Kaggle Credit) (c) Classiﬁcation

Robustness against Noise MNIST 0.8571 0.4973 0.0970 Fashion 0.7854 0.5200

Conclusion & Take Home Away › Made robust against noise

References › [1] C. Dwork. Differential privacy. ICALP, 2006. ›

Thank you