Upgrade to Pro — share decks privately, control downloads, hide ads and more …

LINEにおけるプライバシー保護型データ合成の研究事例

 LINEにおけるプライバシー保護型データ合成の研究事例

リュウセンペイ(LINE株式会社)
DEIM2022での技術報告資料です
https://cms.dbsj.org/deim2022/program/?oral#/J33

A3966f193f4bef226a0d3e3c1f728d7f?s=128

LINE Developers
PRO

March 01, 2022
Tweet

More Decks by LINE Developers

Other Decks in Technology

Transcript

  1. PEARL Data synthesis via private embeddings and adversarial reconstruction learning

    ICLR 2022 (to appear) Seng Pei Liew with Tsubasa Takahashi & Michihiko Ueno
  2. Summary in one slide • Sharing data among organizations or

    departments may cause privacy issues (How to mitigate this issue?) • Privacy-preserving data synthesis (PPDS): we train a generative model with di ff erential privacy (rigorous privacy guarantees) and use the model to generate synthetic data for private data sharing purposes • We propose PEARL, a framework to train generative models at practical level of privacy, and overcomes issues encountered in previous works which mainly utilize DP-SGD (to be explained in later parts) Sensitive Data Privately Embedded 1 Privately Embedded 2 Privately Embedded k Aux Synthesized 1 Synthesized 2 Synthesized k Critic Adv. Recon. Learner Generator (1) (2) (3) (4) DP Flow (one-shot) Training Flow … …
  3. Content • Di ff erential Privacy • Di ff erential

    Private Data Synthesis (with generative model) • Training generative models with di ff erential privacy (and general shortcomings) • Proposal: PEARL • Realization of PEARL (embedding, generative model, critic) • Results
  4. Differential Privacy • “An algorithm is di ff erential private

    if changing a single record does not alter its output distribution by much.” [DN03, DMNS06] Sensitive data Statistics Algorithm
  5. Differential Privacy • “An algorithm is di ff erential private

    if changing a single record does not alter its output distribution by much.” [DN03, DMNS06] Sensitive data Statistics Algorithm
  6. Differential Privacy • Two datasets are neighbors if they di

    ff er in the data of a single record. • An algorithm is -di ff erentially private if for all neighboring datasets, , and all outputs, : D, D′  M ϵ D, D′  x • The parameter controls the degree of privacy, often called privacy budget. ϵ Pr[M(D) = x] ≤ eϵPr[M(D′  ) = x] Pr[M(D) = x] ≤ (1 + ϵ)Pr[M(D′  ) = x] Note: at small , we can instead write ϵ
  7. Differentially private data synthesis Sensitive data Algorithm “Fake data” that

    is private and preserves the characteristics of the real data
  8. Differentially private data synthesis Sensitive data Algorithm “Fake data” that

    is private and preserves the characteristics of the real data Allow arbitrary usage without privacy violation (Data scientist) • Training ML models • Exploratory data analysis
  9. Differentially private data synthesis with generative models Sensitive data Algorithm

    Generative Model Learning a private generative model allows the generation of “fake” data
  10. Training deep generative models with differential privacy • The most

    popular method is di ff erential private stochastic gradient descent (DP-SGD) [ACG+16] • DP-SGD ensures that each gradient update is private, which in turn guarantees that the network parameters are private Sensitive data Calculate loss Lθ Compute gradient ∇Lθ Add Gaussian noise Update parameters θ • Accumulate privacy consumption with moments accountant. Clip gradient Generative model Sample a batch of data
  11. Some examples of Differentially private Generative Models Add noise only

    to the gradient of discriminator because the generator has no access to data. Privatize both encoder and decoder. • Refs: [TTC+20], [COF20]
  12. General shortcomings of DP-SGD 1. Training steps are limited. Each

    access of data reduces the guarantees of privacy. 2. Network size is limited. Large neural networks lead to too much noises added to the gradient updates. 3. Extensive hyperparameter (clipping size) tunings are required. Sensitive data Calculate loss Lθ Compute gradient ∇Lθ Add Gaussian noise Update parameters θ Clip gradient Generative model Sample a batch of data 1. Multiple access to data 3. Tuning of clipping size (to bound sensitivity) 2. Noise proportional to network size
  13. Proposal: PEARL Private Embeddings and Adversarial Reconstruction Learning (arXiv: 2106.04590)

    1. Project sensitive data to low-dimensional embeddings and add Gaussian noises to make the embeddings di ff erentially private 2. Obtain auxiliary information useful for training in a di ff erential private manner 3. Train a generator by minimizing the embedding distance 4. Train with an adversarial objective to improve the performance Sensitive Data Privately Embedded 1 Privately Embedded 2 Privately Embedded k Aux Synthesized 1 Synthesized 2 Synthesized k Critic Adv. Recon. Learner Generator (1) (2) (3) (4) DP Flow (one-shot) Training Flow … …
  14. Proposal: PEARL Private Embeddings and Adversarial Reconstruction Learning (arXiv: 2106.04590)

    1. Project sensitive data to low-dimensional embeddings and add Gaussian noises to make the embeddings di ff erentially private 2. Obtain auxiliary information useful for training in a di ff erential private manner 3. Train a generator by minimizing the embedding distance 4. Train with an adversarial objective to improve the performance Sensitive Data Privately Embedded 1 Privately Embedded 2 Privately Embedded k Aux Synthesized 1 Synthesized 2 Synthesized k Critic Adv. Recon. Learner Generator (1) (2) (3) (4) DP Flow (one-shot) Training Flow … … One-shot data access No noise added to the gradients. No clipping required.
  15. General shortcomings of DP-SGD 1. Training steps are limited. Each

    access of data reduces the guarantees of privacy. 2. Network size is limited. Large neural networks lead to too much noises added to the gradient updates. 3. Extensive hyperparameter (clipping size) tunings are required. Sensitive data Calculate loss Lθ Compute gradient ∇Lθ Add Gaussian noise Update parameters θ Clip gradient Generative model Sample a batch of data 1. Multiple access to data 3. Tuning of clipping size (to bound sensitivity) 2. Noise proportional to network size
  16. General shortcomings of DP-SGD 1. Training steps are limited. Each

    access of data reduces the guarantees of privacy. 2. Network size is limited. Large neural networks lead to too much noises added to the gradient updates. 3. Extensive hyperparameter (clipping size) tunings are required. Sensitive data Calculate loss Lθ Compute gradient ∇Lθ Add Gaussian noise Update parameters θ Clip gradient Generative model Sample a batch of data 1. Multiple access to data 3. Tuning of clipping size (to bound sensitivity) 2. Noise proportional to network size Sensitive Data Privately Embedded 1 Privately Embedded 2 Privately Embedded k Aux Synthesized 1 Synthesized 2 Synthesized k Critic Adv. Recon. Learner Generator (1) (2) (3) (4) DP Flow (one-shot) Training Flow … … One-shot data access No noise added to the gradients. No clipping required.
  17. Realization of PEARL Embedding Sensitive Data Privately Embedded 1 Privately

    Embedded 2 Privately Embedded k Aux Synthesized 1 Synthesized 2 Synthesized k Critic Adv. Recon. Learner Generator (1) (2) (3) (4) DP Flow (one-shot) Training Flow … …
  18. Realization of PEARL Characteristic Function Φℙ (t) = 𝔼 x∼ℙ

    [eit⋅x] = ∫ ℝd eit⋅xdℙ ≃ ∑ x eit⋅x • Let be a random variable with probability distribution , the corresponding characteristic function is x ℙ • This mathematical operation is equivalent to Fourier transformation from the signal processing point of view. is frequency. • Also de fi ne Characteristic function distance between two distributions: t C2(ℙ, ℚ) = ∫ |Φℙ (t) − Φℚ (t)|2 ω(t)dt • It can be shown that with appropriately de fi ned density , ω(t) C(ℙ, ℚ) = 0 ⟺ ℙ = ℚ (empirical CF)
  19. Realization of PEARL Generative model Sensitive Data Privately Embedded 1

    Privately Embedded 2 Privately Embedded k Aux Synthesized 1 Synthesized 2 Synthesized k Critic Adv. Recon. Learner Generator (1) (2) (3) (4) DP Flow (one-shot) Training Flow … …
  20. Realization of PEARL Generative model ̂ Φ ℙ (t) •

    Let be the sensitive data. We sample a fi nite number of from , and add noise to the empirical CF to make it DP. x t ω(t) • De fi ne a generator that takes a latent vector (noise) as input and outputs “fake” Gθ z x inf θ∈Θ k ∑ i=1 ˜ Φℙ (ti ) − ̂ Φ ℚ (ti ) 2 • We can then train the generator with the following objective minimizing CF distance: Add noise to for sampled frequencies , t ̂ Φ ℚ (t) = ∑ y eit⋅y with y = Gθ (z) where is no. of sampled frequencies k ˜ Φℙ (t) No noise required because this term has no access to data
  21. Realization of PEARL Critic Sensitive Data Privately Embedded 1 Privately

    Embedded 2 Privately Embedded k Aux Synthesized 1 Synthesized 2 Synthesized k Critic Adv. Recon. Learner Generator (1) (2) (3) (4) DP Flow (one-shot) Training Flow … …
  22. Realization of PEARL Critic • We have not discussed much

    about so far. • The idea is to treat as an adversarial critic to provide more discriminative features for training , like in GANs • cannot be optimized directly. Known methods, e.g., reparametrization tricks require access to data, violating privacy. ω(t) ω(t) Gθ ω(t) • We propose to re-weight the CFs to choose the “best” weight for training while preserving privacy by performing minimax optimization: Gθ inf θ∈Θ sup ω∈Ω Cω (ℙ, ℚθ ) C2(ℙ, ℚ) = ∫ |Φℙ (t) − Φℚ (t)|2 ω(t)dt
  23. Realization of PEARL Critic • More concretely, the following minimax

    optimization is proposed: • Additionally, we are able to show that the above optimization has the following theoretical properties: 1. Continuity and di ff erentiability (allows generator to be trained via gradient descent) 2. Weak convergence (good for training GAN-like models [ACB’17]) 3. Consistency at in fi nite sampling limit (ensures the maximization procedure is consistent asymptotically) inf θ∈Θ sup ω∈Ω k ∑ i=1 ω(ti ) ω0 (ti ) ˜ Φℙ (ti ) − ̂ Φ ℚ (ti ) 2
  24. Generated image data • PEARL’s quality is low at non-private

    ( ) limit, but the quality doesn’t change much as decreases (except at extreme value) ϵ = ∞ ϵ
  25. Generated image data • Evaluating with metrics commonly used for

    GANs
  26. Results on tabular data • We also generate synthetic Adult

    data. The frequency histogram is shown in the left (compared with another SOTA method), which can capture the pattern of the distribution well. • We use the synthetic data to train ML models for classifying real data. The result on the right also show that PEARL outperforms the SOTA method.
  27. Wrap-up PEARL: a new approach of training deep generative models

    1 Sensitive Data Privately Embedded 1 Privately Embedded 2 Privately Embedded k Aux Synthesized 1 Synthesized 2 Synthesized k Critic Adv. Recon. Learner Generator (1) (2) (3) (4) DP Flow (one-shot) Training Flow … … • Training practical models at reasonable privacy levels while avoiding di ffi culties of DP-SGD.
  28. Appendix

  29. Auxiliary information • Get auxiliary information privately to train better

    generative model. • Tabular table: use DP-mean to train Gaussian Mixture Model to better model continuous attributes. • Class imbalance: get the number of samples in each class to perform re-weighting to train more balanced model.
  30. Choosing ω0 • is chosen by median heuristic (pairwise median

    of data points). • We estimate the mean privately instead because it is more tractable. • The privacy budget for this calculation is accounted for appropriately ω0 (t)
  31. Implementation details

  32. Detailed quantitative Adult results

  33. Frequency histogram for continuous attribute of Adult data