Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Privacy granted by maths

2d2dbdf5d060b4c1bb238f8f59185cfb?s=47 Giulia
October 21, 2019

Privacy granted by maths

Since March 2019 the TensorFlow family counts a new member TensorFlow Privacy. What is it about? What are the mathematical theories that guarantee the privacy of deep learning models and how they are implemented?

Presented at DevFest Nantes 2019.
Format: 20 minutes Q&A included

2d2dbdf5d060b4c1bb238f8f59185cfb?s=128

Giulia

October 21, 2019
Tweet

Transcript

  1. Privacy granted by maths @Giuliabianchl October 22, 2019

  2. Giulia Bianchi ‍ data scientist @Giuliabianchl

  3. Table of contents Privacy leaks Differential Privacy TensorFlow Privacy

  4. https://www.cs.cmu.edu/~mfredrik/papers/fjr2015ccs.pdf Privacy leaks

  5. https://xkcd.com/2169/ https://arxiv.org/abs/1802.08232 Privacy leaks

  6. https://arxiv.org/abs/1802.08232 Privacy leaks "Unintended memorization occurs when trained neural networks

    may reveal the presence of out-of-distribution training data -- i.e., training data that is irrelevant to the learning task [...]. Let's meet by the docks at midnight on june 28, come alone. Long live the revolution. Our next meeting will be at the docks...
  7. Differential privacy addresses the paradox of learning nothing about an

    individual while learning useful information about a population " Roughly, an algorithm is differentially private if an observer seeing its output cannot tell if a particular individual's information was used in the computation " Differential privacy https://www.cis.upenn.edu/~aaroth/Papers/privacybook.pdf
  8. Differential privacy Y X 2 databases X, Y Y =

    X + 1 entry If M(X) is sufficiently close to M(Y) then M is differentially private " Analyst ‍♀ The analyst queries X and Y and obtains M(X) and M(Y) Algorithm M Probability of M(X)≅Probability of M(Y)
  9. "Differential privacy will provide privacy by process; in particular it

    will introduce randomness. " Differential privacy A randomized algorithm M with domain N|X| is (ε, δ)-differentially private if for all S ⊆ Range(M) and for all x, y ∈ N|X| such that ∥x − y∥ 1 ≤ 1: Pr[M(x) ∈ S] ≤ exp(ε) Pr[M(y) ∈ S] + δ If δ = 0, we say that M is ε-differentially private. https://www.cis.upenn.edu/~aaroth/Papers/privacybook.pdf
  10. • adjacent databases x, y ∈ N|X| ◦ N|x| input

    set ◦ x, y differs for 1 entry " Differential privacy A randomized algorithm M with domain N|X| is (ε, δ)-differentially private if for all S ⊆ Range(M) and for all x, y ∈ N|X| such that ∥x − y∥ 1 ≤ 1: Pr[M(x) ∈ S] ≤ exp(ε) Pr[M(y) ∈ S] + δ If δ = 0, we say that M is ε-differentially private. https://www.cis.upenn.edu/~aaroth/Papers/privacybook.pdf
  11. • randomized algorithm M: N|x| → Range(M) ◦ N|x| input

    set ◦ Range(M) output set ◦ S ⊆ Range(M) → S is an output " Differential privacy A randomized algorithm M with domain N|X| is (ε, δ)-differentially private if for all S ⊆ Range(M) and for all x, y ∈ N|X| such that ∥x − y∥ 1 ≤ 1: Pr[M(x) ∈ S] ≤ exp(ε) Pr[M(y) ∈ S] + δ If δ = 0, we say that M is ε-differentially private. https://www.cis.upenn.edu/~aaroth/Papers/privacybook.pdf
  12. • δ=0 ◦ Probability[M(x) ∈ S] ≤ exp(ε) Probability[M(y) ∈

    S] • ε≅0 ⇒ exp(ε)≅1 ◦ Probability[M(x) ∈ S] / Probability[M(y) ∈ S] ≤ 1 " Differential privacy A randomized algorithm M with domain N|X| is (ε, δ)-differentially private if for all S ⊆ Range(M) and for all x, y ∈ N|X| such that ∥x − y∥ 1 ≤ 1: Pr[M(x) ∈ S] ≤ exp(ε) Pr[M(y) ∈ S] + δ If δ = 0, we say that M is ε-differentially private. https://www.cis.upenn.edu/~aaroth/Papers/privacybook.pdf
  13. " Differential privacy A randomized algorithm M with domain N|X|

    is (ε, δ)-differentially private if for all S ⊆ Range(M) and for all x, y ∈ N|X| such that ∥x − y∥ 1 ≤ 1: Pr[M(x) ∈ S] ≤ exp(ε) Pr[M(y) ∈ S] + δ If δ = 0, we say that M is ε-differentially private. • The probability of an output of a randomized (ε, δ)-differentially private algorithm on two adjacent databases is pretty much the same ◦ ε small enough ◦ δ < 1/|X| https://www.cis.upenn.edu/~aaroth/Papers/privacybook.pdf
  14. *Easy way to approximate a deterministic function with a differentially

    private algorithm * Differential privacy Privacy guarantee achieved is quantifiable *Differential privacy has properties that makes it useful in machine learning (composability, group privacy, robustness to auxiliary information)
  15. *Easy way to approximate a deterministic function with a differentially

    private algorithm Differential privacy
  16. *Easy way to approximate a deterministic function with a differentially

    private algorithm Differential privacy By adding NOISE *Gaussian mechanism G σ f(x) ≝ f(x) + N(0, σ2) https://en.wikipedia.org/wiki/Additive_noise_mechanisms
  17. Deep Learning with Differential Privacy https://arxiv.org/abs/1607.00133 Differentially private Stochastic Gradient

    Descent *Differential privacy is introduced in deep learning algorithms by modifying the stochastic gradient descent At each iteration: 1. clip gradient 2. add noise Clipping gradient is common practice to avoid overfitting even when privacy is not a matter. It is required to prove the differential privacy guarantee
  18. Differentially private sgd Deep Learning with Differential Privacy https://arxiv.org/abs/1607.00133

  19. TensorFlow Privacy https://github.com/tensorflow/privacy Latest release 0.1.0 2 oct 2019 First

    release 0.0.1 23 aug 2019
  20. Without DP With DP TensorFlow Privacy https://github.com/tensorflow/privacy/blob/master/tutorials/mnist_dpsgd_tutorial_keras.py if train_with_differential_privacy ==

    True: optimizer = DPGradientDescentGaussianOptimizer( l2_norm_clip=l2_norm_clip, noise_multiplier=noise_multiplier, num_microbatches=microbatches, learning_rate=learning_rate) # Compute vector of per-example loss rather than its mean over a minibatch. loss = tf.keras.losses.CategoricalCrossentropy( from_logits=True, reduction=tf.losses.Reduction.NONE) else: optimizer = GradientDescentOptimizer(learning_rate=learning_rate) loss = tf.keras.losses.CategoricalCrossentropy(from_logits=True) model.compile(optimizer=optimizer, loss=loss, metrics=['accuracy'])
  21. if train_with_differential_privacy == True: optimizer = DPGradientDescentGaussianOptimizer( l2_norm_clip=l2_norm_clip, noise_multiplier=noise_multiplier, num_microbatches=microbatches,

    learning_rate=learning_rate) # Compute vector of per-example loss rather than its mean over a minibatch. loss = tf.keras.losses.CategoricalCrossentropy( from_logits=True, reduction=tf.losses.Reduction.NONE) else: optimizer = GradientDescentOptimizer(learning_rate=FLAGS.learning_rate) loss = tf.keras.losses.CategoricalCrossentropy(from_logits=True) model.compile(optimizer=optimizer, loss=loss, metrics=['accuracy']) TensorFlow Privacy https://github.com/tensorflow/privacy/blob/master/tutorials/mnist_dpsgd_tutorial_keras.py privacy.optimizers.dp_optimizer. DPGradientDescentGaussianOptimizer tf.optimizers.SGD DPAdamGaussianOptimizer DPAdagradGaussianOptimizer DPGradientDescentGaussianOptimizer
  22. Privacy Analysis *Differential privacy guarantee is expressed by epsilon and

    delta • epsilon: upper bound on how much the probability of a particular model output can vary by adding or removing a single training point • delta: bounds the probability of our privacy guarantee not holding *TensorFlow Privacy provides methods to compute them
  23. Take aways Differential privacy Privacy leaks TensorFlow Privacy Y X

    Pr [M(X)] ≅ Pr [M(Y)] M M DPAdamGaussianOptimizer DPAdagradGaussianOptimizer DPGradientDescentGaussianOptimizer
  24. Thank you @Giuliabianchl