Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Privacy granted by maths

Giulia
October 21, 2019

Privacy granted by maths

Since March 2019 the TensorFlow family counts a new member TensorFlow Privacy. What is it about? What are the mathematical theories that guarantee the privacy of deep learning models and how they are implemented?

Presented at DevFest Nantes 2019.
Format: 20 minutes Q&A included

Giulia

October 21, 2019
Tweet

More Decks by Giulia

Other Decks in Programming

Transcript

  1. https://arxiv.org/abs/1802.08232 Privacy leaks "Unintended memorization occurs when trained neural networks

    may reveal the presence of out-of-distribution training data -- i.e., training data that is irrelevant to the learning task [...]. Let's meet by the docks at midnight on june 28, come alone. Long live the revolution. Our next meeting will be at the docks...
  2. Differential privacy addresses the paradox of learning nothing about an

    individual while learning useful information about a population " Roughly, an algorithm is differentially private if an observer seeing its output cannot tell if a particular individual's information was used in the computation " Differential privacy https://www.cis.upenn.edu/~aaroth/Papers/privacybook.pdf
  3. Differential privacy Y X 2 databases X, Y Y =

    X + 1 entry If M(X) is sufficiently close to M(Y) then M is differentially private " Analyst ‍♀ The analyst queries X and Y and obtains M(X) and M(Y) Algorithm M Probability of M(X)≅Probability of M(Y)
  4. "Differential privacy will provide privacy by process; in particular it

    will introduce randomness. " Differential privacy A randomized algorithm M with domain N|X| is (ε, δ)-differentially private if for all S ⊆ Range(M) and for all x, y ∈ N|X| such that ∥x − y∥ 1 ≤ 1: Pr[M(x) ∈ S] ≤ exp(ε) Pr[M(y) ∈ S] + δ If δ = 0, we say that M is ε-differentially private. https://www.cis.upenn.edu/~aaroth/Papers/privacybook.pdf
  5. • adjacent databases x, y ∈ N|X| ◦ N|x| input

    set ◦ x, y differs for 1 entry " Differential privacy A randomized algorithm M with domain N|X| is (ε, δ)-differentially private if for all S ⊆ Range(M) and for all x, y ∈ N|X| such that ∥x − y∥ 1 ≤ 1: Pr[M(x) ∈ S] ≤ exp(ε) Pr[M(y) ∈ S] + δ If δ = 0, we say that M is ε-differentially private. https://www.cis.upenn.edu/~aaroth/Papers/privacybook.pdf
  6. • randomized algorithm M: N|x| → Range(M) ◦ N|x| input

    set ◦ Range(M) output set ◦ S ⊆ Range(M) → S is an output " Differential privacy A randomized algorithm M with domain N|X| is (ε, δ)-differentially private if for all S ⊆ Range(M) and for all x, y ∈ N|X| such that ∥x − y∥ 1 ≤ 1: Pr[M(x) ∈ S] ≤ exp(ε) Pr[M(y) ∈ S] + δ If δ = 0, we say that M is ε-differentially private. https://www.cis.upenn.edu/~aaroth/Papers/privacybook.pdf
  7. • δ=0 ◦ Probability[M(x) ∈ S] ≤ exp(ε) Probability[M(y) ∈

    S] • ε≅0 ⇒ exp(ε)≅1 ◦ Probability[M(x) ∈ S] / Probability[M(y) ∈ S] ≤ 1 " Differential privacy A randomized algorithm M with domain N|X| is (ε, δ)-differentially private if for all S ⊆ Range(M) and for all x, y ∈ N|X| such that ∥x − y∥ 1 ≤ 1: Pr[M(x) ∈ S] ≤ exp(ε) Pr[M(y) ∈ S] + δ If δ = 0, we say that M is ε-differentially private. https://www.cis.upenn.edu/~aaroth/Papers/privacybook.pdf
  8. " Differential privacy A randomized algorithm M with domain N|X|

    is (ε, δ)-differentially private if for all S ⊆ Range(M) and for all x, y ∈ N|X| such that ∥x − y∥ 1 ≤ 1: Pr[M(x) ∈ S] ≤ exp(ε) Pr[M(y) ∈ S] + δ If δ = 0, we say that M is ε-differentially private. • The probability of an output of a randomized (ε, δ)-differentially private algorithm on two adjacent databases is pretty much the same ◦ ε small enough ◦ δ < 1/|X| https://www.cis.upenn.edu/~aaroth/Papers/privacybook.pdf
  9. *Easy way to approximate a deterministic function with a differentially

    private algorithm * Differential privacy Privacy guarantee achieved is quantifiable *Differential privacy has properties that makes it useful in machine learning (composability, group privacy, robustness to auxiliary information)
  10. *Easy way to approximate a deterministic function with a differentially

    private algorithm Differential privacy By adding NOISE *Gaussian mechanism G σ f(x) ≝ f(x) + N(0, σ2) https://en.wikipedia.org/wiki/Additive_noise_mechanisms
  11. Deep Learning with Differential Privacy https://arxiv.org/abs/1607.00133 Differentially private Stochastic Gradient

    Descent *Differential privacy is introduced in deep learning algorithms by modifying the stochastic gradient descent At each iteration: 1. clip gradient 2. add noise Clipping gradient is common practice to avoid overfitting even when privacy is not a matter. It is required to prove the differential privacy guarantee
  12. Without DP With DP TensorFlow Privacy https://github.com/tensorflow/privacy/blob/master/tutorials/mnist_dpsgd_tutorial_keras.py if train_with_differential_privacy ==

    True: optimizer = DPGradientDescentGaussianOptimizer( l2_norm_clip=l2_norm_clip, noise_multiplier=noise_multiplier, num_microbatches=microbatches, learning_rate=learning_rate) # Compute vector of per-example loss rather than its mean over a minibatch. loss = tf.keras.losses.CategoricalCrossentropy( from_logits=True, reduction=tf.losses.Reduction.NONE) else: optimizer = GradientDescentOptimizer(learning_rate=learning_rate) loss = tf.keras.losses.CategoricalCrossentropy(from_logits=True) model.compile(optimizer=optimizer, loss=loss, metrics=['accuracy'])
  13. if train_with_differential_privacy == True: optimizer = DPGradientDescentGaussianOptimizer( l2_norm_clip=l2_norm_clip, noise_multiplier=noise_multiplier, num_microbatches=microbatches,

    learning_rate=learning_rate) # Compute vector of per-example loss rather than its mean over a minibatch. loss = tf.keras.losses.CategoricalCrossentropy( from_logits=True, reduction=tf.losses.Reduction.NONE) else: optimizer = GradientDescentOptimizer(learning_rate=FLAGS.learning_rate) loss = tf.keras.losses.CategoricalCrossentropy(from_logits=True) model.compile(optimizer=optimizer, loss=loss, metrics=['accuracy']) TensorFlow Privacy https://github.com/tensorflow/privacy/blob/master/tutorials/mnist_dpsgd_tutorial_keras.py privacy.optimizers.dp_optimizer. DPGradientDescentGaussianOptimizer tf.optimizers.SGD DPAdamGaussianOptimizer DPAdagradGaussianOptimizer DPGradientDescentGaussianOptimizer
  14. Privacy Analysis *Differential privacy guarantee is expressed by epsilon and

    delta • epsilon: upper bound on how much the probability of a particular model output can vary by adding or removing a single training point • delta: bounds the probability of our privacy guarantee not holding *TensorFlow Privacy provides methods to compute them
  15. Take aways Differential privacy Privacy leaks TensorFlow Privacy Y X

    Pr [M(X)] ≅ Pr [M(Y)] M M DPAdamGaussianOptimizer DPAdagradGaussianOptimizer DPGradientDescentGaussianOptimizer