Privacy granted by maths

Privacy granted by maths @Giuliabianchl October 22, 2019

Giulia Bianchi ‍ data scientist @Giuliabianchl

Table of contents Privacy leaks Differential Privacy TensorFlow Privacy

https://www.cs.cmu.edu/~mfredrik/papers/fjr2015ccs.pdf Privacy leaks

https://xkcd.com/2169/ https://arxiv.org/abs/1802.08232 Privacy leaks

https://arxiv.org/abs/1802.08232 Privacy leaks "Unintended memorization occurs when trained neural networks
may reveal the presence of out-of-distribution training data -- i.e., training data that is irrelevant to the learning task [...]. Let's meet by the docks at midnight on june 28, come alone. Long live the revolution. Our next meeting will be at the docks...

Differential privacy addresses the paradox of learning nothing about an
individual while learning useful information about a population " Roughly, an algorithm is differentially private if an observer seeing its output cannot tell if a particular individual's information was used in the computation " Diﬀerential privacy https://www.cis.upenn.edu/~aaroth/Papers/privacybook.pdf

Diﬀerential privacy Y X 2 databases X, Y Y =
X + 1 entry If M(X) is sufficiently close to M(Y) then M is differentially private " Analyst ‍♀ The analyst queries X and Y and obtains M(X) and M(Y) Algorithm M Probability of M(X)≅Probability of M(Y)

"Differential privacy will provide privacy by process; in particular it
will introduce randomness. " Diﬀerential privacy A randomized algorithm M with domain N|X| is (ε, δ)-differentially private if for all S ⊆ Range(M) and for all x, y ∈ N|X| such that ∥x − y∥ 1 ≤ 1: Pr[M(x) ∈ S] ≤ exp(ε) Pr[M(y) ∈ S] + δ If δ = 0, we say that M is ε-differentially private. https://www.cis.upenn.edu/~aaroth/Papers/privacybook.pdf

• adjacent databases x, y ∈ N|X| ◦ N|x| input
set ◦ x, y differs for 1 entry " Diﬀerential privacy A randomized algorithm M with domain N|X| is (ε, δ)-differentially private if for all S ⊆ Range(M) and for all x, y ∈ N|X| such that ∥x − y∥ 1 ≤ 1: Pr[M(x) ∈ S] ≤ exp(ε) Pr[M(y) ∈ S] + δ If δ = 0, we say that M is ε-differentially private. https://www.cis.upenn.edu/~aaroth/Papers/privacybook.pdf

• randomized algorithm M: N|x| → Range(M) ◦ N|x| input
set ◦ Range(M) output set ◦ S ⊆ Range(M) → S is an output " Diﬀerential privacy A randomized algorithm M with domain N|X| is (ε, δ)-differentially private if for all S ⊆ Range(M) and for all x, y ∈ N|X| such that ∥x − y∥ 1 ≤ 1: Pr[M(x) ∈ S] ≤ exp(ε) Pr[M(y) ∈ S] + δ If δ = 0, we say that M is ε-differentially private. https://www.cis.upenn.edu/~aaroth/Papers/privacybook.pdf

• δ=0 ◦ Probability[M(x) ∈ S] ≤ exp(ε) Probability[M(y) ∈
S] • ε≅0 ⇒ exp(ε)≅1 ◦ Probability[M(x) ∈ S] / Probability[M(y) ∈ S] ≤ 1 " Diﬀerential privacy A randomized algorithm M with domain N|X| is (ε, δ)-differentially private if for all S ⊆ Range(M) and for all x, y ∈ N|X| such that ∥x − y∥ 1 ≤ 1: Pr[M(x) ∈ S] ≤ exp(ε) Pr[M(y) ∈ S] + δ If δ = 0, we say that M is ε-differentially private. https://www.cis.upenn.edu/~aaroth/Papers/privacybook.pdf

" Diﬀerential privacy A randomized algorithm M with domain N|X|
is (ε, δ)-differentially private if for all S ⊆ Range(M) and for all x, y ∈ N|X| such that ∥x − y∥ 1 ≤ 1: Pr[M(x) ∈ S] ≤ exp(ε) Pr[M(y) ∈ S] + δ If δ = 0, we say that M is ε-differentially private. • The probability of an output of a randomized (ε, δ)-differentially private algorithm on two adjacent databases is pretty much the same ◦ ε small enough ◦ δ < 1/|X| https://www.cis.upenn.edu/~aaroth/Papers/privacybook.pdf

*Easy way to approximate a deterministic function with a differentially
private algorithm * Diﬀerential privacy Privacy guarantee achieved is quantifiable *Differential privacy has properties that makes it useful in machine learning (composability, group privacy, robustness to auxiliary information)

private algorithm Diﬀerential privacy

private algorithm Diﬀerential privacy By adding NOISE *Gaussian mechanism G σ f(x) ≝ f(x) + N(0, σ2) https://en.wikipedia.org/wiki/Additive_noise_mechanisms

Deep Learning with Differential Privacy https://arxiv.org/abs/1607.00133 Diﬀerentially private Stochastic Gradient
Descent *Differential privacy is introduced in deep learning algorithms by modifying the stochastic gradient descent At each iteration: 1. clip gradient 2. add noise Clipping gradient is common practice to avoid overfitting even when privacy is not a matter. It is required to prove the differential privacy guarantee

Diﬀerentially private sgd Deep Learning with Differential Privacy https://arxiv.org/abs/1607.00133

TensorFlow Privacy https://github.com/tensorflow/privacy Latest release 0.1.0 2 oct 2019 First
release 0.0.1 23 aug 2019

Without DP With DP TensorFlow Privacy https://github.com/tensorflow/privacy/blob/master/tutorials/mnist_dpsgd_tutorial_keras.py if train_with_differential_privacy ==
True: optimizer = DPGradientDescentGaussianOptimizer( l2_norm_clip=l2_norm_clip, noise_multiplier=noise_multiplier, num_microbatches=microbatches, learning_rate=learning_rate) # Compute vector of per-example loss rather than its mean over a minibatch. loss = tf.keras.losses.CategoricalCrossentropy( from_logits=True, reduction=tf.losses.Reduction.NONE) else: optimizer = GradientDescentOptimizer(learning_rate=learning_rate) loss = tf.keras.losses.CategoricalCrossentropy(from_logits=True) model.compile(optimizer=optimizer, loss=loss, metrics=['accuracy'])

if train_with_differential_privacy == True: optimizer = DPGradientDescentGaussianOptimizer( l2_norm_clip=l2_norm_clip, noise_multiplier=noise_multiplier, num_microbatches=microbatches,
learning_rate=learning_rate) # Compute vector of per-example loss rather than its mean over a minibatch. loss = tf.keras.losses.CategoricalCrossentropy( from_logits=True, reduction=tf.losses.Reduction.NONE) else: optimizer = GradientDescentOptimizer(learning_rate=FLAGS.learning_rate) loss = tf.keras.losses.CategoricalCrossentropy(from_logits=True) model.compile(optimizer=optimizer, loss=loss, metrics=['accuracy']) TensorFlow Privacy https://github.com/tensorflow/privacy/blob/master/tutorials/mnist_dpsgd_tutorial_keras.py privacy.optimizers.dp_optimizer. DPGradientDescentGaussianOptimizer tf.optimizers.SGD DPAdamGaussianOptimizer DPAdagradGaussianOptimizer DPGradientDescentGaussianOptimizer

Privacy Analysis *Differential privacy guarantee is expressed by epsilon and
delta • epsilon: upper bound on how much the probability of a particular model output can vary by adding or removing a single training point • delta: bounds the probability of our privacy guarantee not holding *TensorFlow Privacy provides methods to compute them

Take aways Differential privacy Privacy leaks TensorFlow Privacy Y X
Pr [M(X)] ≅ Pr [M(Y)] M M DPAdamGaussianOptimizer DPAdagradGaussianOptimizer DPGradientDescentGaussianOptimizer

Thank you @Giuliabianchl

Privacy granted by maths

Privacy granted by maths

Giulia

More Decks by Giulia

Other Decks in Programming

Featured

Transcript