Giulia
October 21, 2019
280

# Privacy granted by maths

Since March 2019 the TensorFlow family counts a new member TensorFlow Privacy. What is it about? What are the mathematical theories that guarantee the privacy of deep learning models and how they are implemented?

Presented at DevFest Nantes 2019.
Format: 20 minutes Q&A included

October 21, 2019

## Transcript

1. Privacy granted by maths
@Giuliabianchl
October 22, 2019

2. Giulia Bianchi
‍ data scientist
@Giuliabianchl

Privacy leaks
Differential Privacy
TensorFlow Privacy

4. https://www.cs.cmu.edu/~mfredrik/papers/fjr2015ccs.pdf
Privacy leaks

5. https://xkcd.com/2169/
https://arxiv.org/abs/1802.08232
Privacy leaks

6. https://arxiv.org/abs/1802.08232
Privacy leaks
"Unintended memorization occurs when trained neural networks may reveal the
presence of out-of-distribution training data -- i.e., training data that is irrelevant to
Let's meet by
the docks at
midnight on
june 28, come
alone.
Long live the
revolution.
Our next
meeting will
be at the
docks...

individual while learning useful information about a population
"
Roughly, an algorithm is differentially private if an observer seeing its output
cannot tell if a particular individual's information was used in the computation
"
Diﬀerential privacy
https://www.cis.upenn.edu/~aaroth/Papers/privacybook.pdf

8. Diﬀerential privacy
Y
X
2 databases X, Y
Y = X + 1 entry
If M(X) is sufficiently close to M(Y)
then M is differentially private
"
Analyst ‍♀
The analyst queries X
and Y and obtains
M(X) and M(Y)
Algorithm M Probability of M(X)≅Probability of M(Y)

9. "Differential privacy will provide privacy by process; in particular it will introduce
randomness.
"
Diﬀerential privacy
A randomized algorithm M with domain N|X| is (ε, δ)-differentially private if
for all S ⊆ Range(M) and for all x, y ∈ N|X| such that ∥x − y∥
1
≤ 1:
Pr[M(x) ∈ S] ≤ exp(ε) Pr[M(y) ∈ S] + δ
If δ = 0, we say that M is ε-differentially private.
https://www.cis.upenn.edu/~aaroth/Papers/privacybook.pdf

10. ● adjacent databases x, y ∈ N|X|
○ N|x| input set
○ x, y differs for 1 entry
"
Diﬀerential privacy
A randomized algorithm M with domain N|X| is (ε, δ)-differentially private if
for all S ⊆ Range(M) and for all x, y ∈ N|X| such that ∥x − y∥
1
≤ 1:
Pr[M(x) ∈ S] ≤ exp(ε) Pr[M(y) ∈ S] + δ
If δ = 0, we say that M is ε-differentially private.
https://www.cis.upenn.edu/~aaroth/Papers/privacybook.pdf

11. ● randomized algorithm M: N|x| → Range(M)
○ N|x| input set
○ Range(M) output set
○ S ⊆ Range(M) → S is an output
"
Diﬀerential privacy
A randomized algorithm M with domain N|X| is (ε, δ)-differentially private if
for all S ⊆ Range(M) and for all x, y ∈ N|X| such that ∥x − y∥
1
≤ 1:
Pr[M(x) ∈ S] ≤ exp(ε) Pr[M(y) ∈ S] + δ
If δ = 0, we say that M is ε-differentially private.
https://www.cis.upenn.edu/~aaroth/Papers/privacybook.pdf

12. ● δ=0
○ Probability[M(x) ∈ S] ≤ exp(ε) Probability[M(y) ∈ S]
● ε≅0 ⇒ exp(ε)≅1
○ Probability[M(x) ∈ S] / Probability[M(y) ∈ S] ≤ 1
"
Diﬀerential privacy
A randomized algorithm M with domain N|X| is (ε, δ)-differentially private if
for all S ⊆ Range(M) and for all x, y ∈ N|X| such that ∥x − y∥
1
≤ 1:
Pr[M(x) ∈ S] ≤ exp(ε) Pr[M(y) ∈ S] + δ
If δ = 0, we say that M is ε-differentially private.
https://www.cis.upenn.edu/~aaroth/Papers/privacybook.pdf

13. "
Diﬀerential privacy
A randomized algorithm M with domain N|X| is (ε, δ)-differentially private if
for all S ⊆ Range(M) and for all x, y ∈ N|X| such that ∥x − y∥
1
≤ 1:
Pr[M(x) ∈ S] ≤ exp(ε) Pr[M(y) ∈ S] + δ
If δ = 0, we say that M is ε-differentially private.
● The probability of an output of a randomized (ε, δ)-differentially private
algorithm on two adjacent databases is pretty much the same
○ ε small enough
○ δ < 1/|X|
https://www.cis.upenn.edu/~aaroth/Papers/privacybook.pdf

14. *Easy way to approximate a deterministic function with a differentially private
algorithm
*
Diﬀerential privacy
Privacy guarantee achieved is quantifiable
*Differential privacy has properties that makes it useful in machine learning
(composability, group privacy, robustness to auxiliary information)

15. *Easy way to approximate a deterministic function with a differentially private
algorithm
Diﬀerential privacy

16. *Easy way to approximate a deterministic function with a differentially private
algorithm
Diﬀerential privacy
*Gaussian mechanism G
σ
f(x) ≝ f(x) + N(0, σ2)

17. Deep Learning with Differential Privacy
https://arxiv.org/abs/1607.00133
Descent
*Differential privacy is introduced in deep learning algorithms by modifying the
At each iteration:
Clipping gradient is common practice to avoid
overfitting even when privacy is not a matter.
It is required to prove the differential privacy
guarantee

18. Diﬀerentially private sgd
Deep Learning with Differential Privacy
https://arxiv.org/abs/1607.00133

19. TensorFlow Privacy
https://github.com/tensorflow/privacy
Latest release 0.1.0
2 oct 2019
First release 0.0.1
23 aug 2019

20. Without DP
With DP
TensorFlow Privacy
https://github.com/tensorflow/privacy/blob/master/tutorials/mnist_dpsgd_tutorial_keras.py
if train_with_differential_privacy == True:
l2_norm_clip=l2_norm_clip,
noise_multiplier=noise_multiplier,
num_microbatches=microbatches,
learning_rate=learning_rate)
# Compute vector of per-example loss rather than its mean over a minibatch.
loss = tf.keras.losses.CategoricalCrossentropy(
from_logits=True, reduction=tf.losses.Reduction.NONE)
else:
loss = tf.keras.losses.CategoricalCrossentropy(from_logits=True)
model.compile(optimizer=optimizer, loss=loss, metrics=['accuracy'])

21. if train_with_differential_privacy == True:
l2_norm_clip=l2_norm_clip,
noise_multiplier=noise_multiplier,
num_microbatches=microbatches,
learning_rate=learning_rate)
# Compute vector of per-example loss rather than its mean over a minibatch.
loss = tf.keras.losses.CategoricalCrossentropy(
from_logits=True, reduction=tf.losses.Reduction.NONE)
else:
loss = tf.keras.losses.CategoricalCrossentropy(from_logits=True)
model.compile(optimizer=optimizer, loss=loss, metrics=['accuracy'])
TensorFlow Privacy
https://github.com/tensorflow/privacy/blob/master/tutorials/mnist_dpsgd_tutorial_keras.py
privacy.optimizers.dp_optimizer.
tf.optimizers.SGD

22. Privacy Analysis
*Differential privacy guarantee is expressed by epsilon and delta
● epsilon: upper bound on how much the probability of a particular model output can vary by adding
or removing a single training point
● delta: bounds the probability of our privacy guarantee not holding
*TensorFlow Privacy provides methods to compute them

23. Take aways
Differential privacy
Privacy leaks TensorFlow Privacy
Y
X
Pr [M(X)]

Pr [M(Y)]
M
M