Luca Corbucci
June 16, 2024
27

# AI Conf 2024 - Is Your Model Private?

June 16, 2024

## Transcript

1. ### Milano 17 GIUGNO 2024 >>AI CONF Is your model private?

Luca Corbucci Ph.D. candidate in Computer Science

5. ### How many of you know about the privacy risks of

training ML models?
6. ### Luca Corbucci PhD Student in Computer Science @ University of

Pisa Podcaster @ PointerPodcast Community Manager @ SuperHero Valley Community Manager @ Pisa.dev https://lucacorbucci.me/
7. ### Why should we care about privacy when training ML models?

i.e. What could possibly go wrong?

9. ### What’s the color of the cat? An attacker wants to

know If a sample was used to Train the model In a Membership Inference Attack,
10. ### What’s the color of the cat? 0 45 90 0

1 2 4 0 20 40 0 1 2 4
11. ### What’s the color of the cat? 0 45 90 0

1 2 4 0 20 40 0 1 2 4 The model will be more confident when we query it with the image that was in the training dataset

22. ### Differential Privacy (An intuition using databases) Suppose you have two

databases That differs in one single instance
23. ### Differential Privacy (An intuition using databases) You query both of

them and you have two different results
24. ### Differential Privacy (An intuition using databases) “How many patients have

diabetes?” “How many patients have diabetes?” “10” “9”
25. ### Differential Privacy (An intuition using databases) “How many patients have

diabetes?” “How many patients have diabetes?” “10” “9” You can infer something about the missing instance

27. ### Differential Privacy Differential Privacy allows you to query the databases

adding some randomisation to the answer. (An intuition using databases) You will have (more or less) the same output regardless of the presence of one sample
28. ### Differential Privacy Differential Privacy allows you to query the databases

adding some randomisation to the answer. (An intuition using databases) You will have (more or less) the same output regardless of the presence of one sample Different queries -> Different results
29. ### Differential Privacy (A slightly more advanced definition) P[A( ) =

O] ≤ P[A( ) = O] eϵ Given two databases which differ in only one instance:
30. ### P[A( ) = O] ≤ P[A( ) = O] eϵ

tells us how much these two probabilities are similar eϵ is called “privacy budget” and represents an upper bound on how much we can leak information ϵ How to interpret the ϵ Given two databases which differ in only one instance:
31. ### Differential Privacy (A more relaxed definition) P[A( ) = O]

≤ P[A( ) = O] +δ eϵ The parameter quantifies the probability that something goes wrong. The algorithm will be differentially private with probability 1 -δ δ Given two databases which differ in only one instance:
32. ### Essentially, instead of returning the real output of the query,

we return a noisy output.
33. ### Example We want to query our database to know how

many patients have Diabetes >>> df[df["Disease"] == “Diabetes”].shape[0] 10
34. ### Example We want to query our database to know how

many patients have Diabetes >>> df[df["Disease"] == “Diabetes”].shape[0] + rnd.laplace(loc=0, scale=sensitivity/eps_1)
35. ### Example We want to query our database to know how

many patients have Diabetes >>> df[df["Disease"] == “Diabetes”].shape[0] + rnd.laplace(loc=0, scale=sensitivity/eps_1)
36. ### Example We want to query our database to know how

many patients have Diabetes >>> df[df["Disease"] == “Diabetes”].shape[0] + rnd.laplace(loc=0, scale=sensitivity/eps_1) 11.19888273257044
37. ### Example We want to query our database to know how

many patients have Diabetes >>> df[df["Disease"] == “Diabetes”].shape[0] + rnd.laplace(loc=0, scale=sensitivity/eps_1) 11.19888273257044 >>> df[df["Disease"] == "Diabetes"].shape[0] + rnd.laplace(loc=0, scale=sensitivity/eps_2) 9.0943263602294
38. ### Example We want to query our database to know how

many patients have Diabetes >>> df[df["Disease"] == “Diabetes”].shape[0] + rnd.laplace(loc=0, scale=sensitivity/eps_1) 11.19888273257044 >>> df[df["Disease"] == "Diabetes"].shape[0] + rnd.laplace(loc=0, scale=sensitivity/eps_2) 9.0943263602294 What’s the privacy cost here?
39. ### Example We want to query our database to know how

many patients have Diabetes >>> df[df["Disease"] == “Diabetes”].shape[0] + rnd.laplace(loc=0, scale=sensitivity/eps_1) 11.19888273257044 >>> df[df["Disease"] == "Diabetes"].shape[0] + rnd.laplace(loc=0, scale=sensitivity/eps_2) 9.0943263602294 What’s the privacy cost here? (eps_1 + eps_2)-DP
40. ### Example We want to query our database to know how

many patients have Diabetes >>> df[df["Disease"] == “Diabetes”].shape[0] + rnd.laplace(loc=0, scale=sensitivity/eps_1) 11.19888273257044 >>> df[df["Disease"] == "Diabetes"].shape[0] + rnd.laplace(loc=0, scale=sensitivity/eps_2) 9.0943263602294 The more you query the database, the higher is the privacy budget spent
41. ### Example We want to query our database to know how

many patients have Diabetes >>> df[df["Disease"] == “Diabetes”].shape[0] + rnd.laplace(loc=0, scale=sensitivity/eps_1) 11.19888273257044 >>> df[df["Disease"] == "Diabetes"].shape[0] + rnd.laplace(loc=0, scale=sensitivity/eps_2) 9.0943263602294 The more you query the database, the higher is the privacy budget spent The more is the privacy budget spent, the higher will be the upper bound on the privacy loss
42. ### Example We want to query our database to know how

many patients have Diabetes >>> df[df["Disease"] == “Diabetes”].shape[0] + rnd.laplace(loc=0, scale=sensitivity/eps_1) 11.19888273257044 >>> df[df["Disease"] == "Diabetes"].shape[0] + rnd.laplace(loc=0, scale=sensitivity/eps_2) 9.0943263602294 >>> int(df[df[“Disease"] == "Diabetes"].shape[0] + rnd.laplace(loc=0, scale=sensitivity/eps_3)) 12 Am I removing DP when I round the result?
43. ### Example We want to query our database to know how

many patients have Diabetes >>> df[df["Disease"] == “Diabetes”].shape[0] + rnd.laplace(loc=0, scale=sensitivity/eps_1) 11.19888273257044 >>> df[df["Disease"] == "Diabetes"].shape[0] + rnd.laplace(loc=0, scale=sensitivity/eps_2) 9.0943263602294 >>> int(df[df[“Disease"] == "Diabetes"].shape[0] + rnd.laplace(loc=0, scale=sensitivity/eps_3)) 12 Are the returned results useful?
44. ### Example We want to query our database to know how

many patients have Diabetes >>> df[df["Disease"] == “Diabetes”].shape[0] + rnd.laplace(loc=0, scale=sensitivity/eps_1) 11.19888273257044 >>> df[df["Disease"] == "Diabetes"].shape[0] + rnd.laplace(loc=0, scale=sensitivity/eps_2) 9.0943263602294 >>> int(df[df[“Disease"] == "Diabetes"].shape[0] + rnd.laplace(loc=0, scale=sensitivity/eps_3)) 12 Are the returned results useful? It depends on the privacy parameters

] eϵ
49. ### Differential Privacy P[A( ) = ] ≤ P[A( ) =

] eϵ The outputs of the two neural networks will be similar regardless of the presence of in the dataset
50. ### Where do we apply DP in this case? Can we

apply it directly on the prediction?
51. ### Where do we apply DP in this case? Can we

apply it directly on the prediction? Class 2 + Noise
52. ### Where do we apply DP in this case? Can we

apply it directly on the prediction? Class 2 + Noise
53. ### Where do we apply DP in this case? We need

to apply it DURING the training
54. ### Where do we apply DP in this case? We need

to apply it DURING the training The privacy budget will be spent during the training
55. ### Where do we apply DP in this case? We need

to apply it DURING the training The privacy budget will be spent during the training Once trained, we can query the model without additional privacy cost
56. ### SGD def sgd(): for each batch L_t: for each sample

x_i in the batch: g_t(x_i) = compute_gradient(M, x_i) g_t = average of gradients M = M - lr * g_t Return M
57. ### SGD DP-SGD def sgd(): for each batch L_t: for each

sample x_i in the batch: g_t(x_i) = compute_gradient(M, x_i) g_t = average of gradients M = M - lr * g_t Return M def sgd(): for each batch L_t: for each sample x_i: g_t(x_i) = compute_gradient(M, x_i) g_t = average of gradients M = M - lr * g_t Return M
58. ### SGD DP-SGD def dp_sgd(): for each batch L_t: for each

sample x_i: g_t(x_i) = compute_gradient(M, x_i) g_t(x_i) = clip_gradient(C) g_t = average of clipped gradients + Noise M = M - lr * g_t Return M def sgd(): for each batch L_t: for each sample x_i in the batch: g_t(x_i) = compute_gradient(M, x_i) g_t = average of gradients M = M - lr * g_t Return M
59. ### DP-SGD def dp_sgd(): for each batch L_t: for each sample

x_i: g_t(x_i) = compute_gradient(M, x_i) g_t(x_i) = clip_gradient(C) g_t = average of clipped gradients + Noise M = M - lr * g_t Return M clip_gradient(C)
60. ### DP-SGD def dp_sgd(): for each batch L_t: for each sample

x_i: g_t(x_i) = compute_gradient(M, x_i) g_t(x_i) = clip_gradient(C) g_t = average of clipped gradients + Noise M = M - lr * g_t Return M clip_gradient(C) We need to bound the information of each gradient computation
61. ### DP-SGD def dp_sgd(): for each batch L_t: for each sample

x_i: g_t(x_i) = compute_gradient(M, x_i) g_t(x_i) = clip_gradient(C) g_t = average of clipped gradients + Noise M = M - lr * g_t Return M clip_gradient(C) We need to bound the information of each gradient computation C is the maximum value for the gradients
62. ### DP-SGD def dp_sgd(): for each batch L_t: for each sample

x_i: g_t(x_i) = compute_gradient(M, x_i) g_t(x_i) = clip_gradient(C) g_t = average of clipped gradients + Noise M = M - lr * g_t Return M 𝒩 (0, σ2C2I) Can be Gaussian Noise Noise
63. ### DP-SGD def dp_sgd(): for each batch L_t: for each sample

x_i: g_t(x_i) = compute_gradient(M, x_i) g_t(x_i) = clip_gradient(C) g_t = average of clipped gradients + Noise M = M - lr * g_t Return M 𝒩 (0, σ2C2I) Can be Gaussian Noise This depends on C and on the privacy budget we want to guarantee Noise
64. ### DP-SGD def dp_sgd(): for each batch L_t: for each sample

x_i: g_t(x_i) = compute_gradient(M, x_i) g_t(x_i) = clip_gradient(C) g_t = average of clipped gradients + Noise M = M - lr * g_t Return M 𝒩 (0, σ2C2I) Can be Gaussian Noise This depends on C and on the privacy budget we want to guarantee High C -> High Noise Noise
65. ### DP-SGD def dp_sgd(): for each batch L_t: for each sample

x_i: g_t(x_i) = compute_gradient(M, x_i) g_t(x_i) = clip_gradient(C) g_t = average of clipped gradients + Noise M = M - lr * g_t Return M 𝒩 (0, σ2C2I) Can be Gaussian Noise This depends on C and on the privacy budget we want to guarantee High C -> High Noise Low privacy budget -> High Noise Noise
66. ### DP-SGD def dp_sgd(): for each batch L_t: for each sample

x_i: g_t(x_i) = compute_gradient(M, x_i) g_t(x_i) = clip_gradient(C) g_t = average of clipped gradients + Noise M = M - lr * g_t Return M 𝒩 (0, σ2C2I) Can be Gaussian Noise This depends on C and on the privacy budget we want to guarantee C -> Noise Noise -> Lower model accuracy Noise -> Noise ϵ

Tensorflow
68. ### Differentially Private NN are just a wrapper away * *

if you carefully choose your privacy parameters model, optimizer, train_loader = privacy_engine.make_private_with_epsilon( module=model, # the model you want to train with DP optimizer=optimizer, data_loader=train_loader, epochs=EPOCHS, target_epsilon=EPSILON, # privacy budget target_delta=DELTA, max_grad_norm=MAX_GRAD_NORM, # clipping value )
69. ### A few notes on the privacy parameters Choosing the is

a tradeoff between the utility of the model and the privacy we want to guarantee ϵ
70. ### A few notes on the privacy parameters If we set

a low we will need to introduce a lot of noise during the training Choosing the is a tradeoff between the utility of the model and the privacy we want to guarantee ϵ ϵ
71. ### A few notes on the privacy parameters If we set

a low we will need to introduce a lot of noise during the training This will degrade the model performances! Choosing the is a tradeoff between the utility of the model and the privacy we want to guarantee ϵ ϵ

73. ### 1 7 / 0 6 / 2 0 2 4

4a EDIZIONE >>AI CONF
74. ### References 1) Evaluating and Testing Unintended Memorization in Neural Networks

https:// bair.berkeley.edu/blog/2019/08/13/memorization/ 2) Scalable Extraction of Training Data from (Production) Language Models https://arxiv.org/pdf/2311.17035 3) Membership Inference Attacks against Machine Learning Models https:// arxiv.org/abs/1610.05820 4) A friendly, non-technical introduction to differential privacy https:// desfontain.es/blog/friendly-intro-to-differential-privacy.html 5) Deep Learning with Differential Privacy https://arxiv.org/abs/1607.00133 6) Opacus https://opacus.ai/ 7) Tensor fl ow Privacy https://github.com/tensor fl ow/privacy
75. ### References 8) A list of real-world uses of differential privacy

https://desfontain.es/blog/ real-world-differential-privacy.html 9) Improving Gboard language models via private federated analytics https://research.google/blog/improving-gboard-language-models-via- private-federated-analytics/ 10) Learning with Privacy at Scale https://docs-assets.developer.apple.com/ ml-research/papers/learning-with-privacy-at-scale.pdf