Differential Privacy - Data Science with Privacy at Scale

Differential Privacy - Data Science with Privacy at Scale 2023.6.9
慶應義塾⼤学「先端研究（CI）」 Tsubasa TAKAHASHI Senior Research Scientist LINE Corp.

• Self Introduction / LINE’s R&D in Privacy Techs (5min)
• Privacy Risks, Issues, and Case-studies (10min) • Differential Privacy (Central Model) (30min) • Query Release via Laplace Mechanism • Machine Learning via DP-SGD • Local Differential Privacy (20min) • Stats Gathering via Randomized Response • Federated Learning via LDP-SGD • Shuffle Model – an intermediate privacy model (10min) • QA Table of Contents 2

Tsubasa TAKAHASHI, Ph.D. Senior Research Scientist at LINE Data Science
Center 3 R&D Activity • R&D on Privacy Techs (LINE Data Science Center) • Differential Privacy / Federated Learning / … • R&D on Trustworthy AI Selected Publication • Differential Privacy @VLDB22 / SIGMOD22 / ICLR22 / ICDE21 • DP w/ Homomorphic Encryption @BigData22 • Adversarial Attacks @BigData19 • Anomaly/OOD Detection @WWW17 / WACV23 Univ. NEC LINE B.E. / M.E. (CS) from U. Tsukuba Ph.D. from U. Tsukuba Visiting Scholar @CMU 上林奨励賞 Central Labs R&D on Data Privacy2010~15 R&D on AI Security2016~18 R&D on Privacy Tech 2019~ 2010~18 2018.12~

n Publications on major database and machine learning conferences n
These achievements are based on collaborations w/ academia 5 LINE’s R&D on Privacy Techs https://linecorp.com/ja/pr/news/ja/2022/4269

n Released on late September 2022 n Learning sticker recommendation
feature is now on your app 6 Federated Learning w/ Differential Privacy https://www.youtube.com/watch?v=kTBshg1O7b0 https://tech-verse.me/ja/sessions/124

Privacy Techs is an “Innovation Triger” 7 https://infocert.digital/analyst-reports/2021-gartner-hype-cycle-for-privacy/ 市場動向︓the 2021
Gartner Hype Cycle for Privacy

Privacy Risks, Issues, and Case-studies 8

n Even when only statistical information is disclosed, the "difference"
will reveal the data of specific individuals. 9 Difference Attack avg. salary = 7M JPY avg. salary = 6.8M JPY Alice’s salary can be revealed by using this simple math. … … … … Alice Alice was retired. 30 engineers 29 engineers 700 x 30 – 680 x 29 = 12M JPY

n From stats delivered to advertisers, various user info could
be estimated n The vulnerability has been fixed. 10 A case study of difference attack: Facebook’s PII-based Targeting https://www.youtube.com/watch?v=Lp-IwYvxGpk https://www.ftc.gov/system/files/documents/public_events /1223263/p155407privacyconmislove_1.pdf Facebook had installed thresholding, rounding, etc. for disclosure control, but they could be passed phone number estimation from e-mail address / web access history estimation

n Recreation of individual-level data from tabular or aggregate data
11 Database Reconstruction Reconstruction like “sudoku” There are rules, algorithms and dependencies An Example https://www2.census.gov/about/training-workshops/2021/2021-05-07-das-presentation.pdf

n US Census reports various stats for policy-making and academic
research n For the results of 2010, it found that reconstruction attacks are possible 12 A case study of reconstruction: US Census 2010 https://www2.census.gov/about/training-workshops/2021/2021-05-07-das-presentation.pdf

n k-anonymization: any record has k-1 duplicates having the same
quasi-identifiers n Quasi-identifiers: a (predefined) combination of attributes that has a chance to identify individuals 13 k-anonymization https://dataprivacylab.org/dataprivacy/projects/kanonymity/kanonymity.pdf

n k-anonymization is only effective behind assumed adversary’s knowledge n
By linking external knowledge, adversary can achieve re-identification 14 k-anonymization is vulnerable

n Data analytics competition w/ publishing a dataset that removes
identifiers n Unfortunately, the records can be re-identified and linked w/ public data 15 A case study of de-anonymization: Netflix Prize Pseudo ID Title Rating Review Date 1023 xxx 5 20xx/1/1 yyy 5 20xx/1/1 zzz 2 20xx/1/1 … aaa 5 20xx/3/1 20ab xxx 4 zzz 5 … … 98u7 ddd 2 20xx/4/5 fff 4 20xx/4/6 Title Rating Comments xxx 3 xxx 5 (political interests are included) yyy 5 zzz 2 … aaa 5 … … 8 Ratings à Identify w/ 99% acc. 2 Ratings à Identify w/ 68% acc. Mr. A Ms. B Linking Netflix Data IMDB Anonymous Data Using external knowledge identified as a real person

Differential Privacy 16

What is Differential Privacy? 17 “Differential privacy is a research
topic in the area of statistics and data analytics that uses hashing, subsampling and noise injection to enable crowdsourced learning while keeping the data of individual users completely private.” On WWDC2016, Craig Federighi (Apple) said, https://www.wired.com/2016/06/apples-differential-privacy-collecting-data/

Disclosure Avoidance in US Census 2020 18 https://www.census.gov/about/policies/privacy/statistical_safeguards/ disclosure-avoidance-2020-census.html

n Mathematical privacy notion that can compose multiple ones n
Guaranteed by randomized mechanisms (i.e., injecting noise) 19 What is Differential Privacy? 𝜖! 𝜖! 𝜖" 𝜖! 𝜖" 𝜖# … … Sensitive Database 𝑫 satisfying 𝝐𝟏 -DP satisfying 𝝐𝒌 -DP … Query 𝒒𝟏 Query 𝒒𝒌 Privacy by Randomization Composable Privacy

A randomized mechanism ℳ: 𝒟 → 𝒮 satisfies 𝜖-DP if,
for any two neighboring databases 𝑫, 𝑫! ∈ 𝓓 such that 𝐷′ differs from 𝐷 in at most one record and any subset of outputs 𝑆 ⊆ 𝒮, it holds that 20 𝝐-Differential Privacy Pr ℳ 𝐷 ∈ 𝑆 ≤ exp 𝜖 Pr ℳ 𝐷! ∈ 𝑆 𝝐 : privacy parameter, privacy budget (𝟎 ≤ 𝝐 ≤ ∞) C. Dwork. Differential privacy. ICALP, 2006. Sensitive Data 𝑫 Output 𝑫′︓𝑫’neighboring databases 𝜖 0 ∞ 0.5 1 2 strong weak 4 8 …

n Any pair of databases that differ only one record
n Differential privacy aims to conceal the difference among the neighbors 21 Neighboring Databases NAME Salary Alice ¥10M Bob ¥20M Cynthia ¥5M David ¥3M … 𝑫’s neighboring databases (examples) NAME Salary Alice ¥10M Bob ¥20M Cynthia ¥5M David ¥3M Eve ¥15M NAME Salary Alice ¥10M Cynthia ¥5M David ¥3M NAME Salary Alice ¥10M Bob ¥20M David ¥3M NAME Salary Alice ¥10M Bob ¥20M Cynthia ¥5M David ¥3M Franc ¥100M 𝑫 𝑑" 𝐷, 𝐷! = 1 𝑑# ⋅,⋅ ︓Humming distance In the most standard case, we assume adding/removing one record.

n Most basic randomization for differential privacy n Parameter: sensitivity
Δ# , privacy budget 𝜖 22 Laplace Mechanism and Sensitivity ℳ 𝐷 = 𝑓 𝐷 + Lap 0, Δ- 𝜖 𝜖 = 10, Δ# = 1 𝜖 = 1, Δ# = 1 𝜖 = 0.1, Δ# = 1 Adding a noise sampled from the Laplace distribution whose mean is 0 and variance is 2 $! % & . Δ# = sup $,$!∈𝒟 𝑓 𝐷 − 𝑓 𝐷# $ ℓ𝟏 -sensitivity Laplace Mechanism

Proof: Laplace Mechanism satisfies 𝝐-DP 23 Pr[𝑀 𝐷 = 𝑦]
Pr[𝑀 𝐷# = 𝑦] = Π% 𝑃&'( 𝑦% − 𝑓 𝐷 % Π% 𝑃&'( 𝑦% − 𝑓 𝐷# % = Π% exp 𝑏)$ 𝑦% − 𝑓 𝐷 % − 𝑦% − 𝑓 𝐷# % ≤ Π% exp 𝑏)$ 𝑓 𝐷 % − 𝑓 𝐷# % = exp 𝑏)$ D % 𝑓 𝐷 % − 𝑓 𝐷# % = exp 𝑏)$ 𝑓 𝐷 − 𝑓 𝐷# $ = exp 𝜖 Δ* 𝑓 𝐷 − 𝑓 𝐷# $ ≤ exp 𝜖 𝑃&'( 𝑥 = 1 2𝑏 exp(−𝑏)$|𝑥|) 𝑏 = Δ* 𝜖 Δ* ≥ 𝑓 𝐷 − 𝑓 𝐷# $ 𝑥$ − 𝑥+ ≤ |𝑥$ − 𝑥+|

n Compute the average salary over 30 engineers 24 Example
of Randomization avg. salary = 7M JPY avg. salary = 6.8M JPY … … … … Alice Alice was retired. 30 engineers 29 engineers Assume • Max salary: 30M JPY • 𝜖: 1 Δ$%& = '$( )$*$+, - =1M à (7 + 0.8)M JPY à (6.8 + 1.3)M JPY 𝜎 = .'() / =1M ℳ 𝐷 = 𝑓 𝐷 + Lap 0, 1M

n Easy to implement 25 Implementation of Laplace Mechanism

n Due to generating random noise, the outputs are probabilistic.
26 Behavior of Laplace Mechanism 𝜖 = 1, Δ0 = 1 𝜖 = 1, Δ0 = 1 𝜖 = 1, Δ0 = 1

n Varying privacy parameter 𝜖 27 Behavior of Laplace Mechanism
𝜖 = 0.1 𝜖 = 0.5 𝜖 = 2 𝜖 = 0.05 𝜖 = 10 Δ0 = 1

n Well-known relaxed version of DP n Satisfy (𝝐,𝜹)-DP by
injecting a particular Gaussian noise 28 (𝝐,𝜹)-DP & Gaussian Mechanism Pr ℳ 𝑥" ∈ 𝑆 ≤ exp 𝜖 Pr ℳ 𝑥# ∈ 𝑆 + 𝛿 0 ≤ 𝛿 < 𝑛12 𝜎 = 2 log 1.25/𝛿 𝜖 ℳ 𝐷 = 𝑓 𝐷 + 𝒩(0, Δ- C𝜎C) Δ* = sup ,,,!∈𝒟 𝑓 𝐷 − 𝑓 𝐷# + ℓ3 -sensitivity Gaussian Mechanism 𝝐, 𝜹 -differential privacy

n An interpretation of DP in view of statistical testing
n Assume a game that guesses the input source from the randomized output 29 Differential Privacy as Hypothesis Testing ℳ 𝑦 𝐷 or 𝐷# ? 𝐷 or 𝐷# 𝐷 𝐷′ 𝜖456 = max log 1 − 𝛿 − FP FN , log 1 − 𝛿 − FN FP Empirical differential privacy Peter Kairouz, et al. The composition theorem for differential privacy. ICML2015 True Input Guess False Positive 𝐷 𝐷* False Negative 𝐷* 𝐷

n Theory of composing multiple differentially private mechanisms 30 Privacy
Composition 𝜖! 𝜖! 𝜖" 𝜖! 𝜖" 𝜖# … #Queries 𝜖! 𝜖" 𝜖# … 𝜖#$! … Total Privacy Budget Sensitive Database 𝑫 satisfying 𝝐𝟏 -DP satisfying 𝝐𝒌 -DP … Query 𝒒𝟏 Privacy Parameter 𝝐𝟏 Query 𝒒𝒌 Privacy Parameter 𝝐𝒌

Privacy Composition 31 Sequential Composition 𝜖 = D %∈ /
𝜖/ 𝜖 = max 𝜖$ , … , 𝜖/ Let ℳ! , … , ℳ# satisfy 𝜖! , … , 𝜖# respectively. 𝐷’s total privacy consumption by ℳ! 𝐷 , … , ℳ# 𝐷 is Parallel Composition Let ℳ! , … , ℳ# satisfy 𝜖! , … , 𝜖# respectively. Let 𝐷 = 𝐷! ∪ ⋯ ∪ 𝐷# where 𝐷% ∩ 𝐷&'% = ∅. 𝐷’s total privacy consumption by ℳ! 𝐷! , … , ℳ# 𝐷# is Example ℳ2 ℳ3 𝜖2 = 1 ℳ7 𝜖3 = 0.5 𝜖7 = 1.5 ℳ! ℳ" ℳ( 1 0.5 1.5 1.5 1 1 2 2.5 SUM max

n Sequential composition is the most conservative upper-bound à very
loose n Seeking a tighter composition theorem is a core of DP researches 32 Sequential Composition is Loose 𝜖 #compositions Existing composition theorems ・Strong Composition ・Advance Composition ・Rényi Differential Privacy (RDP) … Sequential Com position ideal

n Well-known tighter privacy composition on Rényi divergence n Recent
studies consider compositions on RDP, then translate into (𝜖,𝛿)-DP 33 Rényi Differential Privacy (RDP) A randomized mechanism ℳ: 𝒟8 → 𝒮 is 𝜖-RDP of order 𝜆 ∈ (1, ∞) (or (𝜆, 𝜖)-RDP) if for any neighboring databases 𝐷, 𝐷9 ∈ 𝒟8, the Rényi divergence of order 𝜆 between ℳ 𝐷 and ℳ 𝐷9 is upper-bounded by 𝜖: 𝐷: ℳ 𝐷 || ℳ 𝐷9 = 1 𝜆 − 1 log 𝔼;∼ℳ >+ ℳ 𝐷 𝜙 ℳ 𝐷9 𝜙 : ≤ 𝜖 where ℳ 𝐷 𝜙 denotes the probability of ℳ which takes 𝐷 as input outputting 𝜙. 𝜖 𝜆 = 𝜖 + 1 𝜆 − 1 log 𝑚 𝜆, 𝜖, 𝛿 Where 𝑚 𝜆, 𝜖, 𝛿 = min 𝑟) 𝑟 − 𝛿 !*) + 1 − 𝑟 ) 𝑒+ − 𝑟 + 𝛿 !*) DP-to-RDP conversion Rényi Differential Privacy (RDP) https://arxiv.org/abs/1702.07476 https://arxiv.org/abs/2008.06529

Q: How can we explore (unforeknown) data to design data
analytics while preserving privacy without the query limitation? 34 Querying without Privacy Budget Limitation 𝜖! 𝜖! 𝜖" 𝜖! 𝜖" 𝜖# … #Queries 𝜖! 𝜖" 𝜖# … 𝜖#$! … Total Privacy Budget Sensitive Database 𝑫 satisfying 𝝐𝟏 -DP satisfying 𝝐𝒌,𝟏 -DP … Query 𝒒𝟏 Privacy Parameter 𝝐𝟏 Query 𝒒𝒌?𝟏 Privacy Parameter 𝝐𝒌?𝟏

n Construction of intermediate privatized “view” (P-view) towards actualizing any
query responses with smaller noise 35 HDPView: A Differentially Private View Noise Resistance Space Efficient Query Agnostic Analytical Reliability Accepted at VLDB2022 https://arxiv.org/abs/2203.06791

Partitioning Strategy 36 1+ 0+ 5+ 4+ 2+ 1+ 8+
7+ 13+ 62+ 64+ 0+ 0+ 0+ 1+ 1+ Age ~20 20~30 30~40 40~50 ~10M 20M 30M 40M 4+ 24+ 13+ 126+ 0+ 2+ Salary 1+ 0+ 5+ 4+ 2+ 1+ 8+ 7+ 13+ 62+ 64+ 0+ 0+ 0+ 1+ 1+ 4+ 24+ 13+ 126+ 0+ 2+ 79 + 6x (4 + ) + (13 + ) + (126 + ) /2 = 80 + 2.5x PE àAE=0 PE àAE=1 AE︓Aggregation Error PE︓Perturbation Error Q. How can we find a partitioning minimizing AE + PE? Range Counting Query Data-aware Partitioning

n Recursive bisection–based algorithm n Scalable / Data-distribution-aware / Privacy
budget efficiency 37 Algorithm & Performance of HDPView 1 0 6 0 2 2 2 32 8 4 3 4 0 1 64 0 16 0 0 0 0 0 12 1 9 8 24 2 3 4 6 6 0 6 3 4 1 0 6 2 32 8 0 1 64 0 2 2 4 3 4 0 16 0 0 0 0 0 12 1 9 8 24 2 3 4 6 6 0 6 3 4 1 0 6 0 2 2 2 32 8 4 3 4 0 1 64 0 16 0 0 0 0 0 12 1 9 8 24 2 3 4 6 6 0 6 3 4 Identity Privtree HDMM Privbayes HDPView (ours) ARR 1.94×10- 7.05 35.34 3.79 𝟏. 𝟎𝟎 Average Relative Error over 8 datasets Size of p-view Each block runs two mechanisms 1. Random converge: distinguish stop or not 2. Random cut: choose a cutting point

n Get randomized model parameters by randomizing the gradient n
Employ gradient clipping since the sensitivity of gradients is intractable 38 DP-SGD: Differentially Private Stochastic Gradient Decent Sensitive Database 𝑫 𝑔@ = ∇A. ℒ(𝑥; 𝜃@) 𝜃@?2 = 𝜃@ − 𝜂𝑔@ Sample batch Compute gradient Update parameters 𝜃B Until converge Non-private SGD https://arxiv.org/abs/1607.00133

n Get randomized model parameters by randomizing the gradient n
Employ gradient clipping since the sensitivity of gradients is intractable 39 DP-SGD: Differentially Private Stochastic Gradient Decent Sensitive Database 𝑫 𝑔@ = ∇A. ℒ(𝑥; 𝜃@) 𝜃@?2 = 𝜃@ − 𝜂𝑔@ Sample batch Compute gradient Update parameters 𝜃B 𝜃@?2 = 𝜃@ − 𝜂 ] 𝑔@ 𝜃B Until converge while 𝝐 remains Clipping & Adding Noise ] 𝑔@ = ^ C∈E 𝜋F 𝑔C,@ + 𝑁 0, 𝐶𝜎 3𝐼 𝜋F 𝑔C,@ = 𝑔C,@ ⋅ min 1, 𝐶 𝑔C,@ 3 Clipping Adding Noise random sampling Sensitivity is bounded at the constant 𝑪. 𝝐 is computed by 𝝈, 𝜸, 𝜹 and 𝑻. sampling rate: 𝛾 Non-private SGD DP-SGD 𝑔!,# : per-sample gradient of 𝑖 https://arxiv.org/abs/1607.00133

n Training a data synthesis model that imitates original sensitive
dataset n Issue: training process is sensitive to noise since the process is complicated n Approach: data embedding that is robust against noise under dp constraint 40 Privacy Preserving Data Synthesis Train with Generative Model Synthesize Naïve Method (VAE w/ DP-SGD) P3GM (ours) ε=1.0 ε=0.2 PEARL (ours) ε=1.0 ε=1.0 Naïve P3GM PEARL Embedding End-to-end w/ DP-SGD DP-PCA Characteristic Function under DP Reconstruction DP-SGD Non-private (adversarial) High reconstruction performances under practical privacy level (ε≦1) Accepted at ICDE2021 / ICLR2022 https://arxiv.org/abs/ 2006.12101 https://arxiv.org/abs /2106.04590

Local Differential Privacy 41

n Privacy-preserving mechanism allows inferring statistics about populations while preserving
the privacy of individuals n No trusted entity is required 42 Privacy-Preserving Mechanism for Collecting Data ℳ ℳ ℳ Server 𝑥2 𝑥3 𝑥8 ] 𝑥2 ] 𝑥3 ] 𝑥8 … … Indistinguishable 𝑥f ∈ 𝒳 𝒳 ∈ { } Randomized Original

Local Differential Privacy 43 Pr ℳ 𝑥! ∈ 𝑆 ≤
exp 𝜖 Pr ℳ 𝑥" ∈ 𝑆 A randomized mechanism ℳ: 𝒳 → 𝒮 is said to satisfy 𝜖-LDP if and only if, for any input pair 𝑥2, 𝑥3 ∈ 𝒳 and any output 𝑆 ⊆ 𝒮, it holds that: 𝑥+ : (1 0 0 0) 𝑥$ : (0 0 1 0) neighboring databases is different against CDP. LDP: replacement = remove & add 𝝐-local differential privacy (𝝐-LDP) ℳ ℳ ℳ 𝑥2 𝑥3 𝑥8 ] 𝑥2 ] 𝑥3 ] 𝑥8 … … Indistinguishable

(Central) DP vs Local DP 44 ℳ ℳ ℳ Server
𝑥2 𝑥3 𝑥8 ] 𝑥2 ] 𝑥3 ] 𝑥8 … … Indistinguishable 𝑥2 𝑥3 𝑥8 … ℳ Server Trusted Not required to be trusted Neighboring DB: add/remove Neighboring DB: replacement Central DP Local DP

n Randomize an item selection following a differentially private way
45 Randomized Response 𝒳 ∈ { } Randomized Original 𝑅𝑅 𝑥 = Y 𝑥 𝑤. 𝑝. exp 𝜖 exp 𝜖 + 𝑘 − 1 𝑥# ∼ 𝒳 ∖ 𝑥 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒 ℳ randomly select 𝑘: |𝒳| (#items)

n Examples on synthetic data (N randomized reports, including 100
items) n Errors are significantly reduced when gathering more randomized reports 46 Stats Gathering w/ Privacy at Scale 𝑁 = 10,000 𝑁 = 10,000,000

n The probabilistic data structure is very useful to estimate
frequency with having noise-resistance property and communication efficiency 47 Rand. Mech. w/ Probabilistic Data Structure https://petsymposium.org/2016/files/papers/Building_a_RAPPOR_with_the_Unknown__Pri vacy-Preserving_Learning_of_Associations_and_Data_Dictionaries.pdf https://machinelearning.apple.com/research/learnin g-with-privacy-at-scale RAPPOR by Google (Bloom Filter) Private Count Mean Sketch by Apple

n Collaborative learning w/ server and clients n Raw data
never leaves clients’ devices 48 Federated Learning Non-participants of FL Global Model https://proceedings.mlr.press/v54/mcmahan17a/mcmahan17a.pdf https://arxiv.org/abs/1912.04977 First FL paper Survey paper

Gradient Inversion - Privacy Issues in FL 49 (出典) “Inverting
Gradients - How easy is it to break privacy in federated learning?” https://arxiv.org/abs/2003.14053 Can we reconstruct an image used in training from a gradient? à Yes.

n Central model: clients send raw grads and server aggregates
them w/ noise n Local model: clients send randomized grads and server aggregates them 50 Federated Learning under Differential Privacy Global Model Global Model Raw Gradient Randomized Gradient Noise Injection Central Model Local Model

n Randomize response via randomizing gradient’s direction n Randomly select
the green zone or the white zone, and then uniformly pick a vector from the selected zone 51 LDP-SGD https://arxiv.org/abs/2001.03618

n Empirical measurement with instantiated adversaries for LDP-SGD n The
worst-case flipping the gradient direction reaches the theoretical bound 52 Empirical Privacy Measurement in LDP-SGD https://arxiv.org/abs/2206.09122

n LDP enables us to collect users’ data in a
privatized way, but the amount of noise tends to be prohibitable 53 Issues in Local DP Randomized Original Global Model Randomized Gradient

Shuffle Model – an intermediate privacy model 54

n Intermediate trusted entity “shuffler” anonymizes local users’ identity n
Each client encrypts their randomized content w/ the server’s public key, then shuffler only mixes their identifies w/o looking at the contents 55 Shuffle model – an intermediate privacy model ] 𝑥2 ] 𝑥3 ] 𝑥8 l 𝑥2 (l 𝑥2, l 𝑥3, … , l 𝑥8) l 𝑥3 l 𝑥8 Server Randomized w/ 𝝐𝟎 Shuffle Shuffler Send the shuffled batch anonymized

n Shuffler can amplify differential privacy à possibility to decrease
local noise n The amplification on shuffler translates LDP on clients into CDP 56 Privacy Amplification via Shuffling 𝜖I = 8 (LDP) 𝛿 = 101J 𝑘 = 10 by Hiding among clones Example in k-randomized response https://arxiv.org/abs/2012.12803 Privacy Amplification 8 𝑥2 8 𝑥& 8 𝑥3 : 𝑥2 (: 𝑥2 , : 𝑥& , … , : 𝑥3 ) : 𝑥& : 𝑥3 Shuffler Server 𝝐𝟎 (LDP) 𝝐 < 𝝐𝟎 (CDP)

n Using shuffler and sub-sampling, FL also can employ privacy
amplifications n Clients randomly check-ins federated learning at each iteration 57 Shuffle Model in Federated Learning Higher accuracy at a strong privacy level (smaller 𝜖) weak privacy ] 𝑥2 ] 𝑥K ] 𝑥8 Shuffler Aggregator strong privacy Privacy amplification Sub-sampling & Shuffling in FL https://arxiv.org/abs/2206.03151 𝝐𝐥𝐝𝐩 = 𝟖 𝝐𝐜𝐝𝐩 = 𝟏

n Decentralized shuffling via multi-round random walks on a graph
n In each round, every client relays her randomized reports to one of her neighbors (e.g., friends on a social network) via an encrypted channel 58 Network Shuffling Accepted at SIGMOD2022 https://arxiv.org/abs/2204.03919 The larger graph amplifies privacy the more.

Conclusion 59

• Privacy Risks, Issues, and Case-studies • Differential Privacy (Central
Model) • Query Release via Laplace Mechanism • Machine Learning via DP-SGD • Local Differential Privacy • Stats Gathering via Randomized Response • Federated Learning via LDP-SGD • Shuffle Model – an intermediate privacy model Topics in this lecture 60

Differential Privacy - Data Science with Privac...

Differential Privacy - Data Science with Privacy at Scale

More Decks by LINE Developers

Other Decks in Technology

Featured

Transcript