Over-the-Air Computation for Scalable, Lightweight, and Privacy Preserving Edge Machine Learning

Over-the-Air Computation for Scalable, Lightweight, and Privacy Preserving Edge Machine
Learning Yusuke Koda Postdoctoral researcher 2021-09-15 ι α Π Τ ς Ο େ ձ ν ϡ ʔ τ Ϧ Ξ ϧ η ο γ ϣ ϯ BT-2. ෼ ࢄ ڠ ௐ ػ ց ֶ श (Federated Learning) ν ϡ ʔ τ Ϧ Ξ ϧ ʙ IoT ʹ ޲ ͚ ͨ ϓ ϥ Π ό γ อ ޢ ͱ ௨ ৴ ޮ ཰ ޲ ্ Λ ໨ ࢦ ͠ ͯ ʙ 1 / 38

Who is the presenter? 香田優介, Yusuke Koda 所属 2021年- オウル大学（フィンランド）研究員
2018-2021年京都大学情報学研究科博士後期課程通信情報システム専攻研究テーマ - 無線通信システムへの機械学習応用博士論文：「Visual data-driven millimeter wave communication systems」 - プライバシを志向した分散的機械学習における無線通信設計博士（情報学） 2 / 38

Who is the presenter? 香田優介, Yusuke Koda 所属 2021年- オウル大学（フィンランド）研究員
2018-2021年京都大学情報学研究科博士後期課程通信情報システム専攻昨日の写真研究テーマ - 無線通信システムへの機械学習応用博士論文：「Visual data-driven millimeter wave communication systems」 - プライバシを志向した分散的機械学習における無線通信設計博士（情報学） 3 / 38

ֶज़քͰͷ Federated LearningʢFLʣ΁ͷऔΓ૊Έ FL ͸Ұ෦࣮༻Խ͞Ε͍ͯΔʹ΋͔͔ΘΒͣɺ 2021 ೥ݱࡏ΋਺ଟ͘ͷ࿦จ͕ɺ෼໺ʹ·͕ͨͬͯ ʢओʹ௨৴ɾηΩϡϦςΟ෼໺ʣग़൛͞Ε͍ͯΔ ͜ͷ͜ͱ͕ҙຯ͢Δ΋ͷ ͦͷ෼໺͔Β
FL Λݟͨݚڀऀ͕ɺ ɾFL Λࢧ͑ΔϓϥοτϑΥʔϜʹʢ௨৴෼໺ʣ ɾFL ͱ͍͏ֶशํ๏ͦͷ΋ͷʹʢηΩϡϦςΟ෼໺ʣ ղܾ͢΂͖՝୊Λݟग़͠ଓ͚͍ͯΔ ຊߨԋͷझࢫ ͜ΕΒ 2 ͭͷ෼໺͔Βݟͨ FL ͷ՝୊ײΛڞ༗͢Δͱͱ΋ʹɺ ݚڀஈ֊ͷٕज़ʢAirCompɺDifferential PrivacyʣΛ̍ͭͣͭɺ ͦΕΒΛ༥߹ͨ߳͠ాͷऔΓ૊ΈΛ 1 ͭʢDP-AirComp FL [1]ʣ঺հ ҟ෼໺ʹ·͕ͨΔ FL ͷ࠷৽ಈ޲Λ୳ΔҰॿͱͳΕ͹޾͍Ͱ͢ [1] Y. Koda, K. Yamamoto, T. Nishio, et al., “Differentially private aircomp federated learning with power adaptation harnessing receiver noise,” in Proc. IEEE GLOBECOM 2020, Held online, 2020, pp. 1–6 4 / 38

Agenda Collaborative machine learning Data security Avg. - Scalable data
aggregation Federated learning Part 1. Aircomp. (Analog Commun.) Part 2. Differential Privacy - Lightweight attack robustness - Private collaborative learning - Digital Commun. - Cryptography Split learning Wireless Communications Centralized Part 3. DP- AirComp FL 5 / 38

Part I: Basics of Over-the-Air Computation & Application to Federated
Learning Collaborative machine learning Data security Avg. - Scalable data aggregation Part 1' Aircomp FL Federated learning Part 1. Aircomp. (Analog Commun.) Part 2. Differential Privacy - Lightweight attack robustness - Private collaborative learning - Digital Commun. - Cryptography Split learning Wireless Communications Centralized Part 3. DP- AirComp FL Part 4. Beyond FL: AirMixML 6 / 38

Question for FL from wireless communication perspective General Question: Is
5G/beyond ready for federated learning? Avg. Federated learning - Private collaborative learning Wireless Communications community Security community OK. Attractive for applications of 5G or beyond wireless communications Q. Can we do better than 5G key technologies in terms of resource efficiency? Q. Wait, Is model parameter exchange is secure for privacy? Q. If not, how can we enhance attack robustness? - #UEs per cell will be massive Intra-cell intelligence (no need to upload parameters to cloud) - Computational capability is becoming higher per UE Massive num of UEs uploads model parameters at once UEs BS + edge server 7 / 38

Overview of over-the-air computation (AirComp) Function computation leveraging waveform superposition
UE 1 Analog linear modulation + Channel pre- processing Analog linear modulation + Channel pre- processing Source UE Channel post- processing + distortion Fig. 1: AirComp (Uncoded + Analog transmission). What’s the beneﬁt from AirComp? Scalable for # UEs in terms of time-frequency resources due to simultaneous co-channel transmission Fig. 1 ͸ݫີʹ͸ AirComp ͷ̍σβΠϯͰ͋ͬͯɺAirComp=Fig. 1 ͱ͍͏Θ͚Ͱ͸ඞͣ͠΋ͳ͍ɻ࣮ࡍɺ৘ใݯɾ௨৴࿏ූ߸ԽΛߦ͏ AirComp ΋ଘࡏ ͯ͠ɺͪ͜Βͷํ͕Ή͠ΖઌʹఏҊ͞Ε͍ͯΔʢP9ʣ ɻ͔͠͠ɺ ʢ1ʣFL ʹؔͯ͠͸ Fig. 1 Λ༻͍ͨઃܭʹؔ͢Δจݙ͕ѹ౗తʹଟ͍͜ͱ͔ΒɺຊߨԋͰ͸ Fig. 1 ͷσβΠϯΛղઆ͠·͢ɻ 8 / 38

History of academic research of AirComp 2007 2007- 2014 Michael
Gastpar et. al. (EPFL) Guangxu Zhu et. al. (Hong Kong) Information theoretic study Initial study on Rate analysis Code design Wireless commun. Application to FL 2019- -2021 2014- 2019 -Discussion for sensor network - Achievable Rate in Wireless - Channel estimation - PoC - MIMO by zero-forcing Learning performance check etc Discussion for FL Mario Goldenbaum (Bremen Univ.) Scheduling Power control etc Deniz Gunduz et. al. (London) Osavaldo Simeone et. al. (London) Rate/distortion analysis Code design Coded or Uncoded? Findings ▶ AirComp is not necessarily new, and is progressing gradually. ▶ Randmark study [2] in 2019 made AirComp (w/ FL) active research area [2] G. Zhu, Y. Wang, and K. Huang, “Broadband analog aggregation for low-latency federated edge learning,” IEEE Trans. Wirel. Commun., vol. 19, no. 1, pp. 491–506, Oct. 2019ʢݩͷ࿦จͰ͸ɺ̍αϒΩϟϦΞ̍ύϥϝʔλͷ఻ૹΛߦ͏ OFDM ͕ఏҊ͞Ε͍ͯ·͢ɻ ʣ 9 / 38

Intuitive design of AirComp (ex. comp. of summation) Source Comp.
result 1. Amplitude modulation & Channel inversion 3. Post- processing UE 1 UE BS 2. Simultaneous & co-channel transmission Same as signal detection in digital commun. (ρ is the constant among UEs, which is detailed in Page 14.) ؆୯ͷͨΊɺύϧε੔ܗϑΟϧλ͸লུ͍ͯ͠·͢ɻ 10 / 38

What kind of function can we compute? Nomographic function Function
f which can be written by ϕi , ψ as follows: f(x1 , . . . , xK ) = ψ post-processing function ( K ∑ i=1 ϕi (xi ) pre-processing function ) Source Comp. result 1. Amplitude modulation & Channel inversion UE 1 UE BS 2. Simultaneous & co-channel transmission Same as signal detection in digital commun. f : [0, 1]K → R, ϕi : [0, 1] → R, and ψ : R → R 11 / 38

Is my function nomographic? Theorem 8 in [3] Every f
: [0, 1]K → R is nomographic. Nomographic function: f(x1 , . . . , xK ) = ψ ( ∑ K i=1 ϕi (xi ) ) (Examples) ▶ Arithmetic mean: ϕi (xi ) = xi /K, ψ(x) = x. ▶ Geometric mean: ϕi (xi ) = ln(xi )/K, ψ(x) = ex. ▶ Euclidean norm: ϕi (xi ) = x2 i , ψ(x) = √ x. ▶ Maximum, minimum: See. [4]. ▶ Others can be approximated algorithmically (See. [5]). [3] R. C. Buck, “Approximate complexity and functional representation.,” Wisconsin Univ. Madison Mathematics Research Center, Tech. Rep., 1976 [4] M. Goldenbaum, H. Boche, and S. Sta´ nczak, “Harnessing interference for analog function computation in wireless sensor networks,” IEEE Trans. Signal Process., vol. 61, no. 20, pp. 4893–4906, Oct. 2013 [5] S. Limmer, J. Mohammadi, and S. Sta´ nczak, “A simple algorithm for approximation by nomographic functions,” in Proc. IEEE Allerton 2015, Monticello, IL, USA, Sep. 2015, pp. 453–458 12 / 38

Wireless federated learning design based on AirComp (AirComp-FL) [2] Comp.
result = Global Model parameter Amplitude modulation & Channel inversion Post- processing 2. Pre- processing 1. Updating model parameter Model parameter update UE 1 UE BS Simultaneous & co- channel transmission Same as signal detection in digital commun. 3. Broadcasting [2] G. Zhu, Y. Wang, and K. Huang, “Broadband analog aggregation for low-latency federated edge learning,” IEEE Trans. Wirel. Commun., vol. 19, no. 1, pp. 491–506, Oct. 2019ʢݩͷ࿦จͰ͸ɺ̍αϒΩϟϦΞ̍ύϥϝʔλͷ఻ૹΛߦ͏ OFDM ͕ఏҊ͞Ε͍ͯ·͢ɻ ʣ 13 / 38

Q. Is channel noise harmful in AirComp FL? Amplitude modulation
& Channel inversion Post- processing 2. Pre- processing 1. Updating model parameter Model parameter update UE 1 UE BS Simultaneous & co- channel transmission Same as signal detection in digital commun. 3. Broadcasting Comp. result = Global Model parameter Harmful? (c.f.) Work in [2] suppresses noise maximally by setting max. scaling factor: ρ = P0 max. TP C−2 Up. bound of∆xi min i∈{UEs} |hi |2 14 / 38

Part II: Basics of Differential Privacy & Application to Federated
Learning Collaborative machine learning Data security Avg. - Scalable data aggregation Federated learning Part 1. Aircomp. (Analog Commun.) Part 2. Differential Privacy - Lightweight attack robustness - Private collaborative learning - Digital Commun. - Cryptography Split learning Wireless Communications Centralized Part 3. DP- AirComp FL Part 4. Beyond FL: AirMixML Part 2' DP-FL 15 / 38

Question from security perspective General Question: Is FL perfect for
data security? Avg. Federated learning - Private collaborative learning Security community Q. Wait, Is model parameter exchange secure for privacy? Q. If not, how can we enhance attack robustness? Wireless Communications community OK. Attractive for applications of 5G or beyond wireless communications Q. Can we do better than 5G key technologies in terms of resource efficiency? - #UEs per cell will be massive Intra-cell intelligence (no need to upload parameters to cloud) - Computational capability is becoming higher per UE Massive num of UEs uploads model parameters at once UEs BS + edge server 16 / 38

Leakage of training data from model parameter updates Model update
exchange is not necessarily safe for data privacy. (ex. [6]) 0 1 2 0 1 2 Output of NN Diff. metric (loss) Back- prop Model update Gradient Input data Sending to server levels input data True label Multi layer perceptron ʢࠨͷྫʣ[6] L. Phong, Y. Aono, T. Hayashi, et al., “Privacy-preserving deep learning via additively homomorphic encryption,” IEEE Trans. Inf. Forensics Secur., vol. 13, no. 5, pp. 1333–1345, May 2018 ิ଍ɿFedAvg Ͱ͸ɺ୺຤಺Ͱޯ഑ܭࢉɺ଍͠߹ΘͤΛσʔλΛม͑ͯԿ౓΋܁Γฦ͢ͷͰɺΞοϓϩʔυ͞Ε ΔϞσϧߋ৽෯͸ ∆xi ͸ɺ্ͷਤͷҟͳΔ ∆J ͷ଍͋͠ΘͤʹͳΔɻΑͬͯɺ্ͷΑ͏ͳ׬શͳ leakage ͸͓ͦΒ͘ى͜Βͳ͍ɻ͔͠͠ɺ͜ΕΒ͸ʮΫϥ ΠΞϯτͷσʔλΛڞ༗͠ͳ͍ ̸= ҆શʯΛ͓ࣔͯ͠Γɺͦ͏͍͏ཱ৔͔ΒɺϞσϧͷߋ৽෯͔ΒσʔλΛ༧ଌ͢Δํ๏ɾͦΕΛकΔํ๏Λߟ͑Δͱ͍͏ͷ͕̍ͭ ͷେ͖ͳྲྀΕͱࢥ͍·͢ɻ 17 / 38

Standard FL allows malicious clients for analysis of model updates
Avg. UE 1 BS UE 2 UE K (malicious) Distributed also for malicious UEs Distributing global model update allows malicious UEs to analyze what other UEs’ model update is (and private data accordingly). Ұ୴ɺBS ͸৴པͰ͖Δ΋ͷͱ͠·͢ɻPart III ʹͯɺAirComp Λ༻͍ͯ BS ΋৴པͰ͖ͳ͍৔߹΁ͱ֦ு͍ͨ͠·͢ɻ ·ͨɺଞ࣍ݩʹ͸༰қʹ֦ுͰ͖ΔͷͰɺ؆୯ͷͨΊɺ∆xi ͸Ұ࣍ݩͷྔͱͯ͠ѻ͍·͢ɻ 18 / 38

How can malicious UE reveal other UE’s updates only from
average? (ex) Differecing attack leveraging: ▶ drop of one UE from federated model training, ▶ similarity of model update between adjacent rounds. Avg. UE 1 BS UE 2 UE K (malicious) Avg. UE 1 BS UE 2 UE K (malicious) Round Round ... ... K∆x(t) g − (K − 1)∆x(t+1) g available for everyone = ∆x(t) 2 + ∑ i∈{UE 2 Ҏ֎ } (∆x(t) i − ∆x(t+1) i ) → If the second term (difference between rounds) is small, ∆x(t) 2 is leaked. ิ଍ɿ࣮ࡍɺྡ઀͢Δϥ΢ϯυͷߋ৽෯ͷࠩ෼Λௐ΂ͨ࿦จ [7] ͕͋Γɺ10−2 Φʔμ΄ͲͰ͋ΔɻΑͬͯɺ∆x (t) 2 ͕ͦΕҎ্ͷΦʔμͰ͋Ε͹ɺਫ਼౓ ྑ͘࿙Ӯ͢Δɻ [7] W. Luping, W. Wei, and L. Bo, “CMFL: Mitigating communication overhead for federated learning,” in Proc. IEEE ICDCS 2019, Richardson, TX, USA, Jul. 2019, pp. 954–964 19 / 38

Why artiﬁcial noise prevents differencing attack? Avg . UE 1
BS UE 2 UE K (malicious) Avg . UE 1 BS UE 2 UE K (malicious) PDF PDF w/ Artificial noise w/o Artificial noise ϊΠζͷ෼ࢄ͕େ͖͍΄Ͳɼͱ͋Δ UE ͕͍ͨͱ͖ʹಘͨ݁Ռͱ͍ͳ͍ͱ͖ʹ ಘͨ݁Ռͱͷࠩ෼͕ҙຯΛͳ͞ͳ͘ͳ͍ͬͯ͘ → ͍ͳ͘ͳΔ UE ͷϞσϧͷߋ৽෯Λ஌ΒΕʹ͘͘ͳΔͱ͍͏ҙຯͰϓϥΠόγ ͕อͨΕ͍ͯΔͱݴ͑Δ 20 / 38

Motivation for differential privacy ϊΠζͷ෼ࢄ͕େ͖͍΄ͲϓϥΠόγϨϕϧ͕ߴͦ͏ͱ͍͏͜ͱ͸Θ͔Δ͕ɼ ͜ΕͰ͸ͦͷ෼ࢄΛͲͷఔ౓·Ͱେ͖͘͢Ε͹͍͍͔Θ͔Βͳ͍ (ແਚଂʹେ͖͘͢ΔͱɺϞσϧͷੑೳΛଛͶ͔Ͷͳ͍) “ߴͦ͏”ɼ“௿ͦ͏” ͱ͍͏യવͱͨ͠΋ͷͰͳ͘ɼϓϥΠόγϨϕϧΛఆྔԽ ͢ΔͨΊͷࢦඪ͕΄͍͠
͜͏͍͏ࢦඪ͕͋Ε͹ɼͦͷཁٻʹԠͨ͡ϊΠζΛ෇༩͢Ε͹͍͍ͱΘ͔Δ 21 / 38

ϓϥΠόγΛఆྔԽ͢Δࢦඪɿࠩ෼ϓϥΠόγ ฏۉ஋ʹϊΠζΛ෇༩͢Δૢ࡞ͷࠩ෼ϓϥΠόγ Ϟσϧύϥϝʔλͷߋ৽෯ͷฏۉʹϊΠζΛ෇༩͢Δૢ࡞͸ɼ࣍ͷࣜͷ͍ͣΕ͔ Λຬͨ͢ͱ͖ (ϵ, δ)-ࠩ෼ϓϥΠϕʔτͰ͋Δɿ શͯͷ UE i′ ʹର͠ɼ೚ҙͷ
S ∈ R Ͱɼ P(∆xg ∈ S | UE i′ ࡏ) ≤ eϵP(∆xg ∈ S | UE i′ෆࡏ) + δ, P(∆xg ∈ S | UE i′ෆࡏ) ≤ eϵP(∆xg ∈ S | UE i′ ࡏ) + δ. PDF ϵ, δ ͕ࣔ͢΋ͷɿ̎ͭͷ෼෍ͷۙ͞ 1 ͓ΑͦͲΜͳྖҬ S ΛͱΖ͏ ͱ 2 ͭͷ֬཰ͷൺ͸ eϵ ҎԼ 2 ྫ֎͕ى͜Δ֬཰ δ ϵ, δ ͕খ͍͞΄Ͳ 2 ͭͷ෼෍͸͍ۙʢϵ = 0ɼδ = 0 ͰҰகʣ → ͱ͋Δ UE ͷ͋Δͳ͠ͷࠩ෼͕Θ͔Βͳ͍ͱ͍͏ҙຯͰͷڧݻͳϓϥΠόγ ௐ΂ͨൣғͰ͸ɼϵɼδ ͕Ͳͷఔ౓Ͱ͋Ε͹ϓϥΠόγ͕อͨΕΔͱݴ͑Δͷ͔ʹ͍ͭͯͷ౷Ұతͳݟղ͸ಘΒΕ͓ͯΓ·ͤΜʢ΋ͪΖΜɺݚڀϨϕϧͰ͸৭ʑ ͳղऍ͕ఏҊ͞Ε͍ͯΔʣ ɻ޿͍Ϋϥεͷʮσʔλϕʔε΁ͷૢ࡞ʯʹؔ͢ΔϓϥΠόγΛ࿦͡ΔͨΊʹ Dwork Β͕ܾΊͨई౓Ͱɼখ͚͞Ε͹খ͍͞΄Ͳ͍͍΋ͷ ͱ͍͏ͷ͕ࠓͷͱ͜ΖͷཧղͰ͢ɻͨͩɺ࿦จͰ͸͓Αͦ 10−2–1 ΦʔμͰͷੑೳධՁ͕͞Ε͓ͯΓɺ͜ͷลΓ͕ʢσϑΝΫτͳʣελϯμʔυͱਪଌ͠·͢ɻ 22 / 38

ʮσʔλͷฏۉ஋ΛͱΓϊΠζΛ෇༩͢Δૢ࡞ʯ͚ͩͰͳ͘ɺΑΓҰൠతͳૢ࡞ M(·) ΁ͱ֦ு͢Δͱɺ Ұൠతͳࠩ෼ϓϥΠόγͷఆٛ [8] ೚ҙͷྡ઀ͨ͠σʔλϕʔε d ͱ d′ɼͦΕΒʹର͢Δग़ྗૢ࡞ M(·)
͕༩͑ΒΕ ͨͱ͢ΔɽM ͕࣍ͷࣜΛຬͨ͢ͱ͖ɼ(ϵ, δ)-ࠩ෼ϓϥΠϕʔτͰ͋ΔͱΑͿɿ P(M(d) ∈ S) ≤ eϵP(M(d′) ∈ S) + δ ∀S ∈ R ͨͩ͠ɼR ͸ M ͷग़ྗۭؒΛද͢ɽ [8] C. Dwork, A. Roth, et al., “The algorithmic foundations of differential privacy,” Found. Trends Theor. Comput. Sci., vol. 9, no. 3–4, pp. 211–407, Aug. 2014 23 / 38

ྫ 1ɿϊΠζͳ͠Ͱ ∆xg Λ഑෍ Av g. UE 1 BS UE
2 UE K (malicious) PDF ্ͷΑ͏ͳྖҬ S ΛͱΔͱɺ2 ͭͷ֬཰ͷൺ͸ແݶେ΁ൃࢄɻϵ = ∞ ࠩ෼ϓϥΠόγͷ؍఺͔Β͸࠷ѱͷέʔε ʢࢀߟɿલϖʔδʣ PDF ϵ ͕ࣔ͢΋ͷɿ̎ͭͷ෼෍ͷۙ͞ ▶ ͓ΑͦͲΜͳྖҬ S ΛͱΖ͏ ͱ 2 ͭͷ֬཰ͷൺ͸ eϵ ҎԼ 24 / 38

ྫ 2ɿϥϓϥεϊΠζΛ্͔͚ͨͰ ∆xg Λ഑෍ Avg. UE 1 BS UE 2
UE K (malicious) PDF ͲΜͳྖҬ S Λͱͬͯ΋ɺ2 ͭͷ֬཰ͷൺ͸ e2∆xi′ /σ ҎԼ ͢΂ͯͷ i′ Ͱ |∆xi′ | ≤ C Ͱ͋Ε͹ɺϵ = 2C/σ ͱΘ͔Δɻ ٯʹɺॴ๬ϓϥΠόγϨϕϧ ϵtarget Λୡ੒͢Δʹ͸ɺҎԼΛ͢Ε͹Α͍ɻ ▶ |∆xi′ | ≤ C ͱͳΔΑ͏ʹ ∆xi′ ΛΫϦοϓ 1 ▶ ෼ࢄ σ2 = ϵ2 target /(4C2) ͷϥϓϥεϊΠζΛ͔͚Δ 1∆xi Λ min{C, xi} (if ∆xi ≥ 0) ·ͨ͸ɺmax{−C, xi} (if ∆xi < 0) Ͱஔ͖׵͑Δ͜ͱ 25 / 38

Differentially private federated learning w/ Laplace noise Avg. UE 1
BS UE 2 UE K ... 1. Clipping + uploading 2. Averaging + Adding noise 3. Distributing 1 ϥ΢ϯυ͋ͨΓͷૢ࡞ʹ͸ɺϵtarget -ࠩ෼ϓϥΠόγ͕อূ͞ΕΔɻ 26 / 38

Causion: Privacy level (ϵ, δ) per round is not privacy
level (ϵ, δ) of FL Composition theorem (in this case derived from Th. 3 in [9]) ʮલϖʔδͷૢ࡞Λ T ϥ΢ϯυ܁Γฦ͢ʯૢ࡞͸ Tϵtarget -ࠩ෼ϓϥΠϕʔτ ʢw/ T-fold adaptive compositionʣ Ͱ͋Δɻ ্͸ɺ1 ϥ΢ϯυͰ ϵtarget ࠩ෼ϓϥΠϕʔτͩͱͯ͠΋ɺશମͱͯ͠͸ ͦͷ T ഒʢѱ͍ʣϓϥΠόγϨϕϧͰ͋Δ͜ͱΛҙຯ͍ͯ͠Δɻ Αͬͯɺ1 ϥ΢ϯυ͝ͱͷϓϥΠόγϨϕϧ͸ɺϥ΢ϯυ਺ͱॴ๬ͷ ʢશମͱͯ͠ͷʣϓϥΠόγϨϕϧ͔Βٯࢉܾͯ͠ఆ͢Δඞཁɻ [9] C. Dwork, G. N. Rothblum, and S. Vadhan, “Boosting and differential privacy,” in Proc. IEEE FOCS 2010, Las Vegas, NV, USA, Oct. 2010, pp. 51–60 (ϵ-DP w/ T -fold adaptive composition ͸ɺҟͳΔσʔλ΁ T ճ࿈ଓͯ͠ૢ࡞Λߦ͏͜ͱͷϓϥΠόγΛఆྔԽ͢ΔͨΊݩͷ DP ͷఆٛΛ֦ுͨ͠΋ͷͱߟ͑ ͍͚ͯͨͩΕ͹ͱࢥ͍·͢ɻਖ਼֬ͳఆٛʹ͍ͭͯ͸ɺ্ͷจݙΛ͝ࢀর͍ͩ͘͞ɻ) ิ଍ɿ΋͏Ұͭͷࢦඪ δʢྫ֎֬཰ʣ͕͋Ε͹ɺϵ ͕ϥ΢ϯυ਺ʹؔͯ͠ઢܗΑΓ΋؇΍͔ʹ૿͍͑ͯ͘Α͏ δ Λௐ੔͢Δ͜ͱ͕ՄೳͰɺ͜ΕΛ advanced composition [9] ͱ͍͏ɻ͔͠͠ɺ͜Ε͸ຊߨԋͷର৅֎ͱ͍ͤͯͩ͘͞͞ɻ 27 / 38

Part III: Differentially Private AirComp Federated Learning w/ Power Adaptation
Harnessing Receiver Noise Part 3. DP- AirComp FL Collaborative machine learning Data security Avg. - Scalable data aggregation Federated learning Part 1. Aircomp. (Analog Commun.) Part 2. Differential Privacy - Lightweight attack robustness - Private collaborative learning - Digital Commun. - Cryptography Split learning Wireless Communications Centralized Part 4. Beyond FL: AirMixML Part 2' DP-FL Part 3. DP- AirComp FL Y. Koda, K. Yamamoto, T. Nishio, et al., “Differentially private aircomp federated learning with power adaptation harnessing receiver noise,” in Proc. IEEE GLOBECOM 2020, Held online, 2020, pp. 1–6 28 / 38

Key messege of this part Let’s not suppress noise perturbations
maximally but harness it to ensure UEs’ privacy Amplitude modulation & Channel inversion Post- processing 2. Pre- processing 1. Updating model parameter Model parameter update UE 1 UE BS Simultaneous & co- channel transmission Same as signal detection in digital commun. 3. Broadcasting Comp. result = Global Model parameter Useful to ensure differential privacy against BS 29 / 38

How can we control power scaling factor? Maximizing power scaling
factor under privacy constraint maximize ρ ρ subject to UE i’s transmit power ≤ Max transmit power P0 , ∀i, per round privacy budget ϵ ≤ ϵtarget , per round privacy budget δ ≤ δtarget ↓ solve Power adaptation rule for targeting (ϵtarget , δtarget )-DP under AWGN ρ⋆ priv = P0 C−2 min { min i∈{UEs} |hi |2, σ2 n 4 GP0 ϵ2 target ln(1.25/δtarget ) New term due to privacy budget } ิ଍ɿಋग़ʹ͸ɺ ʢϥϓϥεͰ͸ͳ͘ʣΨ΢εϊΠζ޲͚ͷఆཧΛ࢖༻͠·͢ɻ ʢTheorem A.1. in [8]ʣ [8] C. Dwork, A. Roth, et al., “The algorithmic foundations of differential privacy,” Found. Trends Theor. Comput. Sci., vol. 9, no. 3–4, pp. 211–407, Aug. 2014 30 / 38

But... privacy constraint prohibits from setting larger transmit power. 0.0
0.1 0.2 0.3 0.4 0.5 0.000 0.002 0.004 0.006 Due to constraint for differential privacy Non-private Squared root of power scaling factor Increase in privacy level → There might be a tradeoff between SNR and privacy budget. How does SNR relate to them? 31 / 38

How does SNR relates to privacy budget? Upper limit of
SNR Under Rayleigh fading channel, SNR ≤ GβI2P0 ∑ i∈I rα i σ2 n [ 1 − exp ( − ∑ i∈I rα i σ2 n 4 GβP0 ϵ2 target ln(1.25/δtarget ) )] (Exact form) ≈ I2 4 ϵ2 target ln(1.25/δtarget ) (Approximated form for higher privacy level) I is the number of UEs. Two key insights: ▶ Challenge: There is a tradeoff between privacy budget and SNR. ▶ Solution: Increasing in the number of UEs is key to enhance SNR. 32 / 38

Numerical Evaluations Table 1: Parameters in wireless communciations Antenna gain
G 0 dBi Distance between BS and UEs ri 100 m Path loss exponent α 4 Noise power σ2 n −60 dBm Table 2: Parameters in FL Data set MNIST Optimizer Adam Learning rate 1 × 10−3 # Local epochs 20 Batch size 32 # Rounds 50 Model Two Fully connected layers (Unit: 512) 33 / 38

Validity of SNR analysis fosucing on tradeoff between SNR and
privacy budget 0.0 0.2 0.4 0.6 0.8 −60 −40 −20 0 20 SNR (dB) Analytical bound, 100 UEs Simulation, 100 UEs Analytical bound, 5 UEs Simulation, 5 UEs 1.0 Per-round privacy budget ▶ Actual SNR is along with the analysis. ▶ There is a tradeoff between SNR and privacy budget. ▶ More UEs lead to higher SNR. 34 / 38

Impact of number of UEs on test accuracy # UEs
I = 5, 100 ʢϵtarget = 0.01, δtarget = 0.01 per roundʣ 10 20 30 40 50 0.2 0.4 0.6 0.8 1.0 Number of global updates Test Accuracy 100 UEs 5 UEs 0 ( , for 5 UEs, , for 100 UEs) DP AirComp FL: (Privacy level: ) AirComp FL w/ max. TP: ▶ Large number of UEs results in higher model performance. ▶ Participants of 100 UEs results in competitive performance to AirComp FL w/ max TP. 35 / 38

Let’s not suppress noise perturbations maximally but harness it to
ensure UEs’ differential privacy Amplitude modulation & Channel inversion Post- processing 2. Pre- processing 1. Updating model parameter Model parameter update UE 1 UE BS Simultaneous & co- channel transmission Same as signal detection in digital commun. 3. Broadcasting Comp. result = Global Model parameter Useful to ensure differential privacy against BS Summary: ▶ We derived power adaptation rule to meet differential privacy budgets. ▶ We analyzed the relationship between SNR and differential privacy budgets. 36 / 38

Summary Collaborative machine learning Data security Avg. - Scalable data
aggregation Federated learning Part 1. Aircomp. (Analog Commun.) Part 2. Differential Privacy - Lightweight attack robustness - Private collaborative learning - Digital Commun. - Cryptography Split learning Wireless Communications Centralized Part 3. DP- AirComp FL 37 / 38

Reference I [1] Y. Koda, K. Yamamoto, T. Nishio, and
M. Morikura, “Differentially private aircomp federated learning with power adaptation harnessing receiver noise,” in Proc. IEEE GLOBECOM 2020, Held online, 2020, pp. 1–6. [2] G. Zhu, Y. Wang, and K. Huang, “Broadband analog aggregation for low-latency federated edge learning,” IEEE Trans. Wirel. Commun., vol. 19, no. 1, pp. 491–506, Oct. 2019. [3] R. C. Buck, “Approximate complexity and functional representation.,” Wisconsin Univ. Madison Mathematics Research Center, Tech. Rep., 1976. [4] M. Goldenbaum, H. Boche, and S. Sta´ nczak, “Harnessing interference for analog function computation in wireless sensor networks,” IEEE Trans. Signal Process., vol. 61, no. 20, pp. 4893–4906, Oct. 2013. [5] S. Limmer, J. Mohammadi, and S. Sta´ nczak, “A simple algorithm for approximation by nomographic functions,” in Proc. IEEE Allerton 2015, Monticello, IL, USA, Sep. 2015, pp. 453–458. [6] L. Phong, Y. Aono, T. Hayashi, L. Wang, and S. Moriai, “Privacy-preserving deep learning via additively homomorphic encryption,” IEEE Trans. Inf. Forensics Secur., vol. 13, no. 5, pp. 1333–1345, May 2018. [7] W. Luping, W. Wei, and L. Bo, “CMFL: Mitigating communication overhead for federated learning,” in Proc. IEEE ICDCS 2019, Richardson, TX, USA, Jul. 2019, pp. 954–964. [8] C. Dwork, A. Roth, et al., “The algorithmic foundations of differential privacy,” Found. Trends Theor. Comput. Sci., vol. 9, no. 3–4, pp. 211–407, Aug. 2014. [9] C. Dwork, G. N. Rothblum, and S. Vadhan, “Boosting and differential privacy,” in Proc. IEEE FOCS 2010, Las Vegas, NV, USA, Oct. 2010, pp. 51–60. 38 / 38

Over-the-Air Computation for Scalable, Lightwei...

Over-the-Air Computation for Scalable, Lightweight, and Privacy Preserving Edge Machine Learning

Other Decks in Research

Featured

Transcript