Upgrade to Pro — share decks privately, control downloads, hide ads and more …

NLPコロキウム:Unlearning Traces the Influential Trai...

Masaru Isonuma
July 03, 2024
130

NLPコロキウム:Unlearning Traces the Influential Training Data of Language Models

Masaru Isonuma

July 03, 2024
Tweet

Transcript

  1. Unlearning Traces the Influential Training Data of Language Models үপ

    େ1,2 Ivan Titov1, 3 1ΤσΟϯόϥେֶ 2౦ژେֶ 3ΞϜεςϧμϜେֶ
  2. • ௨ৗͷֶश – ֶशσʔλ𝑧͕ੜ੒͞ΕΔ֬཰𝑝𝜽 (𝑧)Λ࠷େԽ͢ΔΑ͏ʹɺϞσϧ𝜽Λߋ৽ 𝜽 = arg max 𝜽

    log 𝑝𝜽 (𝑧)ɿޯ഑߱Լ๏Λར༻ • ٯֶश – ֶशσʔλ𝑧͕ੜ੒͞ΕΔ֬཰𝑝𝜽 (𝑧)Λ࠷খԽ͢ΔΑ͏ʹɺϞσϧ𝜽Λߋ৽ 𝜽 = arg min 𝜽 log 𝑝𝜽 (𝑧)ɿޯ഑্ঢ๏Λར༻ Unlearning ʢٯֶश/൓ֶशʣͱ͸ 2 𝒛: Masaru lives in the UK UK Japan US 𝒛: Masaru lives in the UK UK Japan US
  3. • ༗֐ͳֶशࡁσʔλΛϞσϧ͔Β๨٫ – ϓϥΠόγʔ৵֐๷ࢭ – ஶ࡞ݖ৵֐๷ࢭ – όΠΞεআڈ – ॏෳσʔλআڈ

    ٯֶशݚڀͷྲྀΕͱҐஔ෇͚ 3 2015ɿCVΛத৺ʹٯֶश͕࢖ΘΕΔ Cao et al., Towards making systems forget with machine unlearning. IEEE symposium on security and privacy, 2015. 2023ɿLLMʹٯֶशΛར༻ Jang et al., Knowledge Unlearning for Mitigating Privacy Risks in Language Models. ACL, 2023. 2024ɿຊݚڀ • ֶशσʔλͷӨڹਪఆ ٯֶशݚڀͷྲྀΕ ໨త
  4. • ֶशσʔλΛൈ͘୅ΘΓʹɺֶशࡁϞσϧ͔Βٯֶश • ٯֶशલޙͷϞσϧΛൺֱ͢Δ͜ͱͰɺֶशσʔλͷӨڹΛਪఆ Ξϓϩʔν 5 χϡʔε ॻ੶ SNS ֶशࡁϞσϧ

    Ϟσϧͷग़ྗ + + → χϡʔε ॻ੶ SNS + + → χϡʔε ॻ੶ SNS + + → ແ֐ ༗֐ ΍΍༗֐ χϡʔε − ॻ੶ − SNS − ٯֶश
  5. • ֶशσʔληοτ𝑍! ͕ධՁσʔληοτ𝑍"ʹ༩͑ΔӨڹ (ਅͷ஋) Λɺଛࣦؔ਺ͷࠩ෼Ͱఆٛ (leave-one-out) 𝐼#$%#& 𝑍! , 𝑍"

    = 𝐿 𝑍", 𝜽'(𝒊 − 𝐿(𝑍", 𝜽) ) ໰୊ઃఆͱΞϓϩʔν 6 ධՁσʔλ𝑍! ֶशσʔλ𝑍" ֶश ධՁ ධՁσʔλ𝑍! ֶशσʔλ𝑍# ֶशσʔλ𝑍$ + + + + → → ධՁσʔλ𝑍! ֶशσʔλ𝑍" + + → ֶशσʔλ𝑍$ ֶशσʔλ𝑍# 𝑍" Λൈֶ͍ͯशͨ͠Ϟσϧ શσʔλΛֶशͨ͠Ϟσϧ
  6. • ֶशσʔληοτ𝑍! ͕ධՁσʔληοτ𝑍"ʹ༩͑ΔӨڹ (ਅͷ஋) Λɺଛࣦؔ਺ͷࠩ෼Ͱఆٛ (leave-one-out) 𝐼#$%#& 𝑍! , 𝑍"

    = 𝐿 𝑍", 𝜽'(𝒊 − 𝐿(𝑍", 𝜽) ) • 𝜽'(" ͷܭࢉ͸େม ⇒ 𝑍! Λൈ͘୅ΘΓʹɺશσʔλΛֶशͨ͠Ϟσϧ͔Βɺ𝑍! Λٯֶशͯ͠𝐼#$%#& Λਪఆ 𝜽'(" = argmin 𝜽 ∑(#+(" 𝐿 𝑍, , 𝜽 = argmin 𝜽 ∑(# 𝐿 𝑍, , 𝜽 − 𝐿 𝑍! , 𝜽 ໰୊ઃఆͱΞϓϩʔν 7 ධՁσʔλ𝑍! ֶशσʔλ𝑍" ֶश ධՁ ධՁσʔλ𝑍! ֶशσʔλ𝑍# ֶशσʔλ𝑍$ + + + + → → ධՁσʔλ𝑍! ֶशσʔλ𝑍" + + → ֶशσʔλ𝑍$ ֶशσʔλ𝑍# 𝑍" Λൈֶ͍ͯशͨ͠Ϟσϧ શσʔλΛֶशͨ͠Ϟσϧ
  7. • UnTracɿֶशσʔληοτ𝑍! Λٯֶशͯ͠ධՁσʔληοτ𝑍"ͰධՁ UnTrac 𝑍! , 𝑍" = 𝐿 𝑍",

    𝜽- − 𝐿(𝑍", 𝜽) ) • ֶशσʔλ͕ଟ͍৔߹ɺܭࢉίετߴ ఏҊख๏1ɿUnTrac 8 ධՁσʔλ𝑍! ֶशσʔλ𝑍# ֶशσʔλ𝑍" ֶशσʔλ𝑍$ શσʔλΛֶशͨ͠Ϟσϧ ٯֶश ධՁ ֶशσʔλ𝑍# ධՁσʔλ𝑍! ֶशσʔλ𝑍# ֶशσʔλ𝑍" ֶशσʔλ𝑍$ ֶशσʔλ𝑍" + + + + − − → → ධՁσʔλ𝑍! ֶशσʔλ𝑍# ֶशσʔλ𝑍" ֶशσʔλ𝑍$ ֶशσʔλ𝑍$ + + − → ֶशσʔλ𝑍" Λٯֶशͨ͠Ϟσϧ ʢޯ഑্ঢ๏ʣ શσʔλΛֶशͨ͠Ϟσϧ
  8. • UnTrac-InvɿධՁσʔληοτ𝑍"Λٯֶशֶͯ͠शσʔληοτ𝑍! ͰධՁ UnTrac − Inv 𝑍! , 𝑍" =

    𝐿 𝑍! , 𝜽- − 𝐿(𝑍! , 𝜽) ) • ͍͔ͭ͘ͷԾఆɾ৚݅ͷ΋ͱɺUnTrac-Inv͸UnTracΛۙࣅ ఏҊख๏2ɿUnTrac-Inv 9 ධՁσʔλ𝑍! ֶशσʔλ𝑍# ֶशσʔλ𝑍" ֶशσʔλ𝑍$ શσʔλΛֶशͨ͠Ϟσϧ ٯֶश ධՁ ධՁσʔλ𝑍! ֶशσʔλ𝑍# ֶशσʔλ𝑍" ֶशσʔλ𝑍$ + + + + − − → → ֶशσʔλ𝑍# ֶशσʔλ𝑍" ධՁσʔλ𝑍! ֶशσʔλ𝑍# ֶशσʔλ𝑍" ֶशσʔλ𝑍$ + + − → ֶशσʔλ𝑍$ ධՁσʔλ𝑍#Λٯֶशͨ͠Ϟσϧ ʢޯ഑্ঢ๏ʣ શσʔλΛֶशͨ͠Ϟσϧ
  9. • 𝜽. = 𝜽.'/ + 𝜂∇𝜽 𝐿 𝑧. , 𝜽.'/

    ; 𝐿 𝑍", 𝜽. − 𝐿(𝑍", 𝜽.'/ ) ≈ ∇𝜽 𝐿 𝑍", 𝜽.'/ (𝜽. − 𝜽.'/ )ͱ1࣍ۙࣅ͢Δͱɺ UnTrac 𝑍, 𝑍" = ∑.0/ - 𝐿 𝑍", 𝜽. − 𝐿(𝑍", 𝜽.'/ ) ≈ ∑.0/ - 𝜂∇𝜽 𝐿 𝑧. , 𝜽.'/ 1∇𝜽 𝐿 𝑍", 𝜽.'/ • ಉ༷ʹɺ UnTrac − Inv 𝑍, 𝑍" = ∑.0/ -$ 𝐿 𝑍, 𝜽. − 𝐿(𝑍, 𝜽.'/ ) ≈ ∑.0/ -$ 𝜂∇𝜽 𝐿 𝑍, 𝜽.'/ 1∇𝜽 𝐿 𝑧. ", 𝜽.'/ • ٯֶश͕1εςοϓͰ (𝑇 = 𝑇" = 1)ɺ1όονʹશαϯϓϧؚ͕·ΕΔͱ͖ (𝑧. = 𝑍, 𝑧. " = 𝑍")ɺ྆ऀ͸Ұக → όοναΠζ͕େ͖͘ɺ͔ͭٯֶशͷεςοϓ਺͕গͳ͍৔߹ʹUnTrac-Inv͸༗ޮ UnTrac-Inv͕UnTracΛۙࣅ͢Δ৚݅ 10
  10. • طଘख๏ͷଟ͘͸ɺUnTrac/UnTrac-InvͷಛघέʔεͱΈͳͤΔ • UnTracͰશͯͷֶशαϯϓϧΛ1εςοϓٯֶशͨ͠৔߹ΛҰ࣍ۙࣅ͢Δͱɺ UnTrac 𝑍, 𝑍" = 𝐿 𝑍",

    𝜽/ − 𝐿 𝑍", 𝜽) ≈ ∇𝜽 𝐿 𝑍", 𝜽) 1 𝜽/ − 𝜽) طଘख๏ͱͷؔ܎ 11 GradDot GradCos HIF ∇𝜽 𝐿 𝑍#, 𝜽$ %∇𝜽 𝐿 𝑍, 𝜽$ ∇𝜽 𝐿 𝑍#, 𝜽$ %∇𝜽 𝐿 𝑍, 𝜽$ ∇𝜽 𝐿 𝑍#, 𝜽$ ∇𝜽 𝐿 𝑍, 𝜽$ ∇𝜽𝐿 𝑍#, 𝜽$ %𝐻𝜽 &'∇𝜽𝐿 𝑍, 𝜽$ ਪఆ஋ طଘख๏ 𝜽' − 𝜽$ = 𝜂∇𝜽 𝐿 𝑍, 𝜽$ (SGD) 𝜽' − 𝜽$ = 𝜂 ∇𝜽) *, 𝜽& ∇𝜽) *, 𝜽& (RMSProp, Adam) 𝜽' − 𝜽$ = 𝐻𝜽 &'∇𝜽𝐿 𝑍, 𝜽$ (χϡʔτϯ๏) Ұக͢Δ৚݅ Fisher kernel ∇𝜽 𝐿 𝑍#, 𝜽$ %𝐹𝜽 &'∇𝜽 𝐿 𝑍, 𝜽$ 𝜽' − 𝜽$ = 𝐹𝜽 &'∇𝜽 𝐿 𝑍, 𝜽$ (ࣗવޯ഑๏)
  11. • ྆ख๏ͱ΋ɺྨࣅλεΫʢ1, 2ʣͷӨڹΛඇྨࣅλεΫʢ3, 4ʣΑΓߴ͘ਪఆ • ࣮ࡍʹleave-one-outΛߦ͏ͱɺྨࣅλεΫʢ1, 2ʣͷӨڹ͸ඇྨࣅλεΫʢ3, 4ʣΑΓߴ͍ Þ ग़ྗܗࣜʹ͞΄Ͳࠨӈ͞ΕͣɺֶशλεΫͷӨڹΛద੾ʹਪఆ

    ࣮ݧ1ɿ݁Ռ 13 UnTrac UnTrac-Inv 1: ྨࣅλεΫ/ಉ͡ग़ྗܗࣜ 2: ྨࣅλεΫ/ҟͳΔग़ྗܗࣜ 3: ඇྨࣅλεΫ/ಉ͡ग़ྗܗࣜ 4: ඇྨࣅλεΫ/ҟͳΔग़ྗܗࣜ ٯֶशΤϙοΫ਺ ٯֶशΤϙοΫ਺
  12. • OPT (125M) ͷෆద੾ͳੜ੒ʹରͯ͠ɺࣄલֶशσʔληοτͷӨڹΛਖ਼֬ʹਪఆͰ͖Δ͔ݕূ • ࣄલֶशσʔληοτ – OPTͷࣄલֶशσʔληοτ8ݸʢܭ32ສαϯϓϧʣ – ࣄલֶशσʔληοτͷαΠζ͕ۉҰ/ෆۉҰͳ৔߹ͷ2έʔεΛ૝ఆ

    • ධՁσʔληοτ – ToxiGenɿϚΠϊϦςΟʹର͢Δࠩผతͳจষ – WinoBiasɿδΣϯμʔόΠΞεΛؚΉจষ – TruthfulQAɿ࣭໰ʹର͢Δෆਖ਼֬ͳճ౴ ࣮ݧ2ɿOPTͷࣄલֶश 14
  13. • leave-one-out͸ܧଓٯֶशͱݟ၏ͤΔ – 𝑍" Λֶश͔Βൈ͘ = શσʔλΛֶशͨ͠Ϟσϧ͔Β𝑍" Λٯֶश 𝜽'(" =

    argmin 𝜽 ∑(#+(" 𝐿 𝑍, , 𝜽 = argmin 𝜽 ∑(# 𝐿 𝑍, , 𝜽 − 𝐿 𝑍! , 𝜽 • leave-one-outΛΑۙ͘ࣅ͢Δʹ͸ɺܧଓֶशٕज़ΛऔΓೖΕΔͷ͕༗ޮͦ͏ – ྫʣelastic weight consolidation (EWC)ɿֶशࡁΈͷσʔλΛ๨٫͠ͳ͍Α͏ʹ৽͍͠σʔλΛֶश • EWCΛ౰ॳಋೖ͍͕ͯͨ͠ɺ࣮ݧ1Ͱ܏޲͕େ͖͘มΘΒͳ͔ͬͨͨΊ࠾༻ͤͣ – ΑΓৄࡉʹݕূ͢Ε͹ɺܧଓٯֶशΛߦ͏΂͖ͱ͍͏࣮ݧ݁Ռ͕ಘΒΕΔ͔΋͠Εͳ͍ ෇࿥ 18 Kirkpatrick et al., Overcoming catastrophic forgetting in neural networks. PNAS, 2017.