Slide 1

Slide 1 text

Unlearning Traces the Influential Training Data of Language Models үপ େ1,2 Ivan Titov1, 3 1ΤσΟϯόϥେֶ 2౦ژେֶ 3ΞϜεςϧμϜେֶ

Slide 2

Slide 2 text

• ௨ৗͷֶश – ֶशσʔλ𝑧͕ੜ੒͞ΕΔ֬཰𝑝𝜽 (𝑧)Λ࠷େԽ͢ΔΑ͏ʹɺϞσϧ𝜽Λߋ৽ 𝜽 = arg max 𝜽 log 𝑝𝜽 (𝑧)ɿޯ഑߱Լ๏Λར༻ • ٯֶश – ֶशσʔλ𝑧͕ੜ੒͞ΕΔ֬཰𝑝𝜽 (𝑧)Λ࠷খԽ͢ΔΑ͏ʹɺϞσϧ𝜽Λߋ৽ 𝜽 = arg min 𝜽 log 𝑝𝜽 (𝑧)ɿޯ഑্ঢ๏Λར༻ Unlearning ʢٯֶश/൓ֶशʣͱ͸ 2 𝒛: Masaru lives in the UK UK Japan US 𝒛: Masaru lives in the UK UK Japan US

Slide 3

Slide 3 text

• ༗֐ͳֶशࡁσʔλΛϞσϧ͔Β๨٫ – ϓϥΠόγʔ৵֐๷ࢭ – ஶ࡞ݖ৵֐๷ࢭ – όΠΞεআڈ – ॏෳσʔλআڈ ٯֶशݚڀͷྲྀΕͱҐஔ෇͚ 3 2015ɿCVΛத৺ʹٯֶश͕࢖ΘΕΔ Cao et al., Towards making systems forget with machine unlearning. IEEE symposium on security and privacy, 2015. 2023ɿLLMʹٯֶशΛར༻ Jang et al., Knowledge Unlearning for Mitigating Privacy Risks in Language Models. ACL, 2023. 2024ɿຊݚڀ • ֶशσʔλͷӨڹਪఆ ٯֶशݚڀͷྲྀΕ ໨త

Slide 4

Slide 4 text

• ͋ΔϞσϧͷग़ྗʹର͠ɺֶ֤शσʔλ͸Ͳͷఔ౓ӨڹΛ༩͔͑ͨʁ • ֶशσʔλΛൈֶ͍ͯशΛ܁Γฦ͢͜ͱͰɺֶ֤शσʔλͷӨڹΛଌΕΔ͕ɺܭࢉίετ͕๲େ ֶशσʔλͷӨڹਪఆͱ͸ 4 χϡʔε ॻ੶ SNS ֶशσʔλ Ϟσϧͷग़ྗ + + → χϡʔε ॻ੶ SNS + + → χϡʔε ॻ੶ SNS + + → ແ֐ ༗֐ ΍΍༗֐

Slide 5

Slide 5 text

• ֶशσʔλΛൈ͘୅ΘΓʹɺֶशࡁϞσϧ͔Βٯֶश • ٯֶशલޙͷϞσϧΛൺֱ͢Δ͜ͱͰɺֶशσʔλͷӨڹΛਪఆ Ξϓϩʔν 5 χϡʔε ॻ੶ SNS ֶशࡁϞσϧ Ϟσϧͷग़ྗ + + → χϡʔε ॻ੶ SNS + + → χϡʔε ॻ੶ SNS + + → ແ֐ ༗֐ ΍΍༗֐ χϡʔε − ॻ੶ − SNS − ٯֶश

Slide 6

Slide 6 text

• ֶशσʔληοτ𝑍! ͕ධՁσʔληοτ𝑍"ʹ༩͑ΔӨڹ (ਅͷ஋) Λɺଛࣦؔ਺ͷࠩ෼Ͱఆٛ (leave-one-out) 𝐼#$%#& 𝑍! , 𝑍" = 𝐿 𝑍", 𝜽'(𝒊 − 𝐿(𝑍", 𝜽) ) ໰୊ઃఆͱΞϓϩʔν 6 ධՁσʔλ𝑍! ֶशσʔλ𝑍" ֶश ධՁ ධՁσʔλ𝑍! ֶशσʔλ𝑍# ֶशσʔλ𝑍$ + + + + → → ධՁσʔλ𝑍! ֶशσʔλ𝑍" + + → ֶशσʔλ𝑍$ ֶशσʔλ𝑍# 𝑍" Λൈֶ͍ͯशͨ͠Ϟσϧ શσʔλΛֶशͨ͠Ϟσϧ

Slide 7

Slide 7 text

• ֶशσʔληοτ𝑍! ͕ධՁσʔληοτ𝑍"ʹ༩͑ΔӨڹ (ਅͷ஋) Λɺଛࣦؔ਺ͷࠩ෼Ͱఆٛ (leave-one-out) 𝐼#$%#& 𝑍! , 𝑍" = 𝐿 𝑍", 𝜽'(𝒊 − 𝐿(𝑍", 𝜽) ) • 𝜽'(" ͷܭࢉ͸େม ⇒ 𝑍! Λൈ͘୅ΘΓʹɺશσʔλΛֶशͨ͠Ϟσϧ͔Βɺ𝑍! Λٯֶशͯ͠𝐼#$%#& Λਪఆ 𝜽'(" = argmin 𝜽 ∑(#+(" 𝐿 𝑍, , 𝜽 = argmin 𝜽 ∑(# 𝐿 𝑍, , 𝜽 − 𝐿 𝑍! , 𝜽 ໰୊ઃఆͱΞϓϩʔν 7 ධՁσʔλ𝑍! ֶशσʔλ𝑍" ֶश ධՁ ධՁσʔλ𝑍! ֶशσʔλ𝑍# ֶशσʔλ𝑍$ + + + + → → ධՁσʔλ𝑍! ֶशσʔλ𝑍" + + → ֶशσʔλ𝑍$ ֶशσʔλ𝑍# 𝑍" Λൈֶ͍ͯशͨ͠Ϟσϧ શσʔλΛֶशͨ͠Ϟσϧ

Slide 8

Slide 8 text

• UnTracɿֶशσʔληοτ𝑍! Λٯֶशͯ͠ධՁσʔληοτ𝑍"ͰධՁ UnTrac 𝑍! , 𝑍" = 𝐿 𝑍", 𝜽- − 𝐿(𝑍", 𝜽) ) • ֶशσʔλ͕ଟ͍৔߹ɺܭࢉίετߴ ఏҊख๏1ɿUnTrac 8 ධՁσʔλ𝑍! ֶशσʔλ𝑍# ֶशσʔλ𝑍" ֶशσʔλ𝑍$ શσʔλΛֶशͨ͠Ϟσϧ ٯֶश ධՁ ֶशσʔλ𝑍# ධՁσʔλ𝑍! ֶशσʔλ𝑍# ֶशσʔλ𝑍" ֶशσʔλ𝑍$ ֶशσʔλ𝑍" + + + + − − → → ධՁσʔλ𝑍! ֶशσʔλ𝑍# ֶशσʔλ𝑍" ֶशσʔλ𝑍$ ֶशσʔλ𝑍$ + + − → ֶशσʔλ𝑍" Λٯֶशͨ͠Ϟσϧ ʢޯ഑্ঢ๏ʣ શσʔλΛֶशͨ͠Ϟσϧ

Slide 9

Slide 9 text

• UnTrac-InvɿධՁσʔληοτ𝑍"Λٯֶशֶͯ͠शσʔληοτ𝑍! ͰධՁ UnTrac − Inv 𝑍! , 𝑍" = 𝐿 𝑍! , 𝜽- − 𝐿(𝑍! , 𝜽) ) • ͍͔ͭ͘ͷԾఆɾ৚݅ͷ΋ͱɺUnTrac-Inv͸UnTracΛۙࣅ ఏҊख๏2ɿUnTrac-Inv 9 ධՁσʔλ𝑍! ֶशσʔλ𝑍# ֶशσʔλ𝑍" ֶशσʔλ𝑍$ શσʔλΛֶशͨ͠Ϟσϧ ٯֶश ධՁ ධՁσʔλ𝑍! ֶशσʔλ𝑍# ֶशσʔλ𝑍" ֶशσʔλ𝑍$ + + + + − − → → ֶशσʔλ𝑍# ֶशσʔλ𝑍" ධՁσʔλ𝑍! ֶशσʔλ𝑍# ֶशσʔλ𝑍" ֶशσʔλ𝑍$ + + − → ֶशσʔλ𝑍$ ධՁσʔλ𝑍#Λٯֶशͨ͠Ϟσϧ ʢޯ഑্ঢ๏ʣ શσʔλΛֶशͨ͠Ϟσϧ

Slide 10

Slide 10 text

• 𝜽. = 𝜽.'/ + 𝜂∇𝜽 𝐿 𝑧. , 𝜽.'/ ; 𝐿 𝑍", 𝜽. − 𝐿(𝑍", 𝜽.'/ ) ≈ ∇𝜽 𝐿 𝑍", 𝜽.'/ (𝜽. − 𝜽.'/ )ͱ1࣍ۙࣅ͢Δͱɺ UnTrac 𝑍, 𝑍" = ∑.0/ - 𝐿 𝑍", 𝜽. − 𝐿(𝑍", 𝜽.'/ ) ≈ ∑.0/ - 𝜂∇𝜽 𝐿 𝑧. , 𝜽.'/ 1∇𝜽 𝐿 𝑍", 𝜽.'/ • ಉ༷ʹɺ UnTrac − Inv 𝑍, 𝑍" = ∑.0/ -$ 𝐿 𝑍, 𝜽. − 𝐿(𝑍, 𝜽.'/ ) ≈ ∑.0/ -$ 𝜂∇𝜽 𝐿 𝑍, 𝜽.'/ 1∇𝜽 𝐿 𝑧. ", 𝜽.'/ • ٯֶश͕1εςοϓͰ (𝑇 = 𝑇" = 1)ɺ1όονʹશαϯϓϧؚ͕·ΕΔͱ͖ (𝑧. = 𝑍, 𝑧. " = 𝑍")ɺ྆ऀ͸Ұக → όοναΠζ͕େ͖͘ɺ͔ͭٯֶशͷεςοϓ਺͕গͳ͍৔߹ʹUnTrac-Inv͸༗ޮ UnTrac-Inv͕UnTracΛۙࣅ͢Δ৚݅ 10

Slide 11

Slide 11 text

• طଘख๏ͷଟ͘͸ɺUnTrac/UnTrac-InvͷಛघέʔεͱΈͳͤΔ • UnTracͰશͯͷֶशαϯϓϧΛ1εςοϓٯֶशͨ͠৔߹ΛҰ࣍ۙࣅ͢Δͱɺ UnTrac 𝑍, 𝑍" = 𝐿 𝑍", 𝜽/ − 𝐿 𝑍", 𝜽) ≈ ∇𝜽 𝐿 𝑍", 𝜽) 1 𝜽/ − 𝜽) طଘख๏ͱͷؔ܎ 11 GradDot GradCos HIF ∇𝜽 𝐿 𝑍#, 𝜽$ %∇𝜽 𝐿 𝑍, 𝜽$ ∇𝜽 𝐿 𝑍#, 𝜽$ %∇𝜽 𝐿 𝑍, 𝜽$ ∇𝜽 𝐿 𝑍#, 𝜽$ ∇𝜽 𝐿 𝑍, 𝜽$ ∇𝜽𝐿 𝑍#, 𝜽$ %𝐻𝜽 &'∇𝜽𝐿 𝑍, 𝜽$ ਪఆ஋ طଘख๏ 𝜽' − 𝜽$ = 𝜂∇𝜽 𝐿 𝑍, 𝜽$ (SGD) 𝜽' − 𝜽$ = 𝜂 ∇𝜽) *, 𝜽& ∇𝜽) *, 𝜽& (RMSProp, Adam) 𝜽' − 𝜽$ = 𝐻𝜽 &'∇𝜽𝐿 𝑍, 𝜽$ (χϡʔτϯ๏) Ұக͢Δ৚݅ Fisher kernel ∇𝜽 𝐿 𝑍#, 𝜽$ %𝐹𝜽 &'∇𝜽 𝐿 𝑍, 𝜽$ 𝜽' − 𝜽$ = 𝐹𝜽 &'∇𝜽 𝐿 𝑍, 𝜽$ (ࣗવޯ഑๏)

Slide 12

Slide 12 text

• ࣄલֶशࡁT5 (3B)ͷϑΝΠϯνϡʔχϯάʹ༻͍ͨλεΫͷӨڹΛద੾ʹਪఆͰ͖Δ͔ݕূ • ຊख๏ͷݒ೦ɿ ͋Δग़ྗܗࣜͷλεΫΛٯֶश͢ΔͱɺٯֶशޙͷϞσϧ͸ͲͷλεΫͰ΋ͦͷܗࣜͰग़ྗ͠ͳ͘ͳΔʁ → ධՁλεΫͱग़ྗܗ͕ࣜಉ͡λεΫͷӨڹ͕աେධՁ͞ΕΔڪΕ • ධՁλεΫͱྨࣅ/ඇྨࣅºಉ͡/ҟͳΔग़ྗܗࣜͷ4λεΫΛֶश ࣮ݧ1ɿT5ͷϑΝΠϯνϡʔχϯά 12 σʔληοτͷαϯϓϧྫɻ{} ෦෼͸αϯϓϧ͝ͱʹҟͳΔɻ

Slide 13

Slide 13 text

• ྆ख๏ͱ΋ɺྨࣅλεΫʢ1, 2ʣͷӨڹΛඇྨࣅλεΫʢ3, 4ʣΑΓߴ͘ਪఆ • ࣮ࡍʹleave-one-outΛߦ͏ͱɺྨࣅλεΫʢ1, 2ʣͷӨڹ͸ඇྨࣅλεΫʢ3, 4ʣΑΓߴ͍ Þ ग़ྗܗࣜʹ͞΄Ͳࠨӈ͞ΕͣɺֶशλεΫͷӨڹΛద੾ʹਪఆ ࣮ݧ1ɿ݁Ռ 13 UnTrac UnTrac-Inv 1: ྨࣅλεΫ/ಉ͡ग़ྗܗࣜ 2: ྨࣅλεΫ/ҟͳΔग़ྗܗࣜ 3: ඇྨࣅλεΫ/ಉ͡ग़ྗܗࣜ 4: ඇྨࣅλεΫ/ҟͳΔग़ྗܗࣜ ٯֶशΤϙοΫ਺ ٯֶशΤϙοΫ਺

Slide 14

Slide 14 text

• OPT (125M) ͷෆద੾ͳੜ੒ʹରͯ͠ɺࣄલֶशσʔληοτͷӨڹΛਖ਼֬ʹਪఆͰ͖Δ͔ݕূ • ࣄલֶशσʔληοτ – OPTͷࣄલֶशσʔληοτ8ݸʢܭ32ສαϯϓϧʣ – ࣄલֶशσʔληοτͷαΠζ͕ۉҰ/ෆۉҰͳ৔߹ͷ2έʔεΛ૝ఆ • ධՁσʔληοτ – ToxiGenɿϚΠϊϦςΟʹର͢Δࠩผతͳจষ – WinoBiasɿδΣϯμʔόΠΞεΛؚΉจষ – TruthfulQAɿ࣭໰ʹର͢Δෆਖ਼֬ͳճ౴ ࣮ݧ2ɿOPTͷࣄલֶश 14

Slide 15

Slide 15 text

• leave-one-outͰଌͬͨӨڹʢਅͷ஋ʣͱɺ֤ख๏ʹΑΔਪఆ஋ͷϐΞιϯ૬ؔ܎਺Λࢉग़ • શσʔληοτΛ௨ͯ͡ɺUnTracͱUnTrac-Inv ͸࣮ࡍͷӨڹΛൺֱతߴ͍ਫ਼౓Ͱਪఆ – WinoBias/TruthfulQA͸ֶशσʔληοτͷӨڹͷ෼ࢄ͕খ͘͞ɺطଘख๏΋ߴ͍ੑೳ – ToxiGen͸ֶशσʔληοτͷӨڹͷ෼ࢄ͕େ͖͘ɺطଘख๏͸௿͍ੑೳ͕ͩɺఏҊख๏͸ؤ݈ͳੑೳ ࣮ݧ2ɿ݁Ռ 15 طଘ ఏҊ

Slide 16

Slide 16 text

• UnTrac͸ٯֶशճ਺͕૿͑Δ΄ͲɺόοναΠζʹؔΘΒͣੑೳ͕҆ఆ • UnTrac-Inv͸ٯֶशճ਺͕গͳ͘ɺ͔ͭόοναΠζ͕େ͖͍৔߹ʹߴ͍ੑೳʢP10ͷٞ࿦ͱ੔߹ʣ • طଘख๏͸ٯֶशճ਺͕1εςοϓͷ৔߹ʹ૬౰ => 1εςοϓͰͷਪఆʹແཧ͕͋ͬͨͱࣔࠦ ࣮ݧ2ɿٯֶशճ਺ͱόοναΠζͷӨڹ 16 UnTrac UnTrac-Inv ٯֶशΤϙοΫ਺ ٯֶशΤϙοΫ਺

Slide 17

Slide 17 text

• ֶशࡁϞσϧ͔Βֶशσʔλ΍ධՁσʔλΛٯֶश͢Δ͜ͱͰɺֶशσʔλͷӨڹΛਪఆͰ͖Δ – ֶशσʔλͷӨڹਪఆ͸͔ͳΓݚڀ͞Ε͍ͯΔʹ΋͔͔ΘΒͣɺ͜Ε͚ͩ୯७ͳΞϓϩʔν͕·ͩ͋Δ͜ͱʹڻ͖ – leave-one-outͳͲͷ໰୊ઃఆʹཱͪ໭ΓɺγϯϓϧͳΞΠσΞΛߟ͑Δ͜ͱͷେ੾͞Λ࣮ײ • طଘख๏ͷଟ͘͸ຊख๏ͷಛघέʔε – ٯֶशʹͲͷޯ഑߱Լ๏Λ࢖͏͔ͱ͍͏ࢹ఺͔Βɺ֤छख๏ͷੑ࣭ΛղऍͰ͖Δ • ٯֶशʹඞཁͳϝϞϦ͸ֶशͱಉ͡ͳͷͰɺLLMʹ΋ద༻Մೳ – ࠷ۙͰ͸ܰྔͳٯֶशख๏͕ͨ͘͞Μग़͖͓ͯͯΓɺͦΕΒ͕࢖͑Δ͔΋͠Εͳ͍ ·ͱΊͱॴײ 17

Slide 18

Slide 18 text

• leave-one-out͸ܧଓٯֶशͱݟ၏ͤΔ – 𝑍" Λֶश͔Βൈ͘ = શσʔλΛֶशͨ͠Ϟσϧ͔Β𝑍" Λٯֶश 𝜽'(" = argmin 𝜽 ∑(#+(" 𝐿 𝑍, , 𝜽 = argmin 𝜽 ∑(# 𝐿 𝑍, , 𝜽 − 𝐿 𝑍! , 𝜽 • leave-one-outΛΑۙ͘ࣅ͢Δʹ͸ɺܧଓֶशٕज़ΛऔΓೖΕΔͷ͕༗ޮͦ͏ – ྫʣelastic weight consolidation (EWC)ɿֶशࡁΈͷσʔλΛ๨٫͠ͳ͍Α͏ʹ৽͍͠σʔλΛֶश • EWCΛ౰ॳಋೖ͍͕ͯͨ͠ɺ࣮ݧ1Ͱ܏޲͕େ͖͘มΘΒͳ͔ͬͨͨΊ࠾༻ͤͣ – ΑΓৄࡉʹݕূ͢Ε͹ɺܧଓٯֶशΛߦ͏΂͖ͱ͍͏࣮ݧ݁Ռ͕ಘΒΕΔ͔΋͠Εͳ͍ ෇࿥ 18 Kirkpatrick et al., Overcoming catastrophic forgetting in neural networks. PNAS, 2017.