Upgrade to Pro — share decks privately, control downloads, hide ads and more …

【輪講資料】Length-Induced Embedding Collapse in PLM-...

Avatar for Yano Yano
September 22, 2025
120

【輪講資料】Length-Induced Embedding Collapse in PLM-based Models

ACL読み会@名大 2025で使用したスライドです

Avatar for Yano

Yano

September 22, 2025
Tweet

Transcript

  1. Length Collapseʹ͍ͭͯͷཧ࿦తͳ෼ੳɿ४උ • ࠓճѻ͏TransformerΤϯίʔμʹ͸ࣗݾ஫ҙػߏʢSelf Attention, SAʣ ؚ͕·ΕɺҎԼͷࣜͰද͞ΕΔ • X: ೖྗจɺWq,

    Wk, Wv: ΫΤϦɺΩʔɺόϦϡʔͷॏΈ 1. SoftmaxΛ࠷ޙʹ௨ΔͷͰɺߦྻͷཁૉ͸͢΂ͯਖ਼ 2. SoftmaxΛ࠷ޙʹ௨ΔͷͰɺߦྻͷ֤ߦͷ࿨ʢ=֬཰ͷ࿨ʣ͸ඞͣ1 ➡ 2ΑΓɺग़ྗͷ࠷େݻ༗஋͸1ɺରԠ͢Δݻ༗ϕΫτϧ͸ [1,1,1,…1] 10 ՝୊ͷਂງΓ
  2. Length Collapseʹ͍ͭͯͷཧ࿦తͳ෼ੳɿ४උ • ࠓճѻ͏TransformerΤϯίʔμʹ͸ࣗݾ஫ҙػߏʢSelf Attention, SAʣ ؚ͕·ΕɺҎԼͷࣜͰද͞ΕΔ • X: ೖྗจɺWq,

    Wk, Wv: ΫΤϦɺΩʔɺόϦϡʔͷॏΈ 1. SoftmaxΛ࠷ޙʹ௨ΔͷͰɺߦྻͷཁૉ͸͢΂ͯਖ਼ 2. SoftmaxΛ࠷ޙʹ௨ΔͷͰɺߦྻͷ֤ߦͷ࿨ʢ=֬཰ͷ࿨ʣ͸ඞͣ1 ➡ 2ΑΓɺग़ྗͷ࠷େݻ༗஋͸1ɺରԠ͢Δݻ༗ϕΫτϧ͸ [1,1,1,…1] 11 ิ଍ɿݻ༗஋ͱݻ༗ϕΫτϧ "ߦྻ Yݻ༗ϕΫτϧ Еݻ༗஋ ݻ༗ϕΫτϧɿ"Ͱม׵ͯ͠΋޲͖͕มΘΒͣɺЕഒ͞ΕΔϕΫτϧY Ax = λx ՝୊ͷਂງΓ
  3. ิ୊1ɿࣗݾ஫ҙػߏ͸ߴप೾੒෼Λݮਰͤ͞Δ • AttentionΛ௨ͬͨग़ྗͷ࠷େݻ༗஋͸1ɺରԠ͢Δݻ༗ϕΫτϧ ͸ [1,1,1,…1] • Կ౓΋ಉ͡ߦྻΛ͔͚Δʢ΂͖৐ʣͱɺ࠷େݻ༗஋ʹରԠ͢Δݻ ༗ϕΫτϧͷํ޲ɺ[1,1,1,…1] ΁ͲΜͲΜ͍͍͖ۙͮͯɺߴप೾ ੒෼͕ࣦΘΕΔ

    • ͜Εࣗମ͸ઌߦݚڀͰ΋ূ໌͞Ε͓ͯΓɺ૚͕ਂ͘ͳΔࣄʹߴप ೾੒෼͕ࣦΘΕΔ͜ͱʢOver-Smoothingʣ͕஌ΒΕ͍ͯΔ 13 ֤ཁૉ஋͔Βฏۉ஋ΛҾ͍ͨ΋ͷ දݱͷσΟςΟʔϧΈ͍ͨͳΠϝʔδ… ՝୊ͷਂງΓ
  4. ఆٛ 3: ߴप೾੒෼ͷݮਰ཰͸ܥྻ௕͕௕͘ͳΔ΄Ͳେ͖͘ͳΔ 15 • ఆٛ3: ܎਺σ_α͸ܥྻ௕nͰ཈͑Δ͜ͱ͕Ͱ͖Δ • ͓ؾ࣋ͪɿܥྻ௕͕௕͘ͳΔ΄ͲsoftmaxʹΑͬͯAttention Score

    ͷ෼෍͕ฏୱʹͳΓɺߴप೾੒෼͕θϩߦྻʹۙͮ͘ = ಛҟ஋͕ খ͘͞ͳΔ ͭ·ΓɺϑΟϧλʔ཰͸ܥྻ௕Ͱ཈͑Δ͜ͱ͕Ͱ͖Δ ՝୊ͷਂງΓ
  5. ײ૝ • ཧ࿦తʹ΋࣮ݧతʹ΋ॆ࣮ͨ͠ίετͷֻ͔͍ͬͯΔ࿦จͩͳ͋ͱݴ ͏ҹ৅ • Appendixʹେྔͷ࣮ݧɺ֤ධՁσʔλͷઆ໌͕͋ͬͨΓɺLLMϕʔε ͷຒΊࠐΈϞσϧʹ͍ͭͯ΋ݴٴ͕͋Δ • ICLRͰreject͞Εͨόʔδϣϯ͔Βཧ࿦໘ॆ࣮͍ͤͯͯ͞Ғ͍ •

    ͦΕ͸ͦ͏ͱɺTempScale͸͏·͘ߦ͍ͬͯΔ͔Θ͔Βͳ͍… • Ϟσϧ΍λεΫʹΑͬͯ܏޲͕͔ͳΓ͹Β͍͍ͭͯΔ • TempScaleΛܥྻ௕ʹґଘܾͯ͠ఆ͠ɺ܇࿅ͯ͠ΈΔͱͦΕͳΓʹੑೳ ্͕͕Δ͔΋…ʁ 25