Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
【輪講資料】Length-Induced Embedding Collapse in PLM-...
Search
Yano
September 22, 2025
0
120
【輪講資料】Length-Induced Embedding Collapse in PLM-based Models
ACL読み会@名大 2025で使用したスライドです
Yano
September 22, 2025
Tweet
Share
More Decks by Yano
See All by Yano
【輪講資料】How Do Large Language Models Acquire Factual Knowledge During Pretraining?
yano0
0
210
NLP2025参加報告
yano0
0
550
【輪講資料】ReAct: Synergizing Reasoning and Acting in Language Models / Tree of Thoughts: Deliberate Problem Solving with Large Language Models
yano0
0
180
【輪講資料】SimLM: Pre-training with Representation Bottleneck for Dense Passage Retrieval
yano0
2
330
【輪講資料】From Zero to Hero: On the Limitations of Zero-Shot Language Transfer with Multilingual Transformers
yano0
0
89
【輪講資料】Zero-shot Cross-lingual Semantic Parsing
yano0
0
130
Featured
See All Featured
Fantastic passwords and where to find them - at NoRuKo
philnash
52
3.4k
Speed Design
sergeychernyshev
32
1.1k
The Invisible Side of Design
smashingmag
301
51k
What's in a price? How to price your products and services
michaelherold
246
12k
Save Time (by Creating Custom Rails Generators)
garrettdimon
PRO
32
1.6k
Fight the Zombie Pattern Library - RWD Summit 2016
marcelosomers
234
17k
A Modern Web Designer's Workflow
chriscoyier
697
190k
The World Runs on Bad Software
bkeepers
PRO
71
11k
Improving Core Web Vitals using Speculation Rules API
sergeychernyshev
18
1.2k
Connecting the Dots Between Site Speed, User Experience & Your Business [WebExpo 2025]
tammyeverts
8
560
ReactJS: Keep Simple. Everything can be a component!
pedronauck
667
120k
Practical Orchestrator
shlominoach
190
11k
Transcript
Length-Induced Embedding Collapse in PLM-based Models ݚڀࣨɹD1 ઍߛ Yuqi Zhou,
Sunhao Dai, Zhanshuo Cao, Xiao Zhang, Jun Xu ACL 2025
֓ཁ • ςΩετຒΊࠐΈϞσϧʹ͓͍ͯɺςΩετ͕͘ͳΔ΄ͲԼྲྀ λεΫͷੑೳ͕Լ͕Δ͕ଘࡏɻݪҼͱ͍ͯ͠ςΩετͷຒ ΊࠐΈ͕ີूͯ͠͠·͏Length CollapseΛఏএͨ͠ • Length CollapseͷݪҼΛཧతʹಛఆ͠ɺղܾࡦͱͯ͠Attention ʹԹύϥϝʔλΛಋೖ͢ΔTemp
ScaleΛఏҊͨ͠ • Temp ScaleΛಋೖ͢Δ͜ͱͰɺLength Collapse͕ݮ͠ɺԼྲྀ λεΫͷੑೳ্͕͢Δ͜ͱΛ֬ೝͨ͠ 2
ҙ • จதͷਤݩจ͔ΒͷҾ༻ࣗ͘͠࡞Ͱ͢ɻ • ࣜͷల։ɺূ໌ʹؔͯ͠ेͳઆ໌͕͋Γ·ͤΜɻ 3
ςΩετຒΊࠐΈϞσϧ ✓ จΛ୯ҰͷදݱʢϕΫτϧͳͲʣʹΤϯίʔυ͢ΔϞσϧ • ۙLLMϕʔεͷϞσϧΜʹѻΘΕ͍ͯΔ͕ɺ͜ͷจ BERTͳͲͷํΞςϯγϣϯΛ࣋ͭɺ͍ΘΏΔΤϯίʔμ ϕʔεͷϞσϧ͕ର 4 ςΩετ ຒΊࠐΈϞσϧ
͓ʹ͗Γ͕৯͍ͨ ͓ण࢘Λ৯Α͏ കͬ͢ͺ͍ ྨࣅߴ ྨࣅ ҙຯۙ ҙຯԕ ಋೖ
ςΩετຒࠐϞσϧͱܥྻ • ͍ܥྻΛ୯ҰͷදݱʹຒΊࠐΉ͜ͱͷͰ͖ΔςΩετຒࠐϞσ ϧͷߏஙΑ͘औΓ·Ε͍ͯΔ • ͍ܥྻΛ୯ҰͷදݱʹຒΊࠐΊΔͱɺ͍จষΛରͱ͢Δݕ ࡧɺ͍ରཤྺΛରͱ͢Δྨͱ͔͕Ͱ͖ͯخ͍͠ • ۙͷϞσϧʢBAAI/bge-m3ɺjinaai/jina-embeddings-v3ͳͲʣ ཧ্࠷େܥྻ8192ͳͲͱ͞Ε͍ͯͨΓ͢Δ
• 🤔͔͠͠ɺ܇࿅σʔλϕϯνϚʔΫະͩෆेͱ͍͏ҹ… 5 ಋೖ
ςΩετຒࠐϞσϧܥྻʹऑ͍ᶃ ✓ ܥྻ/Ϟσϧ͝ͱʹྨλεΫͷ݁ՌΛཧ • ྨλεΫͷੑೳܥྻ͕͍΄ͲԼ͕Δ • 🤔 ͍ํ͕λεΫࣗମ؆୯…ʁ 6 Ϟσϧ
ܥྻʢUPLFOʣ өըϨϏϡʔྨ ಋೖ: طଘͷ՝
ςΩετຒࠐϞσϧܥྻʹऑ͍ᶄ ✓ ฏۉ670จࣈͷςΩετΛLLMʹΑͬͯฏۉ120จࣈɺ36.5จࣈʹ ཁ͠ɺBGEͰΤϯίʔυɺՄࢹԽ • ར༻σʔλNFCorpus • ܥྻͷຒΊࠐΈʢ•ʣີू͍ͯ͠Δ 7 ҩֶจ
ಋೖ: طଘͷ՝
ςΩετຒࠐϞσϧܥྻʹऑ͍ᶅ ✓ ܥྻ͝ͱʹCosine SimilarityΛ͔Δ ※τ=1ͷάϥϑ͚ͩݟ͍ͯͩ͘͞ɺτ͕ͳʹ͔ޙड़… • ར༻σʔλNFCorpus • ܥྻͷຒΊࠐΈಉ࢜΄Ͳ Similarity͕ߴ͍ɺີू͍ͯ͠Δʂ
8 ಋೖ: طଘͷ՝
Length Collapse • ܥྻͷຒΊࠐΈີू͍ͯͯ͠ɺͦͷ͍ͤͰԼྲྀλεΫͷੑೳ ͕མͪΔΒ͍͠ = ͜ΕΛLength CollapseͱݺͿ • ͳͥͦΜͳࣄ͕ى͖Δͷ͔…ʁ
9 ՝ͷਂງΓ
Length Collapseʹ͍ͭͯͷཧతͳੳɿ४උ • ࠓճѻ͏TransformerΤϯίʔμʹࣗݾҙػߏʢSelf Attention, SAʣ ؚ͕·ΕɺҎԼͷࣜͰද͞ΕΔ • X: ೖྗจɺWq,
Wk, Wv: ΫΤϦɺΩʔɺόϦϡʔͷॏΈ 1. SoftmaxΛ࠷ޙʹ௨ΔͷͰɺߦྻͷཁૉͯ͢ਖ਼ 2. SoftmaxΛ࠷ޙʹ௨ΔͷͰɺߦྻͷ֤ߦͷʢ=֬ͷʣඞͣ1 ➡ 2ΑΓɺग़ྗͷ࠷େݻ༗1ɺରԠ͢Δݻ༗ϕΫτϧ [1,1,1,…1] 10 ՝ͷਂງΓ
Length Collapseʹ͍ͭͯͷཧతͳੳɿ४උ • ࠓճѻ͏TransformerΤϯίʔμʹࣗݾҙػߏʢSelf Attention, SAʣ ؚ͕·ΕɺҎԼͷࣜͰද͞ΕΔ • X: ೖྗจɺWq,
Wk, Wv: ΫΤϦɺΩʔɺόϦϡʔͷॏΈ 1. SoftmaxΛ࠷ޙʹ௨ΔͷͰɺߦྻͷཁૉͯ͢ਖ਼ 2. SoftmaxΛ࠷ޙʹ௨ΔͷͰɺߦྻͷ֤ߦͷʢ=֬ͷʣඞͣ1 ➡ 2ΑΓɺग़ྗͷ࠷େݻ༗1ɺରԠ͢Δݻ༗ϕΫτϧ [1,1,1,…1] 11 ิɿݻ༗ͱݻ༗ϕΫτϧ "ߦྻ Yݻ༗ϕΫτϧ Еݻ༗ ݻ༗ϕΫτϧɿ"Ͱม͖͕ͯ͠มΘΒͣɺЕഒ͞ΕΔϕΫτϧY Ax = λx ՝ͷਂງΓ
ิ1ɿࣗݾҙػߏߴपΛݮਰͤ͞Δ • AttentionΛ௨ͬͨग़ྗͷ࠷େݻ༗1ɺରԠ͢Δݻ༗ϕΫτϧ [1,1,1,…1] • Կಉ͡ߦྻΛ͔͚Δʢ͖ʣͱɺ࠷େݻ༗ʹରԠ͢Δݻ ༗ϕΫτϧͷํɺ[1,1,1,…1] ͲΜͲΜ͍͍͖ۙͮͯɺߴप ͕ࣦΘΕΔ
• ͜ΕࣗମઌߦݚڀͰূ໌͞Ε͓ͯΓɺ͕ਂ͘ͳΔࣄʹߴप ͕ࣦΘΕΔ͜ͱʢOver-Smoothingʣ͕ΒΕ͍ͯΔ 12 ՝ͷਂງΓ
ิ1ɿࣗݾҙػߏߴपΛݮਰͤ͞Δ • AttentionΛ௨ͬͨग़ྗͷ࠷େݻ༗1ɺରԠ͢Δݻ༗ϕΫτϧ [1,1,1,…1] • Կಉ͡ߦྻΛ͔͚Δʢ͖ʣͱɺ࠷େݻ༗ʹରԠ͢Δݻ ༗ϕΫτϧͷํɺ[1,1,1,…1] ͲΜͲΜ͍͍͖ۙͮͯɺߴप ͕ࣦΘΕΔ
• ͜ΕࣗମઌߦݚڀͰূ໌͞Ε͓ͯΓɺ͕ਂ͘ͳΔࣄʹߴप ͕ࣦΘΕΔ͜ͱʢOver-Smoothingʣ͕ΒΕ͍ͯΔ 13 ֤ཁૉ͔ΒฏۉΛҾ͍ͨͷ දݱͷσΟςΟʔϧΈ͍ͨͳΠϝʔδ… ՝ͷਂງΓ
ఆٛ2: Self AttentionʹΑΔߴपͷݮਰ 14 • ఆٛ2: ߴपHC[X]ͷSelf AttentionʹΑΔϑΟϧλʔ HC[X]ͷ࠷େಛҟσ_αͰ͑Δ͜ͱ͕Ͱ͖Δ ͦͷߦྻͰͷมΛߦͬͨͱ͖ʹ
มԽ͢Δେ͖͞ͷ࠷େ ՝ͷਂງΓ
ఆٛ 3: ߴपͷݮਰܥྻ͕͘ͳΔ΄Ͳେ͖͘ͳΔ 15 • ఆٛ3: σ_αܥྻnͰ͑Δ͜ͱ͕Ͱ͖Δ • ͓ؾ࣋ͪɿܥྻ͕͘ͳΔ΄ͲsoftmaxʹΑͬͯAttention Score
ͷ͕ฏୱʹͳΓɺߴप͕θϩߦྻʹۙͮ͘ = ಛҟ͕ খ͘͞ͳΔ ͭ·ΓɺϑΟϧλʔܥྻͰ͑Δ͜ͱ͕Ͱ͖Δ ՝ͷਂງΓ
ϑΟϧλʔܥྻͰ͑Δ͜ͱ͕Ͱ͖Δ = ܥྻ͕͍΄Ͳߴप͕ݮਰ͠ɺࣅͨຒΊࠐΈ͔Γʹͳ ΔʢLength Collapseʣ • ࠷ॳʹ͔֬ΊͨCosineྨࣅͱ ܥྻͷؔʹઆ໌͕ͭ͘ Length Collapseى͖͍ͯͦ͏…
16 ՝ͷਂງΓ
ఏҊɿTempScaleͷಋೖ • Length CollapseߴपͷݮਰʢաฏԽʣʹΑͬͯى͖͍ͯͨ ➡ݩͷग़ྗͷΛͬͱઑΒͤΕΑ͍ͷͰʁ • softmaxલͷlogitͷ֤ߦΛআࢉ͢ΔɺԹ0< τ< 1ΛɺAttentionʹಋೖ •
ग़ྗτ͕େ͖͍΄Ͳฏୱʹɺখ͍͞΄ͲઑͬͨʹͳΔ • ͭ·Γɺτ͕খ͍͞΄ͲΑΓଟ͘ͷߴप͕อ࣋͞ΕΔʂ 17 ղܾ๏ͷఏҊ
TempScaleͷద༻ ✓ ܥྻ͝ͱʹCosine SimilarityΛ͔Δ • ར༻σʔλNFCorpus • Թ͕Լ͕ΔͱɺܥྻʹΑΔ ྨࣅͷӨڹখ͘͞ͳ͍ͬͯͦ͏ʂ 18
ղܾ๏ͷద༻
TempScaleͷԼྲྀλεΫͷӨڹ 19 ղܾ๏ͷద༻
TempScaleͷԼྲྀλεΫͷӨڹ 20 ࠩਖ਼ͷ͕ͩ ݁ߏখ্͍͞෯… ʜ ղܾ๏ͷద༻
TempScaleͷԼྲྀλεΫͷӨڹ 21 ܥྻେ͖ͳϞσϧ λεΫͱ૬ੑྑ…ʁ ʜ ղܾ๏ͷద༻
TempScaleͷԼྲྀλεΫʢSTSʣͷӨڹ • STSɿจϖΞͷҙຯͷۙ͞Λ͏·͘ଊ͑ΔλεΫ • ԹΛখ͘͢͞Δ΄Ͳɺؔͳ͍ʢRandomUnrekatedʣϖΞ ͷྨࣅ͕Լ͕Δ • 🤔ͦ͏͔ͳ…ʁ 22 ղܾ๏ͷద༻
ରςΩετͷܥྻʹΑͬͯ࠷దͳԹҟͳΔ • ײతʹɺ͍ܥྻΛѻ͏߹΄Ͳݮਰ͋ͬͯ͘΄͍͠ = Թখ͋ͬͯ͘͞΄͍͠ • λεΫɿSummScreenFD • ͍͍ͩͨظ௨Γ͕ͩɺANCEͷΈظͱٯͷ 23
ςϨϏͷຊ͔Βͦͷ ཁΛݕࡧ͢ΔλεΫ ˛ϞσϧɺςΩετͷ͞͝ͱͷԹͱੑೳͷؔ ղܾ๏ͷద༻
·ͱΊ • ςΩετຒΊࠐΈϞσϧʹ͓͍ͯɺςΩετ͕͘ͳΔ΄ͲԼྲྀ λεΫͷੑೳ͕Լ͕Δ͕ଘࡏɻݪҼͱ͍ͯ͠ςΩετͷຒ ΊࠐΈ͕ີूͯ͠͠·͏Length CollapseΛఏএͨ͠ • Length CollapseͷݪҼΛཧతʹಛఆ͠ɺղܾࡦͱͯ͠Attention ʹԹύϥϝʔλΛಋೖ͢ΔTemp
ScaleΛఏҊͨ͠ • Temp ScaleΛಋೖ͢Δ͜ͱͰɺLength Collapse͕ݮ͠ɺԼྲྀ λεΫͷੑೳ্͕͢Δ͜ͱΛ֬ೝͨ͠ 24
ײ • ཧతʹ࣮ݧతʹॆ࣮ͨ͠ίετͷֻ͔͍ͬͯΔจͩͳ͋ͱݴ ͏ҹ • Appendixʹେྔͷ࣮ݧɺ֤ධՁσʔλͷઆ໌͕͋ͬͨΓɺLLMϕʔε ͷຒΊࠐΈϞσϧʹ͍ͭͯݴٴ͕͋Δ • ICLRͰreject͞Εͨόʔδϣϯ͔Βཧ໘ॆ࣮͍ͤͯͯ͞Ғ͍ •
ͦΕͦ͏ͱɺTempScale͏·͘ߦ͍ͬͯΔ͔Θ͔Βͳ͍… • ϞσϧλεΫʹΑ͕͔ͬͯͳΓΒ͍͍ͭͯΔ • TempScaleΛܥྻʹґଘܾͯ͠ఆ͠ɺ܇࿅ͯ͠ΈΔͱͦΕͳΓʹੑೳ ্͕͕Δ͔…ʁ 25