Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
論文紹介:∞-former: Infinite Memory Transformer
Search
yuri
September 20, 2022
Research
0
400
論文紹介:∞-former: Infinite Memory Transformer
第14回最先端NLP勉強会(2022年9月26日、27日)@お茶大 発表用資料
yuri
September 20, 2022
Tweet
Share
More Decks by yuri
See All by yuri
論文紹介:What In-Context Learning “Learns” In-Context: Disentangling Task Recognition and Task Learning
yuri00
0
600
論文紹介:Learning Dependency-Based Compositional Semantics
yuri00
0
150
論文紹介:What Context Features Can Transformer Language Models Use?
yuri00
0
400
Other Decks in Research
See All in Research
ロボット学習における大規模検索技術の展開と応用
denkiwakame
1
130
A scalable, annual aboveground biomass product for monitoring carbon impacts of ecosystem restoration projects
satai
4
340
When Submarine Cables Go Dark: Examining the Web Services Resilience Amid Global Internet Disruptions
irvin
0
320
Stealing LUKS Keys via TPM and UUID Spoofing in 10 Minutes - BSides 2025
anykeyshik
0
130
MetaEarth: A Generative Foundation Model for Global-Scale Remote Sensing Image Generation
satai
4
300
[RSJ25] Enhancing VLA Performance in Understanding and Executing Free-form Instructions via Visual Prompt-based Paraphrasing
keio_smilab
PRO
0
140
Submeter-level land cover mapping of Japan
satai
3
400
Sat2City:3D City Generation from A Single Satellite Image with Cascaded Latent Diffusion
satai
3
110
AI in Enterprises - Java and Open Source to the Rescue
ivargrimstad
0
470
PhD Defense 2025: Visual Understanding of Human Hands in Interactions
tkhkaeio
1
240
Large Language Model Agent: A Survey on Methodology, Applications and Challenges
shunk031
17
10k
大規模な2値整数計画問題に対する 効率的な重み付き局所探索法
mickey_kubo
1
380
Featured
See All Featured
Testing 201, or: Great Expectations
jmmastey
45
7.7k
The Art of Delivering Value - GDevCon NA Keynote
reverentgeek
15
1.7k
How GitHub (no longer) Works
holman
315
140k
Measuring & Analyzing Core Web Vitals
bluesmoon
9
610
For a Future-Friendly Web
brad_frost
180
9.9k
A better future with KSS
kneath
239
17k
Six Lessons from altMBA
skipperchong
28
4k
How STYLIGHT went responsive
nonsquared
100
5.8k
RailsConf 2023
tenderlove
30
1.2k
Optimising Largest Contentful Paint
csswizardry
37
3.4k
The Cost Of JavaScript in 2023
addyosmani
53
9k
A Modern Web Designer's Workflow
chriscoyier
697
190k
Transcript
∞-former: Infinite Memory Transformer Pedro Henrique Martins, Zita Marinho, André
F. T. Martins ACL 2022 お茶大 村山友理
Prior Work • ⻑いcontext をどう扱えば良いか︖ 2 Transformer Layer 𝑋! STM
q k,v ... Transformer Layer 𝑋! STM CM q k,v ... Compressive Transformer [Rae+ 2019] Transformer-XL [Dai+ 2019]
Infinite Memory Transformer • 過去の⼊⼒系列を連続値にして扱う 3
Long-term Memory • ⼊⼒Xに畳み込み(stride=1, width=3)をし、スムージングを⾏う Lはinput size, eはembedding size •
Xを連続値 ! 𝑋(𝑡)に変換 𝑡 ∈ 0, 1 : 𝑡! = 𝑖/𝐿 𝜓 𝑡 ∈ ℝ"はN個のRBF (radial basis function) のベクトル B ∈ ℝ"×$は多変量リッジ回帰によって得られる係数⾏列 4
Long-term Memory 𝑄 = 𝑋𝑊" ∈ ℝ#×% 𝐾 = 𝐵𝑊&
∈ ℝ'×% 𝑉 = 𝐵𝑊( ∈ ℝ'×% • attention mechanism としてガウス分布を⽤いる 5
Long-term Memory • 𝑧),+ は𝑍#,-,) ∈ ℝ#×.の⾏を成す • Transformerのcontext vector
𝑍, と⾜し合わせて最終的なcontext vector 𝑍を得る 6 ← attention × value
Unbounded Memory 7 • ! 𝑋(𝑡)を圧縮 • ! 𝑋(𝑡)から𝑀個のベクトルを等間隔にサンプリング
Sticky Memories • 重要な部分のメモリを積極的に保存したほうが良いのでは︖ • 前ステップのattentionからヒストグラムを作成し、D個の等間隔なbinに分割 {𝑑/, … , 𝑑0}
• 各binについてattention probability 𝑝(𝑑1 )を計算 • 𝑝に従ってM個をサンプリング 8
Complexity • Key matrix 𝐾 は基底関数の数𝑁 だけに依存し、contextの⻑さとは無関係 • Complexityもcontextの⻑さとは独⽴ •
short-term memory も使う場合︓ • LTMのみの場合︓ • どちらもvanilla transformer より⼩さい 9
Sorting • 系列のトークンを頻度順に並べる • モデルが直近のトークンだけでなく⻑期記憶も⾒ているか調べるために、 トークンの確率分布を変化させていく • 系列が⻑くなるほど𝛼 ∈ [0,1]は0から1に徐々に増加
• vocabulary size 20 • 4,000, 8,000, 16,000トークンで実験 10
Sorting • Transformer • 3 layers • 6 attention heads
• input size L = 1,024 • memory size 2,048 • LTM (N = 1,024 basis functions) 11
Document Grounded Dialogue • CMU Document Grounded Conversation dataset (CMU-DoG)
[Zhou+ 2018] • より難しくするために、会話が始まる前にしかdocumentにアクセスできなくする • GPT-2 small + continuous LTM (N = 512 basis functions) 12
Document Grounded Dialogue 13
Document Grounded Dialogue 14
LTMのアテンションの層による違い 15
16
17
18
19
まとめ • Infinite Memory Transformer を提案 • Unbounded context •
計算量はcontextの⻑さと独⽴ • Sorting, Language modeling, Document grounded dialogue で実験 • ⻑期記憶の有⽤性を⽰した 20