Slide 53
Slide 53 text
LLM 大規模言語モデル講座 講義資料 © 2024 by 東京大学松尾・岩澤研究室 53
① フィードフォワードネットを知識保存モジュールとして見る
[28] Mor Geva et al. (2021) “Transformer Feed-Forward Layers Are Key-Value Memories”
フィードフォワードネットを記憶装置とみなす
§ フィードフォワードネット (2層MLP) は注意機構と似ている
Feed-Forward
Network
FFN(key)
Activation
inner product
weighted sum
FFN(val)
FFN Output
Hidden State
The capital of Ireland is [MASK]
Self-Attention Layer
Feed-Forward Network
Dublin
… …
… …
Knowledge
Neurons
… …
𝐿 ×
Figure 2: Illustration of how an FFN module in a Transformer block works as a key-value memory. The first linear
layer FFN(key) computes intermediate neurons through inner product. Taking the activation of these neurons as
weights, the second linear layer FFN(val) integrates value vectors through weighted sum. We hypothesize that
knowledge neurons in the FFN module are responsible for expressing factual knowledge.
Attention head
Attention
weights
Key vectors
Value vectors
!!
!"
!#
… weighted sum
inner product
…
…
…
…
…
dule in a Transformer block works as a key-value memory. The first linear
neurons through inner product. Taking the activation of these neurons as
al) integrates value vectors through weighted sum. We hypothesize that
re responsible for expressing factual knowledge.
nowledge at-
g and ampli-
ffects the ex-
dge. Second,
a fact tend to
g knowledge-
e knowledge
g prompts re-
ually express
om activating
lation.
erage knowl-
al knowledge
y fine-tuning.
pdating facts,
ng the knowl-
edge surgery
in Transformers, even without any fine-tuning.
2 Background: Transformer
Transformer (Vaswani et al., 2017) is one of the
most popular and effective NLP architectures. A
Transformer encoder is stacked with L identical
blocks. Each Transformer block mainly contains
two modules: a self-attention module, and a feed-
forward network (abbreviated as FFN) module. Let
X 2 Rn⇥d denote the input matrix, two modules
can be formulated as follows:
Qh = XW
Q
h ,Kh = XW
K
h , Vh = XW
V
h , (1)
Self-Atth(X) = softmax QhK
T
h Vh, (2)
FFN(H) = gelu (HW1) W2, (3)
Q K V
ʢॏΈߦྻʣ
ʢॏΈߦྻʣ
⾮常に類似
Figure 2: Illustration of how an FFN module in a Transformer block works as a key-value memory. The first linear
layer FFN(key) computes intermediate neurons through inner product. Taking the activation of these neurons as
weights, the second linear layer FFN(val) integrates value vectors through weighted sum. We hypothesize that
knowledge neurons in the FFN module are responsible for expressing factual knowledge.
the effectiveness of the proposed knowledge at-
tribution method. First, suppressing and ampli-
fying knowledge neurons notably affects the ex-
pression of the corresponding knowledge. Second,
we find that knowledge neurons of a fact tend to
be activated more by corresponding knowledge-
expressing prompts. Third, given the knowledge
neurons of a fact, the top activating prompts re-
trieved from open-domain texts usually express
the corresponding fact, while the bottom activating
prompts do not express the correct relation.
In our case studies, we try to leverage knowl-
edge neurons to explicitly edit factual knowledge
in pretrained Transformers without any fine-tuning.
We present two preliminary studies: updating facts,
and erasing relations. After identifying the knowl-
edge neurons, we perform a knowledge surgery
for pretrained Transformers by directly modify-
in Transformers, even without any fine-tuning.
2 Background: Transformer
Transformer (Vaswani et al., 2017) is one of the
most popular and effective NLP architectures. A
Transformer encoder is stacked with L identical
blocks. Each Transformer block mainly contains
two modules: a self-attention module, and a feed-
forward network (abbreviated as FFN) module. Let
X 2 Rn⇥d denote the input matrix, two modules
can be formulated as follows:
Qh = XW
Q
h ,Kh = XW
K
h , Vh = XW
V
h , (1)
Self-Atth(X) = softmax QhK
T
h Vh, (2)
FFN(H) = gelu (HW1) W2, (3)
where W
Q
h , W
K
h , W
V
h , W1, W2 are parameter ma-
trices; Self-Atth(X) computes a single attention
Query vector