Slide 22
Slide 22 text
Part 2 : Multi-Head Attention
• ͡Ίʹ, query, key, value Λ Source, Target ͔Βύϥϝʔλ W q
i
,
W k
i
, W v
i
Λ༻͍ͯܭࢉ
• h-head attention Ͱ͋Ε, W q
i
, W k
i
∈ Rdmodel×dk , W v
i
∈ Rdmodel×dv
• dk
= dv
= dmodel
/h, i = 1, . . . , h
• ͭ·Γ, h-head ͳΒ query, key, value ͷ࣍ݩ͕ 1-head ࣌ͷ 1/h ʹͳΔ
IBWF
B
QFO
*
&04
QBE
QBE
IBWF
B
QFO
*
&04
QBE
QBE
4PVSDF
5BSHFU
7BMVF
,FZ
2VFSZ
𝑆𝑊
!
"
𝑆𝑊
!
#
𝑇𝑊
!
$
𝑇𝑊
%
$
𝑆𝑊
%
"
𝑆𝑊
%
#
21 / 41