Transformer Google brain
Attention is all you need, 2017
, SOTA
attention FeedForward
Encoder Decoder
(Conformer)
(Vision Transformer)
1
2
1: Vaswani, Ashish, et al. "Attention is all you need." Advances in
neural information processing systems 30 (2017).
2: 2
Slide 3
Slide 3 text
DeepMind
AI
Transformer
1
1: Reed, Scott, et al. "A Generalist Agent."
arXiv preprint arXiv:2205.06175 (2022). 3
>
RNN( ) GAN( ) CNN( ) Transformer(
)
D = {(x , y )}
i i x i y i
8
Slide 9
Slide 9 text
>
ON, OFF
1: : > > > > AI ,
https://www.soumu.go.jp/johotsusintokei/whitepaper/ja/r01/html/nd113210.html 9
Slide 10
Slide 10 text
>
1.
2.
(
)
y = f(x)
10
Slide 11
Slide 11 text
--> Attention
11
Slide 12
Slide 12 text
Attention
Slide 13
Slide 13 text
Attention
" (query)
" [1, Chapter7.2]
( )" "
13
Slide 14
Slide 14 text
Attention >
: query
query (
)
query
{z }
i
z i
q r r =
i
r(z , q)
i
r z i q
(a , ..., a ) =
1 N
softmax(r , ..., r )
1 N
1
F = a z
∑ i i
1: softmax a =
i r /( e )
i
∑ r i a =
∑ i 1 a ≥
i 0 r ≥
i r j a ≥
i a j
14
Slide 15
Slide 15 text
Attention >
query
query
→ query ( )
a i z i
F = a z
∑ i i
z i
source, query target source-to-target attention source target
self-attention
z i
15
Slide 16
Slide 16 text
Attention >
( ; token)
tokenize token word embedding
" "
→ [" ", " ", " ", " "]
→
= [0, 1, 3], #
= [3, 4, -1], #
= [1, 0, -4], #
= [-3, 2, 1], #
{z }
i
z 0
z 1
z 2
z 3
16
Slide 17
Slide 17 text
Attention >
word embedding
word embedding
→ a b
z a
z b
⟨v , v ⟩
a n
1
1: ( ) (
)
( ) 17
Slide 18
Slide 18 text
Attention >
( )
( )
query
Transformer scaled dot-product attention
z i q i i n
z =
i q i
r(z , q )
i j
⟨z , q ⟩/d
i j
1 d = n
q j
F =
j
a z =
∑ i i
softmax( )Z
d
q Z
j
T
2
1: i j
2: Z z i
18
Slide 19
Slide 19 text
Attention > Key-Value
Transformer (key)
(value)
( )
key-value
(python dict c++ map ) key value
query key
value ?
k , q , v
i i i
i
n
F =
j softmax( )V
d
q K
j
T
19