Slide 35
Slide 35 text
4FMG"UUFOUJPOͷৄࡉ
q1
k1
v1
q2
k2
v2
q3
k3
v3
q4
k4
v4
q5
k5
v5
[α1,1
α1,2
α1,3
α1,4
α1,5
] [α2,1
α2,2
α2,3
α2,4
α2,5
] [α3,1
α3,2
α3,3
α3,4
α3,5
] [α4,1
α4,2
α4,3
α4,4
α4,5
] [α5,1
α5,2
α5,3
α5,4
α5,5
]
[ ̂
α1,1
̂
α1,2
̂
α1,3
̂
α1,4
̂
α1,5
] [ ̂
α2,1
̂
α2,2
̂
α2,3
̂
α2,4
̂
α2,5
] [ ̂
α3,1
̂
α3,2
̂
α3,3
̂
α3,4
̂
α3,5
] [ ̂
α4,1
̂
α4,2
̂
α4,3
̂
α4,4
̂
α4,5
] [ ̂
α5,1
̂
α5,2
̂
α5,3
̂
α5,4
̂
α5,5
]
⊕
⊗ ⊗ ⊗ ⊗ ⊗
⊕
⊗ ⊗ ⊗ ⊗ ⊗
⊕
⊗ ⊗ ⊗ ⊗ ⊗
⊕
⊗ ⊗ ⊗ ⊗ ⊗
⊕
⊗ ⊗ ⊗ ⊗ ⊗
output1
output2
output3
output4
output5
TPGUNBY TPGUNBY TPGUNBY TPGUNBY TPGUNBY
x1
e1
x2
e2
x3
e3
x4
e4
x5
e5
&NCFEEJOH &NCFEEJOH &NCFEEJOH &NCFEEJOH &NCFEEJOH
1&
Figure 1: The Transformer - model architecture.
der and Decoder Stacks
Figure 1: The Transformer - model architecture.
1&
Figure 1: The Transformer - model architecture.
3.1 Encoder and Decoder Stacks
Figure 1: The Transformer - model architecture.
1&
Figure 1: The Transformer - model architecture.
3.1 Encoder and Decoder Stacks
Figure 1: The Transformer - model architecture.
1&
Figure 1: The Transformer - model architecture.
3.1 Encoder and Decoder Stacks
Figure 1: The Transformer - model architecture.
1&
Figure 1: The Transformer - model architecture.
3.1 Encoder and Decoder Stacks
Figure 1: The Transformer - model architecture.
Figure 1: The Transformer - model architecture.
3.1 Encoder and Decoder Stacks
Encoder: The encoder is composed of a stack of N = 6 identical layers. Each
"UUFOUJPOXFJHIUͱ7BMVFಛྔΛࢉ͠ใ༩͢Δ͜ͱͰɼ࣌ࠁؒͷಛྔͷؔੑΛٻΊΔ
Attention(Q, K, V) = ̂
αV