文献紹介 1月24日

Felix Wu, Angela Fan, Alexei Baevski, Yann Dauphin, Michael Auli,
International Conference on Learning Representations, 2019

%1 2 n Transformer self-attention -# )6( ;7+ 0'&:* n
'$.4 SotA8" (ICLR2019 3 !/) n 251, 0!/39

3 n RNNCNNself-Attention.( Sequence Modeling"& % n +
4*(self-attention))'05 ( $, l Ex. ) Transformer l self-attention -!/ #31 2

$ 4 n ) 0%'2 8(# 417! n 417!(#$)
88"3*9! n 0%'2 +&/ (Tang et al., 2018) n .6-,5

5 n ) Self-attention l "!
"# n ) Dynamic convolution () l $ "

6 n Self-attention n Gated linear units
(GLU) Lightweight conv() n Dynamic conv

7 n Self-attention n

8 n Depthwise convolutions n "! ! n
# we have to go to Tokyo tonight we have to go to Tokyo tonight Normal convolutions Depthwise convolutions

9 n Lightweight convolutions n
n Softmax we have to go to Tokyo tonight Lightweight convolutions

10 n & ' " Dynamic convolutions
n $ # # & ( ' !" & ' " n %# $!self-attention

11 n Encoder-Decoder n Transformer self-attentionLightweight Conv,
Dynamic Conv n

() 13 • En-De, En-Fr self-attention (Vaswani
et al., 2017) SotA • Zn-En

3+ (48) 14 • - 9 • :%CNN 0'(5
(CNN, k=3) • Kernel$# /1&72 $(5! • Softmax;*,6 " 3+.)

( ) 15 • • Self-attention

-( (0,) 16 • Self-attention" 4&/13 • Bottom-Up
0, ) sequence-to-sequence • $!.*(Celikyilmaz et al., 2018) +# $!.LightConv, DynamicConv 2* / 4&/'%

17 n Self-attention ) 5$-' ,+28 ! . n
# # 64;SotA9( n 7&31/ : "&0*%

gumigumi7