Slide 6
Slide 6 text
Transformerの基本性能向上を心がけた軌跡
• ELU [Clevert+, ICLR 2016]
• GeLU [Hendrycks+Gimpel, 2016]
• Swish [Ramachandran+, ICLR WS 2018]
• SELU [Klambauer+, NIPS 2017]
• GLU [Dauphin+, ICML 2017]
• RMS [Zhang+Sennrich, NeurIPS 2019]
• ReZero [Bachlechner+, 2020]
• Fixup [Zhang+, ICLR 2019]
• Adaptive Softmax [Joulin+, ICML 2017]
• Mixture of Softmaxes [Yang+, ICLR 2018]