SOTA attention FeedForward Encoder Decoder (Conformer) (Vision Transformer) 1 2 1: Vaswani, Ashish, et al. "Attention is all you need." Advances in neural information processing systems 30 (2017). 2: 2
i z i q r r = i r(z , q) i r z i q (a , ..., a ) = 1 N softmax(r , ..., r ) 1 N 1 F = a z ∑ i i 1: softmax a = i r /( e ) i ∑ r i a = ∑ i 1 a ≥ i 0 r ≥ i r j a ≥ i a j 14