Slide 28
Slide 28 text
Conclusion
• Linformer, the efficient variant of Transformer
• Linformer project key and value into low dimension,
which decreases computational complexity from N to k
• When k is much smaller than N, complexity would be O(1)
• Experiments show that Linformer performs well even if
k is 128, (smaller than 512, the default sequence length of BERT)
28 / 29