Andrew Warrington, Scott W. Linderman (Stanford University) 慶應義塾大学 杉浦孔明研究室 D1 和田唯我 Jimmy T.H. Smith et al., “Simplified State Space Layers for Sequence Modeling” in ICLR (2023) ICLR23
• Weaknesses • 状態サイズが小さいので,S4に劣るタスクもあり • Comments • 結局Mambaでは,SISOのまま高速にparallel scanを行っていて,Tri Daoって偉大だ なという気持ち.アルゴリズムは全てを解決する! > Our proposed S6 shares the scan, but differs by (i) keeping the SISO dimensions, which provides a larger effective recurrent state, (ii) using a hardware-aware algorithm to overcome the computation issue, (iii) adding the selection mechanism. 引用: Mamba [Gu+, 24]