Hyena Hierarchy: Towards Larger Convolutional Language Models D1, Graduate School of Informatics, Nagoya University, Japan Hayato Tsukagoshi Michael Poli, Stefano Massaroli, Eric Nguyen, Daniel Y. Fu, Tri Dao, Stephen Baccus,
• AttentionͷQKVͷΑ͏ͳػߏ͕ͳ͘ɺදݱྗ͕ൺֱతऑ͍ Gu+: E ff i ciently Modeling Long Sequences with Structured State Spaces. ICLR 2022 outstanding paper. ઌߦݚڀ: Structured State Space Sequence (S4) 46
• AttentionͷQKVͷΑ͏ͳػߏ͕ͳ͘ɺදݱྗ͕ൺֱతऑ͍ Gu+: E ff i ciently Modeling Long Sequences with Structured State Spaces. ICLR 2022 outstanding paper. ઌߦݚڀ: Structured State Space Sequence (S4) 47
• hybridϞσϧਪ͕AttentionʹҾͬுΒΕ͍ͯ Fu+: Hungry Hungry Hippos: Towards Language Modeling with State Space Models. ICLR 2023 spotlight. ઌߦݚڀ: Hungry Hungry Hippos (H3) 48
• hybridϞσϧਪ͕AttentionʹҾͬுΒΕ͍ͯ Fu+: Hungry Hungry Hippos: Towards Language Modeling with State Space Models. ICLR 2023 spotlight. ઌߦݚڀ: Hungry Hungry Hippos (H3) 49
• hybridϞσϧਪ͕AttentionʹҾͬுΒΕ͍ͯ Fu+: Hungry Hungry Hippos: Towards Language Modeling with State Space Models. ICLR 2023 spotlight. ઌߦݚڀ: Hungry Hungry Hippos (H3) 50
• hybridϞσϧਪ͕AttentionʹҾͬுΒΕ͍ͯ Fu+: Hungry Hungry Hippos: Towards Language Modeling with State Space Models. ICLR 2023 spotlight. ઌߦݚڀ: Hungry Hungry Hippos (H3) 51
•QK & V Ͱͳ͘ Q & KV Λܭࢉ͢Δ دΓಓ: Linear Attentionʹ͓͚ΔQKVܭࢉ 52 Q K V Q K V Attention Linear Attention Shen+: E ff i cient Attention: Attention with Linear Complexities. WACV 2021.
دΓಓ: Linear Attentionʹ͓͚ΔQKVܭࢉ 53 Q K V Q K V Attention Linear Attention •QK & V Ͱͳ͘ Q & KV Λܭࢉ͢Δ Shen+: E ff i cient Attention: Attention with Linear Complexities. WACV 2021.
دΓಓ: Linear Attentionʹ͓͚ΔQKVܭࢉ 54 QK V Q KV Attention Linear Attention O(N2d) O(Nd2) N N d d d N d N •QK & V Ͱͳ͘ Q & KV Λܭࢉ͢Δ Shen+: E ff i cient Attention: Attention with Linear Complexities. WACV 2021.
دΓಓ: Linear Attentionʹ͓͚ΔQKVܭࢉ 55 QK V Q KV Attention Linear Attention O(N2d) O(Nd2) N N d d d N d N •QK & V Ͱͳ͘ Q & KV Λܭࢉ͢Δ ܭࢉ͕͍ܰʂ Shen+: E ff i cient Attention: Attention with Linear Complexities. WACV 2021.
دΓಓ: Linear Attentionʹ͓͚ΔQKVܭࢉ 56 QK V Q KV Attention Linear Attention O(N2d) O(Nd2) N N d d d N d N •QK & V Ͱͳ͘ Q & KV Λܭࢉ͢Δ ܭࢉ͕͍ܰʂ Shen+: E ff i cient Attention: Attention with Linear Complexities. WACV 2021.
• on-the- fl yʹೖྗܥྻʹ߹Θͤͯຖճੜ Multi-scale Retention — Sun+: Retentive Network: A Successor to Transformer for Large Language Models. arXiv 2023. RoPE — Su+: RoFormer: Enhanced Transformer with Rotary Position Embedding. arXiv 2021. Hyena: ΈࠐΈϑΟϧλ 78 f = [h0, h1, h2, …, hN]
• on-the- fl yʹೖྗܥྻʹ߹Θͤͯຖճੜ Multi-scale Retention — Sun+: Retentive Network: A Successor to Transformer for Large Language Models. arXiv 2023. RoPE — Su+: RoFormer: Enhanced Transformer with Rotary Position Embedding. arXiv 2021. Hyena: ΈࠐΈϑΟϧλ 79 f = [h0, h1, h2, …, hN]
• on-the- fl yʹೖྗܥྻʹ߹Θͤͯຖճੜ Multi-scale Retention — Sun+: Retentive Network: A Successor to Transformer for Large Language Models. arXiv 2023. RoPE — Su+: RoFormer: Enhanced Transformer with Rotary Position Embedding. arXiv 2021. Hyena: ΈࠐΈϑΟϧλ 80 f = [h0, h1, h2, …, hN]