Upgrade to Pro — share decks privately, control downloads, hide ads and more …

SEA Model series Op.1: Saint Lupinus pre-release

Sponsored · Your Podcast. Everywhere. Effortlessly. Share. Educate. Inspire. Entertain. You do you. We'll handle the rest.
Avatar for Rikka Botan Rikka Botan
September 07, 2025

SEA Model series Op.1: Saint Lupinus pre-release

A novel architecture and framework to bridge the gap between Attention mechanism and SSMs: State Space Models

Avatar for Rikka Botan

Rikka Botan

September 07, 2025

More Decks by Rikka Botan

Other Decks in Programming

Transcript

  1. Introduction ❖Abstract Time-series modeling methods have been proposed in recent

    years to overcome the inefficiency of time-series models. However, at various downstream tasks, it has been revealed that these methods can not completely replace Attention mechanism. To such a situation, we propose a novel architecture and framework to bridge the gap between Attention mechanism and SSMs: State Space Models. Our methods, SEA Duality IRIS and DGF: Discrete Generic Filter, is aimed to integrate restricted states and interconnected states more efficiently. And also, these methods are aimed to provide a framework to incorporate with recent time-series modeling methods. This document is a pre release, therefore, it does not include detailed data and modeling code. These are coming soon. ❖Contents Model overview Main concepts Pseudo code
  2. ❖Structural Overview SEA Model Series Op.1: Saint Lupinus T’1 T’2

    T’3 T’4 T’5 T’6 T’7 T1 T2 T3 T4 T5 T6 T7 Embedding to vocab Embedding Decoder Block Selective Synthetic Reweighting Module Selective Efficient Adaptation Normalization Normalization Normalization Normalization ×N Liquid Convolution Module or ◆Main Components SSRM: Selective Synthesis Reweighting Module SLC: Substitution Liquid Convolution SEA Duality IRIS: Selective Efficient Adaptation by Duality to Integrate Restricted and Interconnected States
  3. ❖Concept 1: Kernel fused liquid module Saint Lupinus Model Main

    Concepts Deepseek Mixture of Experts implementation Saint Lupinus Mixture of Experts implementation (toward SSRM) A lot of lines ! Cited from https://github.com/huggingface/transformers/blob/main/src/transformers/models/deepseek_v3/modeling_deepseek_v3.py Only 1 line !
  4. ❖Concept 1: Kernel fused liquid module Saint Lupinus Model Main

    Concepts Kernel Fused Mixture of Experts Mathematical theoretic explanation ℎ = ෍ 𝑖=1 𝑁 𝑔𝑖 𝐹𝐹𝑁𝑖 𝑢 = ෍ 𝑖=1 𝑁 𝜎𝑔𝑖 ∙ 𝑊𝑖 ∙ 𝑢 = 𝜎[𝑔1 , … , 𝑔𝑁−1 , 𝑔𝑁 ] ∙ 𝑊1 , … , 𝑊𝑁−1 , 𝑊𝑁 ∙ 𝑢 𝐼𝑓 ℎ, 𝑢 ∈ 𝑅𝐵×𝐿×𝐻, 𝑔𝑖 ∈ 𝑅𝐵×𝐿×1, 𝑊𝑖 ∈ 𝑅𝐻×𝐻, 𝑡ℎ𝑒𝑛 𝐼𝑓 𝑔𝐺 ∈ 𝑅𝐵×𝐿×𝐺, 𝑊𝐺 ∈ 𝑅𝐺×𝐻×𝐻 𝑎𝑟𝑒 𝑔𝑖𝑣𝑒𝑛, 𝑡ℎ𝑒𝑛 ℎ = Einsum(𝑢, 𝜎𝑔𝐺 , 𝑊𝐺 ) Q.E.D. = Einsum u, 𝜎[𝑔1 , … , 𝑔𝑁−1 , 𝑔𝑁 , 𝑊1 , … , 𝑊𝑁−1 , 𝑊𝑁 )
  5. ❖Concept 1: Kernel fused liquid module Saint Lupinus Model Main

    Concepts Kernel Fused Mixture of Experts advantages 1. Efficient operations by using matrix multiplication General GPUs are designed to perform matrix multiplication faster than other operations. 2. Static operational graph by avoiding “for” and “if” Dynamic operational graph is difficult to optimize kernel operations. 3. Optimizations within the library is available Machine learning libraries, such as PyTorch, are highly optimized within itself.
  6. ❖Concept 2: Bridge the gap between Attention and State Space

    Models Saint Lupinus Model Main Concepts S4: Structured State Space Sequence Models Discrete form SEA Duality IRIS Discrete form (DGF: Discrete Generic Filter) 𝑦 = 𝑆4𝑝𝑎𝑟𝑎𝑚𝑒𝑡𝑟𝑖𝑐 𝛿, 𝑊 𝑝𝑜𝑠 , 𝑥 (𝑆4𝑝𝑎𝑟𝑎𝑚𝑒𝑡𝑟𝑖𝑐 : A Linear Input Varying System) Then, 𝑆4𝑝𝑎𝑟𝑎𝑚𝑒𝑡𝑟𝑖𝑐 will be given to below. (However, if the delta has convexity.) = ෍ 𝑊 𝑝𝑜𝑠 𝑠𝑜𝑓𝑡𝑝𝑙𝑢𝑠 𝑊𝑑 𝑥 𝑥 = ෍ ҧ 𝐴 ത 𝐵𝑥 ≈ 𝑆4 𝑆4𝑝𝑎𝑟𝑎𝑚𝑒𝑡𝑟𝑖𝑐 = 𝑥 ∗ ഥ 𝐾 = ෍ 𝑊 𝑝𝑜𝑠 𝛿𝑥
  7. ❖Concept 2: Bridge the gap between Attention and State Space

    Models Saint Lupinus Model Main Concepts SEA Duality IRIS from Discrete form to Any Selective Mechanism ∆𝑅, ∆𝑆, ∆𝐼 = 𝑆4𝑝𝑎𝑟𝑎𝑚𝑒𝑡𝑟𝑖𝑐 𝛿, 𝑊 𝑝𝑜𝑠 , 𝑅, 𝑆, 𝐼 Then, ASM(𝑆4𝑝𝑎𝑟𝑎𝑚𝑒𝑡𝑟𝑖𝑐 ) is equivalent to S6 (SSM + Selection) under certain conditions: “ROC: Repeated Operational-Cycles”. ∆𝑂 = 𝐴𝑆𝑀(∆𝑅, ∆𝑆, ∆𝐼, 𝑥) (ASM: Any Selective Mechanism) Therefore, SEA Duality IRIS has been shown to have a theoretical complexity like S6 only in cases where it is bridging with ASM.
  8. ❖Concept 3: Complexity consisting of the Base Space and the

    Excitation Space Saint Lupinus Model Main Concepts Duality between Base Space and Excitation Space like Electron Orbitals state-to-state abstract concept conversion internal relationship extraction Duality IRIS Detailed description ∆𝑅, ∆𝑆, ∆𝐼 = 𝑆4𝑝𝑎𝑟𝑎𝑚𝑒𝑡𝑟𝑖𝑐 𝛿, 𝑊 𝑝𝑜𝑠 , 𝑅, 𝑆, 𝐼 Then, ∆𝑅, ∆𝑆, ∆𝐼 represent the state of multiple tokens being combined. By dividing into multiple spaces on multiple tokens and single token, expressiveness and efficiency are improved.
  9. 14