years to overcome the inefficiency of time-series models. However, at various downstream tasks, it has been revealed that these methods can not completely replace Attention mechanism. To such a situation, we propose a novel architecture and framework to bridge the gap between Attention mechanism and SSMs: State Space Models. Our methods, SEA Duality IRIS and DGF: Discrete Generic Filter, is aimed to integrate restricted states and interconnected states more efficiently. And also, these methods are aimed to provide a framework to incorporate with recent time-series modeling methods. This document is a pre release, therefore, it does not include detailed data and modeling code. These are coming soon. ❖Contents Model overview Main concepts Pseudo code
Concepts Deepseek Mixture of Experts implementation Saint Lupinus Mixture of Experts implementation (toward SSRM) A lot of lines ! Cited from https://github.com/huggingface/transformers/blob/main/src/transformers/models/deepseek_v3/modeling_deepseek_v3.py Only 1 line !
Concepts Kernel Fused Mixture of Experts advantages 1. Efficient operations by using matrix multiplication General GPUs are designed to perform matrix multiplication faster than other operations. 2. Static operational graph by avoiding “for” and “if” Dynamic operational graph is difficult to optimize kernel operations. 3. Optimizations within the library is available Machine learning libraries, such as PyTorch, are highly optimized within itself.
Models Saint Lupinus Model Main Concepts S4: Structured State Space Sequence Models Discrete form SEA Duality IRIS Discrete form (DGF: Discrete Generic Filter) 𝑦 = 𝑆4𝑝𝑎𝑟𝑎𝑚𝑒𝑡𝑟𝑖𝑐 𝛿, 𝑊 𝑝𝑜𝑠 , 𝑥 (𝑆4𝑝𝑎𝑟𝑎𝑚𝑒𝑡𝑟𝑖𝑐 : A Linear Input Varying System) Then, 𝑆4𝑝𝑎𝑟𝑎𝑚𝑒𝑡𝑟𝑖𝑐 will be given to below. (However, if the delta has convexity.) = 𝑊 𝑝𝑜𝑠 𝑠𝑜𝑓𝑡𝑝𝑙𝑢𝑠 𝑊𝑑 𝑥 𝑥 = ҧ 𝐴 ത 𝐵𝑥 ≈ 𝑆4 𝑆4𝑝𝑎𝑟𝑎𝑚𝑒𝑡𝑟𝑖𝑐 = 𝑥 ∗ ഥ 𝐾 = 𝑊 𝑝𝑜𝑠 𝛿𝑥
Models Saint Lupinus Model Main Concepts SEA Duality IRIS from Discrete form to Any Selective Mechanism ∆𝑅, ∆𝑆, ∆𝐼 = 𝑆4𝑝𝑎𝑟𝑎𝑚𝑒𝑡𝑟𝑖𝑐 𝛿, 𝑊 𝑝𝑜𝑠 , 𝑅, 𝑆, 𝐼 Then, ASM(𝑆4𝑝𝑎𝑟𝑎𝑚𝑒𝑡𝑟𝑖𝑐 ) is equivalent to S6 (SSM + Selection) under certain conditions: “ROC: Repeated Operational-Cycles”. ∆𝑂 = 𝐴𝑆𝑀(∆𝑅, ∆𝑆, ∆𝐼, 𝑥) (ASM: Any Selective Mechanism) Therefore, SEA Duality IRIS has been shown to have a theoretical complexity like S6 only in cases where it is bridging with ASM.
Excitation Space Saint Lupinus Model Main Concepts Duality between Base Space and Excitation Space like Electron Orbitals state-to-state abstract concept conversion internal relationship extraction Duality IRIS Detailed description ∆𝑅, ∆𝑆, ∆𝐼 = 𝑆4𝑝𝑎𝑟𝑎𝑚𝑒𝑡𝑟𝑖𝑐 𝛿, 𝑊 𝑝𝑜𝑠 , 𝑅, 𝑆, 𝐼 Then, ∆𝑅, ∆𝑆, ∆𝐼 represent the state of multiple tokens being combined. By dividing into multiple spaces on multiple tokens and single token, expressiveness and efficiency are improved.