Upgrade to Pro — share decks privately, control downloads, hide ads and more …

DecompSSM: A Decomposition-based State Space Mo...

DecompSSM: A Decomposition-based State Space Model for Multivariate Time-Series Forecasting | ICASSP 2026

Avatar for Neurogica

Neurogica

May 12, 2026

More Decks by Neurogica

Other Decks in Technology

Transcript

  1. A Decomposition-Based State Space Model for Multivariate Time-Series Forecasting Shunya

    Nagashima, Shuntaro Suzuki, Shuitsu Koyama, Shinnosuke Hirano Neurogica Inc., Japan @Neurogica Inc.
  2. Introduction: MTS forecasting: predict H future steps for M variables

    jointly ◼ MTS forecasting underlies real world-decision making ◼ Energy – power-load planning ◼ Weather – temperature & precipitation ◼ Finance – asset-price prediction ◼ Traffic – city-sensor flow @Neurogica Inc. Past(t) Forecast(H) # variables
  3. Background (1/3): Decomposition makes the timescale structure explicit ◼ A

    real-world series decomposes into three additive components ◼ Trend - long-term direction ◼ Seasonal - periodic oscillations ◼ Residual - short-term shocks, noise ◼ Each component lives on its own timescale @Neurogica Inc. Trend Original Seasonal Residual
  4. Background (2/3): No method is adaptive + specialized + end-to-end

    at once @Neurogica Inc. Strategy Adaptive Specialized End-to-end (i) Fixed moving average Autoformer[Wu+, NeurIPS21], FEDformer[Zhou+, ICML22], DLinear[Zeng+, AAAI23], TimeMixer[Wang, ICLR24] (ii) Latent disentanglement LaST[Wang+, NeurIPS22], CoST[Woo+, ICLR22] (iii) Pre-processing pipeline PPDformer[Wan+, ICASSP25] ∆ ∆ DecompSSM(Ours) yes yes yes DecompSSM(Ours) TimeMixer[Wang, ICLR24] CoST[Woo+, ICLR22] PPDformer[Wan+, ICASSP25]
  5. Transformer SSM Origin NLP (2017) Control theory (1960s) Designed for

    Discrete tokens Continuous signals Compute O(T2) (Attention) O(T) (Recurrence) Time-series fit Borrowed Native Background (3/3): Why an SSM, not a Transformer? @Neurogica Inc. Transformer vs SSM ◼ MTS forecasting today: dominated by Transformers ◼ Informer, Autoformer, FEDformer, PatchTST, iTransformer… ◼ SSMs in MTS forecasting: gradually emerging since 2024 ◼ Mamba family: TimeMachine, S-Mamba SSMs are natively suited to time-series
  6. Proposed method: DecompSSM A decomposition-based SSM for MTS forecasting @Neurogica

    Inc. ◼ GT-SSM: three specialized + adaptive branches → trend / seasonal / residual ◼ GCRM: shares cross-variable context across branches ◼ ADL: decomposition loss, trained end-to-end
  7. Proposed method (1/3): GT-SSM for trend / seasonal / residual

    decomposition @Neurogica Inc. ◼ Each branch = S5 (MIMO, unlike Mamba's SISO) — a natural fit for MTS ◼ But S5 is not input-dependent (unlike Mamba) → ASP makes Δ input-dependent ◼ Per-branch ASP init (slow / mid / fast Δ) → trend / seasonal / residual
  8. Proposed method (1/3): GT-SSM for trend / seasonal / residual

    decomposition @Neurogica Inc. ◼ Each branch = S5 (MIMO, unlike Mamba's SISO) — a natural fit for MTS ◼ But S5 is not input-dependent (unlike Mamba) → ASP makes Δ input-dependent ◼ Per-branch ASP init (slow / mid / fast Δ) → trend / seasonal / residual ▪ Input-dependent timescale ▪ . controls the S5 timescale ▪ Small Δ′ → slow, large Δ′ → fast
  9. Proposed method (2/3): GCRM re-aligns variables with a shared global

    context @Neurogica Inc. ◼ Per-variable states drift → components get misassigned across branches ◼ GCRM: mean over variables → broadcast back as residual correction ▪ : # variables ▪ : linear projection ▪ : learned gate ▪ : average over variables \begin{align*} W_g \times \mathrm{mean}_M(H) \end{align*} \begin{align*} W_g \begin{align*} \begin{align*} \begin{align*} W_g \begin{align*} % W_g \begin{align*} W_g \times \mathrm{mean}_M(H)
  10. Proposed method (3/3): ADL keeps the decomposition meaningful @Neurogica Inc.

    ◼ Forecast loss alone doesn't force branches to differ → they may collapse ◼ ADL = reconstruction (sum ≈ input) + orthogonality (different directions) Trend + Seasonal + Residual ≈ input Reconstruction loss Orthogonality loss Target Sum of branches Trend Seasonal Residual Trend Sum Seasonal Residual
  11. Proposed method (3/3): ADL keeps the decomposition meaningful @Neurogica Inc.

    ◼ Forecast loss alone doesn't force branches to differ → they may collapse ◼ ADL = reconstruction (sum ≈ input) + orthogonality (different directions) Trend + Seasonal + Residual ≈ input Reconstruction loss Orthogonality loss Target Sum of branches Trend Seasonal Residual Trend Sum Seasonal Residual
  12. Experiments: Four datasets, seven baselines, standard protocol @Neurogica Inc. ◼

    Datasets ◼ Baseline methods ◼ Transformer-based: Autoformer, PatchTST, iTransformer, PPDformer ◼ Linear-based: Dlinear, HDMixer ◼ TCN-based: TimesNet Dataset Domain # Variates(M) Sampling ECL Electricity load 321 1 hour Weather 21 weather variables 21 10 min ETTm2 Electricity transformer temperature 7 15 min PEMS04 Traffic flow (CA sensors) 307 5 min
  13. Quantitative results: DecompSSM outperforms all 7 baselines on all 4

    datasets @Neurogica Inc. Dataset Metric Autoformer DLinear TimesNet PatchTST iTransformer HDMixer PPDform er DecompS SM ECL MSE↓ 0.227 ± 0.022 0.231 ± 0.022 0.216 ± 0.049 0.216 ± 0.028 0.176 ± 0.026 0.194 ± 0.020 0.168 ± 0.025 0.167 ± 0.026 MAE↓ 0.338 ± 0.018 0.323 ± 0.022 0.311 ± 0.035 0.304 ± 0.024 0.268 ± 0.025 0.294 ± 0.018 0.267 ± 0.025 0.261 ± 0.025 Weather MSE↓ 0.338 ± 0.066 0.266 ± 0.065 0.258 ± 0.080 0.258 ± 0.076 0.261 ± 0.077 0.255 ± 0.080 0.246 ± 0.080 0.242 ± 0.082 MAE↓ 0.382 ± 0.039 0.316 ± 0.055 0.285 ± 0.055 0.280 ± 0.055 0.282 ± 0.056 0.282 ± 0.057 0.276 ± 0.059 0.270 ± 0.061 ETTm2 MSE↓ 0.311 ± 0.089 0.353 ± 0.149 0.298 ± 0.100 0.289 ± 0.100 0.292 ± 0.097 0.286 ± 0.095 0.285 ± 0.095 0.280 ± 0.098 MAE↓ 0.354 ± 0.052 0.401 ± 0.095 0.334 ± 0.060 0.333 ± 0.061 0.336 ± 0.058 0.330 ± 0.057 0.329 ± 0.058 0.320 ± 0.060 PEMS04 MSE↓ 0.610 ± 0.224 0.295 ± 0.135 0.129 ± 0.045 0.195 ± 0.082 0.120 ± 0.040 0.163 ± 0.072 0.104 ± 0.029 0.103 ± 0.035 MAE↓ 0.589 ± 0.118 0.388 ± 0.103 0.241 ± 0.047 0.307 ± 0.072 0.232 ± 0.041 0.281 ± 0.068 0.213 ± 0.031 0.212 ± 0.040 Lower is better. Bold = best, underline = second-best. MSE / MAE on the test set, averaged over the four prediction horizons H € {96, 192, 336, 720} (12/24/48/96 for PEMS04)
  14. Ablation (1/2): Modules Effect of removing each component @Neurogica Inc.

    GT-SSM is the key module; ADL and GCRM also contribute Each row is one prediction horizon H. Lower is better; bold = best, underline = second best. # Variates(H) (i) w/o GT-SSM (ii) w/o ADL (iii) w/o GCRM DecompSSM MSE MAE MSE MAE MSE MAE MSE MAE 96 0.184 0.223 0.156 0.202 0.159 0.205 0.155 0.201 192 0.230 0.262 0.204 0.246 0.207 0.251 0.204 0.246 336 0.285 0.301 0.269 0.293 0.271 0.297 0.264 0.291 720 0.360 0.350 0.355 0.350 0.353 0.348 0.345 0.343 (i) w/o GT-SSM - largest drop observed across all horizons (ii) w/o ADL - degraded performance without auxiliary decomposition loss (iii) w/o GCRM - degraded performance without global context refinement
  15. Ablation (2/2): Replacing S5 with alternative sequence models @Neurogica Inc.

    S5 is particularly well suited to MTS forecasting Each row is one prediction horizon H. Lower is better; bold = best, underline = second best. # Variates(H) (i) Attention (ii) Mamba (iii) Mamba-2 S5 (DecompSSM) MSE MAE MSE MAE MSE MAE MSE MAE 96 0.165 0.211 0.161 0.208 0.159 0.203 0.155 0.201 192 0.212 0.253 0.209 0.251 0.208 0.250 0.204 0.246 336 0.271 0.296 0.267 0.293 0.268 0.296 0.264 0.291 720 0.353 0.349 0.355 0.351 0.359 0.356 0.345 0.343 (i) Attention - largest drop observed across all horizons (ii) Mamba - degraded performance relative to the S5-based design (iii) Mamba-2 - competitive at short horizons, but still degraded
  16. Recap ◼ Background ◼ Real-world MTS mixes multiple timescales ◼

    Prior methods miss at least one key property ◼ Proposed method: DecompSSM ◼ (specialized) ◼ Three S5 branches + frequency priors ◼ (adaptive) ◼ ASP for input-dependent timescales ◼ (end-to-end) ◼ ADL for reconstruction + separation ◼ Results ◼ Best on all 8 / 8 settings against 7 baselines on 4 datasets Our code is available here! @Neurogica Inc.