DecompSSM: A Decomposition-based State Space Model for Multivariate Time-Series Forecasting | ICASSP 2026

A Decomposition-Based State Space Model for Multivariate Time-Series Forecasting Shunya
Nagashima, Shuntaro Suzuki, Shuitsu Koyama, Shinnosuke Hirano Neurogica Inc., Japan © Neurogica Inc.

Introduction: MTS forecasting: predict H future steps for M variables
jointly ▪ MTS forecasting underlies real world-decision making ▪ Energy – power-load planning ▪ Weather – temperature & precipitation ▪ Finance – asset-price prediction ▪ Traffic – city-sensor flow © Neurogica Inc. Past(t ) Forecast( H) # variables

Background (1/3): Decomposition makes the timescale structure explicit ▪ A
real-world series decomposes into three additive components ▪ Trend - long-term direction ▪ Seasonal - periodic oscillations ▪ Residual - short-term shocks, noise ▪ Each component lives on its own timescale © Neurogica Inc. Trend Original Seasonal Residual

Background (2/3): No method is adaptive + specialized + end-to-end
at once © Neurogica Inc. Strategy Adaptive Specialized End-to-end (i) Fixed moving average Autoformer[Wu+, NeurIPS21], FEDformer[Zhou+, ICML22], DLinear[Zeng+, AAAI23], TimeMixer[Wang, ICLR24] ✖ ✖ ✔ (ii) Latent disentanglement LaST[Wang+, NeurIPS22], CoST[Woo+, ICLR22] ✔ ✖ ✔ (iii) Pre-processing pipeline PPDformer[Wan+, ICASSP25] ∆ ∆ ✖ DecompSSM(Ours) yes yes yes DecompSSM(Ours) ✔ ✔ ✔ TimeMixer[Wang, ICLR24] CoST[Woo+, ICLR22] PPDformer[Wan+, ICASSP25]

Transformer SSM Origin NLP (2017) Control theory (1960s) Designed for
Discrete tokens Continuous signals Compute O(T2) (Attention) O(T) (Recurrence) Time-series fit Borrowed Native Background (3/3): Why an SSM, not a Transformer? © Neurogica Inc. Transformer vs SSM ▪ MTS forecasting today: dominated by Transformers ▪ Informer, Autoformer, FEDformer, PatchTST, iTransformer… ▪ SSMs in MTS forecasting: gradually emerging since 2024 ▪ Mamba family: TimeMachine, S-Mamba SSMs are natively suited to time-series

Proposed method: DecompSSM A decomposition-based SSM for MTS forecasting ©
Neurogica Inc. ▪ GT-SSM: three specialized + adaptive branches → trend / seasonal / residual ▪ GCRM: shares cross-variable context across branches ▪ ADL: decomposition loss, trained end-to-end

Proposed method (1/3): GT-SSM for trend / seasonal / residual
decomposition © Neurogica Inc. ▪ Each branch = S5 (MIMO, unlike Mamba's SISO) — a natural fit for MTS ▪ But S5 is not input-dependent (unlike Mamba) → ASP makes Δ input-dependent ▪ Per-branch ASP init (slow / mid / fast Δ) → trend / seasonal / residual

Proposed method (1/3): GT-SSM for trend / seasonal / residual
decomposition © Neurogica Inc. ▪ Each branch = S5 (MIMO, unlike Mamba's SISO) — a natural fit for MTS ▪ But S5 is not input-dependent (unlike Mamba) → ASP makes Δ input-dependent ▪ Per-branch ASP init (slow / mid / fast Δ) → trend / seasonal / residual ▪ Input-dependent timescale ▪ . controls the S5 timescale ▪ Small Δ′ → slow, large Δ′ → fast Predictor

Proposed method (2/3): GCRM re-aligns variables with a shared global
context © Neurogica Inc. ▪ Per-variable states drift → components get misassigned across branches ▪ GCRM: mean over variables → broadcast back as residual correction ▪ : # variables ▪ : linear projection ▪ : learned gate ▪ : average over variables

Proposed method (3/3): ADL keeps the decomposition meaningful © Neurogica
Inc. ▪ Forecast loss alone doesn't force branches to differ → they may collapse ▪ ADL = reconstruction (sum ≈ input) + orthogonality (different directions) Trend + Seasonal + Residual ≈ input Reconstruction loss Orthogonality loss Targe t Sum of branches Trend Seasonal Residua l Trend Su m Seasonal Residua l

Experiments: Four datasets, seven baselines, standard protocol © Neurogica Inc.
▪ Datasets ▪ Baseline methods ▪ Transformer-based: Autoformer, PatchTST, iTransformer, PPDformer ▪ Linear-based: Dlinear, HDMixer ▪ TCN-based: TimesNet Dataset Domain # Variates(M) Sampling ECL Electricity load 321 1 hour Weather 21 weather variables 21 10 min ETTm2 Electricity transformer temperature 7 15 min PEMS04 Traffic flow (CA sensors) 307 5 min

Quantitative results: DecompSSM outperforms all 7 baselines on all 4
datasets © Neurogica Inc. Dataset Metric Autoformer DLinea r TimesNet PatchTS T iTransforme r HDMixe r PPDform er DecompS SM ECL MSE↓ 0.227 ± 0.022 0.231 ± 0.022 0.216 ± 0.049 0.216 ± 0.028 0.176 ± 0.026 0.194 ± 0.020 0.168 ± 0.025 0.167 ± 0.026 MAE↓ 0.338 ± 0.018 0.323 ± 0.022 0.311 ± 0.035 0.304 ± 0.024 0.268 ± 0.025 0.294 ± 0.018 0.267 ± 0.025 0.261 ± 0.025 Weather MSE↓ 0.338 ± 0.066 0.266 ± 0.065 0.258 ± 0.080 0.258 ± 0.076 0.261 ± 0.077 0.255 ± 0.080 0.246 ± 0.080 0.242 ± 0.082 MAE↓ 0.382 ± 0.039 0.316 ± 0.055 0.285 ± 0.055 0.280 ± 0.055 0.282 ± 0.056 0.282 ± 0.057 0.276 ± 0.059 0.270 ± 0.061 ETTm2 MSE↓ 0.311 ± 0.089 0.353 ± 0.149 0.298 ± 0.100 0.289 ± 0.100 0.292 ± 0.097 0.286 ± 0.095 0.285 ± 0.095 0.280 ± 0.098 MAE↓ 0.354 ± 0.052 0.401 ± 0.095 0.334 ± 0.060 0.333 ± 0.061 0.336 ± 0.058 0.330 ± 0.057 0.329 ± 0.058 0.320 ± 0.060 PEMS04 MSE↓ 0.610 ± 0.224 0.295 ± 0.135 0.129 ± 0.045 0.195 ± 0.082 0.120 ± 0.040 0.163 ± 0.072 0.104 ± 0.029 0.103 ± 0.035 MAE↓ 0.589 ± 0.118 0.388 ± 0.103 0.241 ± 0.047 0.307 ± 0.072 0.232 ± 0.041 0.281 ± 0.068 0.213 ± 0.031 0.212 ± 0.040 Lower is better. Bold = best, underline = second-best. MSE / MAE on the test set, averaged over the four prediction horizons H € {96, 192, 336, 720} (12/24/48/96 for PEMS04)

Ablation (1/2): Modules Effect of removing each component © Neurogica
Inc. GT-SSM is the key module; ADL and GCRM also contribute Each row is one prediction horizon H. Lower is better; bold = best, underline = second best. # Variates(H) (i) w/o GT-SSM (ii) w/o ADL (iii) w/o GCRM DecompSSM MSE MAE MSE MAE MSE MAE MSE MAE 96 0.184 0.223 0.156 0.202 0.159 0.205 0.155 0.201 192 0.230 0.262 0.204 0.246 0.207 0.251 0.204 0.246 336 0.285 0.301 0.269 0.293 0.271 0.297 0.264 0.291 720 0.360 0.350 0.355 0.350 0.353 0.348 0.345 0.343 (i) w/o GT-SSM - largest drop observed across all horizons (ii) w/o ADL - degraded performance without auxiliary decomposition loss (iii) w/o GCRM - degraded performance without global context refinement

Ablation (2/2): Replacing S5 with alternative sequence models © Neurogica
Inc. S5 is particularly well suited to MTS forecasting Each row is one prediction horizon H. Lower is better; bold = best, underline = second best. # Variates(H) (i) Attention (ii) Mamba (iii) Mamba-2 S5 (DecompSSM) MSE MAE MSE MAE MSE MAE MSE MAE 96 0.165 0.211 0.161 0.208 0.159 0.203 0.155 0.201 192 0.212 0.253 0.209 0.251 0.208 0.250 0.204 0.246 336 0.271 0.296 0.267 0.293 0.268 0.296 0.264 0.291 720 0.353 0.349 0.355 0.351 0.359 0.356 0.345 0.343 (i) Attention - largest drop observed across all horizons (ii) Mamba - degraded performance relative to the S5-based design (iii) Mamba-2 - competitive at short horizons, but still degraded

Recap ▪ Background ▪ Real-world MTS mixes multiple timescales ▪
Prior methods miss at least one key property ▪ Proposed method: DecompSSM ▪ (specialized) ▪ Three S5 branches + frequency priors ▪ (adaptive) ▪ ASP for input-dependent timescales ▪ (end-to-end) ▪ ADL for reconstruction + separation ▪ Results ▪ Best on all 8 / 8 settings against 7 baselines on 4 datasets Our code is available here! © Neurogica Inc.

Proposed method(2/3): GCRM injects shared context into each variable @Neurogica
Inc. Global context keeps variable-wise states aligned ▪ Per-variable states can drift ▪ Noise or missing values can misalign components ▪ GCRM: global summary fed back to each variable

Proposed method(3/5): GT-SSM + ASP for input-dependent timescales GT-SSM =
parallel S5 branches + ASP for input-dependent timescales ▪ ASP modulates each branch inside GT-SSM

Proposed method(3/3): ADL = forecast loss + reconstruction + orthogonality
@Neurogica Inc. ▪ Without ADL, branches may collapse ▪ They can learn overlapping representations ▪ Reconstruction loss ▪ Orthogonality loss

Proposed method(1/5): Why S5, not Mamba? @Neurogica Inc. Mamba [Gu+,
COLM24], SISO S5 [Smith+, ICLR23] MIMO no cross-variable mixing ☹ cross-variable mixing in the state ☹ We use S5 (MIMO) - Mamba (SISO) loses cross-variable interaction

Proposed method(4/4): ADL keeps the decomposition meaningful @Neurogica Inc. ADL
= forecast loss + reconstruction + orthogonality ▪ Without constraints, branches can collapse ▪ They may learn the same representation ▪ Reconstruction loss ▪ Orthogonality loss

DecompSSM: A Decomposition-based State Space Mo...

DecompSSM: A Decomposition-based State Space Model for Multivariate Time-Series Forecasting | ICASSP 2026

Neurogica

More Decks by Neurogica

Other Decks in Technology

Featured

Transcript

A Decomposition-Based State Space Model for Multivariate Time-Series Forecasting Shunya

Introduction: MTS forecasting: predict H future steps for M variables

Background (1/3): Decomposition makes the timescale structure explicit ▪ A

Background (2/3): No method is adaptive + specialized + end-to-end

Transformer SSM Origin NLP (2017) Control theory (1960s) Designed for

Proposed method: DecompSSM A decomposition-based SSM for MTS forecasting ©

Proposed method (1/3): GT-SSM for trend / seasonal / residual

Proposed method (1/3): GT-SSM for trend / seasonal / residual

Proposed method (2/3): GCRM re-aligns variables with a shared global

Proposed method (3/3): ADL keeps the decomposition meaningful © Neurogica

Proposed method (3/3): ADL keeps the decomposition meaningful © Neurogica

Experiments: Four datasets, seven baselines, standard protocol © Neurogica Inc.

Quantitative results: DecompSSM outperforms all 7 baselines on all 4

Ablation (1/2): Modules Effect of removing each component © Neurogica

Ablation (2/2): Replacing S5 with alternative sequence models © Neurogica

Recap ▪ Background ▪ Real-world MTS mixes multiple timescales ▪

Proposed method(2/3): GCRM injects shared context into each variable @Neurogica

Proposed method(3/5): GT-SSM + ASP for input-dependent timescales GT-SSM =

Proposed method(3/3): ADL = forecast loss + reconstruction + orthogonality

Proposed method(1/5): Why S5, not Mamba? @Neurogica Inc. Mamba [Gu+,

Proposed method(4/4): ADL keeps the decomposition meaningful @Neurogica Inc. ADL