Inc. ▪ Forecast loss alone doesn't force branches to differ → they may collapse ▪ ADL = reconstruction (sum ≈ input) + orthogonality (different directions) Trend + Seasonal + Residual ≈ input Reconstruction loss Orthogonality loss Targe t Sum of branches Trend Seasonal Residua l Trend Su m Seasonal Residua l
Inc. ▪ Forecast loss alone doesn't force branches to differ → they may collapse ▪ ADL = reconstruction (sum ≈ input) + orthogonality (different directions) Trend + Seasonal + Residual ≈ input Reconstruction loss Orthogonality loss Targe t Sum of branches Trend Seasonal Residua l Trend Su m Seasonal Residua l
Inc. GT-SSM is the key module; ADL and GCRM also contribute Each row is one prediction horizon H. Lower is better; bold = best, underline = second best. # Variates(H) (i) w/o GT-SSM (ii) w/o ADL (iii) w/o GCRM DecompSSM MSE MAE MSE MAE MSE MAE MSE MAE 96 0.184 0.223 0.156 0.202 0.159 0.205 0.155 0.201 192 0.230 0.262 0.204 0.246 0.207 0.251 0.204 0.246 336 0.285 0.301 0.269 0.293 0.271 0.297 0.264 0.291 720 0.360 0.350 0.355 0.350 0.353 0.348 0.345 0.343 (i) w/o GT-SSM - largest drop observed across all horizons (ii) w/o ADL - degraded performance without auxiliary decomposition loss (iii) w/o GCRM - degraded performance without global context refinement
Inc. S5 is particularly well suited to MTS forecasting Each row is one prediction horizon H. Lower is better; bold = best, underline = second best. # Variates(H) (i) Attention (ii) Mamba (iii) Mamba-2 S5 (DecompSSM) MSE MAE MSE MAE MSE MAE MSE MAE 96 0.165 0.211 0.161 0.208 0.159 0.203 0.155 0.201 192 0.212 0.253 0.209 0.251 0.208 0.250 0.204 0.246 336 0.271 0.296 0.267 0.293 0.268 0.296 0.264 0.291 720 0.353 0.349 0.355 0.351 0.359 0.356 0.345 0.343 (i) Attention - largest drop observed across all horizons (ii) Mamba - degraded performance relative to the S5-based design (iii) Mamba-2 - competitive at short horizons, but still degraded
Inc. Global context keeps variable-wise states aligned ▪ Per-variable states can drift ▪ Noise or missing values can misalign components ▪ GCRM: global summary fed back to each variable
COLM24], SISO S5 [Smith+, ICLR23] MIMO no cross-variable mixing ☹ cross-variable mixing in the state ☹ We use S5 (MIMO) - Mamba (SISO) loses cross-variable interaction
= forecast loss + reconstruction + orthogonality ▪ Without constraints, branches can collapse ▪ They may learn the same representation ▪ Reconstruction loss ▪ Orthogonality loss