A decoder-only foundation model for time-series forecasting

論文紹介 A decoder-only foundation model for time-series forecasting

1 論文概要  論文タイトル – A decoder-only foundation model for
time-series forecasting  著者 –  発表 – arxiv  https://arxiv.org/abs/2310.10688 – Google Research Blog Post  https://blog.research.google/2024/02/a-decoder-only-foundation-model-for.html  概要 – 時系列予測のための Foundation Model である TimesFM を提案した論文

2 Introduction  時系列予測分野における深層学習 – 深層学習モデル（DeepAR[Salinas+ 2020], N-BEATS[Oreshkin+ 2019]）が
古典的な統計的アプローチ（ARIMA, GARCH[Box+ 1968]）を凌駕することもある  自然言語処理におけるZero-shot性能を持つ大規模な Foundation Model  時系列予測のためのZero-shot性能を持つFoundation Modelの設計 – 課題  時系列には明確に定義された語彙や文法は無い  膨大な量の時系列データは容易に入手できない [Salinas+ 2020] Salinas, D., Flunkert, V., Gasthaus, J., and Januschowski, T. Deepar: Probabilistic forecasting with autoregressive recurrent networks. International Journal of Forecasting, 36(3):1181–1191, 2020. [Oreshkin+ 2019] Oreshkin, B. N., Carpov, D., Chapados, N., and Bengio, Y. N-beats: Neural basis expansion analysis for interpretable time series forecasting. In International Conference on Learning Representations, 2019. [Box+ 1968] Box, G. E. and Jenkins, G. M. Some recent advances in forecasting and control. Journal of the Royal Statistical Society. Series C (Applied Statistics), 17(2):91–109, 1968.

3 TimesFM  時系列予測のためのFoundation ModelであるTimesFMを設計 – 実データと合成データで構築された大規模な時系列コーパス – 効率的に事前学習する入力パッチを用いたデコーダ形式のAttention architecture
 TimesFMはパラメータサイズ・事前学習データサイズともに近年のLLMより小さい – パラメータサイズ：200M params – 事前学習データサイズ：100B time-points

4 Related Work  Local univariate models（ローカル単変量モデル） – ARIMA，指数平滑化法[McKenzie 1984]，Prophet[Taylor+
2018]  Global univariate models（グローバル単変量モデル） – DeepAR[Salinas+ 2020]，Temporal Convolutions[Borovykh+ 2017]， N-BEATS[Oreshkin+ 2019]，Long-term forcasting models[Nie+ 2022, Das+ 2023]  Global multi-variate models（グローバル多変量モデル） – 古典的なVAR model[Zivot+ 2006]，Deep Learning Models[Sen+ 2019, Zhou+ 2022;2021]  上記の研究はいずれも Foundation Model の開発を目的としていない

5 Related Work  時系列予測のための大規模言語モデルを再利用する研究 – LLMTime[Gruver+ 2023]はGPT-3やLLaMA-2のZero-shot予測性能のベンチマーク – GPT-2を時系列予測タスクでfine-tuneする研究[Zhou+
2023, Chang+ 2023]  時系列予測のFoundation Modelに関する研究：TimeGPT-1 [Garza+ 2023] – モデルの詳細やベンチマーク・データセットは不明 [Gruver+ 2023] Gruver, N., Finzi, M., Qiu, S., and Wilson, A. G. Large language models are zero-shot time series forecasters. arXiv preprint arXiv:2310.07820, 2023. [Zhou+ 2023] Zhou, T., Niu, P., Wang, X., Sun, L., and Jin, R. One fits all: Power general time series analysis by pretrained lm. arXiv preprint arXiv:2302.11939, 2023. [Chang+ 2023] Chang, C., Peng, W.-C., and Chen, T.-F. Llm4ts: Two-stage fine-tuning for time-series forecasting with pre-trained llms. arXiv preprint arXiv:2308.08469, 2023. [Garza+ 2023] Garza, A. and Mergenthaler-Canseco, M. Timegpt-1. arXiv preprint arXiv:2310.03589, 2023.

6 Problem Definition  時系列の過去𝐿𝐿個の time-points を contextとして将来の𝐻𝐻個の time-points を予測する
 予測の精度はMAEなどの実際の値との近さによって測定

7 Model Architecture (TimesFM)  時系列をパッチ分解しDecoder-only transformerで処理

8 Model Architecture (TimesFM)  Patching – PatchTST[Nie+ 2022]を参考に時系列をパッチに分解 
Decoder-only model – PatchTST[Nie+ 2022]との違い  Longer output patches – long-horizon予測では全horizonを直接予測することがmulti-step予測より精度が高い[Zeng+ 2023] – 入力パッチより長い出力パッチを使用  zero-shot設定で多様なhorizon長に対応  Patch Masking – 入力パッチ長の倍数以外の入力に対応するために訓練中にパッチの一部をマスキングする

9 Pretraining (TimesFM)  事前学習データセット – Google trends  2007年～2022年までの15年間のsearch
interestに基づいて約22kのヘッドクエリを選択 – time-pointsの粒度は時間，日，週，月  1Bのtime-points – Wiki Pageview statistics  2012年1月～2023年11月のWikipediaの全ページの1時間ごとの閲覧数 – time-pointsの粒度を時間，日，週，月にクリーニングして集約  100Bのtime-points – 合成データ  ARMA[McKenzie 1984]，季節パターン（sine, cos），トレンド（linear, exponential），ステップ関数の組合せ  2048個のtime-pointsをもつ3M個の合成時系列を生成

10 Pretraining (TimesFM)  事前学習データセット – その他実データ  M4 dataset
[Makridakis+ 2022]  Traffic dataset [Zhou+ 2021]  Weather dataset [Zou+2021]  traffic time-series [Wang+ 2023]  データセットの混合 – 40%の実データと60%の合成データ – 実データ  time-pointsの粒度をそれぞれ等しくサンプリング – コンテキスト長は512（週単位は256，月単位は64） – 各時系列はコンテキスト内の最初の入力パッチの平均と標準偏差で標準化 [Kim+ 2021] – 学習はMSE Lossを使用

11 Empirical Results  TimesFMは多様なデータセットにおいて Zero-shotでありながら教師ありの予測モデルに迫る精度を示す – TimesFMとllmtimeはzero-shot

12 Conclusions  時系列予測のためのFoundation ModelであるTimesFMを設計 – 約200M paramsのdecoder形式のAttention architecture –
約100Bのtime-pointsをもつ実データと合成データで事前学習 – 多様なデータセットにおいて教師ありモデルの精度に迫るzero-shot性能を達成

参考資料 13

14 Related Work  文献詳細 [McKenzie 1984] McKenzie, E. General
exponential smoothing and the equivalent arma process. Journal of Forecasting, 3(3): 333–344, 1984. [Taylor+ 2018] Taylor, S. J. and Letham, B. Forecasting at scale. The American Statistician, 72(1):37–45, 2018. [Salinas+ 2020] Salinas, D., Flunkert, V., Gasthaus, J., and Januschowski, T. Deepar: Probabilistic forecasting with autoregressive recurrent networks. International Journal of Forecasting, 36(3):1181–1191, 2020. [Borovykh+ 2017] Borovykh, A., Bohte, S., and Oosterlee, C. W. Conditional time series forecasting with convolutional neural networks. arXiv preprint arXiv:1703.04691, 2017. [Oreshkin+ 2019] Oreshkin, B. N., Carpov, D., Chapados, N., and Bengio, Y. N-beats: Neural basis expansion analysis for interpretable time series forecasting. In International Conference on Learning Representations, 2019. [Nie+ 2022] Nie, Y., Nguyen, N. H., Sinthong, P., and Kalagnanam, J. A time series is worth 64 words: Long-term forecasting with transformers. International conference on learning representations, 2022. [Das+ 2023] Das, A., Kong, W., Leach, A., Mathur, S. K., Sen, R., and Yu, R. Long-term forecasting with TiDE: Timeseries dense encoder. Transactions on Machine Learning Research, 2023. ISSN 2835-8856. URL https: //openreview.net/forum?id=pCbC3aQB5W. [Zivot+ 2006] Zivot, E. and Wang, J. Vector autoregressive models for multivariate time series. Modeling financial time series with S-PLUS®, pp. 385–429, 2006. [Sen+ 2019] Sen, R., Yu, H.-F., and Dhillon, I. S. Think globally, act locally: A deep neural network approach to highdimensional time series forecasting. Advances in neural information processing systems, 32, 2019. [Zhou+ 2022] Zhou, T., Ma, Z., Wen, Q., Wang, X., Sun, L., and Jin, R. Fedformer: Frequency enhanced decomposed transformer for long-term series forecasting. In International Conference on Machine Learning, pp. 27268–27286. PMLR, 2022. [Zhou+ 2021] Zhou, H., Zhang, S., Peng, J., Zhang, S., Li, J., Xiong, H., and Zhang, W. Informer: Beyond efficient transformer for long sequence time-series forecasting. In Proceedings of the AAAI conference on artificial intelligence, 2021.

15 Related Work の詳細（Local univariate models）  指数平滑化法 Exponential Smoothing
– 最新の観測値により大きな重みを置くことで平滑化をする手法  𝑆𝑆𝑡𝑡 = 𝛼𝛼𝑦𝑦𝑡𝑡 + 1 − 𝛼𝛼 𝑦𝑦𝑡𝑡−1  ARIMA (Autoregressive Integrated Moving Average) – 自己回帰(AR)モデル，移動平均(MA)モデル，和文(I)モデルを組合わせたモデル  𝑦𝑦𝑡𝑡 − 𝑦𝑦𝑡𝑡−𝑑𝑑 = 𝑐𝑐 + 𝜖𝜖𝑡𝑡 + ∑ 𝑖𝑖=1 𝑝𝑝 𝜙𝜙𝑖𝑖 𝑦𝑦𝑡𝑡−𝑖𝑖 + ∑ 𝑖𝑖=1 𝑞𝑞 𝜃𝜃𝑖𝑖 𝜖𝜖𝑡𝑡−𝑖𝑖  Prophet – 𝑦𝑦 𝑡𝑡 = 𝑔𝑔 𝑡𝑡 + 𝑠𝑠 𝑡𝑡 + ℎ 𝑡𝑡 + 𝜖𝜖𝑡𝑡  𝑔𝑔(𝑡𝑡)：非周期的な変化をモデル化するトレンド関数  𝑠𝑠(𝑡𝑡)：周期的な変化（週や年の季節性）を表す  ℎ(𝑡𝑡)：1日以上の不規則なスケジュールで発生する可能性のある休日の影響を表す ARモデル MAモデル Iモデル

16 Related Work の詳細（Global univariate models）  DeepAR – RNNによって目標値の分布のパラメータを予測
– 𝑥𝑥：追加情報を提供する特徴量 Training Inference

17 Related Work の詳細（Global univariate models）  DeepAR – RNNによって目標値の分布のパラメータを予測
 Gaussian likelihood  Negative-binomial likelihood

18 Related Work の詳細（Global univariate models）  Temporal Convolutions

19 Related Work の詳細（Global univariate models）  N-BEATS

20 Related Work の詳細（Global univariate models）  PatchTST – 𝑀𝑀個の単変量時系列を
独立に処理 – パッチに分割し Transformerへ入力 – MSEをロス関数として学習 – パッチをマスクした表現学習(SSL)も可能

21 Related Work の詳細（Global univariate models）  TiDE – MLPベースのモデル

A decoder-only foundation model for time-series...

A decoder-only foundation model for time-series forecasting

ty

More Decks by ty

Other Decks in Technology

Featured

Transcript

論文紹介 A decoder-only foundation model for time-series forecasting

1 論文概要  論文タイトル – A decoder-only foundation model for

2 Introduction  時系列予測分野における深層学習 – 深層学習モデル（DeepAR[Salinas+ 2020], N-BEATS[Oreshkin+ 2019]）が

3 TimesFM  時系列予測のためのFoundation ModelであるTimesFMを設計 – 実データと合成データで構築された大規模な時系列コーパス – 効率的に事前学習する入力パッチを用いたデコーダ形式のAttention architecture

4 Related Work  Local univariate models（ローカル単変量モデル） – ARIMA，指数平滑化法[McKenzie 1984]，Prophet[Taylor+

5 Related Work  時系列予測のための大規模言語モデルを再利用する研究 – LLMTime[Gruver+ 2023]はGPT-3やLLaMA-2のZero-shot予測性能のベンチマーク – GPT-2を時系列予測タスクでfine-tuneする研究[Zhou+

6 Problem Definition  時系列の過去𝐿𝐿個の time-points を contextとして将来の𝐻𝐻個の time-points を予測する

7 Model Architecture (TimesFM)  時系列をパッチ分解しDecoder-only transformerで処理

8 Model Architecture (TimesFM)  Patching – PatchTST[Nie+ 2022]を参考に時系列をパッチに分解 

9 Pretraining (TimesFM)  事前学習データセット – Google trends  2007年～2022年までの15年間のsearch

10 Pretraining (TimesFM)  事前学習データセット – その他実データ  M4 dataset

11 Empirical Results  TimesFMは多様なデータセットにおいて Zero-shotでありながら教師ありの予測モデルに迫る精度を示す – TimesFMとllmtimeはzero-shot

12 Conclusions  時系列予測のためのFoundation ModelであるTimesFMを設計 – 約200M paramsのdecoder形式のAttention architecture –

参考資料 13

14 Related Work  文献詳細 [McKenzie 1984] McKenzie, E. General

15 Related Work の詳細（Local univariate models）  指数平滑化法 Exponential Smoothing

16 Related Work の詳細（Global univariate models）  DeepAR – RNNによって目標値の分布のパラメータを予測

17 Related Work の詳細（Global univariate models）  DeepAR – RNNによって目標値の分布のパラメータを予測

18 Related Work の詳細（Global univariate models）  Temporal Convolutions

19 Related Work の詳細（Global univariate models）  N-BEATS

20 Related Work の詳細（Global univariate models）  PatchTST – 𝑀𝑀個の単変量時系列を

21 Related Work の詳細（Global univariate models）  TiDE – MLPベースのモデル