論文紹介: Dynamic Hyperparameter Optimization for Real-Time Data Streams

Bayesian Stream Tuner: Dynamic Hyperparameter Optimization for Real-Time Data Streams
2025.08.25 千原直⼰

概要 • KDD2025 に採択された論⽂ • ベイズ最適化の理念に基づいた、⾮定常なデータストリームのためのオンラインハイパーパラメータ最適化 (HPO) を提案 •
既存の HPO の⼿法はオフライン設定を仮定しており、データの⾮定常性に対応ができない • 提案⼿法はデータ中のコンセプトドリフトを考慮して、ハイパラの再設定を⾏う • 誤差の上界‧下界を理論的に導出 • Keywords: ハイパーパラメータ最適化、ストリーム処理、⾮定常研究会 © 2025 Naoki Chihara et al. 2

前提知識：ベイズ最適化ブラックボックス関数を最⼤化する⼊⼒を効率よく求める⽅法 !⋆ = arg max "∈$ ) ! •
HPOの場合、 ! はモデルの尤度で、" がハイパーパラメータグリッドサーチは ! を愚直に全探索する⽅法ベイズ最適化 x HPOは過去の結果をもとにあたりをつけて探索する • 例： "! が⼩さすぎると良くなさそう、 "" ⋅ "# が⼤きいと良さそう、的なのを⾃動的に判別しながら、次に試してみるべきハイパラを提案してくれる研究会 © 2025 Naoki Chihara et al. 3

前提知識：ベイズ最適化ブラックボックス関数を最⼤化する⼊⼒を効率よく求める⽅法 !⋆ = arg max "∈$ ) ! 獲得関数というものが次の評価点
!%&"' を提案してくれる * ! = ℙ , > ,⋆ ∣ !, 0 • $⋆: ベストな過去データ $∗ = max & $& • * " は⼊⼒ " に対して $ < $⋆ がどれくらい起きるのか？を計算する • * " を最⼤化するような "'()* の結果を次は⾒てみたいという気持ちになる • ( を⼗分表現できるほどのデータが溜まっていれば、)!"#$ はきっと良い⼊⼒研究会 © 2025 Naoki Chihara et al. 4

前提知識：ベイズ最適化ブラックボックス関数を最⼤化する⼊⼒を効率よく求める⽅法 !⋆ = arg max "∈$ ) ! アルゴリズムの概要は以下の⼿続きを繰り返す
1. 保持しているサンプル !! = { $" , &" }"#$ ! を⽤いてモデル ((*|!! ) を学習する 2. モデルに対する出⼒結果を⽤いて、次の評価点 $"%$ を提⽰する 3. 評価点に対する値 &"%$ を得て、データセットに加える !!%$ = !! ∪ {($"%$ , &"%$ )} 研究会 © 2025 Naoki Chihara et al. 5

問題定義研究会 © 2025 Naoki Chihara et al. 6 Data
Stream Current window + • 何かしらの（更新可能な）予測器 ! "; $, & のハイパーパラメータをオンラインで得たい • この予測器は '! = ! "! という回帰 / 分類のためのごく⼀般的なモデル • モデルプール ) というものがあり、そこに * 個のモデルが含まれている • 最終的な推論は最もフィッティングがうまくいった１つを採⽤する • * 個のモデルの違いはハイパーパラメータのみ • 適応的に ) を更新することで、データに合うハイパラをもつモデルで推論ができるという算段のアルゴリズム ()$ , ,$ ) (% … (! Model Pool 異なるハイパラが設定されているプールの中の最も優れたモデルを使って推論する

Current window ()$ , ,$ ) 問題定義研究会 © 2025
Naoki Chihara et al. 7 Data Stream + (% … (! Model Pool 最新のデータでプール内の予測器を更新する • 何かしらの（更新可能な）予測器 ! "; $, & のハイパーパラメータをオンラインで得たい • この予測器は '! = ! "! という回帰 / 分類のためのごく⼀般的なモデル • モデルプール ) というものがあり、そこに * 個のモデルが含まれている • 最終的な推論は最もフィッティングがうまくいった１つを採⽤する • * 個のモデルの違いはハイパーパラメータのみ • 適応的に ) を更新することで、データに合うハイパラをもつモデルで推論ができるという算段のアルゴリズム

Current window 問題定義研究会 © 2025 Naoki Chihara et al.
8 Data Stream (% … (! Model Pool ()$&% , ,$&% ) + + 1 最新のデータでプール内の予測器を更新する • 何かしらの（更新可能な）予測器 ! "; $, & のハイパーパラメータをオンラインで得たい • この予測器は '! = ! "! という回帰 / 分類のためのごく⼀般的なモデル • モデルプール ) というものがあり、そこに * 個のモデルが含まれている • 最終的な推論は最もフィッティングがうまくいった１つを採⽤する • * 個のモデルの違いはハイパーパラメータのみ • 適応的に ) を更新することで、データに合うハイパラをもつモデルで推論ができるという算段のアルゴリズム

Current window 問題定義研究会 © 2025 Naoki Chihara et al.
9 Data Stream (% … (! Model Pool ()$&' , ,$&' ) + + 2 最新のデータでプール内の予測器を更新する • 何かしらの（更新可能な）予測器 ! "; $, & のハイパーパラメータをオンラインで得たい • この予測器は '! = ! "! という回帰 / 分類のためのごく⼀般的なモデル • モデルプール ) というものがあり、そこに * 個のモデルが含まれている • 最終的な推論は最もフィッティングがうまくいった１つを採⽤する • * 個のモデルの違いはハイパーパラメータのみ • 適応的に ) を更新することで、データに合うハイパラをもつモデルで推論ができるという算段のアルゴリズム

Stream (% … (! Model Pool ()$&( , ,$&( ) Current window + + / 最新のデータでプール内の予測器を更新する • 何かしらの（更新可能な）予測器 ! "; $, & のハイパーパラメータをオンラインで得たい • この予測器は '! = ! "! という回帰 / 分類のためのごく⼀般的なモデル • モデルプール ) というものがあり、そこに * 個のモデルが含まれている • 最終的な推論は最もフィッティングがうまくいった１つを採⽤する • * 個のモデルの違いはハイパーパラメータのみ • 適応的に ) を更新することで、データに合うハイパラをもつモデルで推論ができるという算段のアルゴリズム間違ったハイパラが設定されたモデルの更新は正しい⽅向に進まない ⇨ 初期化して、別のハイパラを設定 +

Stream (% … (! Model Pool ()$&( , ,$&( ) Current window + + / 最新のデータでプール内の予測器を更新する • 何かしらの（更新可能な）予測器 ! "; $, & のハイパーパラメータをオンラインで得たい • この予測器は '! = ! "! という回帰 / 分類のためのごく⼀般的なモデル • モデルプール ) というものがあり、そこに * 個のモデルが含まれている • 最終的な推論は最もフィッティングがうまくいった１つを採⽤する • * 個のモデルの違いはハイパーパラメータのみ • 適応的に ) を更新することで、データに合うハイパラをもつモデルで推論ができるという算段のアルゴリズム間違ったハイパラが設定されたモデルの更新は正しい⽅向に進まない ⇨ 初期化して、別のハイパラを設定どうやって選ぶ？全く性質の異なるデータの逐次更新はまずくない？＋α 提案アルゴリズムは理論的に妥当？

Stream (% … (! Model Pool ()$&( , ,$&( ) Current window + + / 最新のデータでプール内の予測器を更新する • 何かしらの（更新可能な）予測器 ! "; $, & のハイパーパラメータをオンラインで得たい • この予測器は '! = ! "! という回帰 / 分類のためのごく⼀般的なモデル • モデルプール ) というものがあり、そこに * 個のモデルが含まれている • 最終的な推論は最もフィッティングがうまくいった１つを採⽤する • * 個のモデルの違いはハイパーパラメータのみ • 適応的に ) を更新することで、データに合うハイパラをもつモデルで推論ができるという算段のアルゴリズムどうやって選ぶ？全く性質の異なるデータの逐次更新はまずくない？＋α 提案アルゴリズムは理論的に妥当？⾮定常データストリームに対する逐次的なハイパーパラメータ最適化のためのベイズ最適化ベースの⼿法

提案⼿法：概要研究会 © 2025 Naoki Chihara et al. 14 スライディングウィンドウ
から必要な情報を抽出する Step 1

提案⼿法：概要研究会 © 2025 Naoki Chihara et al. 15 複合的な特徴量と推論誤差
の関係をベイズ線形回帰モデル (BLR) で表現 Step 2

提案⼿法：概要研究会 © 2025 Naoki Chihara et al. 16 プールされているモデルを
整理する Step 3

提案⼿法：概要研究会 © 2025 Naoki Chihara et al. 17 コンセプトドリフトを検知
したらモデルをリセット Step 4

5(2, W9 ) =  5⌘ (2) 5B (W9 )
This composite feature vector is used as input to an online Bayesian Linear Regression (BLR) model [15] that assumes a Gauss- ian likelihood for the expected loss ✓(5 (x;2)) associated with conguration 2: ?(✓ | 5, ), V) = N (✓ | 5>), V 1) Here, ) represents the BLR weight vector and V is a precision pa- rameter that controls the observation noise. We impose a Gaussian prior on the weight vector, with U controlling the prior precision: 線形ベイズ回帰モデル (BLR) を⽤いて誤差を回帰する • どんなデータに対して、どのハイパラなら誤差が⼩さくなりそう？という情報が以降で必要になるため • この推定の説明変数として、カレントウィンドウから得られる統計値を採⽤しているステップ１＆２研究会 © 2025 Naoki Chihara et al. 18 損失関数

5(2, W9 ) =  5⌘ (2) 5B (W9 )
This composite feature vector is used as input to an online Bayesian Linear Regression (BLR) model [15] that assumes a Gauss- ian likelihood for the expected loss ✓(5 (x;2)) associated with conguration 2: ?(✓ | 5, ), V) = N (✓ | 5>), V 1) Here, ) represents the BLR weight vector and V is a precision pa- rameter that controls the observation noise. We impose a Gaussian prior on the weight vector, with U controlling the prior precision: 線形ベイズ回帰モデル (BLR) を⽤いて誤差を回帰する • どんなデータに対して、どのハイパラなら誤差が⼩さくなりそう？という情報が以降で必要になるため • この推定の説明変数として、カレントウィンドウから得られる統計値を採⽤しているステップ１＆２研究会 © 2025 Naoki Chihara et al. 19 損失関数 ⌘ ach conguration 2 in the conguration space C to fo posite representation. 5(2, W9 ) =  5⌘ (2) 5B (W9 ) his composite feature vector is used as input to an o esian Linear Regression (BLR) model [15] that assumes a G ikelihood for the expected loss ✓(5 (x;2)) associated with ration 2: ハイパーパラメータカレントウィンドウハイパーパラメータを使って得た特徴量カレントウィンドウ内のデータから得た特徴量（e.g., 平均、分散、歪度）

ステップ３間違ったハイパーパラメータが設定されたモデルは正しく学習ができないと考え、もっと正しく挙動しそうなパラメータに差し替える • ハイパラの推薦は改善確率量 (PI) という獲得関数を⽤いる研究会 © 2025
Naoki Chihara et al. 20 At the end of each window W9 , BST renes its conguration pool P through a systematic update process. First, the algorithm ranks the congurations based on their empirical loss !2,W9 and then replaces half of the under-performing congurations with promising alter- natives. The selection of new congurations uses the acquisition function of probability of improvement (PI) [3], which eectively balances exploration and exploitation. The PI function is dened as: 2new = argmin 22Ccand `9 (2) !2⇤,W9 f9 (2) + n ! , where `9 (2) and f9 (2) denote the BLR model’s predictive mean and standard deviation for conguration 2, (·) is the standard normal cumulative distribution function (CDF), and !2⇤,W9 is the ハイパラ " を使った時の誤差の予測値の平均今までで最も誤差が低かったハイパラ標準正規の CDF

Naoki Chihara et al. 21 At the end of each window W9 , BST renes its conguration pool P through a systematic update process. First, the algorithm ranks the congurations based on their empirical loss !2,W9 and then replaces half of the under-performing congurations with promising alter- natives. The selection of new congurations uses the acquisition function of probability of improvement (PI) [3], which eectively balances exploration and exploitation. The PI function is dened as: 2new = argmin 22Ccand `9 (2) !2⇤,W9 f9 (2) + n ! , where `9 (2) and f9 (2) denote the BLR model’s predictive mean and standard deviation for conguration 2, (·) is the standard normal cumulative distribution function (CDF), and !2⇤,W9 is the 今までで最も誤差が低かったハイパラ標準正規の CDF ハイパラ " を使った時の誤差の予測値の平均誤差の予測値そのものは “0! 1 + 2! " 1 ” と確率で与えられる

Naoki Chihara et al. 22 At the end of each window W9 , BST renes its conguration pool P through a systematic update process. First, the algorithm ranks the congurations based on their empirical loss !2,W9 and then replaces half of the under-performing congurations with promising alter- natives. The selection of new congurations uses the acquisition function of probability of improvement (PI) [3], which eectively balances exploration and exploitation. The PI function is dened as: 2new = argmin 22Ccand `9 (2) !2⇤,W9 f9 (2) + n ! , where `9 (2) and f9 (2) denote the BLR model’s predictive mean and standard deviation for conguration 2, (·) is the standard normal cumulative distribution function (CDF), and !2⇤,W9 is the 今までで最も誤差が低かったハイパラ標準正規の CDF ハイパラ " を使った時の誤差の予測値の平均誤差の予測値そのものは “0! 1 + 2! " 1 ” と確率で与えられる Φ 4 はこの部分が起きうる確率 • 右の式は「ハイパラ 5 を使った時の誤差が今までの最⼩値 6 よりも⼩さくなる確率」を意味する 4 標準正規の CDF について

ステップ４逐次更新というのは急激なコンセプトドリフトに弱い • コンセプトドリフト：データの特性やモデルの分布が変動してしまう現象 • 安定性のために、そもそも急激な変化に対応しない仕様になっている対処法は、コンセプトドリフトを検知するための⼿法 (ADWIN) を⽤いて、検知のタイミングより過去の情報は将来に影響を与えないと考え、
情報を全て棄却し、新たな学習を再開する研究会 © 2025 Naoki Chihara et al. 23

計算量解析 • 時間計算量 • 1: データの次元 • 3: ウィンドウサイズ •
40123 : パラメータ候補のサイズ研究会 © 2025 Naoki Chihara et al. 24 ory Complexity Analysis per window is$(32 +|Ccand |+F logF), lity of the feature vector, |Ccand | is the and F is the window size. This com- ain components: $(32) for BLR updates, valuation, and $(F logF) for drift de- y, BST requires $(=3 + 32 + F + logF) the model pool ($(=3)), BLR parame- ng window along with drift detection • 空間計算量 • 5: モデルプールに保存されているデータの数 ry Complexity Analysis er window is $(32 +|Ccand |+F logF), y of the feature vector, |Ccand | is the nd F is the window size. This com- components: $(32) for BLR updates, luation, and $(F logF) for drift de- BST requires $(=3 + 32 + F + logF) e model pool ($(=3)), BLR parame- window along with drift detection

• -(/) は動的リグレット (dynamic regret) というアルゴリズムの評価指標 25, August 3–7, 2025,
Toronto, ON, Canada Nilesh Verma, Problem Setup sider a data stream {(xC,~C )}C 1 where xC 2 X and ~C 2 Y. C be a compact conguration space equipped with the metric ). Each conguration 2 2 C corresponds to a trained model with instantaneous loss ✓(52 (x),~). e dene the regret over a period ) as: '()) = ) ’ C=1 ✓(52C (xC ),~C ) ) ’ C=1 ✓(52⇤ C (xC ),~C ), e the instantaneous optimal conguration is given by: 2⇤ C = argmin 22C E ⇥ ✓(52 (xC ),~C ) ⇤ . By choosing the windo obtain a regret '()) that ) + log |C|, up to logarith '()) = ˜ $ T 4.2 (L B budget ) , there exists a se " ’ 9=1 3(2⇤ 9 ,2⇤ 9+1 )  以下の -(/) を⼩さくするオンライン予測が「良いアルゴリズム」理論的解析研究会 © 2025 Naoki Chihara et al. 25 52 (·) with instantaneous loss ✓(52 (x),~). We dene the regret over a period ) as: '()) = ) ’ C=1 ✓(52C (xC ),~C ) ) ’ C=1 ✓(52⇤ C (xC ),~C ), where the instantaneous optimal conguration is given by: 2⇤ C = argmin 22C E ⇥ ✓(52 (xC ),~C ) ⇤ . 4.2 Assumptions (1) Bounded Loss: For all 2 2 C and (x,~), there exists ⌫ > 0 such that 0  ✓(52 (x),~)  ⌫. (2) Lipschitz Continuity [14]: For all 2,20 2 C, there exists ! > 0 with T budget ) , th " ’ 9=1 3 P S intervals. Ea these yields ider a data stream {(xC,~C )}C 1 where xC 2 X and ~C 2 Y. be a compact conguration space equipped with the metric . Each conguration 2 2 C corresponds to a trained model with instantaneous loss ✓(52 (x),~). e dene the regret over a period ) as: '()) = ) ’ C=1 ✓(52C (xC ),~C ) ) ’ C=1 ✓(52⇤ C (xC ),~C ), e the instantaneous optimal conguration is given by: 2⇤ C = argmin 22C E ⇥ ✓(52 (xC ),~C ) ⇤ . Assumptions Bounded Loss: For all 2 2 C and (x,~), there exists ⌫ > 0 such that obtain a regret '( ) + log |C|, up to '() T 4.2 (L budget ) , there exi " ’ 9=1 3(2⇤ 9 ,2 P S. intervals. Each int these yields 52 (·) with instantaneous loss ✓(52 (x),~). We dene the regret over a period ) as: '()) = ) ’ C=1 ✓(52C (xC ),~C ) ) ’ C=1 ✓(52⇤ C (xC ), where the instantaneous optimal conguration is gi 2⇤ C = argmin 22C E ⇥ ✓(52 (xC ),~C ) ⇤ . 4.2 Assumptions (1) Bounded Loss: For all 2 2 C and (x,~), ther such that 0  ✓(52 (x),~)  ⌫. (2) Lipschitz Continuity [14]: For all 2,20 2 C ! > 0 with BST が {"! }!"# $ を選んだ時の誤差の総和バッチ処理で {"! }!"# $ を選んだ時の誤差の総和系列全体を考慮して選んだ時刻 % における最良の "%

以下の -(/) を⼩さくするオンライン予測が「良いアルゴリズム」 • リグレットの上界が導出されている • 系列が⻑さ、ハイパラの変化度、探索空間の広さ、が影響を与える理論的解析研究会 ©
2025 Naoki Chihara et al. 26 ) = " ’ 9=1 3(2⇤ 9 ,2⇤ 9+1 ), where " = b)/Fc and F is the window size. 4.3 Main Results T 4.1 (U B). Under Assumptions 1–3, with probability 1 X and for F = ⇥ ✓ ) ) ◆2/3 ! , BST achieves '()) = ˜ $ ⇣ )2/3 ⇣ ) + log |C| ⌘1/3 ⌘ , where ˜ $ hides logarithmic factors in ) and 1/X. P. We begin with the regret decomposition: '())  ) ’ C=1 h ✓(52C ,~C ) ˆ !C (2C ) i | {z } + ) ’ C=1 h ˆ !C (2C ) ˆ !C (2⇤ C ) i | {z } + ) ’ C=1 h ˆ !C (2⇤ C ) ✓(52⇤ C ,~C ) i | {z } tasks.2 The evaluation w split between classi varying characterist prising real-world an performance in a ra Tab Category Datase Classication Electric Forest New A Nomao Hyperp RBF (M RTG (G SEA (A SEA (M SINE (A Bike Diamo Health Physio 52 (·) with instantaneous loss ✓(52 (x),~). We dene the regret over a period ) as: '()) = ) ’ C=1 ✓(52C (xC ),~C ) ) ’ C=1 ✓(52⇤ C (xC ),~C ), where the instantaneous optimal conguration is given by: 2⇤ C = argmin 22C E ⇥ ✓(52 (xC ),~C ) ⇤ . 4.2 Assumptions (1) Bounded Loss: For all 2 2 C and (x,~), there exists ⌫ > 0 such that 0  ✓(52 (x),~)  ⌫. (2) Lipschitz Continuity [14]: For all 2,20 2 C, there exists ! > 0 with T budget ) , th " ’ 9=1 3 P S intervals. Ea these yields

以下の -(/) を⼩さくするオンライン予測が「良いアルゴリズム」 • リグレットの上界が導出されている • 系列が⻑さ、ハイパラの変化度、探索空間の広さ、が影響を与える理論的解析研究会 ©
2025 Naoki Chihara et al. 27 ) = " ’ 9=1 3(2⇤ 9 ,2⇤ 9+1 ), where " = b)/Fc and F is the window size. 4.3 Main Results T 4.1 (U B). Under Assumptions 1–3, with probability 1 X and for F = ⇥ ✓ ) ) ◆2/3 ! , BST achieves '()) = ˜ $ ⇣ )2/3 ⇣ ) + log |C| ⌘1/3 ⌘ , where ˜ $ hides logarithmic factors in ) and 1/X. P. We begin with the regret decomposition: '())  ) ’ C=1 h ✓(52C ,~C ) ˆ !C (2C ) i | {z } + ) ’ C=1 h ˆ !C (2C ) ˆ !C (2⇤ C ) i | {z } + ) ’ C=1 h ˆ !C (2⇤ C ) ✓(52⇤ C ,~C ) i | {z } tasks.2 The evaluation w split between classi varying characterist prising real-world an performance in a ra Tab Category Datase Classication Electric Forest New A Nomao Hyperp RBF (M RTG (G SEA (A SEA (M SINE (A Bike Diamo Health Physio 52 (·) with instantaneous loss ✓(52 (x),~). We dene the regret over a period ) as: '()) = ) ’ C=1 ✓(52C (xC ),~C ) ) ’ C=1 ✓(52⇤ C (xC ),~C ), where the instantaneous optimal conguration is given by: 2⇤ C = argmin 22C E ⇥ ✓(52 (xC ),~C ) ⇤ . 4.2 Assumptions (1) Bounded Loss: For all 2 2 C and (x,~), there exists ⌫ > 0 such that 0  ✓(52 (x),~)  ⌫. (2) Lipschitz Continuity [14]: For all 2,20 2 C, there exists ! > 0 with T budget ) , th " ’ 9=1 3 P S intervals. Ea these yields ⇨ "! が急激な変化を繰り返さないことを仮定（#" : 変化量の総和の上限）

実験設定回帰と分類の２つのタスクを⽤いて精度評価を⾏った • Baselines • MESSPT • RandomSearch • SSPT
• Datasets • 合成データおよび実データを含む合計20種類のデータセット（右図）研究会 © 2025 Naoki Chihara et al. 28

実験結果コンセプトドリフトに対する頑健性の検証 • 上図の横軸は学習に使⽤したデータの数（≒時刻） • BST (Bayesian) はコンセプトドリフトに頑健である（図中の⾚線） • 回帰
/ 分類のどちらのタスクでも精度が良いのが偉い研究会 © 2025 Naoki Chihara et al. 30

実験結果計算時間およびメモリ使⽤量に関する⽐較 • 計算効率は時間的にもメモリ的にも”悪い” • 原因としては、BLRの維持 / 動的な統計的特徴量の抽出の２つの処理が重い • しかし、精度‧頑健性の観点から提案⼿法は⼗分有効だと結論づけている
研究会 © 2025 Naoki Chihara et al. 31 best best

実験結果研究会 © 2025 Naoki Chihara et al. 32 アブレーション研究（DD:
ドリフト検知、ST: 統計的特徴量） • 上記の要素はどちらとも精度向上に寄与している • 頑健性の実験で使ってた SINE での結果が気になる

Future work • 計算速度の改善 & 多⽬的ベイズ最適化も扱えるように拡張 • 論⽂中で著者が⾔及 • 特徴ベクトルの作り⽅をもう少し⼯夫する
• ⼊⼒を「時系列データストリーム」に限定すると、時間的依存性に関する何かしらの情報を⾜せそう？（STL, Fourier, AR 等） • 今後のストリーム予測⼿法に精度保証をつけてみる • できそうなだが、習熟度が⽢過ぎてできるとは⾔い切れない、要勉強 • （余談）レジームの取捨選択に獲得関数を導⼊する • 「ハイパラ選択」じゃなくて「レジーム選択」にするだけ • レジームの埋め込み⽅法に検討の余地あり • 全然モデルパラメータで良いんだが、⾼次元化してしまう恐れがあり研究会 © 2025 Naoki Chihara et al. 33

論文紹介: Dynamic Hyperparameter Optimization for R...

論文紹介: Dynamic Hyperparameter Optimization for Real-Time Data Streams

Naoki Chihara

More Decks by Naoki Chihara

Other Decks in Research

Featured

Transcript

Bayesian Stream Tuner: Dynamic Hyperparameter Optimization for Real-Time Data Streams

概要 • KDD2025 に採択された論⽂ • ベイズ最適化の理念に基づいた、⾮定常なデータストリームのためのオンラインハイパーパラメータ最適化 (HPO) を提案 •

前提知識：ベイズ最適化ブラックボックス関数を最⼤化する⼊⼒を効率よく求める⽅法 !⋆ = arg max "∈$ ) ! •

前提知識：ベイズ最適化ブラックボックス関数を最⼤化する⼊⼒を効率よく求める⽅法 !⋆ = arg max "∈$ ) ! 獲得関数というものが次の評価点

前提知識：ベイズ最適化ブラックボックス関数を最⼤化する⼊⼒を効率よく求める⽅法 !⋆ = arg max "∈$ ) ! アルゴリズムの概要は以下の⼿続きを繰り返す

問題定義研究会 © 2025 Naoki Chihara et al. 6 Data

Current window ()$ , ,$ ) 問題定義研究会 © 2025

Current window 問題定義研究会 © 2025 Naoki Chihara et al.

Current window 問題定義研究会 © 2025 Naoki Chihara et al.

問題定義研究会 © 2025 Naoki Chihara et al. 10 Data

問題定義研究会 © 2025 Naoki Chihara et al. 11 Data

問題定義研究会 © 2025 Naoki Chihara et al. 12 Data

提案⼿法：概要研究会 © 2025 Naoki Chihara et al. 13

提案⼿法：概要研究会 © 2025 Naoki Chihara et al. 14 スライディングウィンドウ

提案⼿法：概要研究会 © 2025 Naoki Chihara et al. 15 複合的な特徴量と推論誤差

提案⼿法：概要研究会 © 2025 Naoki Chihara et al. 16 プールされているモデルを

提案⼿法：概要研究会 © 2025 Naoki Chihara et al. 17 コンセプトドリフトを検知

5(2, W9 ) =  5⌘ (2) 5B (W9 )

5(2, W9 ) =  5⌘ (2) 5B (W9 )

ステップ３間違ったハイパーパラメータが設定されたモデルは正しく学習ができないと考え、もっと正しく挙動しそうなパラメータに差し替える • ハイパラの推薦は改善確率量 (PI) という獲得関数を⽤いる研究会 © 2025

ステップ３間違ったハイパーパラメータが設定されたモデルは正しく学習ができないと考え、もっと正しく挙動しそうなパラメータに差し替える • ハイパラの推薦は改善確率量 (PI) という獲得関数を⽤いる研究会 © 2025

ステップ３間違ったハイパーパラメータが設定されたモデルは正しく学習ができないと考え、もっと正しく挙動しそうなパラメータに差し替える • ハイパラの推薦は改善確率量 (PI) という獲得関数を⽤いる研究会 © 2025

計算量解析 • 時間計算量 • 1: データの次元 • 3: ウィンドウサイズ •

• -(/) は動的リグレット (dynamic regret) というアルゴリズムの評価指標 25, August 3–7, 2025,

以下の -(/) を⼩さくするオンライン予測が「良いアルゴリズム」 • リグレットの上界が導出されている • 系列が⻑さ、ハイパラの変化度、探索空間の広さ、が影響を与える理論的解析研究会 ©

以下の -(/) を⼩さくするオンライン予測が「良いアルゴリズム」 • リグレットの上界が導出されている • 系列が⻑さ、ハイパラの変化度、探索空間の広さ、が影響を与える理論的解析研究会 ©

実験設定回帰と分類の２つのタスクを⽤いて精度評価を⾏った • Baselines • MESSPT • RandomSearch • SSPT

実験結果 BST (Bayesian) はベースラインと⽐べて⾼い精度を達成研究会 © 2025 Naoki Chihara et

実験結果コンセプトドリフトに対する頑健性の検証 • 上図の横軸は学習に使⽤したデータの数（≒時刻） • BST (Bayesian) はコンセプトドリフトに頑健である（図中の⾚線） • 回帰

実験結果研究会 © 2025 Naoki Chihara et al. 32 アブレーション研究（DD:

Future work • 計算速度の改善 & 多⽬的ベイズ最適化も扱えるように拡張 • 論⽂中で著者が⾔及 • 特徴ベクトルの作り⽅をもう少し⼯夫する