Slide 13
Slide 13 text
• Online nonstationary and nonlinear contextual MAB policy

13
Our Proposed Policy
• Nonlinearity and nonstationarity
• Introduce a forgetting mechanism in nonlinear GP regression model.
• Online performance
• Using RFF, compute the predictive distribution of GP Regression in the form of linear regression
in

-dimensional space, where

.
• Applies a linear RLS algorithm.
• Key features
• Fast decision-making with recursive learning.
• Accurate error correction in predictive distribution.
R R ⋘ N
z
P
N−2,M−2
Q
N−2,M−2
x
N−1
y
N−1
(M = 0)
γ
P
N−1,M−1
Q
N−1,M−1
z
x
N
y
N
γ
P
N,M
Q
N,M
z
x
*
p(y
N−1
∣ x
N−1
, X, y)
p(y
N
∣ x
N
, X, y)
p(y
*
∣ x
*
, X, y)
(M = 0)