state of environment changes, but you don't have any "hint" about the current state. ◦ What if you have a "hint" about the state? ▪ eg: the color of slot machines changes according to the reward distribution R(a). • The "hint" can be formalized as a state "s" of the environment. ◦ Action-value function and policy can be functions of "s". ▪ Q(a | s) ▪ P(a | Q(s), s) 17