Q k (s',a')] Q-Learning In MDP: In RL (Model-Free): ✤ Receive a sample (s,a,s’,r) ✤ Consider your old estimate: Q(s,a) ✤ Consider your new sample estimate (sample suggest Q-value): ✤ Incorporate the new estimate into a running average: Q(s,a) ← (1−α)Q(s,a)+α ⋅sample sample = Q suggest (s,a) = R(s,a,s')+γ max a' Q(s',a') From CS188x