Slide 22
Slide 22 text
• Expected immediate reward:
R[x](d, π) = Σa
π[a | x] ζ[x, a](d, π)
• State stochastic matrix:
P[x+ | x](d, π) = Σa
π[a | x] ρ[x+ | x, a](d, π)
• In
fi
nite-horizon value function:
V[x](d, π) = R[x](d, π) + α Σx+
P[x+ | x] (d, π) V[x+](d, π)
• Single-stage deviation rewards:
Q[x, a](d, π) = ζ[x, a](d, π) + α Σx+ ρ[x+, a | x] V[x+](d, π)
• Best response per state:
B[x](d, π) := set of randomizations of a that maximize Q[x, a](d, π)
Best response in dynamic population games
De
fi
ned per state w.r.t. single-stage deviation rewards
Notation
[·] for discrete quantities
(·) for continuous quantities
x - individual state
a - individual action
d - state distribution
π - policy
ζ - immediate reward
ρ - state transition probabilities
α - future discount factor

22