exact gradients of ๐ผ๐โผ๐ss ๐ข [ฮฆ(๐ข, ๐)] โเทฉ ฮฆ ๐ข๐ = ๐ผ๐โผ๐๐ [โฮฆ(๐ข๐ , โ(๐ข๐ , ๐))] = ๐ผ๐โผ๐๐ [โ๐ข ฮฆ(๐ข๐ , โ(๐ข๐ , ๐)) + โ๐ข โ ๐ข๐ , ๐ ๐ป๐ ฮฆ(๐ข๐ , ๐)|๐=โ(๐ข๐,๐) ] = ๐ผ(๐,๐)โผ๐พss(๐ข๐) [โ๐ข ฮฆ(๐ข๐ , ๐) + โ๐ข โ(๐ข๐ , ๐) โ๐ ฮฆ(๐ข๐ , ๐)] steady state induced by ๐ โผ ๐๐ & ๐ โผ โ(๐ข,โ
)# ๐๐ chain rule & law of total derivative conditions on ฮฆ allow swapping โ & ๐ผ Challenges hard to evaluate ๐ผ (integral) no access to the steady state online decision-making! use current samples from ๐๐ = ๐ผ๐โผ๐๐ ฮฆ ๐ข, โ ๐ข, ๐ โ เทฉ ฮฆ(๐ข)