exact gradients of πΌπβΌπss π’ [Ξ¦(π’, π)] βΰ·© Ξ¦ π’π = πΌπβΌππ [βΞ¦(π’π , β(π’π , π))] = πΌπβΌππ [βπ’ Ξ¦(π’π , β(π’π , π)) + βπ’ β π’π , π π»π Ξ¦(π’π , π)|π=β(π’π,π) ] = πΌ(π,π)βΌπΎss(π’π) [βπ’ Ξ¦(π’π , π) + βπ’ β(π’π , π) βπ Ξ¦(π’π , π)] steady state induced by π βΌ ππ & π βΌ β(π’,β
)# ππ chain rule & law of total derivative conditions on Ξ¦ allow swapping β & πΌ Challenges hard to evaluate πΌ (integral) no access to the steady state online decision-making! use current samples from ππ = πΌπβΌππ Ξ¦ π’, β π’, π β ΰ·© Ξ¦(π’)