system rt+1 st+1 action at state st reward rt dx dH p zRT g p zTq x p D R t q p q q D RzT x p вн с c с вн 2 2 4 2 2 2 5 2 2 2 32 2 16 − ∂ ∂ − ∂ ∂ − − = ∂ ∂ π ρ ρ ρ π λ t zT pR x q D zRT t p вн с ∂ ∂ + ∂ ∂ − = ∂ ∂ 2 2 ) ( 2 4 π ρ dx dH Fp c zRT gq t p T z T z p c RT x p T z Fp c q zT R Fp c zRT T T D K x T Fp zRTq t T p с p p с p ос вн то с ρ ρ π ρ − ∂ ∂ ∂ ∂ + + + ∂ ∂ ∂ ∂ + − − = ∂ ∂ + ∂ ∂ 2 3 2 ) ( Basic notation: I – set of active elements (edges of the graph that defines the network structure), G – set of nodes of the transfer of products to consumers; , - vectors of values of pressure and gas temperature in the nodes of the graph; – vector of the gas flow in the edges of the graph; – vectors of values of the power consumption of the active elements; – unit monetary cost or the energy cost of active elements. Discrete: start / stop of equipment Continuous: the speed of rotation of the shafts � , � , � 1)Determine the optimal strategy for moving from the current product delivery plan to the maximum possible delivery Common RL w/ shaped rewards � ∈ � , � , � → 𝑚𝑚 � 𝑚𝑚 ≤ � ≤ � , � 𝑚𝑚 ≤ � ≤ � , � ≤ � 2)Determine the optimal strategy for switching from the current operating mode of the system to the specified target plan for the supply of products (gas) to consumers � 𝑚𝑚 ≤ � ≤ � , � 𝑚𝑚 ≤ � ≤ � , � ≤ � Multi-goal RL w/ sparse rewards � ∈ � , � , � → ∀ ∈ : 𝑚𝑚 ≤ ≤ 𝑎𝑎𝑎𝑎